I’m working on a new character encoding converter library for Gecko. The new library is written in Rust and is called encoding_rs. Here are some performance numbers as of the end of November 2016.
The columns are grouped into decode results and into encode results. Those groups, in turn, are grouped into using UTF-16 as the internal Unicode representation and into using UTF-8 as the internal Unicode representation. (Both are supported in encoding_rs in order to cater for both the legacy UTF-16 needs of Gecko and for Rust code normally using UTF-8.) Then there is a column for each library whose performance is being compared with. uconv is Gecko’s current encoding converter library. ICU is ICU 55. kernel32 is kernel32.dll included in Windows 10. stdlib is Rust’s standard library. rust-encoding is rust-encoding. glibc is glibc’s iconv.
Each row names a language and an external encoding to convert from or to. The numbers are factors relative to the library named in the column. 2.0 means that encoding_rs is twice as fast as the reference library. 0.5 means that the reference library is twice as fast as encoding_rs. 0.00 means that encoding_rs is really slow and the non-zero decimals didn’t show up in the second decimal position.
When decoding from UTF-8, the test case is the Wikipedia article for Mars, the planet, for the language in question.
Reasons for choosing Wikipedia were:
The topic Mars, the planet, was chosen, because it is the most-featured topic across the different-language Wikipedias and, indeed, had non-trivial articles in all the languages needed. Trying to choose a typical-length article for each language separately wasn’t feasible in the Wikidata data set.
When decoding from a non-UTF-8 encoding, the test case is synthetized from the UTF-8 test case by converting the Wikipedia article to the encoding in question and replacing unmappable characters with numeric character references (and in the case of Big5 removing a couple of characters that glibc couldn’t deal with).
When testing x-user-defined decode, the test case is a JPEG image, because loading binary data over XHR is the main performance-sensitive use case for x-user-defined.
Decoding JS or CSS wasn’t tested, but it’s safe to assume that the result would be faster than English, since JS and CSS tend to be 100% ASCII but English Wikipedia isn’t quite.
The encoder work loads use plain text extracts from the decoder test cases in order to simulate form submission (textarea
) workloads.
The other Web-relevant case for the encoders is the parsing of URL query strings. In the absence of errors, the query strings are ASCII, so it’s safe to assume that the result would be faster than English as above.
ICU and glibc are tested in the form shipped by Ubuntu 16.04. While both Chrome and Safari use ICU, it is possible that the compiler and compiler options of ICU as shipped by Ubuntu cause the performance of ICU as tested here to differ from actual performance as used in competing browsers. Still, hopefully ICU as shipped by Ubuntu gives a ballpark understanding of performance relative to Chrome and Safari. The entry point to kernel32.dll appears to be a non-streaming API. Logically, Edge and IE need streaming encoding converters and it’s not clear if the underlying converters used by Edge and IE are the same as the converters exposed by kernel32.dll, but hopefully the comparison with kernel32.dll gives a ballpark understanding of performance relative to Edge and IE. Comparing with rust-encoding is relevant to understanding performance relative to Servo and the Rust ecosystem in general. glibc is included mainly as a matter of curiosity but could be considered relevant in terms of writing Linux apps in C using the Linux C ecosystem libraries vs. writing Linux apps in Rust using encoding_rs. The Rust standard library isn’t interesting for now, since encoding_rs delegates UTF-8 validation to the Rust standard library at the moment. Any deviation from a factor of 1.00 is more likely to be jitter in the tests than anything substantive.
Some the comparisons could be considered to compare things that aren’t commensurable. In particular:
Except for kernel32, the measurements exclude the initialization and destruction of the converter. This is to the advantage of uconv, ICU and glibc, which perform more work during converter initialization than encoding_rs does. kernel32 does not expose converter initialization is a distinct operation and it’s not clear if there is an initialization cost the first time a given converter is used or every time.
When converting to and from UTF-8, in the comparison with rust-encoding, rust-encoding targets String
and Vec<u8>
while encoding_rs uses Cow
s. In this case, instead of trying to make the comparison fair by making encoding_rs make a useless copy, the comparison demonstrates the benefits of conditionally copy-free Rust API design.
Since a reference libraries do not fully conform to the Encoding Standard, the work being performed isn’t exactly the same. Instead, the closest approximation of a given legacy encoding is used.
Arguably, UTF-8 isn’t the native application-side Unicode representation of glibc. However, since e.g. glib (the infrastructure library used by GTK+) uses UTF-8 as its native application-side Unicode representation and wraps glibc for the conversions, testing glibc’s performance to and from UTF-8 is relevant even if arguably unfair.
When encoding from UTF-8, encoding_rs and rust-encoding assume the input is valid, but glibc does not.
The windows-1258 numbers might be bogus due to mismatches between NFC and windows-1258 using combining characters for some diacritics.
For decode performance with SSE2 enabled, encoding_rs outperforms uconv already, except for UTF-16LE and UTF-16BE.
UTF-16LE and UTF-16BE are rare on the Web and, therefore, good performance for them isn’t a goal for encoding_rs. The implementation is totally naïve direcly from spec. There is a lot of room for optimization, but it makes sense to address other concerns first.
Decoding UTF-8-encoded English to UTF-16 is slower in encoding_rs without SSE2 than in uconv (which uses SSE2 for UTF-8 to UTF-16 conversion). Since this is the most performance-sensitive conversion for Firefox startup and for Web performance, without a way to use explicit SIMD code in Rust in Firefox, it seems that landing encoding_rs is blocked. (Solutions could involve explicit SIMD being allowed on release-channel Rust or Firefox allowing the use of nightly-channel Rust.)
Conversion from UTF-8 needs more work, since the current algorithm didn’t outperform Rust standard library, which is why encoding_rs currently delegates to the standard library for UTF-8 to UTF-8 decode, which is why explicit SIMD in encoding_rs makes no difference in the perf numbers for UTF-8 decode.
Conversion from UTF-16 to UTF-8 fails to outperform uconv for many languages. Clearly needs more work even though I thought the code was pretty good.
Legacy encode is expected to be slow in encoding_rs, because unlike the other libraries, encoding_rs uses decode-optimized data structure even on the encoders side in the interest of binary size / initialization speed, since the need for non-ASCII legacy encode occurs only in two places in the Web platform: form submission (between user interaction and waiting for network) and in URL parsing but only for error correction. (It’s unclear if letting the legacy encoders be slow is the right call. Chinese and Korean encoders are currently extremely slow.)
Even though encoding_rs doesn’t have encode-specific data structures and uses the decode-optimized data structures for encode, it still outperforms uconv for single-byte encode except for Thai. (And Russian, but Russian is almost at the same speed.) The reason why Thai is slower is that the search order is the same for all single-byte encodings and the structure of windows-874 differs from the windows-125* series.
EUC-KR decode is really slow in uconv.
Big5 in uconv isn’t like the rest of uconv, because I rewrote it recently. It’s closer to encoding_rs than the rest of uconv in structure.
The ARM numbers show that conversion to and from UTF-8 needs more care. The ALU-based ASCII acceleration code needs more attention, especially on 32-bit systems.
encoding_rs hasn’t undergone CJK lookup table compaction yet. Simply moving Hiragana and Katakana to conversion by range check followed by an offset could bring Japanese legacy encode to par with uconv.
Encoding from UTF-8 to UTF-8 is so fast compared to rust-encoding thanks to there being no copy in the encoding_rs case. (When comparing with glibc, the encoding_rs side is basically a memcpy.)
When decoding from UTF-8 to UTF-8, in comparison with rust-encoding, encoding_rs just validates and borrows, but rust-encoding copies.
Finally, here are the numbers.
Decode | Encode | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
UTF-16 | UTF-8 | UTF-16 | UTF-8 | uconv | ICU | kernel32 | stdlib | rust-encoding | glibc | uconv | ICU | kernel32 | rust-encoding | glibc |
Arabic, UTF-8 | 2.01 | 1.81 | 1.05 | 1.09 | 2.36 | 4.19 | 0.65 | 0.72 | 0.70 | 3657.56 | 116.78 |
Czech, UTF-8 | 2.13 | 2.13 | 1.35 | 1.12 | 3.24 | 5.68 | 0.64 | 0.69 | 0.63 | 9007.22 | 106.67 |
German, UTF-8 | 2.37 | 3.92 | 2.01 | 1.04 | 5.91 | 11.02 | 2.25 | 2.32 | 1.12 | 4260.44 | 79.39 |
Greek, UTF-8 | 2.04 | 2.02 | 1.12 | 1.04 | 2.79 | 4.95 | 0.65 | 0.71 | 0.71 | 4859.56 | 109.65 |
English, UTF-8 | 2.39 | 7.23 | 3.16 | 1.15 | 22.21 | 23.22 | 5.51 | 7.00 | 2.79 | 737.67 | 71.95 |
French, UTF-8 | 2.26 | 3.09 | 1.69 | 1.19 | 5.78 | 8.67 | 1.06 | 1.11 | 0.78 | 14715.00 | 81.61 |
Hebrew, UTF-8 | 2.02 | 1.73 | 1.09 | 1.06 | 2.22 | 3.95 | 0.69 | 0.72 | 0.71 | 8624.33 | 120.95 |
Portuguese, UTF-8 | 2.32 | 3.60 | 1.92 | 1.18 | 6.18 | 9.91 | 1.42 | 1.42 | 0.84 | 5460.33 | 81.75 |
Russian, UTF-8 | 2.06 | 1.89 | 1.09 | 1.04 | 2.51 | 4.42 | 0.69 | 0.71 | 0.73 | 18626.22 | 110.38 |
Thai, UTF-8 | 2.47 | 2.48 | 1.33 | 1.12 | 4.49 | 7.19 | 0.78 | 1.04 | 0.64 | 15117.22 | 72.96 |
Turkish, UTF-8 | 2.10 | 1.97 | 1.31 | 1.09 | 2.89 | 5.15 | 0.70 | 0.74 | 0.66 | 9548.89 | 109.67 |
Vietnamese, UTF-8 | 2.07 | 1.75 | 1.17 | 1.08 | 2.49 | 4.22 | 0.72 | 0.76 | 0.68 | 25492.11 | 147.91 |
Simplified Chinese, UTF-8 | 2.31 | 2.19 | 1.29 | 1.08 | 3.02 | 5.42 | 0.79 | 1.06 | 0.60 | 3422.56 | 75.57 |
Traditional Chinese, UTF-8 | 2.32 | 3.29 | 1.34 | 1.08 | 3.01 | 5.43 | 0.79 | 1.06 | 0.59 | 3408.67 | 76.55 |
Japanese, UTF-8 | 2.44 | 2.10 | 1.27 | 0.99 | 2.58 | 5.00 | 0.78 | 1.08 | 0.61 | 2684.89 | 75.00 |
Korean, UTF-8 | 2.03 | 1.64 | 1.06 | 0.86 | 1.79 | 4.09 | 0.93 | 1.16 | 0.62 | 3751.89 | 106.87 |
Arabic, windows-1256 | 1.57 | 1.20 | 0.81 | 5.37 | 3.93 | 1.43 | 0.14 | 0.02 | 0.33 | 0.44 | |
Czech, windows-1250 | 2.22 | 1.70 | 1.12 | 4.74 | 6.18 | 2.60 | 0.61 | 0.12 | 0.82 | 1.27 | |
German, windows-1252 | 5.42 | 4.14 | 2.74 | 8.81 | 16.01 | 13.19 | 2.69 | 0.67 | 3.32 | 4.75 | |
Greek, windows-1253 | 2.04 | 1.56 | 1.03 | 6.33 | 4.95 | 1.16 | 0.22 | 0.03 | 0.48 | 0.50 | |
English, windows-1252 | 9.18 | 7.02 | 4.66 | 18.23 | 43.34 | 54.12 | 10.94 | 2.86 | 18.34 | 25.10 | |
French, windows-1252 | 3.64 | 2.78 | 1.86 | 9.17 | 10.49 | 7.71 | 1.62 | 0.38 | 2.03 | 2.97 | |
Hebrew, windows-1255 | 1.85 | 1.12 | 0.75 | 5.05 | 4.98 | 1.63 | 0.30 | 0.04 | 0.58 | 0.58 | |
Portuguese, windows-1252 | 4.49 | 3.43 | 2.29 | 7.67 | 13.62 | 10.03 | 2.10 | 0.52 | 2.59 | 3.67 | |
Russian, windows-1251 | 1.60 | 1.22 | 0.81 | 4.15 | 4.04 | 0.98 | 0.27 | 0.04 | 0.55 | 0.54 | |
Thai, windows-874 | 3.13 | 2.40 | 1.58 | 5.64 | 6.85 | 0.50 | 0.08 | 0.01 | 0.22 | 0.21 | |
Turkish, windows-1254 | 2.04 | 1.56 | 1.05 | 6.43 | 5.60 | 3.40 | 0.60 | 0.12 | 0.81 | 1.26 | |
Vietnamese, windows-1258 | 2.31 | 1.77 | 1.19 | 6.83 | 11.97 | 5.74 | 1.12 | 0.25 | 1.43 | 1.74 | |
Simplified Chinese, gb18030 | 3.47 | 2.79 | 4.60 | 6.15 | 4.56 | 0.04 | 0.00 | 0.00 | 0.00 | 0.00 | |
Traditional Chinese, Big5 | 2.87 | 2.17 | 1.59 | 4.43 | 4.12 | 1.32 | 0.01 | 0.00 | 0.01 | 0.02 | |
Japanese, EUC-JP | 4.03 | 3.30 | 6.07 | 5.53 | 0.72 | 0.01 | 0.02 | 0.10 | |||
Japanese, ISO-2022-JP | 1.10 | 2.24 | 2.11 | 1.68 | 0.36 | 0.04 | 0.02 | 0.09 | |||
Japanese, Shift_JIS | 1.85 | 2.10 | 1.47 | 3.73 | 4.06 | 0.58 | 0.01 | 0.01 | 0.03 | 0.03 | |
Korean, EUC-KR | 40.69 | 3.44 | 2.22 | 4.89 | 4.80 | 0.39 | 0.00 | 0.00 | 0.00 | 0.00 | |
x-user-defined | 1.16 | ||||||||||
Arabic, UTF-16LE | 0.68 | 0.32 | 2.04 | 0.94 | |||||||
Czech, UTF-16LE | 0.69 | 0.32 | 1.19 | 0.96 | |||||||
German, UTF-16LE | 0.68 | 0.32 | 1.25 | 0.99 | |||||||
Greek, UTF-16LE | 0.68 | 0.32 | 1.19 | 0.95 | |||||||
English, UTF-16LE | 0.68 | 0.32 | 1.13 | 0.99 | |||||||
French, UTF-16LE | 0.68 | 0.32 | 1.20 | 0.98 | |||||||
Hebrew, UTF-16LE | 0.69 | 0.32 | 1.17 | 0.94 | |||||||
Portuguese, UTF-16LE | 0.68 | 0.32 | 1.12 | 0.99 | |||||||
Russian, UTF-16LE | 0.69 | 0.32 | 1.19 | 0.94 | |||||||
Thai, UTF-16LE | 0.68 | 0.32 | 1.20 | 0.98 | |||||||
Turkish, UTF-16LE | 0.68 | 0.32 | 1.10 | 0.96 | |||||||
Vietnamese, UTF-16LE | 0.69 | 0.32 | 1.09 | 0.94 | |||||||
Simplified Chinese, UTF-16LE | 0.69 | 0.32 | 1.16 | 0.96 | |||||||
Traditional Chinese, UTF-16LE | 0.68 | 0.32 | 1.16 | 0.96 | |||||||
Japanese, UTF-16LE | 0.69 | 0.32 | 1.22 | 0.96 | |||||||
Korean, UTF-16LE | 0.69 | 0.32 | 1.21 | 0.95 | |||||||
Arabic, UTF-16BE | 0.65 | 0.30 | 1.32 | 1.00 | |||||||
Czech, UTF-16BE | 0.64 | 0.30 | 1.17 | 1.02 | |||||||
German, UTF-16BE | 0.65 | 0.30 | 1.23 | 1.02 | |||||||
Greek, UTF-16BE | 0.65 | 0.30 | 1.16 | 1.00 | |||||||
English, UTF-16BE | 0.64 | 0.30 | 1.11 | 1.02 | |||||||
French, UTF-16BE | 0.64 | 0.30 | 1.19 | 1.02 | |||||||
Hebrew, UTF-16BE | 0.64 | 0.30 | 1.16 | 1.00 | |||||||
Portuguese, UTF-16BE | 0.64 | 0.30 | 1.11 | 1.02 | |||||||
Russian, UTF-16BE | 0.64 | 0.30 | 1.18 | 0.99 | |||||||
Thai, UTF-16BE | 0.64 | 0.30 | 1.18 | 1.01 | |||||||
Turkish, UTF-16BE | 0.65 | 0.30 | 1.10 | 1.02 | |||||||
Vietnamese, UTF-16BE | 0.64 | 0.30 | 1.09 | 1.19 | |||||||
Simplified Chinese, UTF-16BE | 0.64 | 0.30 | 1.14 | 1.00 | |||||||
Traditional Chinese, UTF-16BE | 0.65 | 0.30 | 1.14 | 1.00 | |||||||
Japanese, UTF-16BE | 0.65 | 0.30 | 1.15 | 1.00 | |||||||
Korean, UTF-16BE | 0.65 | 0.30 | 1.19 | 0.99 |
Decode | Encode | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
UTF-16 | UTF-8 | UTF-16 | UTF-8 | uconv | ICU | kernel32 | stdlib | rust-encoding | glibc | uconv | ICU | kernel32 | rust-encoding | glibc |
Arabic, UTF-8 | 1.59 | 1.42 | 0.82 | 1.07 | 2.33 | 4.20 | 0.62 | 0.68 | 0.67 | 3657.56 | 117.08 |
Czech, UTF-8 | 1.48 | 1.49 | 0.94 | 1.12 | 3.23 | 5.69 | 0.72 | 0.77 | 0.70 | 9007.22 | 108.03 |
German, UTF-8 | 1.04 | 1.72 | 0.88 | 1.18 | 6.71 | 10.77 | 1.45 | 1.50 | 0.73 | 4260.44 | 80.71 |
Greek, UTF-8 | 1.50 | 1.49 | 0.83 | 1.03 | 2.76 | 4.93 | 0.61 | 0.67 | 0.67 | 4859.56 | 112.11 |
English, UTF-8 | 0.63 | 1.91 | 0.83 | 1.14 | 22.12 | 23.19 | 1.73 | 2.20 | 0.87 | 737.67 | 73.06 |
French, UTF-8 | 1.21 | 1.65 | 0.91 | 1.17 | 5.67 | 8.63 | 1.02 | 1.07 | 0.75 | 14715.00 | 81.77 |
Hebrew, UTF-8 | 1.62 | 1.39 | 0.87 | 1.06 | 2.21 | 3.93 | 0.65 | 0.68 | 0.67 | 8624.33 | 121.95 |
Portuguese, UTF-8 | 1.09 | 1.70 | 0.91 | 1.16 | 6.06 | 9.85 | 1.19 | 1.19 | 0.71 | 5460.33 | 81.64 |
Russian, UTF-8 | 1.58 | 1.45 | 0.84 | 1.04 | 2.50 | 4.38 | 0.65 | 0.66 | 0.67 | 18626.22 | 106.78 |
Thai, UTF-8 | 1.63 | 1.64 | 0.88 | 1.07 | 4.31 | 6.94 | 0.71 | 0.94 | 0.58 | 15117.22 | 70.17 |
Turkish, UTF-8 | 1.55 | 1.45 | 0.96 | 1.08 | 2.87 | 5.14 | 0.79 | 0.84 | 0.75 | 9548.89 | 109.61 |
Vietnamese, UTF-8 | 1.59 | 1.34 | 0.89 | 1.07 | 2.47 | 4.21 | 0.71 | 0.75 | 0.67 | 25492.11 | 150.07 |
Simplified Chinese, UTF-8 | 1.60 | 1.51 | 0.89 | 1.09 | 3.03 | 5.45 | 0.72 | 0.96 | 0.54 | 3422.56 | 77.90 |
Traditional Chinese, UTF-8 | 1.61 | 2.28 | 0.93 | 0.80 | 2.22 | 5.46 | 0.72 | 0.96 | 0.54 | 3408.67 | 76.91 |
Japanese, UTF-8 | 1.80 | 1.55 | 0.94 | 1.07 | 2.78 | 5.07 | 0.70 | 0.98 | 0.55 | 2684.89 | 75.96 |
Korean, UTF-8 | 1.54 | 1.24 | 0.81 | 1.07 | 2.23 | 4.08 | 0.85 | 1.05 | 0.56 | 3751.89 | 113.51 |
Arabic, windows-1256 | 1.11 | 0.85 | 0.57 | 4.13 | 3.02 | 1.42 | 0.14 | 0.02 | 0.34 | 0.45 | |
Czech, windows-1250 | 1.45 | 1.11 | 0.73 | 3.45 | 4.07 | 2.76 | 0.65 | 0.12 | 0.83 | 1.29 | |
German, windows-1252 | 1.95 | 1.49 | 0.99 | 4.48 | 5.36 | 10.24 | 2.09 | 0.52 | 2.47 | 3.53 | |
Greek, windows-1253 | 1.28 | 0.98 | 0.65 | 4.34 | 3.40 | 1.16 | 0.22 | 0.03 | 0.49 | 0.52 | |
English, windows-1252 | 2.14 | 1.63 | 1.09 | 5.81 | 6.86 | 19.19 | 3.88 | 1.01 | 4.34 | 5.83 | |
French, windows-1252 | 1.78 | 1.36 | 0.91 | 4.91 | 4.90 | 7.49 | 1.58 | 0.37 | 1.94 | 2.84 | |
Hebrew, windows-1255 | 1.35 | 0.82 | 0.55 | 3.93 | 3.88 | 1.63 | 0.30 | 0.04 | 0.61 | 0.61 | |
Portuguese, windows-1252 | 1.88 | 1.44 | 0.96 | 4.10 | 5.20 | 8.86 | 1.85 | 0.46 | 2.25 | 3.18 | |
Russian, windows-1251 | 1.13 | 0.86 | 0.57 | 3.31 | 3.04 | 0.98 | 0.27 | 0.04 | 0.57 | 0.57 | |
Thai, windows-874 | 1.61 | 1.23 | 0.81 | 3.82 | 4.00 | 0.46 | 0.07 | 0.01 | 0.22 | 0.21 | |
Turkish, windows-1254 | 1.41 | 1.07 | 0.72 | 4.46 | 3.88 | 3.58 | 0.63 | 0.13 | 0.83 | 1.28 | |
Vietnamese, windows-1258 | 1.50 | 1.14 | 0.77 | 4.43 | 7.76 | 6.12 | 1.19 | 0.27 | 1.47 | 1.79 | |
Simplified Chinese, gb18030 | 2.26 | 1.82 | 3.00 | 4.34 | 3.29 | 0.04 | 0.00 | 0.00 | 0.00 | 0.00 | |
Traditional Chinese, Big5 | 2.13 | 1.62 | 1.18 | 3.48 | 3.22 | 1.31 | 0.01 | 0.00 | 0.01 | 0.02 | |
Japanese, EUC-JP | 2.73 | 2.24 | 4.38 | 3.94 | 0.72 | 0.01 | 0.02 | 0.10 | |||
Japanese, ISO-2022-JP | 1.03 | 2.11 | 2.12 | 1.69 | 0.36 | 0.04 | 0.02 | 0.09 | |||
Japanese, Shift_JIS | 1.47 | 1.66 | 1.17 | 3.67 | 3.21 | 0.58 | 0.01 | 0.01 | 0.03 | 0.03 | |
Korean, EUC-KR | 26.69 | 2.26 | 1.46 | 3.38 | 3.19 | 0.35 | 0.00 | 0.00 | 0.00 | 0.00 | |
x-user-defined | 1.16 | ||||||||||
Arabic, UTF-16LE | 0.68 | 0.32 | 1.99 | 0.91 | |||||||
Czech, UTF-16LE | 0.69 | 0.32 | 1.18 | 0.96 | |||||||
German, UTF-16LE | 0.68 | 0.32 | 1.24 | 0.98 | |||||||
Greek, UTF-16LE | 0.68 | 0.32 | 1.16 | 0.93 | |||||||
English, UTF-16LE | 0.68 | 0.32 | 1.12 | 0.99 | |||||||
French, UTF-16LE | 0.68 | 0.32 | 1.19 | 0.98 | |||||||
Hebrew, UTF-16LE | 0.68 | 0.32 | 1.15 | 0.92 | |||||||
Portuguese, UTF-16LE | 0.68 | 0.32 | 1.11 | 0.98 | |||||||
Russian, UTF-16LE | 0.68 | 0.32 | 1.16 | 0.92 | |||||||
Thai, UTF-16LE | 0.67 | 0.32 | 1.19 | 0.97 | |||||||
Turkish, UTF-16LE | 0.68 | 0.32 | 1.10 | 0.95 | |||||||
Vietnamese, UTF-16LE | 0.68 | 0.32 | 1.09 | 0.94 | |||||||
Simplified Chinese, UTF-16LE | 0.68 | 0.32 | 1.15 | 0.95 | |||||||
Traditional Chinese, UTF-16LE | 0.68 | 0.32 | 1.15 | 0.95 | |||||||
Japanese, UTF-16LE | 0.54 | 0.25 | 1.21 | 0.95 | |||||||
Korean, UTF-16LE | 0.68 | 0.32 | 1.20 | 0.94 | |||||||
Arabic, UTF-16BE | 0.65 | 0.30 | 1.30 | 0.98 | |||||||
Czech, UTF-16BE | 0.64 | 0.30 | 1.14 | 1.00 | |||||||
German, UTF-16BE | 0.65 | 0.30 | 1.21 | 1.01 | |||||||
Greek, UTF-16BE | 0.65 | 0.30 | 1.14 | 0.99 | |||||||
English, UTF-16BE | 0.64 | 0.30 | 1.10 | 1.02 | |||||||
French, UTF-16BE | 0.64 | 0.30 | 1.17 | 1.00 | |||||||
Hebrew, UTF-16BE | 0.64 | 0.30 | 1.14 | 0.98 | |||||||
Portuguese, UTF-16BE | 0.64 | 0.30 | 1.09 | 1.01 | |||||||
Russian, UTF-16BE | 0.64 | 0.30 | 1.16 | 0.98 | |||||||
Thai, UTF-16BE | 0.64 | 0.30 | 1.16 | 1.00 | |||||||
Turkish, UTF-16BE | 0.65 | 0.30 | 1.08 | 1.00 | |||||||
Vietnamese, UTF-16BE | 0.64 | 0.30 | 1.07 | 1.17 | |||||||
Simplified Chinese, UTF-16BE | 0.64 | 0.30 | 1.13 | 1.00 | |||||||
Traditional Chinese, UTF-16BE | 0.65 | 0.30 | 1.13 | 1.00 | |||||||
Japanese, UTF-16BE | 0.65 | 0.30 | 1.14 | 0.99 | |||||||
Korean, UTF-16BE | 0.65 | 0.30 | 1.18 | 0.99 |
Decode | Encode | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
UTF-16 | UTF-8 | UTF-16 | UTF-8 | uconv | ICU | kernel32 | stdlib | rust-encoding | glibc | uconv | ICU | kernel32 | rust-encoding | glibc |
Arabic, UTF-8 | 1.21 | 1.45 | 1.00 | 4.80 | 5.33 | 0.65 | 0.69 | 3147.96 | 119.19 | ||
Czech, UTF-8 | 0.76 | 0.89 | 1.00 | 7.84 | 7.79 | 1.02 | 0.85 | 5926.37 | 104.63 | ||
German, UTF-8 | 0.64 | 0.86 | 1.00 | 13.45 | 11.42 | 1.76 | 2.73 | 11320.06 | 96.01 | ||
Greek, UTF-8 | 1.09 | 1.34 | 1.00 | 5.76 | 6.10 | 0.64 | 1.38 | 8810.62 | 120.38 | ||
English, UTF-8 | 0.58 | 0.86 | 0.99 | 20.58 | 13.35 | 1.96 | 3.06 | 4057.53 | 90.11 | ||
French, UTF-8 | 0.65 | 0.85 | 1.00 | 12.20 | 9.37 | 1.50 | 1.17 | 12590.74 | 47.34 | ||
Hebrew, UTF-8 | 1.16 | 1.38 | 1.00 | 4.73 | 5.21 | 0.66 | 0.70 | 6779.61 | 58.82 | ||
Portuguese, UTF-8 | 0.63 | 0.87 | 1.00 | 13.57 | 10.64 | 1.60 | 1.25 | 5584.95 | 97.60 | ||
Russian, UTF-8 | 1.18 | 1.46 | 1.00 | 5.37 | 5.42 | 0.64 | 0.69 | 15840.13 | 116.02 | ||
Thai, UTF-8 | 1.18 | 1.26 | 1.00 | 6.78 | 3.29 | 0.66 | 1.06 | 22381.14 | 98.15 | ||
Turkish, UTF-8 | 0.78 | 0.90 | 1.01 | 7.23 | 7.03 | 1.07 | 0.89 | 6750.38 | 104.90 | ||
Vietnamese, UTF-8 | 0.91 | 0.98 | 1.00 | 5.76 | 5.50 | 0.82 | 0.78 | 13237.81 | 112.05 | ||
Simplified Chinese, UTF-8 | 1.04 | 1.11 | 1.00 | 5.90 | 12.15 | 0.67 | 1.06 | 4373.86 | 104.74 | ||
Traditional Chinese, UTF-8 | 2.03 | 1.09 | 1.00 | 5.91 | 6.11 | 0.67 | 1.06 | 4373.17 | 102.60 | ||
Japanese, UTF-8 | 2.37 | 1.20 | 1.00 | 5.05 | 5.34 | 0.66 | 1.06 | 4519.20 | 102.65 | ||
Korean, UTF-8 | 1.13 | 1.05 | 1.00 | 4.80 | 5.33 | 0.70 | 1.02 | 2906.59 | 109.17 | ||
Arabic, windows-1256 | 1.03 | 0.79 | 5.16 | 3.10 | 2.27 | 0.19 | 0.56 | 0.66 | |||
Czech, windows-1250 | 1.51 | 1.17 | 3.87 | 4.11 | 4.88 | 0.66 | 1.30 | 1.85 | |||
German, windows-1252 | 1.73 | 1.34 | 4.04 | 4.75 | 13.41 | 3.36 | 5.71 | 4.58 | |||
Greek, windows-1253 | 1.12 | 0.86 | 5.34 | 3.34 | 2.20 | 0.58 | 1.71 | 0.76 | |||
English, windows-1252 | 1.68 | 1.30 | 4.61 | 4.95 | 17.93 | 4.41 | 3.63 | 5.84 | |||
French, windows-1252 | 1.65 | 1.25 | 4.31 | 4.56 | 11.21 | 1.43 | 2.52 | 1.96 | |||
Hebrew, windows-1255 | 1.04 | 0.80 | 5.16 | 3.68 | 3.35 | 0.37 | 1.04 | 0.81 | |||
Portuguese, windows-1252 | 1.65 | 1.28 | 4.51 | 4.65 | 13.03 | 1.65 | 2.84 | 4.47 | |||
Russian, windows-1251 | 1.05 | 0.81 | 4.94 | 3.11 | 1.84 | 0.34 | 0.97 | 0.85 | |||
Thai, windows-874 | 1.26 | 0.97 | 4.69 | 3.82 | 0.90 | 0.11 | 0.37 | 0.34 | |||
Turkish, windows-1254 | 1.47 | 1.13 | 4.62 | 4.09 | 5.49 | 0.63 | 1.25 | 1.79 | |||
Vietnamese, windows-1258 | 1.52 | 1.18 | 5.10 | 5.55 | 10.21 | 1.22 | 2.13 | 2.18 | |||
Simplified Chinese, gb18030 | 1.99 | 1.71 | 4.40 | 2.38 | 0.05 | 0.00 | 0.00 | 0.00 | |||
Traditional Chinese, Big5 | 2.64 | 1.56 | 4.05 | 2.71 | 0.85 | 0.00 | 0.01 | 0.02 | |||
Japanese, EUC-JP | 2.29 | 1.75 | 4.08 | 2.57 | 0.41 | 0.01 | 0.02 | 0.10 | |||
Japanese, ISO-2022-JP | 0.68 | 1.54 | 2.01 | 1.14 | 0.23 | 0.03 | 0.02 | 0.14 | |||
Japanese, Shift_JIS | 1.57 | 1.48 | 4.25 | 2.43 | 0.24 | 0.00 | 0.02 | 0.01 | |||
Korean, EUC-KR | 26.55 | 1.85 | 3.90 | 2.48 | 0.25 | 0.00 | 0.00 | 0.00 | |||
x-user-defined | 1.43 | ||||||||||
Arabic, UTF-16LE | 0.47 | 0.30 | 1.34 | 0.63 | |||||||
Czech, UTF-16LE | 0.47 | 0.30 | 1.19 | 0.58 | |||||||
German, UTF-16LE | 0.47 | 0.30 | 0.95 | 0.58 | |||||||
Greek, UTF-16LE | 0.93 | 0.30 | 1.24 | 0.62 | |||||||
English, UTF-16LE | 0.47 | 0.31 | 1.00 | 0.57 | |||||||
French, UTF-16LE | 0.47 | 0.30 | 1.00 | 0.58 | |||||||
Hebrew, UTF-16LE | 0.47 | 0.30 | 1.30 | 0.63 | |||||||
Portuguese, UTF-16LE | 0.47 | 0.31 | 0.98 | 0.58 | |||||||
Russian, UTF-16LE | 0.47 | 0.31 | 1.37 | 0.63 | |||||||
Thai, UTF-16LE | 0.47 | 0.31 | 1.23 | 0.63 | |||||||
Turkish, UTF-16LE | 0.47 | 0.30 | 0.99 | 0.59 | |||||||
Vietnamese, UTF-16LE | 0.47 | 0.31 | 1.06 | 0.59 | |||||||
Simplified Chinese, UTF-16LE | 0.47 | 0.30 | 1.11 | 0.61 | |||||||
Traditional Chinese, UTF-16LE | 0.24 | 0.15 | 1.11 | 0.61 | |||||||
Japanese, UTF-16LE | 0.47 | 0.30 | 1.19 | 0.62 | |||||||
Korean, UTF-16LE | 0.70 | 0.30 | 1.18 | 0.62 | |||||||
Arabic, UTF-16BE | 0.53 | 0.27 | 1.28 | 0.79 | |||||||
Czech, UTF-16BE | 0.53 | 0.28 | 0.92 | 0.74 | |||||||
German, UTF-16BE | 0.53 | 0.27 | 0.87 | 0.73 | |||||||
Greek, UTF-16BE | 0.27 | 0.14 | 1.18 | 0.77 | |||||||
English, UTF-16BE | 0.53 | 0.28 | 0.93 | 0.73 | |||||||
French, UTF-16BE | 0.53 | 0.28 | 0.93 | 0.73 | |||||||
Hebrew, UTF-16BE | 0.53 | 0.27 | 2.49 | 0.39 | |||||||
Portuguese, UTF-16BE | 0.53 | 0.28 | 0.91 | 0.73 | |||||||
Russian, UTF-16BE | 0.53 | 0.28 | 1.31 | 0.78 | |||||||
Thai, UTF-16BE | 0.53 | 0.28 | 1.16 | 0.78 | |||||||
Turkish, UTF-16BE | 0.53 | 0.27 | 0.92 | 0.74 | |||||||
Vietnamese, UTF-16BE | 0.53 | 0.28 | 0.99 | 0.75 | |||||||
Simplified Chinese, UTF-16BE | 0.53 | 0.28 | 1.04 | 0.76 | |||||||
Traditional Chinese, UTF-16BE | 0.53 | 0.27 | 1.04 | 0.76 | |||||||
Japanese, UTF-16BE | 0.53 | 0.27 | 1.12 | 0.77 | |||||||
Korean, UTF-16BE | 0.52 | 0.26 | 1.11 | 0.76 |