Commit eb12788
committed
t3900: ISO-2022-JP has more than one popular variants
When converting from other encodings (e.g. EUC-JP or UTF-8), there are
subtly different variants of ISO-2022-JP, all of which are valid. At the
end of line or when a run of string switches to 1-byte sequence, ESC ( B
can be used to switch to ASCII or ESC ( J can be used to switch to ISO
646:JP (JIS X 0201) but they essentially are the same character set and
are used interchangeably. Similarly the set ESC $ @ switches to (JIS X
0208-1978) and ESC $ B switches to (JIS X 0208-1983) are in practice used
interchangeably.
Depending on the iconv library and the locale definition on the system, a
program that converts from another encoding to ISO-2022-JP can produce
different byte sequence, and GIT_TEST_CMP (aka "diff -u") will report the
difference as a failure.
Fix this by converting the expected and the actual output to UTF-8 before
comparing when the end result is ISO-2022-JP. The test vector string in
t3900/ISO-2022-JP.txt is expressed with ASCII and JIS X 0208-1983, but it
can be expressed with any other possible variant, and when converted back
to UTF-8, these variants produce identical byte sequences.
Signed-off-by: Junio C Hamano <gitster@pobox.com>1 parent 6345d7a commit eb12788
1 file changed
+16
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
13 | 21 | | |
14 | 22 | | |
15 | 23 | | |
| |||
103 | 111 | | |
104 | 112 | | |
105 | 113 | | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
106 | 120 | | |
107 | 121 | | |
108 | 122 | | |
109 | 123 | | |
110 | | - | |
| 124 | + | |
111 | 125 | | |
112 | 126 | | |
113 | 127 | | |
| |||
0 commit comments