Commit 0d00dec
ARROW-7404: [C++][Gandiva] Fix utf8 char length error on Arm64
Current code checks if a UTF-8 eight-bit code unit is within 0x00~0x7F
by "if (c >= 0)", where c is defined as "char". This checking assumes
char is always signed, which is not true[1]. On Arm64, char is unsigned
by default and causes some Gandiva unit tests fail.
Fix it by casting to "signed char" explicitly.
[1] Cited from https://en.cppreference.com/w/cpp/language/types
The signedness of char depends on the compiler and the target platform:
the defaults for ARM and PowerPC are typically unsigned, the defaults
for x86 and x64 are typically signed.
Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Closes apache#6043 from cyb70289/utf8_char_len and squashes the following commits:
a18f43d <Yibo Cai> ARROW-7404: Fix utf8 char length error on Arm64
Authored-by: Yibo Cai <yibo.cai@arm.com>
Signed-off-by: Pindikura Ravindra <ravindra@dremio.com>1 parent bce0899 commit 0d00dec
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | | - | |
| 95 | + | |
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| |||
0 commit comments