Skip to content

Commit 0d00dec

Browse files
cyb70289Pindikura Ravindra
authored andcommitted
ARROW-7404: [C++][Gandiva] Fix utf8 char length error on Arm64
Current code checks if a UTF-8 eight-bit code unit is within 0x00~0x7F by "if (c >= 0)", where c is defined as "char". This checking assumes char is always signed, which is not true[1]. On Arm64, char is unsigned by default and causes some Gandiva unit tests fail. Fix it by casting to "signed char" explicitly. [1] Cited from https://en.cppreference.com/w/cpp/language/types The signedness of char depends on the compiler and the target platform: the defaults for ARM and PowerPC are typically unsigned, the defaults for x86 and x64 are typically signed. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Closes apache#6043 from cyb70289/utf8_char_len and squashes the following commits: a18f43d <Yibo Cai> ARROW-7404: Fix utf8 char length error on Arm64 Authored-by: Yibo Cai <yibo.cai@arm.com> Signed-off-by: Pindikura Ravindra <ravindra@dremio.com>
1 parent bce0899 commit 0d00dec

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

cpp/src/gandiva/precompiled/string_ops.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ bool ends_with_utf8_utf8(const char* data, int32 data_len, const char* suffix,
9292

9393
FORCE_INLINE
9494
int32 utf8_char_length(char c) {
95-
if (c >= 0) { // 1-byte char
95+
if ((signed char)c >= 0) { // 1-byte char (0x00 ~ 0x7F)
9696
return 1;
9797
} else if ((c & 0xE0) == 0xC0) { // 2-byte char
9898
return 2;

0 commit comments

Comments
 (0)