Skip to content

Commit 4e2443b

Browse files
avargitster
authored andcommitted
log tests: test regex backends in "--encode=<enc>" tests
Improve the tests added in 04deccd ("log: re-encode commit messages before grepping", 2013-02-11) to test the regex backends. Those tests never worked as advertised, due to the is_fixed() optimization in grep.c (which was in place at the time), and the needle in the tests being a fixed string. We'd thus always use the "fixed" backend during the tests, which would use the kwset() backend. This backend liberally accepts any garbage input, so invalid encodings would be silently accepted. In a follow-up commit we'll fix this bug, this test just demonstrates the existing issue. In practice this issue happened on Windows, see [1], but due to the structure of the existing tests & how liberal the kwset code is about garbage we missed this. Cover this blind spot by testing all our regex engines. The PCRE backend will spot these invalid encodings. It's possible that this test breaks the "basic" and "extended" backends on some systems that are more anal than glibc about the encoding of locale issues with POSIX functions that I can remember, but PCRE is more careful about the validation. 1. https://public-inbox.org/git/nycvar.QRO.7.76.6.1906271113090.44@tvgsbejvaqbjf.bet/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent 8dca754 commit 4e2443b

File tree

1 file changed

+40
-1
lines changed

1 file changed

+40
-1
lines changed

t/t4210-log-i18n.sh

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
11
#!/bin/sh
22

33
test_description='test log with i18n features'
4-
. ./test-lib.sh
4+
. ./lib-gettext.sh
55

66
# two forms of é
77
utf8_e=$(printf '\303\251')
88
latin1_e=$(printf '\351')
99

10+
# invalid UTF-8
11+
invalid_e=$(printf '\303\50)') # ")" at end to close opening "("
12+
1013
test_expect_success 'create commits in different encodings' '
1114
test_tick &&
1215
cat >msg <<-EOF &&
@@ -53,4 +56,40 @@ test_expect_success 'log --grep does not find non-reencoded values (latin1)' '
5356
test_must_be_empty actual
5457
'
5558

59+
for engine in fixed basic extended perl
60+
do
61+
prereq=
62+
result=success
63+
if test $engine = "perl"
64+
then
65+
result=failure
66+
prereq="PCRE"
67+
else
68+
prereq=""
69+
fi
70+
force_regex=
71+
if test $engine != "fixed"
72+
then
73+
force_regex=.*
74+
fi
75+
test_expect_$result GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not find non-reencoded values (latin1 + locale)" "
76+
cat >expect <<-\EOF &&
77+
latin1
78+
utf8
79+
EOF
80+
LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$latin1_e\" >actual &&
81+
test_cmp expect actual
82+
"
83+
84+
test_expect_success GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not find non-reencoded values (latin1 + locale)" "
85+
LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$utf8_e\" >actual &&
86+
test_must_be_empty actual
87+
"
88+
89+
test_expect_$result GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not die on invalid UTF-8 value (latin1 + locale + invalid needle)" "
90+
LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$invalid_e\" >actual &&
91+
test_must_be_empty actual
92+
"
93+
done
94+
5695
test_done

0 commit comments

Comments
 (0)