Skip to content

Commit 0e7696c

Browse files
dschogitster
authored andcommitted
ci: disallow directional formatting
As described in https://trojansource.codes/trojan-source.pdf, it is possible to abuse directional formatting (a feature of Unicode) to deceive human readers into interpreting code differently from compilers. For example, an "if ()" expression could be enclosed in a comment, but rendered as if it was outside of that comment. In effect, this could fool a reviewer into misinterpreting the code flow as benign when it is not. It is highly unlikely that Git's source code wants to contain such directional formatting in the first place, so let's just disallow it. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent af6d1d6 commit 0e7696c

File tree

2 files changed

+28
-0
lines changed

2 files changed

+28
-0
lines changed

.github/workflows/main.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,7 @@ jobs:
287287
- uses: actions/checkout@v2
288288
- run: ci/install-dependencies.sh
289289
- run: ci/run-static-analysis.sh
290+
- run: ci/check-directional-formatting.bash
290291
sparse:
291292
needs: ci-config
292293
if: needs.ci-config.outputs.enabled == 'yes'
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/bin/bash
2+
3+
# This script verifies that the non-binary files tracked in the Git index do
4+
# not contain any Unicode directional formatting: such formatting could be used
5+
# to deceive reviewers into interpreting code differently from the compiler.
6+
# This is intended to run on an Ubuntu agent in a GitHub workflow.
7+
#
8+
# To allow translated messages to introduce such directional formatting in the
9+
# future, we exclude the `.po` files from this validation.
10+
#
11+
# Neither GNU grep nor `git grep` (not even with `-P`) handle `\u` as a way to
12+
# specify UTF-8.
13+
#
14+
# To work around that, we use `printf` to produce the pattern as a byte
15+
# sequence, and then feed that to `git grep` as a byte sequence (setting
16+
# `LC_CTYPE` to make sure that the arguments are interpreted as intended).
17+
#
18+
# Note: we need to use Bash here because its `printf` interprets `\uNNNN` as
19+
# UTF-8 code points, as desired. Running this script through Ubuntu's `dash`,
20+
# for example, would use a `printf` that does not understand that syntax.
21+
22+
# U+202a..U+2a2e: LRE, RLE, PDF, LRO and RLO
23+
# U+2066..U+2069: LRI, RLI, FSI and PDI
24+
regex='(\u202a|\u202b|\u202c|\u202d|\u202e|\u2066|\u2067|\u2068|\u2069)'
25+
26+
! LC_CTYPE=C git grep -El "$(LC_CTYPE=C.UTF-8 printf "$regex")" \
27+
-- ':(exclude,attr:binary)' ':(exclude)*.po'

0 commit comments

Comments
 (0)