Skip to content

Commit bd16b3c

Browse files
committed
Merge branch 'js/ci-no-directional-formatting'
CI has been taught to catch some Unicode directional formatting sequence that can be used in certain mischief. * js/ci-no-directional-formatting: ci: disallow directional formatting
2 parents 3d2dce1 + 0e7696c commit bd16b3c

File tree

2 files changed

+28
-0
lines changed

2 files changed

+28
-0
lines changed

.github/workflows/main.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,7 @@ jobs:
289289
- uses: actions/checkout@v2
290290
- run: ci/install-dependencies.sh
291291
- run: ci/run-static-analysis.sh
292+
- run: ci/check-directional-formatting.bash
292293
sparse:
293294
needs: ci-config
294295
if: needs.ci-config.outputs.enabled == 'yes'
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/bin/bash
2+
3+
# This script verifies that the non-binary files tracked in the Git index do
4+
# not contain any Unicode directional formatting: such formatting could be used
5+
# to deceive reviewers into interpreting code differently from the compiler.
6+
# This is intended to run on an Ubuntu agent in a GitHub workflow.
7+
#
8+
# To allow translated messages to introduce such directional formatting in the
9+
# future, we exclude the `.po` files from this validation.
10+
#
11+
# Neither GNU grep nor `git grep` (not even with `-P`) handle `\u` as a way to
12+
# specify UTF-8.
13+
#
14+
# To work around that, we use `printf` to produce the pattern as a byte
15+
# sequence, and then feed that to `git grep` as a byte sequence (setting
16+
# `LC_CTYPE` to make sure that the arguments are interpreted as intended).
17+
#
18+
# Note: we need to use Bash here because its `printf` interprets `\uNNNN` as
19+
# UTF-8 code points, as desired. Running this script through Ubuntu's `dash`,
20+
# for example, would use a `printf` that does not understand that syntax.
21+
22+
# U+202a..U+2a2e: LRE, RLE, PDF, LRO and RLO
23+
# U+2066..U+2069: LRI, RLI, FSI and PDI
24+
regex='(\u202a|\u202b|\u202c|\u202d|\u202e|\u2066|\u2067|\u2068|\u2069)'
25+
26+
! LC_CTYPE=C git grep -El "$(LC_CTYPE=C.UTF-8 printf "$regex")" \
27+
-- ':(exclude,attr:binary)' ':(exclude)*.po'

0 commit comments

Comments
 (0)