Skip to content

Conversation

@edmorley
Copy link
Member

@edmorley edmorley commented Oct 24, 2025

Looking through errors in Honeycomb I noticed some of the instances of builds failing due to an invalid Python version (in either their .python-version or runtime.txt file) were due to leading/trailing invisible Unicode characters such as zero width space / zero width non-breaking space.

Previously the error message for those was hard to understand, since the invisible nature of those characters mean they don't show up in the message, eg:

 !     Error: Invalid Python version in runtime.txt.
 !
 !     The Python version specified in your runtime.txt file isn't
 !     in the correct format.
 !
 !     The following file contents were found, which aren't valid:
 !     python-3.11.9
...

The reason for this is that sed uses the current locale, and so the [:space:] character class used by the sed call in utils::read_file_with_special_chars_substituted() was previously matching Unicode whitespace. (In contrast, the Bash extended regex used when parsing the read contents does not.)

After this change, the unicode replacement character will now be shown (like it already is for other Unicode non-whitespace characters and ASCII control characters), eg:

 !     The following file contents were found, which aren't valid:
 !     ���python-3.11.9

See:
https://www.gnu.org/software/sed/manual/html_node/Locale-Considerations.html

GUS-W-20034162.

Looking through errors in Honeycomb I noticed some of the instances
of builds failing due to an invalid Python version (in either their
`.python-version` or `runtime.txt` file) were due to leading/trailing
invisible Unicode characters such as zero width space / zero width
non-breaking space.

Previously the error message for those was hard to understand, since
the invisible nature of those characters mean they don't show up in
the message, eg:

```
 !     Error: Invalid Python version in runtime.txt.
 !
 !     The Python version specified in your runtime.txt file isn't
 !     in the correct format.
 !
 !     The following file contents were found, which aren't valid:
 !     python-3.11.9
...
```

The reason for this is that `sed` uses the current locale, and so
the `[:space:]` character class used by the sed call in
`utils::read_file_with_special_chars_substituted()` was previously
matching Unicode whitespace. (In contrast, the Bash extended regex
used when parsing the read contents does not.)

After this change, the unicode replacement character will now be
shown (like it already is for other Unicode non-whitespace characters
and ASCII control characters), eg:

```
...
 !     The following file contents were found, which aren't valid:
 !     ���python-3.11.9
...
```

See:
https://www.gnu.org/software/sed/manual/html_node/Locale-Considerations.html

GUS-W-20034162.
@edmorley edmorley self-assigned this Oct 24, 2025
@edmorley edmorley changed the title Improve Python version error message for Unicode whitespace Improve Python version error message for invisible Unicode whitespace Oct 24, 2025
@edmorley edmorley marked this pull request as ready for review October 24, 2025 13:03
@edmorley edmorley requested a review from a team as a code owner October 24, 2025 13:03
@edmorley edmorley enabled auto-merge (squash) October 24, 2025 13:03
@edmorley edmorley merged commit 3b25a5b into main Oct 24, 2025
5 of 6 checks passed
@edmorley edmorley deleted the read_file_locale branch October 24, 2025 13:22
@heroku-linguist heroku-linguist bot mentioned this pull request Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants