Skip to content

Commit 94209ab

Browse files
authored
Further improve .python-version and runtime.txt error handling (#1962)
Some further improvements on-top of those made in #1958, based on build error message scenarios seen in Honeycomb. Now, if `.python-version` contains a single ESC control code (which gets categorised as "very short file"), or contains any of the ASCI control codes that result in the file being categorised as "data" (such as NUL), then the "invalid python version" error message variant is shown instead of the "invalid text encoding" variant. In addition, any NULs in the file are substituted with a placeholder to avoid this Bash warning: ``` /tmp/buildpack/lib/python_version.sh: line 104: warning: command substitution: ignored null byte in input ``` See: https://manpages.ubuntu.com/manpages/noble/en/man1/file.1.html GUS-W-20220514.
1 parent 9ca9559 commit 94209ab

File tree

4 files changed

+19
-14
lines changed

4 files changed

+19
-14
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
## [Unreleased]
44

55
- Updated uv from 0.9.7 to 0.9.9. ([#1961](https://github.com/heroku/heroku-buildpack-python/pull/1961))
6+
- Improved the error message shown for `.python-version` files that contain unexpected ASCII control code characters. ([#1962](https://github.com/heroku/heroku-buildpack-python/pull/1962))
7+
- Fixed Bash command substitution warnings from being shown if `runtime.txt` contains null byte characters. ([#1962](https://github.com/heroku/heroku-buildpack-python/pull/1962))
68

79
## [v318] - 2025-11-12
810

lib/python_version.sh

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -152,28 +152,28 @@ function python_version::parse_python_version_file() {
152152
continue
153153
fi
154154

155-
# If we didn't find a valid Python version string, we check the file encoding so that we
155+
# If we didn't find a valid Python version string, we check the text encoding so that we
156156
# can display a more helpful error message if it turns out that the version was valid but
157-
# that the file was just saved in the wrong encoding.
157+
# that the file was just saved in the wrong encoding (such as UTF-8 with BOM or UTF-16).
158158
#
159-
# Example valid values:
159+
# Example values `file` can return:
160160
# `ASCII text`
161161
# `ASCII text, with CRLF line terminators`
162162
# `ASCII text, with no line terminators`
163163
# `Unicode text, UTF-8 text`
164-
#
165-
# Example invalid values:
166164
# `Unicode text, UTF-8 (with BOM) text`
167165
# `Unicode text, UTF-16, little-endian text, with CRLF line terminators`
168-
# `data` (for example when NUL or CTRL characters found)
166+
# `data` (such as when the file contains a NUL or other control code characters)
167+
# `very short file (no magic)` (such as when the file contains a single ESC character)
169168
#
170-
# Note: File can also return `very short file (no magic)` (eg a file that contains just a newline)
171-
# and `empty`, but we won't see those here since we're iterating over trimmed lines.
169+
# Note: File can also return `empty` but in that case we wouldn't be iterating over found lines.
172170
local file_encoding
173-
file_encoding="$(file --brief --dereference "${python_version_file_path}")"
171+
# We exclude some file type tests to avoid false positives, since we only need the encoding.
172+
file_encoding="$(file --brief --dereference --exclude json --exclude soft "${python_version_file_path}")"
174173

175174
case "${file_encoding}" in
176-
*"ASCII text"* | *"UTF-8 text"*)
175+
# Cases where the text encoding isn't the issue, and so the version itself must be invalid.
176+
*"ASCII text"* | *"UTF-8 text"* | *"very short file"* | "data")
177177
# Replace everything but printable ASCII, spaces and tabs with the Unicode replacement
178178
# character, so any invisible unwanted characters (such as ASCII control codes or the
179179
# Unicode zero width space character) are visible in the error message.
@@ -211,19 +211,20 @@ function python_version::parse_python_version_file() {
211211
build_data::set_string "failure_detail" "${version:0:100}"
212212
exit 1
213213
;;
214+
# Unsupported text encodings such as UTF-8 with BOM or UTF-16.
214215
*)
215216
output::error <<-EOF
216217
Error: Unable to read .python-version.
217218
218219
Your .python-version file couldn't be read because it's using
219-
an unsupported file encoding:
220+
an unsupported text encoding:
220221
${file_encoding}
221222
222223
Configure your editor to save files as UTF-8, without a BOM,
223224
then delete and recreate the file using the correct encoding.
224225
225226
If that doesn't work, make sure you don't have a .gitattributes
226-
file that's overriding the file encoding.
227+
file that's overriding the text encoding.
227228
228229
Note: On Windows, if you pipe or redirect output to a file
229230
it can result in the file being encoded in UTF-16 LE when
@@ -290,12 +291,14 @@ function python_version::parse_python_version_file() {
290291

291292
# Outputs all populated (non-empty and not commented with '#') lines from the passed file,
292293
# with leading/trailing whitespace (including Unicode whitespace) trimmed from each line.
294+
# We replace any NUL characters with a placeholder since Bash variables can't store them.
293295
function python_version::read_trimmed_version_lines() {
294296
local file="${1}"
295297
LC_ALL=C.UTF-8 sed \
296298
--regexp-extended \
297299
--expression 's/^[[:space:]]+//' \
298300
--expression 's/[[:space:]]+$//' \
301+
--expression 's/\x0/␀/' \
299302
--expression '/^(#|$)/d' \
300303
"${file}"
301304
}
2 Bytes
Binary file not shown.

spec/hatchet/python_version_spec.rb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -365,14 +365,14 @@
365365
remote: ! Error: Unable to read .python-version.
366366
remote: !
367367
remote: ! Your .python-version file couldn't be read because it's using
368-
remote: ! an unsupported file encoding:
368+
remote: ! an unsupported text encoding:
369369
remote: ! Unicode text, UTF-8 (with BOM) text, with CRLF line terminators
370370
remote: !
371371
remote: ! Configure your editor to save files as UTF-8, without a BOM,
372372
remote: ! then delete and recreate the file using the correct encoding.
373373
remote: !
374374
remote: ! If that doesn't work, make sure you don't have a .gitattributes
375-
remote: ! file that's overriding the file encoding.
375+
remote: ! file that's overriding the text encoding.
376376
remote: !
377377
remote: ! Note: On Windows, if you pipe or redirect output to a file
378378
remote: ! it can result in the file being encoded in UTF-16 LE when

0 commit comments

Comments
 (0)