Skip to content

Conversation

@edmorley
Copy link
Member

To help work out what file encodings are present in the wild, in order to improve the UX for file parsing related errors that are believe to be due to encoding issues.

The file --brief command will return values like:

  • ASCII text
  • Unicode text, UTF-8 (with BOM) text
  • Unicode text, UTF-16, little-endian text, with CRLF line terminators

These metrics will be removed later.

See:
https://manpages.ubuntu.com/manpages/noble/en/man1/file.1.html

GUS-W-20049083.

@edmorley edmorley self-assigned this Oct 27, 2025
To help work out what file encodings are present in the wild,
in order to improve the UX for file parsing related errors that
are believe to be due to encoding issues.

The `file --brief` command will return values like:
- `ASCII text`
- `Unicode text, UTF-8 (with BOM) text`
- `Unicode text, UTF-16, little-endian text, with CRLF line terminators`

These metrics will be removed later.

See:
https://manpages.ubuntu.com/manpages/noble/en/man1/file.1.html

GUS-W-20049083.
@edmorley edmorley marked this pull request as ready for review October 27, 2025 11:46
@edmorley edmorley requested a review from a team as a code owner October 27, 2025 11:46
@edmorley edmorley enabled auto-merge (squash) October 27, 2025 11:47
@edmorley edmorley merged commit d134601 into main Oct 27, 2025
5 of 6 checks passed
@edmorley edmorley deleted the file-encoding branch October 27, 2025 11:55
@heroku-linguist heroku-linguist bot mentioned this pull request Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants