Skip to content

Handling of BOM (byte order mark) in source files #4353

@squiddy

Description

@squiddy

Feature

PEP 263 specifies that

To aid with platforms such as Windows, which add Unicode BOM marks to the beginning of Unicode files, the UTF-8 signature \xef\xbb\xbf will be interpreted as ‘utf-8’ encoding as well (even if no magic encoding comment is given).

However, RustPython currently fails on files that include a BOM.

$ printf '\xEF\xBB\xBF' > f.py
$ cargo run f.py
SyntaxError: Got unexpected token  at line 1 column 1

^

Python Documentation

Not familiar with CPython source code, but some pointers:

https://github.com/python/cpython/blob/3.11/Lib/tokenize.py#L299 (detect_encoding)
https://github.com/python/cpython/blob/3.11/Lib/test/test_importlib/source/test_source_encoding.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions