Skip to content

fix: use errors='replace' in Frame.__str__() for partial UTF-8 frames (fixes #1695)#1704

Open
naarob wants to merge 1 commit intopython-websockets:mainfrom
naarob:main
Open

fix: use errors='replace' in Frame.__str__() for partial UTF-8 frames (fixes #1695)#1704
naarob wants to merge 1 commit intopython-websockets:mainfrom
naarob:main

Conversation

@naarob
Copy link
Copy Markdown

@naarob naarob commented Mar 26, 2026

Fixes UnicodeDecodeError when DEBUG logging is enabled and a large text message is fragmented at byte boundaries. See issue #1695 for full details.

data = repr(bytes(self.data).decode(errors="replace"))

9 new tests. 79 upstream pass. 0 regressions.

…python-websockets#1695)

Frame.__str__() decoded OP_TEXT frame data with a bare .decode(), which
raises UnicodeDecodeError when the frame ends in the middle of a multi-byte
UTF-8 sequence. This happens when the websockets library itself fragments a
large text message at byte boundaries (not at character boundaries) for
continuations frames (fin=False), e.g. Japanese, Chinese, or emoji text.

When DEBUG logging is enabled, the UnicodeDecodeError propagated and caused
the connection to close with code 1007 (INVALID_DATA), even though the
message was valid. The data itself was fine — only the logging was broken.

Fix: add errors='replace' to the .decode() call in Frame.__str__().
This replaces incomplete sequences with U+FFFD (replacement character),
making the log entry human-readable while never crashing the connection.

Tests: 9 new tests covering partial Japanese, partial emoji, complete frames,
ASCII, binary, and ping frames. 79 upstream tests unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant