How do you use Sentry?
Sentry Saas (sentry.io)
Version
1.39.2
Issue
The strip_string function isn't working properly.
Here we calculate the size of the string in bytes as length. But then when we actually determine that the string needs trimming, we trim length characters from the string instead of length bytes. We also then potentially report the wrong number in the metadata.
from sentry_sdk.utils import strip_string
strip_string("éê", 2) # == AnnotatedValue(value="éê", ...)
Both é and ê are two-byte large, making the string "éê" 4 bytes long. Yet strip_string will not strip it to two bytes.
- It'll get encoded into bytes here.
- The size of the encoded version is 4, so
length will be set to 4.
- This check will be
True, because 4 > 2.
- But when we actually try to trim here, we're trimming the string
"éê" to two (characters/code points), as opposed to the encoded bytes representation.
Solution
Probably something to the effect of
string.encode("utf-8")[: max_bytes - 3].decode("utf-8", errors="ignore")
The [: max_bytes - 3] part might end up cutting a code point in two; .decode with errors="ignore" will ignore any malformed codepoints.
How do you use Sentry?
Sentry Saas (sentry.io)
Version
1.39.2
Issue
The
strip_stringfunction isn't working properly.Here we calculate the size of the string in bytes as
length. But then when we actually determine that the string needs trimming, we trimlengthcharacters from the string instead oflengthbytes. We also then potentially report the wrong number in the metadata.Both
éandêare two-byte large, making the string"éê"4 bytes long. Yetstrip_stringwill not strip it to two bytes.lengthwill be set to4.True, because4 > 2."éê"to two (characters/code points), as opposed to the encoded bytes representation.Solution
Probably something to the effect of
The
[: max_bytes - 3]part might end up cutting a code point in two;.decodewitherrors="ignore"will ignore any malformed codepoints.