bpo-34222: Lib/email: Fix infinite loop when folding#8990
bpo-34222: Lib/email: Fix infinite loop when folding#8990Xiami2012 wants to merge 2 commits intopython:masterfrom Xiami2012:fix-bpo-34222
Conversation
|
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
Currently when folding headers with length > maxlen, _fold_as_ew tries to split the to_encode into multiple parts to fulfill the maxlen limit, in an inapropriate way. If a long header has non-ascii characters, in some situations (e.g. a Subject: with full of CJK chars), it will split the to_encode into ["", to_encode], entering an infinite loop. This commit fixes this by introducing a smarter way to split. Besides, when an header needs to be folded now, every non-last line will try its best to reach the maxlen, in O(log N) time. Also, apply missing charset= parameter for _ew.encode. The bug is introduced in commit 85d5c18
| if len(ew) > remaining_space: | ||
| # Find the longest first_part | ||
| # since len(_ew.encode(to_encode[:x])) is a non-linear | ||
| # monotonically increasing function, and calculating the |
There was a problem hiding this comment.
_ew.encode is biased towards the 'q' encoding. This might violate the assumption of a monotonically increasing function for some corner cases. (This was already the case for the old code.)
I hope to find the time to write a test case for this.
|
Thank you for the contribution. This was fixed in GH-12020, so I'm closing this as a duplicate. |
Currently when folding headers with length > maxlen, _fold_as_ew tries
to split the to_encode into multiple parts to fulfill the maxlen limit,
in an inapropriate way.
If a long header has non-ascii characters, in some situations (e.g. a
Subject: with full of CJK chars), it will split the to_encode into
["", to_encode], entering an infinite loop.
This commit fixes this by introduce a smarter way to split.
Besides, when an header needs to be folded now, every non-last line will
try its best to reach the maxlen, in O(log N) time.
Also, apply missing charset= parameter for _ew.encode.
The bug is introduced in commit 85d5c18
https://bugs.python.org/issue34222