Mercurial > p > roundup > code
annotate roundup/anypy/email_.py @ 7752:b2dbab2b34bc
fix(refactor): multiple fixups using ruff linter; more testing.
Converting to using the ruff linter and its rulesets. Fixed a number
of issues.
admin.py:
sort imports
use immutable tuples as default value markers for parameters where a
None value is valid.
reduced some loops to list comprehensions for performance
used ternary to simplify some if statements
named some variables to make them less magic
(e.g. _default_savepoint_setting = 1000)
fixed some tests for argument counts < 2 becomes != 2 so 3 is an
error.
moved exception handlers outside of loops for performance where
exception handler will abort loop anyway.
renamed variables called 'id' or 'dir' as they shadow builtin
commands.
fix translations of form _("string %s" % value) -> _("string %s") %
value so translation will be looked up with the key before
substitution.
end dicts, tuples with a trailing comma to reduce missing comma
errors if modified
simplified sorted(list(self.setting.keys())) to
sorted(self.setting.keys()) as sorted consumes whole list.
in if conditions put compared variable on left and threshold condition
on right. (no yoda conditions)
multiple noqa: suppression
removed unneeded noqa as lint rulesets are a bit different
do_get - refactor output printing logic: Use fast return if not
special formatting is requested; use isinstance with a tuple
rather than two isinstance calls; cleaned up flow and removed
comments on algorithm as it can be easily read from the code.
do_filter, do_find - refactor output printing logic. Reduce
duplicate code.
do_find - renamed variable 'value' that was set inside a loop. The
loop index variable was also named 'value'.
do_pragma - added hint to use list subcommand if setting was not
found. Replaced condition 'type(x) is bool' with 'isinstance(x,
bool)' for various types.
test_admin.py
added testing for do_list
better test coverage for do_get includes: -S and -d for multilinks,
error case for -d with non-link.
better testing for do_find including all output modes
better testing for do_filter including all output modes
fixed expected output for do_pragma that now includes hint to use
pragma list if setting not found.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Fri, 01 Mar 2024 14:53:18 -0500 |
| parents | f21ec1414591 |
| children | 609c5fd638e8 |
| rev | line source |
|---|---|
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
1 import re |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
2 import binascii |
|
4979
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
3 import email |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
4 from email import quoprimime, base64mime |
|
5761
cacef71b3a54
working branch for fixing https://issues.roundup-tracker.org/issue2551008
John Rouillard <rouilj@ieee.org>
parents:
5542
diff
changeset
|
5 from email import charset as _charset |
|
cacef71b3a54
working branch for fixing https://issues.roundup-tracker.org/issue2551008
John Rouillard <rouilj@ieee.org>
parents:
5542
diff
changeset
|
6 |
|
5494
b7fa56ced601
use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5421
diff
changeset
|
7 if str == bytes: |
|
b7fa56ced601
use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5421
diff
changeset
|
8 message_from_bytes = email.message_from_string |
|
5542
29346d92d80c
Fix email interfaces with Python 3 (issue 2550974, issue 2551000).
Joseph Myers <jsm@polyomino.org.uk>
parents:
5494
diff
changeset
|
9 message_from_binary_file = email.message_from_file |
|
5494
b7fa56ced601
use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5421
diff
changeset
|
10 else: |
|
b7fa56ced601
use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5421
diff
changeset
|
11 message_from_bytes = email.message_from_bytes |
|
5542
29346d92d80c
Fix email interfaces with Python 3 (issue 2550974, issue 2551000).
Joseph Myers <jsm@polyomino.org.uk>
parents:
5494
diff
changeset
|
12 message_from_binary_file = email.message_from_binary_file |
|
5494
b7fa56ced601
use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5421
diff
changeset
|
13 |
|
4979
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
14 ## please import this file if you are using the email module |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
15 |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
16 # Match encoded-word strings in the form =?charset?q?Hello_World?= |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
17 ecre = re.compile(r''' |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
18 =\? # literal =? |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
19 (?P<charset>[^?]*?) # non-greedy up to the next ? is the charset |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
20 \? # literal ? |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
21 (?P<encoding>[qb]) # either a "q" or a "b", case insensitive |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
22 \? # literal ? |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
23 (?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
24 \?= # literal ?= |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
25 ''', re.VERBOSE | re.IGNORECASE | re.MULTILINE) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
26 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
27 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
28 # Fixed header parser, see my proposed patch and discussions: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
29 # http://bugs.python.org/issue1079 "decode_header does not follow RFC 2047" |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
30 # http://bugs.python.org/issue1467619 "Header.decode_header eats up spaces" |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
31 # This implements the decode_header specific parts of my proposed patch |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
32 # backported to python2.X |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
33 def decode_header(header): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
34 """Decode a message header value without converting charset. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
35 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
36 Returns a list of (string, charset) pairs containing each of the decoded |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
37 parts of the header. Charset is None for non-encoded parts of the header, |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
38 otherwise a lower-case string containing the name of the character set |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
39 specified in the encoded string. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
40 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
41 header may be a string that may or may not contain RFC2047 encoded words, |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
42 or it may be a Header object. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
43 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
44 An email.errors.HeaderParseError may be raised when certain decoding error |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
45 occurs (e.g. a base64 decoding exception). |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
46 """ |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
47 # If it is a Header object, we can just return the encoded chunks. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
48 if hasattr(header, '_chunks'): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
49 return [(_charset._encode(string, str(charset)), str(charset)) |
|
6022
70a6ebad4df4
flake8 fix: rm unused sys import; formatting fixes.
John Rouillard <rouilj@ieee.org>
parents:
5764
diff
changeset
|
50 for string, charset in header._chunks] |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
51 # If no encoding, just return the header with no charset. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
52 if not ecre.search(header): |
|
5764
611737bc7261
Handle the issue in _decode_header by decoding only when decode_header returns bytes.
Ezio Melotti <ezio.melotti@gmail.com>
parents:
5761
diff
changeset
|
53 return [(header, None)] |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
54 # First step is to parse all the encoded parts into triplets of the form |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
55 # (encoded_string, encoding, charset). For unencoded strings, the last |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
56 # two parts will be None. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
57 words = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
58 for line in header.splitlines(): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
59 parts = ecre.split(line) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
60 first = True |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
61 while parts: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
62 unencoded = parts.pop(0) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
63 if first: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
64 unencoded = unencoded.lstrip() |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
65 first = False |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
66 if unencoded: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
67 words.append((unencoded, None, None)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
68 if parts: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
69 charset = parts.pop(0).lower() |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
70 encoding = parts.pop(0).lower() |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
71 encoded = parts.pop(0) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
72 words.append((encoded, encoding, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
73 # Now loop over words and remove words that consist of whitespace |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
74 # between two encoded strings. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
75 droplist = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
76 for n, w in enumerate(words): |
|
6022
70a6ebad4df4
flake8 fix: rm unused sys import; formatting fixes.
John Rouillard <rouilj@ieee.org>
parents:
5764
diff
changeset
|
77 if n > 1 and w[1] and words[n-2][1] and words[n-1][0].isspace(): |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
78 droplist.append(n-1) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
79 for d in reversed(droplist): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
80 del words[d] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
81 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
82 # The next step is to decode each encoded word by applying the reverse |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
83 # base64 or quopri transformation. decoded_words is now a list of the |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
84 # form (decoded_word, charset). |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
85 decoded_words = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
86 for encoded_string, encoding, charset in words: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
87 if encoding is None: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
88 # This is an unencoded word. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
89 decoded_words.append((encoded_string, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
90 elif encoding == 'q': |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
91 word = quoprimime.header_decode(encoded_string) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
92 decoded_words.append((word, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
93 elif encoding == 'b': |
|
6022
70a6ebad4df4
flake8 fix: rm unused sys import; formatting fixes.
John Rouillard <rouilj@ieee.org>
parents:
5764
diff
changeset
|
94 # Postel's law: add missing padding |
|
70a6ebad4df4
flake8 fix: rm unused sys import; formatting fixes.
John Rouillard <rouilj@ieee.org>
parents:
5764
diff
changeset
|
95 paderr = len(encoded_string) % 4 |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
96 if paderr: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
97 encoded_string += '==='[:4 - paderr] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
98 try: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
99 word = base64mime.decode(encoded_string) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
100 except binascii.Error: |
|
5238
758edaa61ec0
pylint flagged HeaderParseError as an Undefined variable.
John Rouillard <rouilj@ieee.org>
parents:
5090
diff
changeset
|
101 raise email.errors.HeaderParseError('Base64 decoding error') |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
102 else: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
103 decoded_words.append((word, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
104 else: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
105 raise AssertionError('Unexpected encoding: ' + encoding) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
106 # Now convert all words to bytes and collapse consecutive runs of |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
107 # similarly encoded words. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
108 collapsed = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
109 last_word = last_charset = None |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
110 for word, charset in decoded_words: |
|
6278
f21ec1414591
issue2551092 - fix crash due to use of string not bytes under py3
John Rouillard <rouilj@ieee.org>
parents:
6022
diff
changeset
|
111 if isinstance(word, str) and bytes != str: |
|
f21ec1414591
issue2551092 - fix crash due to use of string not bytes under py3
John Rouillard <rouilj@ieee.org>
parents:
6022
diff
changeset
|
112 word = bytes(word, 'raw-unicode-escape') |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
113 if last_word is None: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
114 last_word = word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
115 last_charset = charset |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
116 elif charset != last_charset: |
|
5764
611737bc7261
Handle the issue in _decode_header by decoding only when decode_header returns bytes.
Ezio Melotti <ezio.melotti@gmail.com>
parents:
5761
diff
changeset
|
117 collapsed.append((last_word, last_charset)) |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
118 last_word = word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
119 last_charset = charset |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
120 elif last_charset is None: |
|
5090
89c2c1a88927
issue2550850 anypy/email_.py uses BSPACE which is not defined in python 2.7
John Rouillard <rouilj@ieee.org>
parents:
4983
diff
changeset
|
121 BSPACE = b' ' |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
122 last_word += BSPACE + word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
123 else: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
124 last_word += word |
|
5764
611737bc7261
Handle the issue in _decode_header by decoding only when decode_header returns bytes.
Ezio Melotti <ezio.melotti@gmail.com>
parents:
5761
diff
changeset
|
125 collapsed.append((last_word, last_charset)) |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
126 return collapsed |
