annotate roundup/anypy/email_.py @ 7752:b2dbab2b34bc

fix(refactor): multiple fixups using ruff linter; more testing. Converting to using the ruff linter and its rulesets. Fixed a number of issues. admin.py: sort imports use immutable tuples as default value markers for parameters where a None value is valid. reduced some loops to list comprehensions for performance used ternary to simplify some if statements named some variables to make them less magic (e.g. _default_savepoint_setting = 1000) fixed some tests for argument counts < 2 becomes != 2 so 3 is an error. moved exception handlers outside of loops for performance where exception handler will abort loop anyway. renamed variables called 'id' or 'dir' as they shadow builtin commands. fix translations of form _("string %s" % value) -> _("string %s") % value so translation will be looked up with the key before substitution. end dicts, tuples with a trailing comma to reduce missing comma errors if modified simplified sorted(list(self.setting.keys())) to sorted(self.setting.keys()) as sorted consumes whole list. in if conditions put compared variable on left and threshold condition on right. (no yoda conditions) multiple noqa: suppression removed unneeded noqa as lint rulesets are a bit different do_get - refactor output printing logic: Use fast return if not special formatting is requested; use isinstance with a tuple rather than two isinstance calls; cleaned up flow and removed comments on algorithm as it can be easily read from the code. do_filter, do_find - refactor output printing logic. Reduce duplicate code. do_find - renamed variable 'value' that was set inside a loop. The loop index variable was also named 'value'. do_pragma - added hint to use list subcommand if setting was not found. Replaced condition 'type(x) is bool' with 'isinstance(x, bool)' for various types. test_admin.py added testing for do_list better test coverage for do_get includes: -S and -d for multilinks, error case for -d with non-link. better testing for do_find including all output modes better testing for do_filter including all output modes fixed expected output for do_pragma that now includes hint to use pragma list if setting not found.
author John Rouillard <rouilj@ieee.org>
date Fri, 01 Mar 2024 14:53:18 -0500
parents f21ec1414591
children 609c5fd638e8
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
1 import re
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
2 import binascii
4979
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
3 import email
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
4 from email import quoprimime, base64mime
5761
cacef71b3a54 working branch for fixing https://issues.roundup-tracker.org/issue2551008
John Rouillard <rouilj@ieee.org>
parents: 5542
diff changeset
5 from email import charset as _charset
cacef71b3a54 working branch for fixing https://issues.roundup-tracker.org/issue2551008
John Rouillard <rouilj@ieee.org>
parents: 5542
diff changeset
6
5494
b7fa56ced601 use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5421
diff changeset
7 if str == bytes:
b7fa56ced601 use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5421
diff changeset
8 message_from_bytes = email.message_from_string
5542
29346d92d80c Fix email interfaces with Python 3 (issue 2550974, issue 2551000).
Joseph Myers <jsm@polyomino.org.uk>
parents: 5494
diff changeset
9 message_from_binary_file = email.message_from_file
5494
b7fa56ced601 use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5421
diff changeset
10 else:
b7fa56ced601 use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5421
diff changeset
11 message_from_bytes = email.message_from_bytes
5542
29346d92d80c Fix email interfaces with Python 3 (issue 2550974, issue 2551000).
Joseph Myers <jsm@polyomino.org.uk>
parents: 5494
diff changeset
12 message_from_binary_file = email.message_from_binary_file
5494
b7fa56ced601 use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5421
diff changeset
13
4979
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
14 ## please import this file if you are using the email module
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
15
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
16 # Match encoded-word strings in the form =?charset?q?Hello_World?=
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
17 ecre = re.compile(r'''
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
18 =\? # literal =?
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
19 (?P<charset>[^?]*?) # non-greedy up to the next ? is the charset
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
20 \? # literal ?
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
21 (?P<encoding>[qb]) # either a "q" or a "b", case insensitive
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
22 \? # literal ?
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
23 (?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
24 \?= # literal ?=
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
25 ''', re.VERBOSE | re.IGNORECASE | re.MULTILINE)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
26
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
27
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
28 # Fixed header parser, see my proposed patch and discussions:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
29 # http://bugs.python.org/issue1079 "decode_header does not follow RFC 2047"
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
30 # http://bugs.python.org/issue1467619 "Header.decode_header eats up spaces"
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
31 # This implements the decode_header specific parts of my proposed patch
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
32 # backported to python2.X
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
33 def decode_header(header):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
34 """Decode a message header value without converting charset.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
35
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
36 Returns a list of (string, charset) pairs containing each of the decoded
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
37 parts of the header. Charset is None for non-encoded parts of the header,
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
38 otherwise a lower-case string containing the name of the character set
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
39 specified in the encoded string.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
40
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
41 header may be a string that may or may not contain RFC2047 encoded words,
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
42 or it may be a Header object.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
43
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
44 An email.errors.HeaderParseError may be raised when certain decoding error
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
45 occurs (e.g. a base64 decoding exception).
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
46 """
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
47 # If it is a Header object, we can just return the encoded chunks.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
48 if hasattr(header, '_chunks'):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
49 return [(_charset._encode(string, str(charset)), str(charset))
6022
70a6ebad4df4 flake8 fix: rm unused sys import; formatting fixes.
John Rouillard <rouilj@ieee.org>
parents: 5764
diff changeset
50 for string, charset in header._chunks]
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
51 # If no encoding, just return the header with no charset.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
52 if not ecre.search(header):
5764
611737bc7261 Handle the issue in _decode_header by decoding only when decode_header returns bytes.
Ezio Melotti <ezio.melotti@gmail.com>
parents: 5761
diff changeset
53 return [(header, None)]
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
54 # First step is to parse all the encoded parts into triplets of the form
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
55 # (encoded_string, encoding, charset). For unencoded strings, the last
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
56 # two parts will be None.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
57 words = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
58 for line in header.splitlines():
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
59 parts = ecre.split(line)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
60 first = True
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
61 while parts:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
62 unencoded = parts.pop(0)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
63 if first:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
64 unencoded = unencoded.lstrip()
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
65 first = False
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
66 if unencoded:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
67 words.append((unencoded, None, None))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
68 if parts:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
69 charset = parts.pop(0).lower()
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
70 encoding = parts.pop(0).lower()
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
71 encoded = parts.pop(0)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
72 words.append((encoded, encoding, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
73 # Now loop over words and remove words that consist of whitespace
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
74 # between two encoded strings.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
75 droplist = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
76 for n, w in enumerate(words):
6022
70a6ebad4df4 flake8 fix: rm unused sys import; formatting fixes.
John Rouillard <rouilj@ieee.org>
parents: 5764
diff changeset
77 if n > 1 and w[1] and words[n-2][1] and words[n-1][0].isspace():
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
78 droplist.append(n-1)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
79 for d in reversed(droplist):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
80 del words[d]
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
81
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
82 # The next step is to decode each encoded word by applying the reverse
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
83 # base64 or quopri transformation. decoded_words is now a list of the
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
84 # form (decoded_word, charset).
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
85 decoded_words = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
86 for encoded_string, encoding, charset in words:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
87 if encoding is None:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
88 # This is an unencoded word.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
89 decoded_words.append((encoded_string, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
90 elif encoding == 'q':
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
91 word = quoprimime.header_decode(encoded_string)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
92 decoded_words.append((word, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
93 elif encoding == 'b':
6022
70a6ebad4df4 flake8 fix: rm unused sys import; formatting fixes.
John Rouillard <rouilj@ieee.org>
parents: 5764
diff changeset
94 # Postel's law: add missing padding
70a6ebad4df4 flake8 fix: rm unused sys import; formatting fixes.
John Rouillard <rouilj@ieee.org>
parents: 5764
diff changeset
95 paderr = len(encoded_string) % 4
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
96 if paderr:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
97 encoded_string += '==='[:4 - paderr]
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
98 try:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
99 word = base64mime.decode(encoded_string)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
100 except binascii.Error:
5238
758edaa61ec0 pylint flagged HeaderParseError as an Undefined variable.
John Rouillard <rouilj@ieee.org>
parents: 5090
diff changeset
101 raise email.errors.HeaderParseError('Base64 decoding error')
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
102 else:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
103 decoded_words.append((word, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
104 else:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
105 raise AssertionError('Unexpected encoding: ' + encoding)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
106 # Now convert all words to bytes and collapse consecutive runs of
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
107 # similarly encoded words.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
108 collapsed = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
109 last_word = last_charset = None
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
110 for word, charset in decoded_words:
6278
f21ec1414591 issue2551092 - fix crash due to use of string not bytes under py3
John Rouillard <rouilj@ieee.org>
parents: 6022
diff changeset
111 if isinstance(word, str) and bytes != str:
f21ec1414591 issue2551092 - fix crash due to use of string not bytes under py3
John Rouillard <rouilj@ieee.org>
parents: 6022
diff changeset
112 word = bytes(word, 'raw-unicode-escape')
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
113 if last_word is None:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
114 last_word = word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
115 last_charset = charset
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
116 elif charset != last_charset:
5764
611737bc7261 Handle the issue in _decode_header by decoding only when decode_header returns bytes.
Ezio Melotti <ezio.melotti@gmail.com>
parents: 5761
diff changeset
117 collapsed.append((last_word, last_charset))
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
118 last_word = word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
119 last_charset = charset
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
120 elif last_charset is None:
5090
89c2c1a88927 issue2550850 anypy/email_.py uses BSPACE which is not defined in python 2.7
John Rouillard <rouilj@ieee.org>
parents: 4983
diff changeset
121 BSPACE = b' '
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
122 last_word += BSPACE + word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
123 else:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
124 last_word += word
5764
611737bc7261 Handle the issue in _decode_header by decoding only when decode_header returns bytes.
Ezio Melotti <ezio.melotti@gmail.com>
parents: 5761
diff changeset
125 collapsed.append((last_word, last_charset))
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
126 return collapsed

Roundup Issue Tracker: http://roundup-tracker.org/