annotate roundup/anypy/email_.py @ 7531:913a73b9fab5 2.3.0

Update for 2.3.0 release
author John Rouillard <rouilj@ieee.org>
date Wed, 12 Jul 2023 23:00:25 -0400
parents f21ec1414591
children 609c5fd638e8
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
1 import re
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
2 import binascii
4979
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
3 import email
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
4 from email import quoprimime, base64mime
5761
cacef71b3a54 working branch for fixing https://issues.roundup-tracker.org/issue2551008
John Rouillard <rouilj@ieee.org>
parents: 5542
diff changeset
5 from email import charset as _charset
cacef71b3a54 working branch for fixing https://issues.roundup-tracker.org/issue2551008
John Rouillard <rouilj@ieee.org>
parents: 5542
diff changeset
6
5494
b7fa56ced601 use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5421
diff changeset
7 if str == bytes:
b7fa56ced601 use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5421
diff changeset
8 message_from_bytes = email.message_from_string
5542
29346d92d80c Fix email interfaces with Python 3 (issue 2550974, issue 2551000).
Joseph Myers <jsm@polyomino.org.uk>
parents: 5494
diff changeset
9 message_from_binary_file = email.message_from_file
5494
b7fa56ced601 use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5421
diff changeset
10 else:
b7fa56ced601 use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5421
diff changeset
11 message_from_bytes = email.message_from_bytes
5542
29346d92d80c Fix email interfaces with Python 3 (issue 2550974, issue 2551000).
Joseph Myers <jsm@polyomino.org.uk>
parents: 5494
diff changeset
12 message_from_binary_file = email.message_from_binary_file
5494
b7fa56ced601 use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5421
diff changeset
13
4979
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
14 ## please import this file if you are using the email module
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
15
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
16 # Match encoded-word strings in the form =?charset?q?Hello_World?=
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
17 ecre = re.compile(r'''
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
18 =\? # literal =?
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
19 (?P<charset>[^?]*?) # non-greedy up to the next ? is the charset
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
20 \? # literal ?
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
21 (?P<encoding>[qb]) # either a "q" or a "b", case insensitive
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
22 \? # literal ?
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
23 (?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
24 \?= # literal ?=
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
25 ''', re.VERBOSE | re.IGNORECASE | re.MULTILINE)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
26
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
27
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
28 # Fixed header parser, see my proposed patch and discussions:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
29 # http://bugs.python.org/issue1079 "decode_header does not follow RFC 2047"
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
30 # http://bugs.python.org/issue1467619 "Header.decode_header eats up spaces"
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
31 # This implements the decode_header specific parts of my proposed patch
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
32 # backported to python2.X
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
33 def decode_header(header):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
34 """Decode a message header value without converting charset.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
35
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
36 Returns a list of (string, charset) pairs containing each of the decoded
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
37 parts of the header. Charset is None for non-encoded parts of the header,
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
38 otherwise a lower-case string containing the name of the character set
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
39 specified in the encoded string.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
40
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
41 header may be a string that may or may not contain RFC2047 encoded words,
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
42 or it may be a Header object.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
43
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
44 An email.errors.HeaderParseError may be raised when certain decoding error
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
45 occurs (e.g. a base64 decoding exception).
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
46 """
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
47 # If it is a Header object, we can just return the encoded chunks.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
48 if hasattr(header, '_chunks'):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
49 return [(_charset._encode(string, str(charset)), str(charset))
6022
70a6ebad4df4 flake8 fix: rm unused sys import; formatting fixes.
John Rouillard <rouilj@ieee.org>
parents: 5764
diff changeset
50 for string, charset in header._chunks]
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
51 # If no encoding, just return the header with no charset.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
52 if not ecre.search(header):
5764
611737bc7261 Handle the issue in _decode_header by decoding only when decode_header returns bytes.
Ezio Melotti <ezio.melotti@gmail.com>
parents: 5761
diff changeset
53 return [(header, None)]
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
54 # First step is to parse all the encoded parts into triplets of the form
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
55 # (encoded_string, encoding, charset). For unencoded strings, the last
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
56 # two parts will be None.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
57 words = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
58 for line in header.splitlines():
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
59 parts = ecre.split(line)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
60 first = True
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
61 while parts:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
62 unencoded = parts.pop(0)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
63 if first:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
64 unencoded = unencoded.lstrip()
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
65 first = False
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
66 if unencoded:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
67 words.append((unencoded, None, None))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
68 if parts:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
69 charset = parts.pop(0).lower()
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
70 encoding = parts.pop(0).lower()
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
71 encoded = parts.pop(0)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
72 words.append((encoded, encoding, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
73 # Now loop over words and remove words that consist of whitespace
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
74 # between two encoded strings.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
75 droplist = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
76 for n, w in enumerate(words):
6022
70a6ebad4df4 flake8 fix: rm unused sys import; formatting fixes.
John Rouillard <rouilj@ieee.org>
parents: 5764
diff changeset
77 if n > 1 and w[1] and words[n-2][1] and words[n-1][0].isspace():
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
78 droplist.append(n-1)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
79 for d in reversed(droplist):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
80 del words[d]
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
81
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
82 # The next step is to decode each encoded word by applying the reverse
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
83 # base64 or quopri transformation. decoded_words is now a list of the
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
84 # form (decoded_word, charset).
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
85 decoded_words = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
86 for encoded_string, encoding, charset in words:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
87 if encoding is None:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
88 # This is an unencoded word.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
89 decoded_words.append((encoded_string, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
90 elif encoding == 'q':
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
91 word = quoprimime.header_decode(encoded_string)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
92 decoded_words.append((word, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
93 elif encoding == 'b':
6022
70a6ebad4df4 flake8 fix: rm unused sys import; formatting fixes.
John Rouillard <rouilj@ieee.org>
parents: 5764
diff changeset
94 # Postel's law: add missing padding
70a6ebad4df4 flake8 fix: rm unused sys import; formatting fixes.
John Rouillard <rouilj@ieee.org>
parents: 5764
diff changeset
95 paderr = len(encoded_string) % 4
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
96 if paderr:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
97 encoded_string += '==='[:4 - paderr]
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
98 try:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
99 word = base64mime.decode(encoded_string)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
100 except binascii.Error:
5238
758edaa61ec0 pylint flagged HeaderParseError as an Undefined variable.
John Rouillard <rouilj@ieee.org>
parents: 5090
diff changeset
101 raise email.errors.HeaderParseError('Base64 decoding error')
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
102 else:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
103 decoded_words.append((word, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
104 else:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
105 raise AssertionError('Unexpected encoding: ' + encoding)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
106 # Now convert all words to bytes and collapse consecutive runs of
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
107 # similarly encoded words.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
108 collapsed = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
109 last_word = last_charset = None
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
110 for word, charset in decoded_words:
6278
f21ec1414591 issue2551092 - fix crash due to use of string not bytes under py3
John Rouillard <rouilj@ieee.org>
parents: 6022
diff changeset
111 if isinstance(word, str) and bytes != str:
f21ec1414591 issue2551092 - fix crash due to use of string not bytes under py3
John Rouillard <rouilj@ieee.org>
parents: 6022
diff changeset
112 word = bytes(word, 'raw-unicode-escape')
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
113 if last_word is None:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
114 last_word = word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
115 last_charset = charset
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
116 elif charset != last_charset:
5764
611737bc7261 Handle the issue in _decode_header by decoding only when decode_header returns bytes.
Ezio Melotti <ezio.melotti@gmail.com>
parents: 5761
diff changeset
117 collapsed.append((last_word, last_charset))
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
118 last_word = word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
119 last_charset = charset
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
120 elif last_charset is None:
5090
89c2c1a88927 issue2550850 anypy/email_.py uses BSPACE which is not defined in python 2.7
John Rouillard <rouilj@ieee.org>
parents: 4983
diff changeset
121 BSPACE = b' '
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
122 last_word += BSPACE + word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
123 else:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
124 last_word += word
5764
611737bc7261 Handle the issue in _decode_header by decoding only when decode_header returns bytes.
Ezio Melotti <ezio.melotti@gmail.com>
parents: 5761
diff changeset
125 collapsed.append((last_word, last_charset))
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
126 return collapsed

Roundup Issue Tracker: http://roundup-tracker.org/