annotate roundup/anypy/email_.py @ 5852:44b6a79f4e70 maint-1.6

pyme no longer exists at pypi. Try using the git repo of record so we can run the tests on this year old tree. Right after this release in mid 2018, pyme was depricated on 2018-10-16.
author John Rouillard <rouilj@ieee.org>
date Wed, 21 Aug 2019 20:59:32 -0400
parents 758edaa61ec0
children 45bfb4bf59c2
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
1 import re
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
2 import binascii
4979
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
3 import email
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
4 from email import quoprimime, base64mime
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
5
4979
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
6 ## please import this file if you are using the email module
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
7 #
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
8 # a "monkey patch" to unify the behaviour of python 2.5 2.6 2.7
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
9 # when generating header files, see http://bugs.python.org/issue1974
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
10 # and https://hg.python.org/cpython/rev/5deb27042e5a/
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
11 # can go away once the minimum requirement is python 2.7
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
12 _oldheaderinit = email.Header.Header.__init__
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
13 def _unifiedheaderinit(self, *args, **kw):
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
14 # override continuation_ws
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
15 kw['continuation_ws'] = ' '
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
16 _oldheaderinit(self, *args, **kw)
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
17 email.Header.Header.__dict__['__init__'] = _unifiedheaderinit
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
18 ##
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
19
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
20 # Match encoded-word strings in the form =?charset?q?Hello_World?=
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
21 ecre = re.compile(r'''
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
22 =\? # literal =?
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
23 (?P<charset>[^?]*?) # non-greedy up to the next ? is the charset
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
24 \? # literal ?
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
25 (?P<encoding>[qb]) # either a "q" or a "b", case insensitive
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
26 \? # literal ?
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
27 (?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
28 \?= # literal ?=
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
29 ''', re.VERBOSE | re.IGNORECASE | re.MULTILINE)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
30
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
31
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
32 # Fixed header parser, see my proposed patch and discussions:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
33 # http://bugs.python.org/issue1079 "decode_header does not follow RFC 2047"
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
34 # http://bugs.python.org/issue1467619 "Header.decode_header eats up spaces"
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
35 # This implements the decode_header specific parts of my proposed patch
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
36 # backported to python2.X
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
37 def decode_header(header):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
38 """Decode a message header value without converting charset.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
39
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
40 Returns a list of (string, charset) pairs containing each of the decoded
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
41 parts of the header. Charset is None for non-encoded parts of the header,
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
42 otherwise a lower-case string containing the name of the character set
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
43 specified in the encoded string.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
44
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
45 header may be a string that may or may not contain RFC2047 encoded words,
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
46 or it may be a Header object.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
47
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
48 An email.errors.HeaderParseError may be raised when certain decoding error
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
49 occurs (e.g. a base64 decoding exception).
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
50 """
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
51 # If it is a Header object, we can just return the encoded chunks.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
52 if hasattr(header, '_chunks'):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
53 return [(_charset._encode(string, str(charset)), str(charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
54 for string, charset in header._chunks]
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
55 # If no encoding, just return the header with no charset.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
56 if not ecre.search(header):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
57 return [(header, None)]
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
58 # First step is to parse all the encoded parts into triplets of the form
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
59 # (encoded_string, encoding, charset). For unencoded strings, the last
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
60 # two parts will be None.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
61 words = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
62 for line in header.splitlines():
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
63 parts = ecre.split(line)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
64 first = True
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
65 while parts:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
66 unencoded = parts.pop(0)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
67 if first:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
68 unencoded = unencoded.lstrip()
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
69 first = False
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
70 if unencoded:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
71 words.append((unencoded, None, None))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
72 if parts:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
73 charset = parts.pop(0).lower()
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
74 encoding = parts.pop(0).lower()
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
75 encoded = parts.pop(0)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
76 words.append((encoded, encoding, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
77 # Now loop over words and remove words that consist of whitespace
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
78 # between two encoded strings.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
79 import sys
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
80 droplist = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
81 for n, w in enumerate(words):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
82 if n>1 and w[1] and words[n-2][1] and words[n-1][0].isspace():
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
83 droplist.append(n-1)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
84 for d in reversed(droplist):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
85 del words[d]
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
86
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
87 # The next step is to decode each encoded word by applying the reverse
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
88 # base64 or quopri transformation. decoded_words is now a list of the
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
89 # form (decoded_word, charset).
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
90 decoded_words = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
91 for encoded_string, encoding, charset in words:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
92 if encoding is None:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
93 # This is an unencoded word.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
94 decoded_words.append((encoded_string, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
95 elif encoding == 'q':
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
96 word = quoprimime.header_decode(encoded_string)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
97 decoded_words.append((word, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
98 elif encoding == 'b':
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
99 paderr = len(encoded_string) % 4 # Postel's law: add missing padding
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
100 if paderr:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
101 encoded_string += '==='[:4 - paderr]
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
102 try:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
103 word = base64mime.decode(encoded_string)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
104 except binascii.Error:
5238
758edaa61ec0 pylint flagged HeaderParseError as an Undefined variable.
John Rouillard <rouilj@ieee.org>
parents: 5090
diff changeset
105 raise email.errors.HeaderParseError('Base64 decoding error')
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
106 else:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
107 decoded_words.append((word, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
108 else:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
109 raise AssertionError('Unexpected encoding: ' + encoding)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
110 # Now convert all words to bytes and collapse consecutive runs of
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
111 # similarly encoded words.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
112 collapsed = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
113 last_word = last_charset = None
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
114 for word, charset in decoded_words:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
115 if isinstance(word, str):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
116 pass
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
117 if last_word is None:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
118 last_word = word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
119 last_charset = charset
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
120 elif charset != last_charset:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
121 collapsed.append((last_word, last_charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
122 last_word = word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
123 last_charset = charset
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
124 elif last_charset is None:
5090
89c2c1a88927 issue2550850 anypy/email_.py uses BSPACE which is not defined in python 2.7
John Rouillard <rouilj@ieee.org>
parents: 4983
diff changeset
125 BSPACE = b' '
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
126 last_word += BSPACE + word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
127 else:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
128 last_word += word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
129 collapsed.append((last_word, last_charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
130 return collapsed
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
131

Roundup Issue Tracker: http://roundup-tracker.org/