Mercurial > p > roundup > code
annotate roundup/anypy/email_.py @ 5840:b68d3d8531d5 maint-1.6 1.6.1
Changes to prepare for 1.6.1 release.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Wed, 10 Jul 2019 10:35:29 -0400 |
| parents | 758edaa61ec0 |
| children | 45bfb4bf59c2 |
| rev | line source |
|---|---|
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
1 import re |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
2 import binascii |
|
4979
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
3 import email |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
4 from email import quoprimime, base64mime |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
5 |
|
4979
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
6 ## please import this file if you are using the email module |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
7 # |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
8 # a "monkey patch" to unify the behaviour of python 2.5 2.6 2.7 |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
9 # when generating header files, see http://bugs.python.org/issue1974 |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
10 # and https://hg.python.org/cpython/rev/5deb27042e5a/ |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
11 # can go away once the minimum requirement is python 2.7 |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
12 _oldheaderinit = email.Header.Header.__init__ |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
13 def _unifiedheaderinit(self, *args, **kw): |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
14 # override continuation_ws |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
15 kw['continuation_ws'] = ' ' |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
16 _oldheaderinit(self, *args, **kw) |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
17 email.Header.Header.__dict__['__init__'] = _unifiedheaderinit |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
18 ## |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
19 |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
20 # Match encoded-word strings in the form =?charset?q?Hello_World?= |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
21 ecre = re.compile(r''' |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
22 =\? # literal =? |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
23 (?P<charset>[^?]*?) # non-greedy up to the next ? is the charset |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
24 \? # literal ? |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
25 (?P<encoding>[qb]) # either a "q" or a "b", case insensitive |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
26 \? # literal ? |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
27 (?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
28 \?= # literal ?= |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
29 ''', re.VERBOSE | re.IGNORECASE | re.MULTILINE) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
30 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
31 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
32 # Fixed header parser, see my proposed patch and discussions: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
33 # http://bugs.python.org/issue1079 "decode_header does not follow RFC 2047" |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
34 # http://bugs.python.org/issue1467619 "Header.decode_header eats up spaces" |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
35 # This implements the decode_header specific parts of my proposed patch |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
36 # backported to python2.X |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
37 def decode_header(header): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
38 """Decode a message header value without converting charset. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
39 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
40 Returns a list of (string, charset) pairs containing each of the decoded |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
41 parts of the header. Charset is None for non-encoded parts of the header, |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
42 otherwise a lower-case string containing the name of the character set |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
43 specified in the encoded string. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
44 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
45 header may be a string that may or may not contain RFC2047 encoded words, |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
46 or it may be a Header object. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
47 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
48 An email.errors.HeaderParseError may be raised when certain decoding error |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
49 occurs (e.g. a base64 decoding exception). |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
50 """ |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
51 # If it is a Header object, we can just return the encoded chunks. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
52 if hasattr(header, '_chunks'): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
53 return [(_charset._encode(string, str(charset)), str(charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
54 for string, charset in header._chunks] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
55 # If no encoding, just return the header with no charset. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
56 if not ecre.search(header): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
57 return [(header, None)] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
58 # First step is to parse all the encoded parts into triplets of the form |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
59 # (encoded_string, encoding, charset). For unencoded strings, the last |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
60 # two parts will be None. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
61 words = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
62 for line in header.splitlines(): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
63 parts = ecre.split(line) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
64 first = True |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
65 while parts: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
66 unencoded = parts.pop(0) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
67 if first: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
68 unencoded = unencoded.lstrip() |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
69 first = False |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
70 if unencoded: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
71 words.append((unencoded, None, None)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
72 if parts: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
73 charset = parts.pop(0).lower() |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
74 encoding = parts.pop(0).lower() |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
75 encoded = parts.pop(0) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
76 words.append((encoded, encoding, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
77 # Now loop over words and remove words that consist of whitespace |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
78 # between two encoded strings. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
79 import sys |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
80 droplist = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
81 for n, w in enumerate(words): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
82 if n>1 and w[1] and words[n-2][1] and words[n-1][0].isspace(): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
83 droplist.append(n-1) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
84 for d in reversed(droplist): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
85 del words[d] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
86 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
87 # The next step is to decode each encoded word by applying the reverse |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
88 # base64 or quopri transformation. decoded_words is now a list of the |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
89 # form (decoded_word, charset). |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
90 decoded_words = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
91 for encoded_string, encoding, charset in words: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
92 if encoding is None: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
93 # This is an unencoded word. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
94 decoded_words.append((encoded_string, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
95 elif encoding == 'q': |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
96 word = quoprimime.header_decode(encoded_string) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
97 decoded_words.append((word, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
98 elif encoding == 'b': |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
99 paderr = len(encoded_string) % 4 # Postel's law: add missing padding |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
100 if paderr: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
101 encoded_string += '==='[:4 - paderr] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
102 try: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
103 word = base64mime.decode(encoded_string) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
104 except binascii.Error: |
|
5238
758edaa61ec0
pylint flagged HeaderParseError as an Undefined variable.
John Rouillard <rouilj@ieee.org>
parents:
5090
diff
changeset
|
105 raise email.errors.HeaderParseError('Base64 decoding error') |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
106 else: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
107 decoded_words.append((word, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
108 else: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
109 raise AssertionError('Unexpected encoding: ' + encoding) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
110 # Now convert all words to bytes and collapse consecutive runs of |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
111 # similarly encoded words. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
112 collapsed = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
113 last_word = last_charset = None |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
114 for word, charset in decoded_words: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
115 if isinstance(word, str): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
116 pass |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
117 if last_word is None: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
118 last_word = word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
119 last_charset = charset |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
120 elif charset != last_charset: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
121 collapsed.append((last_word, last_charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
122 last_word = word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
123 last_charset = charset |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
124 elif last_charset is None: |
|
5090
89c2c1a88927
issue2550850 anypy/email_.py uses BSPACE which is not defined in python 2.7
John Rouillard <rouilj@ieee.org>
parents:
4983
diff
changeset
|
125 BSPACE = b' ' |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
126 last_word += BSPACE + word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
127 else: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
128 last_word += word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
129 collapsed.append((last_word, last_charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
130 return collapsed |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
131 |
