Mercurial > p > roundup > code
annotate roundup/anypy/email_.py @ 5460:87f22a5d65ca
container modification while iterating over it
| author | Christof Meerwald <cmeerw@cmeerw.org> |
|---|---|
| date | Tue, 24 Jul 2018 21:25:38 +0100 |
| parents | 45bfb4bf59c2 |
| children | b7fa56ced601 |
| rev | line source |
|---|---|
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
1 import re |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
2 import binascii |
|
4979
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
3 import email |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
4 from email import quoprimime, base64mime |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
5 |
|
4979
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
6 ## please import this file if you are using the email module |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
7 |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
8 # Match encoded-word strings in the form =?charset?q?Hello_World?= |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
9 ecre = re.compile(r''' |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
10 =\? # literal =? |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
11 (?P<charset>[^?]*?) # non-greedy up to the next ? is the charset |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
12 \? # literal ? |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
13 (?P<encoding>[qb]) # either a "q" or a "b", case insensitive |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
14 \? # literal ? |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
15 (?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
16 \?= # literal ?= |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
17 ''', re.VERBOSE | re.IGNORECASE | re.MULTILINE) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
18 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
19 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
20 # Fixed header parser, see my proposed patch and discussions: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
21 # http://bugs.python.org/issue1079 "decode_header does not follow RFC 2047" |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
22 # http://bugs.python.org/issue1467619 "Header.decode_header eats up spaces" |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
23 # This implements the decode_header specific parts of my proposed patch |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
24 # backported to python2.X |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
25 def decode_header(header): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
26 """Decode a message header value without converting charset. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
27 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
28 Returns a list of (string, charset) pairs containing each of the decoded |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
29 parts of the header. Charset is None for non-encoded parts of the header, |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
30 otherwise a lower-case string containing the name of the character set |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
31 specified in the encoded string. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
32 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
33 header may be a string that may or may not contain RFC2047 encoded words, |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
34 or it may be a Header object. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
35 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
36 An email.errors.HeaderParseError may be raised when certain decoding error |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
37 occurs (e.g. a base64 decoding exception). |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
38 """ |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
39 # If it is a Header object, we can just return the encoded chunks. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
40 if hasattr(header, '_chunks'): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
41 return [(_charset._encode(string, str(charset)), str(charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
42 for string, charset in header._chunks] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
43 # If no encoding, just return the header with no charset. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
44 if not ecre.search(header): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
45 return [(header, None)] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
46 # First step is to parse all the encoded parts into triplets of the form |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
47 # (encoded_string, encoding, charset). For unencoded strings, the last |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
48 # two parts will be None. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
49 words = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
50 for line in header.splitlines(): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
51 parts = ecre.split(line) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
52 first = True |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
53 while parts: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
54 unencoded = parts.pop(0) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
55 if first: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
56 unencoded = unencoded.lstrip() |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
57 first = False |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
58 if unencoded: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
59 words.append((unencoded, None, None)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
60 if parts: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
61 charset = parts.pop(0).lower() |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
62 encoding = parts.pop(0).lower() |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
63 encoded = parts.pop(0) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
64 words.append((encoded, encoding, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
65 # Now loop over words and remove words that consist of whitespace |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
66 # between two encoded strings. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
67 import sys |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
68 droplist = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
69 for n, w in enumerate(words): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
70 if n>1 and w[1] and words[n-2][1] and words[n-1][0].isspace(): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
71 droplist.append(n-1) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
72 for d in reversed(droplist): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
73 del words[d] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
74 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
75 # The next step is to decode each encoded word by applying the reverse |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
76 # base64 or quopri transformation. decoded_words is now a list of the |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
77 # form (decoded_word, charset). |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
78 decoded_words = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
79 for encoded_string, encoding, charset in words: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
80 if encoding is None: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
81 # This is an unencoded word. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
82 decoded_words.append((encoded_string, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
83 elif encoding == 'q': |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
84 word = quoprimime.header_decode(encoded_string) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
85 decoded_words.append((word, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
86 elif encoding == 'b': |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
87 paderr = len(encoded_string) % 4 # Postel's law: add missing padding |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
88 if paderr: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
89 encoded_string += '==='[:4 - paderr] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
90 try: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
91 word = base64mime.decode(encoded_string) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
92 except binascii.Error: |
|
5238
758edaa61ec0
pylint flagged HeaderParseError as an Undefined variable.
John Rouillard <rouilj@ieee.org>
parents:
5090
diff
changeset
|
93 raise email.errors.HeaderParseError('Base64 decoding error') |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
94 else: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
95 decoded_words.append((word, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
96 else: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
97 raise AssertionError('Unexpected encoding: ' + encoding) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
98 # Now convert all words to bytes and collapse consecutive runs of |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
99 # similarly encoded words. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
100 collapsed = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
101 last_word = last_charset = None |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
102 for word, charset in decoded_words: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
103 if isinstance(word, str): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
104 pass |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
105 if last_word is None: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
106 last_word = word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
107 last_charset = charset |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
108 elif charset != last_charset: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
109 collapsed.append((last_word, last_charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
110 last_word = word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
111 last_charset = charset |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
112 elif last_charset is None: |
|
5090
89c2c1a88927
issue2550850 anypy/email_.py uses BSPACE which is not defined in python 2.7
John Rouillard <rouilj@ieee.org>
parents:
4983
diff
changeset
|
113 BSPACE = b' ' |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
114 last_word += BSPACE + word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
115 else: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
116 last_word += word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
117 collapsed.append((last_word, last_charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
118 return collapsed |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
119 |
