annotate roundup/anypy/email_.py @ 5840:b68d3d8531d5 maint-1.6 1.6.1

Changes to prepare for 1.6.1 release.
author John Rouillard <rouilj@ieee.org>
date Wed, 10 Jul 2019 10:35:29 -0400
parents 758edaa61ec0
children 45bfb4bf59c2
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
1 import re
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
2 import binascii
4979
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
3 import email
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
4 from email import quoprimime, base64mime
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
5
4979
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
6 ## please import this file if you are using the email module
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
7 #
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
8 # a "monkey patch" to unify the behaviour of python 2.5 2.6 2.7
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
9 # when generating header files, see http://bugs.python.org/issue1974
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
10 # and https://hg.python.org/cpython/rev/5deb27042e5a/
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
11 # can go away once the minimum requirement is python 2.7
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
12 _oldheaderinit = email.Header.Header.__init__
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
13 def _unifiedheaderinit(self, *args, **kw):
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
14 # override continuation_ws
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
15 kw['continuation_ws'] = ' '
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
16 _oldheaderinit(self, *args, **kw)
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
17 email.Header.Header.__dict__['__init__'] = _unifiedheaderinit
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
18 ##
f1a2bd1dea77 issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents: 4575
diff changeset
19
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
20 # Match encoded-word strings in the form =?charset?q?Hello_World?=
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
21 ecre = re.compile(r'''
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
22 =\? # literal =?
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
23 (?P<charset>[^?]*?) # non-greedy up to the next ? is the charset
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
24 \? # literal ?
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
25 (?P<encoding>[qb]) # either a "q" or a "b", case insensitive
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
26 \? # literal ?
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
27 (?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
28 \?= # literal ?=
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
29 ''', re.VERBOSE | re.IGNORECASE | re.MULTILINE)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
30
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
31
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
32 # Fixed header parser, see my proposed patch and discussions:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
33 # http://bugs.python.org/issue1079 "decode_header does not follow RFC 2047"
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
34 # http://bugs.python.org/issue1467619 "Header.decode_header eats up spaces"
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
35 # This implements the decode_header specific parts of my proposed patch
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
36 # backported to python2.X
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
37 def decode_header(header):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
38 """Decode a message header value without converting charset.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
39
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
40 Returns a list of (string, charset) pairs containing each of the decoded
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
41 parts of the header. Charset is None for non-encoded parts of the header,
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
42 otherwise a lower-case string containing the name of the character set
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
43 specified in the encoded string.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
44
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
45 header may be a string that may or may not contain RFC2047 encoded words,
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
46 or it may be a Header object.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
47
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
48 An email.errors.HeaderParseError may be raised when certain decoding error
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
49 occurs (e.g. a base64 decoding exception).
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
50 """
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
51 # If it is a Header object, we can just return the encoded chunks.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
52 if hasattr(header, '_chunks'):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
53 return [(_charset._encode(string, str(charset)), str(charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
54 for string, charset in header._chunks]
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
55 # If no encoding, just return the header with no charset.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
56 if not ecre.search(header):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
57 return [(header, None)]
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
58 # First step is to parse all the encoded parts into triplets of the form
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
59 # (encoded_string, encoding, charset). For unencoded strings, the last
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
60 # two parts will be None.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
61 words = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
62 for line in header.splitlines():
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
63 parts = ecre.split(line)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
64 first = True
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
65 while parts:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
66 unencoded = parts.pop(0)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
67 if first:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
68 unencoded = unencoded.lstrip()
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
69 first = False
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
70 if unencoded:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
71 words.append((unencoded, None, None))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
72 if parts:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
73 charset = parts.pop(0).lower()
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
74 encoding = parts.pop(0).lower()
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
75 encoded = parts.pop(0)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
76 words.append((encoded, encoding, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
77 # Now loop over words and remove words that consist of whitespace
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
78 # between two encoded strings.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
79 import sys
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
80 droplist = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
81 for n, w in enumerate(words):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
82 if n>1 and w[1] and words[n-2][1] and words[n-1][0].isspace():
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
83 droplist.append(n-1)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
84 for d in reversed(droplist):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
85 del words[d]
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
86
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
87 # The next step is to decode each encoded word by applying the reverse
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
88 # base64 or quopri transformation. decoded_words is now a list of the
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
89 # form (decoded_word, charset).
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
90 decoded_words = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
91 for encoded_string, encoding, charset in words:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
92 if encoding is None:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
93 # This is an unencoded word.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
94 decoded_words.append((encoded_string, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
95 elif encoding == 'q':
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
96 word = quoprimime.header_decode(encoded_string)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
97 decoded_words.append((word, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
98 elif encoding == 'b':
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
99 paderr = len(encoded_string) % 4 # Postel's law: add missing padding
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
100 if paderr:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
101 encoded_string += '==='[:4 - paderr]
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
102 try:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
103 word = base64mime.decode(encoded_string)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
104 except binascii.Error:
5238
758edaa61ec0 pylint flagged HeaderParseError as an Undefined variable.
John Rouillard <rouilj@ieee.org>
parents: 5090
diff changeset
105 raise email.errors.HeaderParseError('Base64 decoding error')
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
106 else:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
107 decoded_words.append((word, charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
108 else:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
109 raise AssertionError('Unexpected encoding: ' + encoding)
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
110 # Now convert all words to bytes and collapse consecutive runs of
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
111 # similarly encoded words.
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
112 collapsed = []
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
113 last_word = last_charset = None
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
114 for word, charset in decoded_words:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
115 if isinstance(word, str):
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
116 pass
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
117 if last_word is None:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
118 last_word = word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
119 last_charset = charset
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
120 elif charset != last_charset:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
121 collapsed.append((last_word, last_charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
122 last_word = word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
123 last_charset = charset
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
124 elif last_charset is None:
5090
89c2c1a88927 issue2550850 anypy/email_.py uses BSPACE which is not defined in python 2.7
John Rouillard <rouilj@ieee.org>
parents: 4983
diff changeset
125 BSPACE = b' '
4575
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
126 last_word += BSPACE + word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
127 else:
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
128 last_word += word
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
129 collapsed.append((last_word, last_charset))
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
130 return collapsed
c426cb251bc7 Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents: 4447
diff changeset
131

Roundup Issue Tracker: http://roundup-tracker.org/