Mercurial > p > roundup > code
annotate roundup/anypy/email_.py @ 5543:bc3e00a3d24b
MySQL backend fixes for Python 3.
With Python 2, text sent to and from MySQL is treated as bytes in
Python. The database may be recorded by MySQL as having some other
encoding (latin1 being the default in some MySQL versions - Roundup
does not set an encoding explicitly, unlike in back_postgresql), but
as long as MySQL's notion of the connection encoding agrees with its
notion of the database encoding, no conversions actually take place
and the bytes are stored and returned as-is.
With Python 3, text sent to and from MySQL is treated as Python
Unicode strings. When the database and connection encoding is latin1,
that means the bytes stored in the database under Python 2 are
interpreted as latin1 and converted from that to Unicode, producing
incorrect results for any non-ASCII characters; furthermore, if trying
to store new non-ASCII data in the database under Python 3, any
non-latin1 characters produce errors.
This patch arranges for both the connection and database character
sets to be UTF-8 when using Python 3, and documents a need to export
and import the database when moving from Python 2 to Python 3 with
this backend.
| author | Joseph Myers <jsm@polyomino.org.uk> |
|---|---|
| date | Sun, 16 Sep 2018 16:19:20 +0000 |
| parents | 29346d92d80c |
| children | cacef71b3a54 |
| rev | line source |
|---|---|
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
1 import re |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
2 import binascii |
|
4979
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
3 import email |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
4 from email import quoprimime, base64mime |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
5 |
|
5494
b7fa56ced601
use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5421
diff
changeset
|
6 if str == bytes: |
|
b7fa56ced601
use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5421
diff
changeset
|
7 message_from_bytes = email.message_from_string |
|
5542
29346d92d80c
Fix email interfaces with Python 3 (issue 2550974, issue 2551000).
Joseph Myers <jsm@polyomino.org.uk>
parents:
5494
diff
changeset
|
8 message_from_binary_file = email.message_from_file |
|
5494
b7fa56ced601
use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5421
diff
changeset
|
9 else: |
|
b7fa56ced601
use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5421
diff
changeset
|
10 message_from_bytes = email.message_from_bytes |
|
5542
29346d92d80c
Fix email interfaces with Python 3 (issue 2550974, issue 2551000).
Joseph Myers <jsm@polyomino.org.uk>
parents:
5494
diff
changeset
|
11 message_from_binary_file = email.message_from_binary_file |
|
5494
b7fa56ced601
use gpg module instead of pyme module for PGP encryption
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5421
diff
changeset
|
12 |
|
4979
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
13 ## please import this file if you are using the email module |
|
f1a2bd1dea77
issue2550877: Writing headers with the email module will use continuation_ws = ' ' now for python 2.5 and 2.6 when importing anypy.email_.
Bernhard Reiter <bernhard@intevation.de>
parents:
4575
diff
changeset
|
14 |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
15 # Match encoded-word strings in the form =?charset?q?Hello_World?= |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
16 ecre = re.compile(r''' |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
17 =\? # literal =? |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
18 (?P<charset>[^?]*?) # non-greedy up to the next ? is the charset |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
19 \? # literal ? |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
20 (?P<encoding>[qb]) # either a "q" or a "b", case insensitive |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
21 \? # literal ? |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
22 (?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
23 \?= # literal ?= |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
24 ''', re.VERBOSE | re.IGNORECASE | re.MULTILINE) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
25 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
26 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
27 # Fixed header parser, see my proposed patch and discussions: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
28 # http://bugs.python.org/issue1079 "decode_header does not follow RFC 2047" |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
29 # http://bugs.python.org/issue1467619 "Header.decode_header eats up spaces" |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
30 # This implements the decode_header specific parts of my proposed patch |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
31 # backported to python2.X |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
32 def decode_header(header): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
33 """Decode a message header value without converting charset. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
34 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
35 Returns a list of (string, charset) pairs containing each of the decoded |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
36 parts of the header. Charset is None for non-encoded parts of the header, |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
37 otherwise a lower-case string containing the name of the character set |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
38 specified in the encoded string. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
39 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
40 header may be a string that may or may not contain RFC2047 encoded words, |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
41 or it may be a Header object. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
42 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
43 An email.errors.HeaderParseError may be raised when certain decoding error |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
44 occurs (e.g. a base64 decoding exception). |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
45 """ |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
46 # If it is a Header object, we can just return the encoded chunks. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
47 if hasattr(header, '_chunks'): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
48 return [(_charset._encode(string, str(charset)), str(charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
49 for string, charset in header._chunks] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
50 # If no encoding, just return the header with no charset. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
51 if not ecre.search(header): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
52 return [(header, None)] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
53 # First step is to parse all the encoded parts into triplets of the form |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
54 # (encoded_string, encoding, charset). For unencoded strings, the last |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
55 # two parts will be None. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
56 words = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
57 for line in header.splitlines(): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
58 parts = ecre.split(line) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
59 first = True |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
60 while parts: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
61 unencoded = parts.pop(0) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
62 if first: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
63 unencoded = unencoded.lstrip() |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
64 first = False |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
65 if unencoded: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
66 words.append((unencoded, None, None)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
67 if parts: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
68 charset = parts.pop(0).lower() |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
69 encoding = parts.pop(0).lower() |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
70 encoded = parts.pop(0) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
71 words.append((encoded, encoding, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
72 # Now loop over words and remove words that consist of whitespace |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
73 # between two encoded strings. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
74 import sys |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
75 droplist = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
76 for n, w in enumerate(words): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
77 if n>1 and w[1] and words[n-2][1] and words[n-1][0].isspace(): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
78 droplist.append(n-1) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
79 for d in reversed(droplist): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
80 del words[d] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
81 |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
82 # The next step is to decode each encoded word by applying the reverse |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
83 # base64 or quopri transformation. decoded_words is now a list of the |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
84 # form (decoded_word, charset). |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
85 decoded_words = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
86 for encoded_string, encoding, charset in words: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
87 if encoding is None: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
88 # This is an unencoded word. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
89 decoded_words.append((encoded_string, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
90 elif encoding == 'q': |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
91 word = quoprimime.header_decode(encoded_string) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
92 decoded_words.append((word, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
93 elif encoding == 'b': |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
94 paderr = len(encoded_string) % 4 # Postel's law: add missing padding |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
95 if paderr: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
96 encoded_string += '==='[:4 - paderr] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
97 try: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
98 word = base64mime.decode(encoded_string) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
99 except binascii.Error: |
|
5238
758edaa61ec0
pylint flagged HeaderParseError as an Undefined variable.
John Rouillard <rouilj@ieee.org>
parents:
5090
diff
changeset
|
100 raise email.errors.HeaderParseError('Base64 decoding error') |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
101 else: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
102 decoded_words.append((word, charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
103 else: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
104 raise AssertionError('Unexpected encoding: ' + encoding) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
105 # Now convert all words to bytes and collapse consecutive runs of |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
106 # similarly encoded words. |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
107 collapsed = [] |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
108 last_word = last_charset = None |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
109 for word, charset in decoded_words: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
110 if isinstance(word, str): |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
111 pass |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
112 if last_word is None: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
113 last_word = word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
114 last_charset = charset |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
115 elif charset != last_charset: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
116 collapsed.append((last_word, last_charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
117 last_word = word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
118 last_charset = charset |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
119 elif last_charset is None: |
|
5090
89c2c1a88927
issue2550850 anypy/email_.py uses BSPACE which is not defined in python 2.7
John Rouillard <rouilj@ieee.org>
parents:
4983
diff
changeset
|
120 BSPACE = b' ' |
|
4575
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
121 last_word += BSPACE + word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
122 else: |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
123 last_word += word |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
124 collapsed.append((last_word, last_charset)) |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
125 return collapsed |
|
c426cb251bc7
Be more tolerant when parsing RFC2047 encoded mail headers.
Ralf Schlatterbeck <rsc@runtux.com>
parents:
4447
diff
changeset
|
126 |
