annotate roundup/token.py @ 6941:bd2c3b2010c3

issue2551232 - modify in-reply-to threading when multiple matches if an email is missing an issue designator, in-reply-to threading is attempted. In this change if in-reply-to threading matches multiple issues, fall back to matching on subject. It used to just arbitrairly choose the first matching issue.
author John Rouillard <rouilj@ieee.org>
date Thu, 08 Sep 2022 18:49:46 -0400
parents f023edbeb24d
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
1 #
475
a1a44636bace Fix breakage caused by transaction changes.
Richard Jones <richard@users.sourceforge.net>
parents: 470
diff changeset
2 # Copyright (c) 2001 Richard Jones, richard@bofh.asn.au.
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
3 # This module is free software, and you may redistribute it and/or modify
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
4 # under the same terms as Python, so long as this copyright message and
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
5 # disclaimer are retained in their original form.
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
6 #
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
7 # This module is distributed in the hope that it will be useful,
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
6017
f023edbeb24d flake8 cleanup: remove unused var; whitespace changes.
John Rouillard <rouilj@ieee.org>
parents: 5813
diff changeset
10 #
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
11
2005
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
12 """This module provides the tokeniser used by roundup-admin.
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
13 """
2005
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
14 __docformat__ = 'restructuredtext'
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
15
6017
f023edbeb24d flake8 cleanup: remove unused var; whitespace changes.
John Rouillard <rouilj@ieee.org>
parents: 5813
diff changeset
16
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
17 def token_split(s, whitespace=' \r\n\t', quotes='\'"',
6017
f023edbeb24d flake8 cleanup: remove unused var; whitespace changes.
John Rouillard <rouilj@ieee.org>
parents: 5813
diff changeset
18 escaped={'r': '\r', 'n': '\n', 't': '\t'}):
5813
4b2c6f7bc9b4 Try make doc string for token_split into a raw string. \o and other
John Rouillard <rouilj@ieee.org>
parents: 5811
diff changeset
19 r'''Split the string up into tokens. An occurence of a ``'`` or ``"`` in
2005
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
20 the input will cause the splitter to ignore whitespace until a matching
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
21 quote char is found. Embedded non-matching quote chars are also skipped.
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
22
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
23 Whitespace and quoting characters may be escaped using a backslash.
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
24 ``\r``, ``\n`` and ``\t`` are converted to carriage-return, newline and
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
25 tab. All other backslashed characters are left as-is.
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
26
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
27 Valid examples::
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
28
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
29 hello world (2 tokens: hello, world)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
30 "hello world" (1 token: hello world)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
31 "Roch'e" Compaan (2 tokens: Roch'e Compaan)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
32 Roch\'e Compaan (2 tokens: Roch'e Compaan)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
33 address="1 2 3" (1 token: address=1 2 3)
5813
4b2c6f7bc9b4 Try make doc string for token_split into a raw string. \o and other
John Rouillard <rouilj@ieee.org>
parents: 5811
diff changeset
34 \\ (1 token: \)
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
35 \n (1 token: a newline)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
36 \o (1 token: \o)
2005
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
37
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
38 Invalid examples::
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
39
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
40 "hello world (no matching quote)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
41 Roch'e Compaan (no matching quote)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
42 '''
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
43 l = []
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
44 pos = 0
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
45 NEWTOKEN = 'newtoken'
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
46 TOKEN = 'token'
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
47 QUOTE = 'quote'
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
48 ESCAPE = 'escape'
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
49 quotechar = ''
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
50 state = NEWTOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
51 oldstate = '' # one-level state stack ;)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
52 length = len(s)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
53 token = ''
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
54 while 1:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
55 # end of string, finish off the current token
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
56 if pos == length:
5245
bc16d91b7a50 Fix token_split() so its one error throws ValueError w/out extra arg.
Eric S. Raymond <esr@thyrsus.com>
parents: 4570
diff changeset
57 if state == QUOTE: raise ValueError
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
58 elif state == TOKEN: l.append(token)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
59 break
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
60 c = s[pos]
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
61 if state == NEWTOKEN:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
62 # looking for a new token
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
63 if c in quotes:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
64 # quoted token
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
65 state = QUOTE
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
66 quotechar = c
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
67 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
68 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
69 elif c in whitespace:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
70 # skip whitespace
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
71 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
72 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
73 elif c == '\\':
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
74 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
75 oldstate = TOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
76 state = ESCAPE
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
77 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
78 # otherwise we have a token
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
79 state = TOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
80 elif state == TOKEN:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
81 if c in whitespace:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
82 # have a token, and have just found a whitespace terminator
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
83 l.append(token)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
84 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
85 state = NEWTOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
86 token = ''
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
87 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
88 elif c in quotes:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
89 # have a token, just found embedded quotes
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
90 state = QUOTE
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
91 quotechar = c
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
92 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
93 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
94 elif c == '\\':
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
95 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
96 oldstate = state
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
97 state = ESCAPE
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
98 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
99 elif state == QUOTE and c == quotechar:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
100 # in a quoted token and found a matching quote char
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
101 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
102 # now we're looking for whitespace
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
103 state = TOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
104 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
105 elif state == ESCAPE:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
106 # escaped-char conversions (t, r, n)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
107 # TODO: octal, hexdigit
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
108 state = oldstate
5381
0942fe89e82e Python 3 preparation: change "x.has_key(y)" to "y in x".
Joseph Myers <jsm@polyomino.org.uk>
parents: 5245
diff changeset
109 if c in escaped:
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
110 c = escaped[c]
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
111 # just add this char to the token and move along
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
112 token = token + c
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
113 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
114 return l
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
115
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
116 # vim: set filetype=python ts=4 sw=4 et si

Roundup Issue Tracker: http://roundup-tracker.org/