annotate roundup/token.py @ 5245:bc16d91b7a50

Fix token_split() so its one error throws ValueError w/out extra arg. In Python 3, raise has to throw a single object value without extra arguments. This commit also checked the function's callsites. One needed armoring, didn't have it, and got it. There didn't seem to be anything to do about the other than add a warning comment. All tests pass.
author Eric S. Raymond <esr@thyrsus.com>
date Thu, 24 Aug 2017 17:27:49 -0400
parents 6e3e4f24c753
children 0942fe89e82e
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
1 #
475
a1a44636bace Fix breakage caused by transaction changes.
Richard Jones <richard@users.sourceforge.net>
parents: 470
diff changeset
2 # Copyright (c) 2001 Richard Jones, richard@bofh.asn.au.
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
3 # This module is free software, and you may redistribute it and/or modify
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
4 # under the same terms as Python, so long as this copyright message and
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
5 # disclaimer are retained in their original form.
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
6 #
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
7 # This module is distributed in the hope that it will be useful,
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
10 #
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
11
2005
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
12 """This module provides the tokeniser used by roundup-admin.
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
13 """
2005
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
14 __docformat__ = 'restructuredtext'
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
15
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
16 def token_split(s, whitespace=' \r\n\t', quotes='\'"',
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
17 escaped={'r':'\r', 'n':'\n', 't':'\t'}):
2005
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
18 '''Split the string up into tokens. An occurence of a ``'`` or ``"`` in
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
19 the input will cause the splitter to ignore whitespace until a matching
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
20 quote char is found. Embedded non-matching quote chars are also skipped.
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
21
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
22 Whitespace and quoting characters may be escaped using a backslash.
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
23 ``\r``, ``\n`` and ``\t`` are converted to carriage-return, newline and
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
24 tab. All other backslashed characters are left as-is.
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
25
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
26 Valid examples::
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
27
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
28 hello world (2 tokens: hello, world)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
29 "hello world" (1 token: hello world)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
30 "Roch'e" Compaan (2 tokens: Roch'e Compaan)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
31 Roch\'e Compaan (2 tokens: Roch'e Compaan)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
32 address="1 2 3" (1 token: address=1 2 3)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
33 \\ (1 token: \)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
34 \n (1 token: a newline)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
35 \o (1 token: \o)
2005
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
36
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
37 Invalid examples::
fc52d57c6c3e documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents: 1090
diff changeset
38
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
39 "hello world (no matching quote)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
40 Roch'e Compaan (no matching quote)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
41 '''
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
42 l = []
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
43 pos = 0
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
44 NEWTOKEN = 'newtoken'
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
45 TOKEN = 'token'
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
46 QUOTE = 'quote'
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
47 ESCAPE = 'escape'
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
48 quotechar = ''
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
49 state = NEWTOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
50 oldstate = '' # one-level state stack ;)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
51 length = len(s)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
52 finish = 0
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
53 token = ''
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
54 while 1:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
55 # end of string, finish off the current token
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
56 if pos == length:
5245
bc16d91b7a50 Fix token_split() so its one error throws ValueError w/out extra arg.
Eric S. Raymond <esr@thyrsus.com>
parents: 4570
diff changeset
57 if state == QUOTE: raise ValueError
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
58 elif state == TOKEN: l.append(token)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
59 break
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
60 c = s[pos]
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
61 if state == NEWTOKEN:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
62 # looking for a new token
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
63 if c in quotes:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
64 # quoted token
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
65 state = QUOTE
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
66 quotechar = c
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
67 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
68 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
69 elif c in whitespace:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
70 # skip whitespace
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
71 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
72 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
73 elif c == '\\':
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
74 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
75 oldstate = TOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
76 state = ESCAPE
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
77 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
78 # otherwise we have a token
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
79 state = TOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
80 elif state == TOKEN:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
81 if c in whitespace:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
82 # have a token, and have just found a whitespace terminator
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
83 l.append(token)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
84 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
85 state = NEWTOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
86 token = ''
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
87 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
88 elif c in quotes:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
89 # have a token, just found embedded quotes
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
90 state = QUOTE
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
91 quotechar = c
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
92 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
93 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
94 elif c == '\\':
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
95 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
96 oldstate = state
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
97 state = ESCAPE
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
98 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
99 elif state == QUOTE and c == quotechar:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
100 # in a quoted token and found a matching quote char
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
101 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
102 # now we're looking for whitespace
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
103 state = TOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
104 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
105 elif state == ESCAPE:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
106 # escaped-char conversions (t, r, n)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
107 # TODO: octal, hexdigit
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
108 state = oldstate
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
109 if escaped.has_key(c):
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
110 c = escaped[c]
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
111 # just add this char to the token and move along
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
112 token = token + c
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
113 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
114 return l
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
115
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
116 # vim: set filetype=python ts=4 sw=4 et si

Roundup Issue Tracker: http://roundup-tracker.org/