annotate roundup/token_r.py @ 7228:07ce4e4110f5

flake8 fixes: whitespace, remove unused imports
author John Rouillard <rouilj@ieee.org>
date Sat, 18 Mar 2023 14:16:31 -0400
parents db06d4aeb978
children 9a74dfeb8620
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
7178
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
1 #
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
2 # Copyright (c) 2001 Richard Jones, richard@bofh.asn.au.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
3 # This module is free software, and you may redistribute it and/or modify
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
4 # under the same terms as Python, so long as this copyright message and
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
5 # disclaimer are retained in their original form.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
6 #
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
7 # This module is distributed in the hope that it will be useful,
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
10 #
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
11
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
12 """This module provides the tokeniser used by roundup-admin.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
13 """
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
14 __docformat__ = 'restructuredtext'
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
15
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
16
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
17 def token_split(s, whitespace=' \r\n\t', quotes='\'"',
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
18 escaped={'r': '\r', 'n': '\n', 't': '\t'}):
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
19 r'''Split the string up into tokens. An occurence of a ``'`` or ``"`` in
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
20 the input will cause the splitter to ignore whitespace until a matching
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
21 quote char is found. Embedded non-matching quote chars are also skipped.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
22
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
23 Whitespace and quoting characters may be escaped using a backslash.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
24 ``\r``, ``\n`` and ``\t`` are converted to carriage-return, newline and
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
25 tab. All other backslashed characters are left as-is.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
26
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
27 Valid examples::
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
28
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
29 hello world (2 tokens: hello, world)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
30 "hello world" (1 token: hello world)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
31 "Roch'e" Compaan (2 tokens: Roch'e Compaan)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
32 Roch\'e Compaan (2 tokens: Roch'e Compaan)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
33 address="1 2 3" (1 token: address=1 2 3)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
34 \\ (1 token: \)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
35 \n (1 token: a newline)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
36 \o (1 token: \o)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
37
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
38 Invalid examples::
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
39
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
40 "hello world (no matching quote)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
41 Roch'e Compaan (no matching quote)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
42 '''
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
43 l = []
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
44 pos = 0
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
45 NEWTOKEN = 'newtoken'
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
46 TOKEN = 'token'
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
47 QUOTE = 'quote'
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
48 ESCAPE = 'escape'
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
49 quotechar = ''
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
50 state = NEWTOKEN
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
51 oldstate = '' # one-level state stack ;)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
52 length = len(s)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
53 token = ''
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
54 while 1:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
55 # end of string, finish off the current token
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
56 if pos == length:
7228
07ce4e4110f5 flake8 fixes: whitespace, remove unused imports
John Rouillard <rouilj@ieee.org>
parents: 7178
diff changeset
57 if state == QUOTE: raise ValueError # noqa: E701
07ce4e4110f5 flake8 fixes: whitespace, remove unused imports
John Rouillard <rouilj@ieee.org>
parents: 7178
diff changeset
58 elif state == TOKEN: l.append(token) # noqa: E701
7178
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
59 break
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
60 c = s[pos]
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
61 if state == NEWTOKEN:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
62 # looking for a new token
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
63 if c in quotes:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
64 # quoted token
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
65 state = QUOTE
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
66 quotechar = c
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
67 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
68 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
69 elif c in whitespace:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
70 # skip whitespace
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
71 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
72 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
73 elif c == '\\':
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
74 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
75 oldstate = TOKEN
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
76 state = ESCAPE
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
77 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
78 # otherwise we have a token
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
79 state = TOKEN
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
80 elif state == TOKEN:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
81 if c in whitespace:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
82 # have a token, and have just found a whitespace terminator
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
83 l.append(token)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
84 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
85 state = NEWTOKEN
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
86 token = ''
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
87 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
88 elif c in quotes:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
89 # have a token, just found embedded quotes
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
90 state = QUOTE
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
91 quotechar = c
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
92 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
93 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
94 elif c == '\\':
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
95 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
96 oldstate = state
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
97 state = ESCAPE
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
98 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
99 elif state == QUOTE and c == quotechar:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
100 # in a quoted token and found a matching quote char
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
101 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
102 # now we're looking for whitespace
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
103 state = TOKEN
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
104 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
105 elif state == ESCAPE:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
106 # escaped-char conversions (t, r, n)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
107 # TODO: octal, hexdigit
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
108 state = oldstate
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
109 if c in escaped:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
110 c = escaped[c]
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
111 # just add this char to the token and move along
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
112 token = token + c
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
113 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
114 return l
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
115
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
116 # vim: set filetype=python ts=4 sw=4 et si

Roundup Issue Tracker: http://roundup-tracker.org/