Mercurial > p > roundup > code
annotate roundup/token_r.py @ 8523:b6b0da04e768
chore: ruff cleanup.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Thu, 19 Feb 2026 22:24:17 -0500 |
| parents | 9a74dfeb8620 |
| children |
| rev | line source |
|---|---|
|
7178
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
1 # |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
2 # Copyright (c) 2001 Richard Jones, richard@bofh.asn.au. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
3 # This module is free software, and you may redistribute it and/or modify |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
4 # under the same terms as Python, so long as this copyright message and |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
5 # disclaimer are retained in their original form. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
6 # |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
7 # This module is distributed in the hope that it will be useful, |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
10 # |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
11 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
12 """This module provides the tokeniser used by roundup-admin. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
13 """ |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
14 __docformat__ = 'restructuredtext' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
15 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
16 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
17 def token_split(s, whitespace=' \r\n\t', quotes='\'"', |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
18 escaped={'r': '\r', 'n': '\n', 't': '\t'}): |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
19 r'''Split the string up into tokens. An occurence of a ``'`` or ``"`` in |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
20 the input will cause the splitter to ignore whitespace until a matching |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
21 quote char is found. Embedded non-matching quote chars are also skipped. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
22 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
23 Whitespace and quoting characters may be escaped using a backslash. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
24 ``\r``, ``\n`` and ``\t`` are converted to carriage-return, newline and |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
25 tab. All other backslashed characters are left as-is. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
26 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
27 Valid examples:: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
28 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
29 hello world (2 tokens: hello, world) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
30 "hello world" (1 token: hello world) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
31 "Roch'e" Compaan (2 tokens: Roch'e Compaan) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
32 Roch\'e Compaan (2 tokens: Roch'e Compaan) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
33 address="1 2 3" (1 token: address=1 2 3) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
34 \\ (1 token: \) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
35 \n (1 token: a newline) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
36 \o (1 token: \o) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
37 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
38 Invalid examples:: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
39 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
40 "hello world (no matching quote) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
41 Roch'e Compaan (no matching quote) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
42 ''' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
43 l = [] |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
44 pos = 0 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
45 NEWTOKEN = 'newtoken' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
46 TOKEN = 'token' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
47 QUOTE = 'quote' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
48 ESCAPE = 'escape' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
49 quotechar = '' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
50 state = NEWTOKEN |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
51 oldstate = '' # one-level state stack ;) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
52 length = len(s) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
53 token = '' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
54 while 1: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
55 # end of string, finish off the current token |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
56 if pos == length: |
|
7228
07ce4e4110f5
flake8 fixes: whitespace, remove unused imports
John Rouillard <rouilj@ieee.org>
parents:
7178
diff
changeset
|
57 if state == QUOTE: raise ValueError # noqa: E701 |
|
07ce4e4110f5
flake8 fixes: whitespace, remove unused imports
John Rouillard <rouilj@ieee.org>
parents:
7178
diff
changeset
|
58 elif state == TOKEN: l.append(token) # noqa: E701 |
|
7178
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
59 break |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
60 c = s[pos] |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
61 if state == NEWTOKEN: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
62 # looking for a new token |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
63 if c in quotes: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
64 # quoted token |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
65 state = QUOTE |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
66 quotechar = c |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
67 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
68 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
69 elif c in whitespace: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
70 # skip whitespace |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
71 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
72 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
73 elif c == '\\': |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
74 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
75 oldstate = TOKEN |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
76 state = ESCAPE |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
77 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
78 # otherwise we have a token |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
79 state = TOKEN |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
80 elif state == TOKEN: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
81 if c in whitespace: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
82 # have a token, and have just found a whitespace terminator |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
83 l.append(token) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
84 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
85 state = NEWTOKEN |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
86 token = '' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
87 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
88 elif c in quotes: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
89 # have a token, just found embedded quotes |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
90 state = QUOTE |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
91 quotechar = c |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
92 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
93 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
94 elif c == '\\': |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
95 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
96 oldstate = state |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
97 state = ESCAPE |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
98 continue |
|
7859
9a74dfeb8620
feat: can use escaped tokens inside quotes including quotes.
John Rouillard <rouilj@ieee.org>
parents:
7228
diff
changeset
|
99 elif state == QUOTE and c == '\\': |
|
9a74dfeb8620
feat: can use escaped tokens inside quotes including quotes.
John Rouillard <rouilj@ieee.org>
parents:
7228
diff
changeset
|
100 # in a quoted token and found an escape sequence |
|
9a74dfeb8620
feat: can use escaped tokens inside quotes including quotes.
John Rouillard <rouilj@ieee.org>
parents:
7228
diff
changeset
|
101 pos = pos + 1 |
|
9a74dfeb8620
feat: can use escaped tokens inside quotes including quotes.
John Rouillard <rouilj@ieee.org>
parents:
7228
diff
changeset
|
102 oldstate = state |
|
9a74dfeb8620
feat: can use escaped tokens inside quotes including quotes.
John Rouillard <rouilj@ieee.org>
parents:
7228
diff
changeset
|
103 state = ESCAPE |
|
9a74dfeb8620
feat: can use escaped tokens inside quotes including quotes.
John Rouillard <rouilj@ieee.org>
parents:
7228
diff
changeset
|
104 continue |
|
7178
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
105 elif state == QUOTE and c == quotechar: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
106 # in a quoted token and found a matching quote char |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
107 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
108 # now we're looking for whitespace |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
109 state = TOKEN |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
110 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
111 elif state == ESCAPE: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
112 # escaped-char conversions (t, r, n) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
113 # TODO: octal, hexdigit |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
114 state = oldstate |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
115 if c in escaped: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
116 c = escaped[c] |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
117 # just add this char to the token and move along |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
118 token = token + c |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
119 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
120 return l |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
121 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
122 # vim: set filetype=python ts=4 sw=4 et si |
