Mercurial > p > roundup > code
annotate roundup/token_r.py @ 7178:db06d4aeb978
unshadow stdlib token from roundup's token.
This bites me every now and again when running pytest and pdb. Some
submodules want to load the stdlib python and end up getting roundup's
python and thing break with N_TOKENS not defined etc.
So rename token.py to token_r.py (token_r(oundup)... hey naming things
is hard) an change code as needed.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Sun, 26 Feb 2023 12:00:35 -0500 |
| parents | |
| children | 07ce4e4110f5 |
| rev | line source |
|---|---|
|
7178
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
1 # |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
2 # Copyright (c) 2001 Richard Jones, richard@bofh.asn.au. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
3 # This module is free software, and you may redistribute it and/or modify |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
4 # under the same terms as Python, so long as this copyright message and |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
5 # disclaimer are retained in their original form. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
6 # |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
7 # This module is distributed in the hope that it will be useful, |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
10 # |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
11 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
12 """This module provides the tokeniser used by roundup-admin. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
13 """ |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
14 __docformat__ = 'restructuredtext' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
15 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
16 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
17 def token_split(s, whitespace=' \r\n\t', quotes='\'"', |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
18 escaped={'r': '\r', 'n': '\n', 't': '\t'}): |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
19 r'''Split the string up into tokens. An occurence of a ``'`` or ``"`` in |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
20 the input will cause the splitter to ignore whitespace until a matching |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
21 quote char is found. Embedded non-matching quote chars are also skipped. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
22 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
23 Whitespace and quoting characters may be escaped using a backslash. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
24 ``\r``, ``\n`` and ``\t`` are converted to carriage-return, newline and |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
25 tab. All other backslashed characters are left as-is. |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
26 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
27 Valid examples:: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
28 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
29 hello world (2 tokens: hello, world) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
30 "hello world" (1 token: hello world) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
31 "Roch'e" Compaan (2 tokens: Roch'e Compaan) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
32 Roch\'e Compaan (2 tokens: Roch'e Compaan) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
33 address="1 2 3" (1 token: address=1 2 3) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
34 \\ (1 token: \) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
35 \n (1 token: a newline) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
36 \o (1 token: \o) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
37 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
38 Invalid examples:: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
39 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
40 "hello world (no matching quote) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
41 Roch'e Compaan (no matching quote) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
42 ''' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
43 l = [] |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
44 pos = 0 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
45 NEWTOKEN = 'newtoken' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
46 TOKEN = 'token' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
47 QUOTE = 'quote' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
48 ESCAPE = 'escape' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
49 quotechar = '' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
50 state = NEWTOKEN |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
51 oldstate = '' # one-level state stack ;) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
52 length = len(s) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
53 token = '' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
54 while 1: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
55 # end of string, finish off the current token |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
56 if pos == length: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
57 if state == QUOTE: raise ValueError |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
58 elif state == TOKEN: l.append(token) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
59 break |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
60 c = s[pos] |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
61 if state == NEWTOKEN: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
62 # looking for a new token |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
63 if c in quotes: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
64 # quoted token |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
65 state = QUOTE |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
66 quotechar = c |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
67 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
68 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
69 elif c in whitespace: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
70 # skip whitespace |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
71 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
72 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
73 elif c == '\\': |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
74 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
75 oldstate = TOKEN |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
76 state = ESCAPE |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
77 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
78 # otherwise we have a token |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
79 state = TOKEN |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
80 elif state == TOKEN: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
81 if c in whitespace: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
82 # have a token, and have just found a whitespace terminator |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
83 l.append(token) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
84 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
85 state = NEWTOKEN |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
86 token = '' |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
87 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
88 elif c in quotes: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
89 # have a token, just found embedded quotes |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
90 state = QUOTE |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
91 quotechar = c |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
92 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
93 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
94 elif c == '\\': |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
95 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
96 oldstate = state |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
97 state = ESCAPE |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
98 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
99 elif state == QUOTE and c == quotechar: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
100 # in a quoted token and found a matching quote char |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
101 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
102 # now we're looking for whitespace |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
103 state = TOKEN |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
104 continue |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
105 elif state == ESCAPE: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
106 # escaped-char conversions (t, r, n) |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
107 # TODO: octal, hexdigit |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
108 state = oldstate |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
109 if c in escaped: |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
110 c = escaped[c] |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
111 # just add this char to the token and move along |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
112 token = token + c |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
113 pos = pos + 1 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
114 return l |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
115 |
|
db06d4aeb978
unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
116 # vim: set filetype=python ts=4 sw=4 et si |
