annotate roundup/token_r.py @ 7178:db06d4aeb978

unshadow stdlib token from roundup's token. This bites me every now and again when running pytest and pdb. Some submodules want to load the stdlib python and end up getting roundup's python and thing break with N_TOKENS not defined etc. So rename token.py to token_r.py (token_r(oundup)... hey naming things is hard) an change code as needed.
author John Rouillard <rouilj@ieee.org>
date Sun, 26 Feb 2023 12:00:35 -0500
parents
children 07ce4e4110f5
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
7178
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
1 #
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
2 # Copyright (c) 2001 Richard Jones, richard@bofh.asn.au.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
3 # This module is free software, and you may redistribute it and/or modify
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
4 # under the same terms as Python, so long as this copyright message and
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
5 # disclaimer are retained in their original form.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
6 #
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
7 # This module is distributed in the hope that it will be useful,
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
10 #
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
11
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
12 """This module provides the tokeniser used by roundup-admin.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
13 """
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
14 __docformat__ = 'restructuredtext'
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
15
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
16
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
17 def token_split(s, whitespace=' \r\n\t', quotes='\'"',
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
18 escaped={'r': '\r', 'n': '\n', 't': '\t'}):
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
19 r'''Split the string up into tokens. An occurence of a ``'`` or ``"`` in
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
20 the input will cause the splitter to ignore whitespace until a matching
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
21 quote char is found. Embedded non-matching quote chars are also skipped.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
22
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
23 Whitespace and quoting characters may be escaped using a backslash.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
24 ``\r``, ``\n`` and ``\t`` are converted to carriage-return, newline and
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
25 tab. All other backslashed characters are left as-is.
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
26
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
27 Valid examples::
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
28
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
29 hello world (2 tokens: hello, world)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
30 "hello world" (1 token: hello world)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
31 "Roch'e" Compaan (2 tokens: Roch'e Compaan)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
32 Roch\'e Compaan (2 tokens: Roch'e Compaan)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
33 address="1 2 3" (1 token: address=1 2 3)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
34 \\ (1 token: \)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
35 \n (1 token: a newline)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
36 \o (1 token: \o)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
37
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
38 Invalid examples::
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
39
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
40 "hello world (no matching quote)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
41 Roch'e Compaan (no matching quote)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
42 '''
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
43 l = []
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
44 pos = 0
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
45 NEWTOKEN = 'newtoken'
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
46 TOKEN = 'token'
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
47 QUOTE = 'quote'
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
48 ESCAPE = 'escape'
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
49 quotechar = ''
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
50 state = NEWTOKEN
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
51 oldstate = '' # one-level state stack ;)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
52 length = len(s)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
53 token = ''
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
54 while 1:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
55 # end of string, finish off the current token
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
56 if pos == length:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
57 if state == QUOTE: raise ValueError
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
58 elif state == TOKEN: l.append(token)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
59 break
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
60 c = s[pos]
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
61 if state == NEWTOKEN:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
62 # looking for a new token
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
63 if c in quotes:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
64 # quoted token
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
65 state = QUOTE
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
66 quotechar = c
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
67 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
68 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
69 elif c in whitespace:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
70 # skip whitespace
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
71 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
72 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
73 elif c == '\\':
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
74 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
75 oldstate = TOKEN
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
76 state = ESCAPE
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
77 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
78 # otherwise we have a token
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
79 state = TOKEN
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
80 elif state == TOKEN:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
81 if c in whitespace:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
82 # have a token, and have just found a whitespace terminator
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
83 l.append(token)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
84 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
85 state = NEWTOKEN
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
86 token = ''
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
87 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
88 elif c in quotes:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
89 # have a token, just found embedded quotes
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
90 state = QUOTE
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
91 quotechar = c
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
92 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
93 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
94 elif c == '\\':
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
95 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
96 oldstate = state
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
97 state = ESCAPE
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
98 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
99 elif state == QUOTE and c == quotechar:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
100 # in a quoted token and found a matching quote char
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
101 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
102 # now we're looking for whitespace
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
103 state = TOKEN
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
104 continue
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
105 elif state == ESCAPE:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
106 # escaped-char conversions (t, r, n)
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
107 # TODO: octal, hexdigit
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
108 state = oldstate
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
109 if c in escaped:
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
110 c = escaped[c]
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
111 # just add this char to the token and move along
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
112 token = token + c
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
113 pos = pos + 1
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
114 return l
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
115
db06d4aeb978 unshadow stdlib token from roundup's token.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
116 # vim: set filetype=python ts=4 sw=4 et si

Roundup Issue Tracker: http://roundup-tracker.org/