Mercurial > p > roundup > code
annotate roundup/token.py @ 1655:0feb34b2de71 0.6.0b3
*** empty log message ***
| author | Richard Jones <richard@users.sourceforge.net> |
|---|---|
| date | Mon, 09 Jun 2003 23:51:14 +0000 |
| parents | 9b910e8d987d |
| children | fc52d57c6c3e |
| rev | line source |
|---|---|
|
470
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
1 # |
|
475
a1a44636bace
Fix breakage caused by transaction changes.
Richard Jones <richard@users.sourceforge.net>
parents:
470
diff
changeset
|
2 # Copyright (c) 2001 Richard Jones, richard@bofh.asn.au. |
|
470
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
3 # This module is free software, and you may redistribute it and/or modify |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
4 # under the same terms as Python, so long as this copyright message and |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
5 # disclaimer are retained in their original form. |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
6 # |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
7 # This module is distributed in the hope that it will be useful, |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
10 # |
| 1090 | 11 # $Id: token.py,v 1.3 2002-09-10 00:18:20 richard Exp $ |
|
470
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
12 # |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
13 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
14 __doc__ = """ |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
15 This module provides the tokeniser used by roundup-admin. |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
16 """ |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
17 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
18 def token_split(s, whitespace=' \r\n\t', quotes='\'"', |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
19 escaped={'r':'\r', 'n':'\n', 't':'\t'}): |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
20 '''Split the string up into tokens. An occurence of a ' or " in the |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
21 input will cause the splitter to ignore whitespace until a matching |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
22 quote char is found. Embedded non-matching quote chars are also |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
23 skipped. |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
24 Whitespace and quoting characters may be escaped using a backslash. |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
25 \r, \n and \t are converted to carriage-return, newline and tab. |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
26 All other backslashed characters are left as-is. |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
27 Valid: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
28 hello world (2 tokens: hello, world) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
29 "hello world" (1 token: hello world) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
30 "Roch'e" Compaan (2 tokens: Roch'e Compaan) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
31 Roch\'e Compaan (2 tokens: Roch'e Compaan) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
32 address="1 2 3" (1 token: address=1 2 3) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
33 \\ (1 token: \) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
34 \n (1 token: a newline) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
35 \o (1 token: \o) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
36 Invalid: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
37 "hello world (no matching quote) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
38 Roch'e Compaan (no matching quote) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
39 ''' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
40 l = [] |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
41 pos = 0 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
42 NEWTOKEN = 'newtoken' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
43 TOKEN = 'token' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
44 QUOTE = 'quote' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
45 ESCAPE = 'escape' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
46 quotechar = '' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
47 state = NEWTOKEN |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
48 oldstate = '' # one-level state stack ;) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
49 length = len(s) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
50 finish = 0 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
51 token = '' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
52 while 1: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
53 # end of string, finish off the current token |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
54 if pos == length: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
55 if state == QUOTE: raise ValueError, "unmatched quote" |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
56 elif state == TOKEN: l.append(token) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
57 break |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
58 c = s[pos] |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
59 if state == NEWTOKEN: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
60 # looking for a new token |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
61 if c in quotes: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
62 # quoted token |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
63 state = QUOTE |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
64 quotechar = c |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
65 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
66 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
67 elif c in whitespace: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
68 # skip whitespace |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
69 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
70 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
71 elif c == '\\': |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
72 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
73 oldstate = TOKEN |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
74 state = ESCAPE |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
75 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
76 # otherwise we have a token |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
77 state = TOKEN |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
78 elif state == TOKEN: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
79 if c in whitespace: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
80 # have a token, and have just found a whitespace terminator |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
81 l.append(token) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
82 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
83 state = NEWTOKEN |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
84 token = '' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
85 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
86 elif c in quotes: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
87 # have a token, just found embedded quotes |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
88 state = QUOTE |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
89 quotechar = c |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
90 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
91 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
92 elif c == '\\': |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
93 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
94 oldstate = state |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
95 state = ESCAPE |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
96 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
97 elif state == QUOTE and c == quotechar: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
98 # in a quoted token and found a matching quote char |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
99 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
100 # now we're looking for whitespace |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
101 state = TOKEN |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
102 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
103 elif state == ESCAPE: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
104 # escaped-char conversions (t, r, n) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
105 # TODO: octal, hexdigit |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
106 state = oldstate |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
107 if escaped.has_key(c): |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
108 c = escaped[c] |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
109 # just add this char to the token and move along |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
110 token = token + c |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
111 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
112 return l |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
113 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
114 # vim: set filetype=python ts=4 sw=4 et si |
