Mercurial > p > roundup > code
annotate roundup/token.py @ 3192:eb00a2fa0e0e maint-0.8 0.8.0
pre-release stuff
| author | Richard Jones <richard@users.sourceforge.net> |
|---|---|
| date | Wed, 16 Feb 2005 00:29:18 +0000 |
| parents | fc52d57c6c3e |
| children | 6e3e4f24c753 |
| rev | line source |
|---|---|
|
470
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
1 # |
|
475
a1a44636bace
Fix breakage caused by transaction changes.
Richard Jones <richard@users.sourceforge.net>
parents:
470
diff
changeset
|
2 # Copyright (c) 2001 Richard Jones, richard@bofh.asn.au. |
|
470
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
3 # This module is free software, and you may redistribute it and/or modify |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
4 # under the same terms as Python, so long as this copyright message and |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
5 # disclaimer are retained in their original form. |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
6 # |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
7 # This module is distributed in the hope that it will be useful, |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
10 # |
|
2005
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
11 # $Id: token.py,v 1.4 2004-02-11 23:55:08 richard Exp $ |
|
470
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
12 # |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
13 |
|
2005
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
14 """This module provides the tokeniser used by roundup-admin. |
|
470
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
15 """ |
|
2005
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
16 __docformat__ = 'restructuredtext' |
|
470
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
17 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
18 def token_split(s, whitespace=' \r\n\t', quotes='\'"', |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
19 escaped={'r':'\r', 'n':'\n', 't':'\t'}): |
|
2005
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
20 '''Split the string up into tokens. An occurence of a ``'`` or ``"`` in |
|
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
21 the input will cause the splitter to ignore whitespace until a matching |
|
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
22 quote char is found. Embedded non-matching quote chars are also skipped. |
|
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
23 |
|
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
24 Whitespace and quoting characters may be escaped using a backslash. |
|
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
25 ``\r``, ``\n`` and ``\t`` are converted to carriage-return, newline and |
|
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
26 tab. All other backslashed characters are left as-is. |
|
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
27 |
|
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
28 Valid examples:: |
|
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
29 |
|
470
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
30 hello world (2 tokens: hello, world) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
31 "hello world" (1 token: hello world) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
32 "Roch'e" Compaan (2 tokens: Roch'e Compaan) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
33 Roch\'e Compaan (2 tokens: Roch'e Compaan) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
34 address="1 2 3" (1 token: address=1 2 3) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
35 \\ (1 token: \) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
36 \n (1 token: a newline) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
37 \o (1 token: \o) |
|
2005
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
38 |
|
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
39 Invalid examples:: |
|
fc52d57c6c3e
documentation cleanup
Richard Jones <richard@users.sourceforge.net>
parents:
1090
diff
changeset
|
40 |
|
470
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
41 "hello world (no matching quote) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
42 Roch'e Compaan (no matching quote) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
43 ''' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
44 l = [] |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
45 pos = 0 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
46 NEWTOKEN = 'newtoken' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
47 TOKEN = 'token' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
48 QUOTE = 'quote' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
49 ESCAPE = 'escape' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
50 quotechar = '' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
51 state = NEWTOKEN |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
52 oldstate = '' # one-level state stack ;) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
53 length = len(s) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
54 finish = 0 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
55 token = '' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
56 while 1: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
57 # end of string, finish off the current token |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
58 if pos == length: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
59 if state == QUOTE: raise ValueError, "unmatched quote" |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
60 elif state == TOKEN: l.append(token) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
61 break |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
62 c = s[pos] |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
63 if state == NEWTOKEN: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
64 # looking for a new token |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
65 if c in quotes: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
66 # quoted token |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
67 state = QUOTE |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
68 quotechar = c |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
69 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
70 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
71 elif c in whitespace: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
72 # skip whitespace |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
73 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
74 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
75 elif c == '\\': |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
76 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
77 oldstate = TOKEN |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
78 state = ESCAPE |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
79 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
80 # otherwise we have a token |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
81 state = TOKEN |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
82 elif state == TOKEN: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
83 if c in whitespace: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
84 # have a token, and have just found a whitespace terminator |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
85 l.append(token) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
86 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
87 state = NEWTOKEN |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
88 token = '' |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
89 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
90 elif c in quotes: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
91 # have a token, just found embedded quotes |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
92 state = QUOTE |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
93 quotechar = c |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
94 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
95 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
96 elif c == '\\': |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
97 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
98 oldstate = state |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
99 state = ESCAPE |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
100 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
101 elif state == QUOTE and c == quotechar: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
102 # in a quoted token and found a matching quote char |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
103 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
104 # now we're looking for whitespace |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
105 state = TOKEN |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
106 continue |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
107 elif state == ESCAPE: |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
108 # escaped-char conversions (t, r, n) |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
109 # TODO: octal, hexdigit |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
110 state = oldstate |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
111 if escaped.has_key(c): |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
112 c = escaped[c] |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
113 # just add this char to the token and move along |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
114 token = token + c |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
115 pos = pos + 1 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
116 return l |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
117 |
|
9f7320624bc2
Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
118 # vim: set filetype=python ts=4 sw=4 et si |
