annotate roundup/token.py @ 1669:17ec0bd6ecc5 maint-0.5

*** empty log message ***
author Richard Jones <richard@users.sourceforge.net>
date Wed, 18 Jun 2003 23:52:54 +0000
parents 9b910e8d987d
children fc52d57c6c3e
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
1 #
475
a1a44636bace Fix breakage caused by transaction changes.
Richard Jones <richard@users.sourceforge.net>
parents: 470
diff changeset
2 # Copyright (c) 2001 Richard Jones, richard@bofh.asn.au.
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
3 # This module is free software, and you may redistribute it and/or modify
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
4 # under the same terms as Python, so long as this copyright message and
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
5 # disclaimer are retained in their original form.
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
6 #
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
7 # This module is distributed in the hope that it will be useful,
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
10 #
1090
9b910e8d987d removed Log
Richard Jones <richard@users.sourceforge.net>
parents: 475
diff changeset
11 # $Id: token.py,v 1.3 2002-09-10 00:18:20 richard Exp $
470
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
12 #
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
13
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
14 __doc__ = """
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
15 This module provides the tokeniser used by roundup-admin.
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
16 """
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
17
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
18 def token_split(s, whitespace=' \r\n\t', quotes='\'"',
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
19 escaped={'r':'\r', 'n':'\n', 't':'\t'}):
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
20 '''Split the string up into tokens. An occurence of a ' or " in the
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
21 input will cause the splitter to ignore whitespace until a matching
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
22 quote char is found. Embedded non-matching quote chars are also
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
23 skipped.
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
24 Whitespace and quoting characters may be escaped using a backslash.
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
25 \r, \n and \t are converted to carriage-return, newline and tab.
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
26 All other backslashed characters are left as-is.
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
27 Valid:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
28 hello world (2 tokens: hello, world)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
29 "hello world" (1 token: hello world)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
30 "Roch'e" Compaan (2 tokens: Roch'e Compaan)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
31 Roch\'e Compaan (2 tokens: Roch'e Compaan)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
32 address="1 2 3" (1 token: address=1 2 3)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
33 \\ (1 token: \)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
34 \n (1 token: a newline)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
35 \o (1 token: \o)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
36 Invalid:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
37 "hello world (no matching quote)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
38 Roch'e Compaan (no matching quote)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
39 '''
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
40 l = []
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
41 pos = 0
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
42 NEWTOKEN = 'newtoken'
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
43 TOKEN = 'token'
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
44 QUOTE = 'quote'
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
45 ESCAPE = 'escape'
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
46 quotechar = ''
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
47 state = NEWTOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
48 oldstate = '' # one-level state stack ;)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
49 length = len(s)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
50 finish = 0
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
51 token = ''
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
52 while 1:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
53 # end of string, finish off the current token
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
54 if pos == length:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
55 if state == QUOTE: raise ValueError, "unmatched quote"
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
56 elif state == TOKEN: l.append(token)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
57 break
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
58 c = s[pos]
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
59 if state == NEWTOKEN:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
60 # looking for a new token
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
61 if c in quotes:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
62 # quoted token
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
63 state = QUOTE
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
64 quotechar = c
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
65 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
66 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
67 elif c in whitespace:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
68 # skip whitespace
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
69 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
70 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
71 elif c == '\\':
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
72 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
73 oldstate = TOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
74 state = ESCAPE
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
75 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
76 # otherwise we have a token
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
77 state = TOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
78 elif state == TOKEN:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
79 if c in whitespace:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
80 # have a token, and have just found a whitespace terminator
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
81 l.append(token)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
82 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
83 state = NEWTOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
84 token = ''
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
85 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
86 elif c in quotes:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
87 # have a token, just found embedded quotes
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
88 state = QUOTE
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
89 quotechar = c
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
90 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
91 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
92 elif c == '\\':
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
93 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
94 oldstate = state
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
95 state = ESCAPE
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
96 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
97 elif state == QUOTE and c == quotechar:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
98 # in a quoted token and found a matching quote char
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
99 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
100 # now we're looking for whitespace
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
101 state = TOKEN
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
102 continue
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
103 elif state == ESCAPE:
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
104 # escaped-char conversions (t, r, n)
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
105 # TODO: octal, hexdigit
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
106 state = oldstate
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
107 if escaped.has_key(c):
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
108 c = escaped[c]
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
109 # just add this char to the token and move along
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
110 token = token + c
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
111 pos = pos + 1
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
112 return l
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
113
9f7320624bc2 Added better tokenising to roundup-admin - handles spaces and stuff.
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
114 # vim: set filetype=python ts=4 sw=4 et si

Roundup Issue Tracker: http://roundup-tracker.org/