-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
gh-140797: Forbid capturing groups in re.Scanner lexicon patterns #140944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
It adds validation to re.Scanner.init that rejects lexicon patterns containing capturing groups. If a user-supplied pattern contains any capturing groups, Scanner now raises ValueError with a clear message advising the use of non-capturing groups (?:...) instead.
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -397,9 +397,16 @@ def __init__(self, lexicon, flags=0): | |
| s = _parser.State() | ||
| s.flags = flags | ||
| for phrase, action in lexicon: | ||
| sub_pattern = _parser.parse(phrase, flags) | ||
| if sub_pattern.state.groups != 1: # <- 1 means always has \0 | ||
| raise ValueError( | ||
|
||
| "re.Scanner lexicon patterns must not contain capturing groups;\n" | ||
Abhi210 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| "Please use non-capturing groups (?:...) instead" | ||
| ) | ||
|
|
||
| gid = s.opengroup() | ||
Abhi210 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| p.append(_parser.SubPattern(s, [ | ||
| (SUBPATTERN, (gid, 0, 0, _parser.parse(phrase, flags))), | ||
| (SUBPATTERN, (gid, 0, 0, sub_pattern)), | ||
| ])) | ||
| s.closegroup(gid, p[-1]) | ||
| p = _parser.SubPattern(s, [(BRANCH, (None, p))]) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1639,6 +1639,25 @@ def s_int(scanner, token): return int(token) | |
| (['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, | ||
| 'op+', 'bar'], '')) | ||
|
|
||
| def test_bug_140797(self): | ||
|
||
| #bug 140797: remove capturing groups compilation form re.Scanner | ||
|
||
|
|
||
| #Presence of Capturing group throws an error | ||
| lex = [("(a)b", None)] | ||
| with self.assertRaises(ValueError): | ||
|
||
| Scanner(lex) | ||
|
||
|
|
||
| #Presence of non-capturing groups should pass normally | ||
| s = Scanner([("(?:a)b", lambda scanner, token: token)]) | ||
| result, rem = s.scan("ab") | ||
| self.assertEqual(result,['ab']) | ||
| self.assertEqual(rem,'') | ||
|
|
||
| #Testing a very complex capturing group | ||
Abhi210 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| pattern= "(?P<name>a)" | ||
| with self.assertRaises(ValueError): | ||
| Scanner([(pattern, None)]) | ||
|
|
||
| def test_bug_448951(self): | ||
| # bug 448951 (similar to 429357, but with single char match) | ||
| # (Also test greedy matches.) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| The re.Scanner class now forbids regular expressions containing capturing | ||
|
||
| groups in its lexicon patterns. Patterns using capturing groups could | ||
| previously lead to crashes with segmentation fault. Use non-capturing groups | ||
| (?:...) instead. | ||
Uh oh!
There was an error while loading. Please reload this page.