Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
This PR references issue gh-140797
It adds validation to re.Scanner.init that rejects lexicon patterns containing capturing groups. If a user-supplied pattern contains any capturing groups, Scanner now raises ValueError with a clear message advising the use of non-capturing groups (?:...) instead.
  • Loading branch information
Abhi210 committed Nov 3, 2025
commit 29db6ca11a3db0d04006bfe844c0c97bb179e60d
9 changes: 8 additions & 1 deletion Lib/re/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -397,9 +397,16 @@ def __init__(self, lexicon, flags=0):
s = _parser.State()
s.flags = flags
for phrase, action in lexicon:
sub_pattern = _parser.parse(phrase, flags)
if sub_pattern.state.groups != 1: # <- 1 means always has \0
raise ValueError(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write in one line. A line should <= 80 characters.

raise ValueError("Can not use capturing groups in re.Scanner.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. I have resolved it now

"re.Scanner lexicon patterns must not contain capturing groups;\n"
"Please use non-capturing groups (?:...) instead"
)

gid = s.opengroup()
p.append(_parser.SubPattern(s, [
(SUBPATTERN, (gid, 0, 0, _parser.parse(phrase, flags))),
(SUBPATTERN, (gid, 0, 0, sub_pattern)),
]))
s.closegroup(gid, p[-1])
p = _parser.SubPattern(s, [(BRANCH, (None, p))])
Expand Down
19 changes: 19 additions & 0 deletions Lib/test/test_re.py
Original file line number Diff line number Diff line change
Expand Up @@ -1639,6 +1639,25 @@ def s_int(scanner, token): return int(token)
(['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5,
'op+', 'bar'], ''))

def test_bug_140797(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My negligence, please use test_bug_gh140797 as the name.
If no "gh", It may refer to the previous bug tracker.
Sorry for this.

- # bug 140797: remove capturing groups compilation form re.Scanner
+ # gh140797: capturing groups is not allowed in re.Scanner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you again for your time and suggestions!

#bug 140797: remove capturing groups compilation form re.Scanner
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a space after # in comments.

- #Capturing group throws an error
+ # Capturing group throws an error

And add a space after , in functions arguments.

- with self.assertRaisesRegex(ValueError,msg):
+ with self.assertRaisesRegex(ValueError, msg):

Then looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Thank you again! Resolved


#Presence of Capturing group throws an error
lex = [("(a)b", None)]
with self.assertRaises(ValueError):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may check exception message.

msg = "Can not use capturing groups in re.Scanner"
with self.assertRaisesRegex(ValueError, msg):
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! I have resolved it now. Need to learn a lot!

Scanner(lex)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This saves a line...
My fault again.

- Scanner(lex)
+ Scanner([("(a)b", None)])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Testing sure takes the time 😂


#Presence of non-capturing groups should pass normally
s = Scanner([("(?:a)b", lambda scanner, token: token)])
result, rem = s.scan("ab")
self.assertEqual(result,['ab'])
self.assertEqual(rem,'')

#Testing a very complex capturing group
pattern= "(?P<name>a)"
with self.assertRaises(ValueError):
Scanner([(pattern, None)])

def test_bug_448951(self):
# bug 448951 (similar to 429357, but with single char match)
# (Also test greedy matches.)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
The re.Scanner class now forbids regular expressions containing capturing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention that that class is undocumented. You can also use some formatting, even if the link does not work: :class:`!re.Scanner`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you 😊

groups in its lexicon patterns. Patterns using capturing groups could
previously lead to crashes with segmentation fault. Use non-capturing groups
(?:...) instead.
Loading