Bug report
The pure-Python zoneinfo parser accepts a POSIX TZ Mm.w.d transition rule whose third separator is not ., while the C accelerator (and POSIX) rejects it.
Reproduced on main (3.16.0a0), from_file footer path through _parse_tz_str -> _parse_dst_start_end:
python: 3.16.0a0
tzstr : EST5EDT,M3.2X0,M11.1.0
pure zoneinfo._zoneinfo.ZoneInfo.from_file: ZoneInfo(key='EST5EDT,M3.2X0,M11.1.0') -> ACCEPTED
C _zoneinfo.ZoneInfo.from_file: ValueError: Malformed transition rule in TZ string: b'EST5EDT,M3.2X0,M11.1.0'
The pure parser also accepts M3.2!0, M3.2-0, M3.2:0 and a bad separator in the second rule (M11.1X0); the C accelerator rejects all of them.
Cause
Lib/zoneinfo/_zoneinfo.py:710:
m = re.fullmatch(r"M(\d{1,2})\.(\d).(\d)", date, re.ASCII)
The first separator is escaped (\.) but the third is a bare ., which in a regex matches any single character. So M3.2X0 matches with week 2, day 0, and the separator X consumed by the wildcard.
Which side is wrong
The pure parser is wrong. POSIX (IEEE Std 1003.1-2024, Issue 8, Base Definitions, 8.3 "Other Environment Variables", the TZ Mm.w.d rule) specifies the form Mm.w.d with literal . separators between month, week, and day. The C accelerator already enforces both periods at Modules/_zoneinfo.c:1862 and Modules/_zoneinfo.c:1868:
if (*ptr++ != '.') {
return -1;
}
so the pure path should match.
Fix
Escape the third separator so it is a literal .:
m = re.fullmatch(r"M(\d{1,2})\.(\d)\.(\d)", date, re.ASCII)
Non-breaking: every well-formed Mm.w.d rule and all bundled IANA zones still parse; only the previously-accepted malformed separators now raise, matching the C accelerator.
Related
Distinct from #152212 / PR #152213, which fix the separate missing-STD-offset parity gap. This is the Mm.w.d separator regex.
Linked PRs
Bug report
The pure-Python
zoneinfoparser accepts a POSIX TZMm.w.dtransition rule whose third separator is not., while the C accelerator (and POSIX) rejects it.Reproduced on
main(3.16.0a0),from_filefooter path through_parse_tz_str->_parse_dst_start_end:The pure parser also accepts
M3.2!0,M3.2-0,M3.2:0and a bad separator in the second rule (M11.1X0); the C accelerator rejects all of them.Cause
Lib/zoneinfo/_zoneinfo.py:710:The first separator is escaped (
\.) but the third is a bare., which in a regex matches any single character. SoM3.2X0matches with week2, day0, and the separatorXconsumed by the wildcard.Which side is wrong
The pure parser is wrong. POSIX (IEEE Std 1003.1-2024, Issue 8, Base Definitions, 8.3 "Other Environment Variables", the
TZMm.w.drule) specifies the formMm.w.dwith literal.separators between month, week, and day. The C accelerator already enforces both periods atModules/_zoneinfo.c:1862andModules/_zoneinfo.c:1868:so the pure path should match.
Fix
Escape the third separator so it is a literal
.:Non-breaking: every well-formed
Mm.w.drule and all bundled IANA zones still parse; only the previously-accepted malformed separators now raise, matching the C accelerator.Related
Distinct from #152212 / PR #152213, which fix the separate missing-STD-offset parity gap. This is the
Mm.w.dseparator regex.Linked PRs
zoneinfoaccepting invalid seperators in POSIX TZ (Mm.w.d) rules #152247zoneinfoaccepting invalid seperators in POSIX TZ rules (GH-152247) #152265zoneinfoaccepting invalid seperators in POSIX TZ rules (GH-152247) #152266zoneinfoaccepting invalid seperators in POSIX TZ rules (GH-152247) #152267