-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add missing xmlparser attributes: namespace_prefixes, ordered_attributes, specified_attributes, intern #6494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the 📝 WalkthroughWalkthroughAdded an internal helper macro and expanded the PyExpatLikeXmlParser struct with 22 new fields—including namespace_prefixes, ordered_attributes, specified_attributes, intern, and 17 handler fields—initializing them with appropriate defaults and exposing them as read-write properties. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Comment |
- Added namespace_prefixes, ordered_attributes, specified_attributes (boolean attributes) - Added intern dictionary attribute - Added stub handlers for all missing handler types to ensure compatibility - Created bool_property macro to ensure boolean attributes are converted correctly Co-authored-by: youknowone <69878+youknowone@users.noreply.github.com>
- Tests for buffer_text, namespace_prefixes, ordered_attributes, and specified_attributes now pass Co-authored-by: youknowone <69878+youknowone@users.noreply.github.com>
- Use proper error propagation in bool_property setter instead of unwrap_or - Add comment explaining the purpose of the intern dictionary attribute Co-authored-by: youknowone <69878+youknowone@users.noreply.github.com>
|
Code has been automatically formatted The code in this PR has been formatted using git pull origin copilot/fix-attributeerror-xmlparser |
youknowone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Resolve additional unexpected success:
UNEXPECTED SUCCESS: test_parse_file (test.test_pyexpat.ParseTest.test_parse_file)
UNEXPECTED SUCCESS: test_parse_str (test.test_pyexpat.ParseTest.test_parse_str)
Also investigate why test_socket fails.
These tests now pass with the added xmlparser attributes. Co-authored-by: youknowone <69878+youknowone@users.noreply.github.com>
Fixed in c9be4ac - removed Regarding test_socket: I verified it passes successfully (39.9s) on this branch. The test doesn't use any XML/pyexpat functionality, and my changes only modified |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Lib/test/test_pyexpat.pyis excluded by!Lib/**
📒 Files selected for processing (1)
crates/stdlib/src/pyexpat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/stdlib/src/pyexpat.rs
🧬 Code graph analysis (1)
crates/stdlib/src/pyexpat.rs (2)
crates/vm/src/vm/context.rs (1)
new_static_getset(589-603)crates/vm/src/protocol/object.rs (1)
is_true(47-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Check Rust code with clippy
🔇 Additional comments (4)
crates/stdlib/src/pyexpat.rs (4)
28-46: LGTM! Well-structured boolean property macro.The macro correctly handles boolean conversion using
is_true(vm)?and properly propagates errors. The pattern mirrors the existingcreate_property!macro while adding type-safe boolean handling.
74-96: Verify that stub implementation satisfies the linked issue requirements.The new fields are declared and exposed as properties, but they are not integrated into the parsing logic:
- The boolean flags (
namespace_prefixes,ordered_attributes,specified_attributes) don't affect parser behavior increate_configordo_parse.- The 17 new handler fields are never invoked during parsing (only
start_element,end_element, andcharacter_dataare called indo_parse).- The
interndictionary is initialized but never used for string interning.This stub implementation prevents
AttributeErrorwhen setting these attributes, which may satisfy issue #6487. However, if client code expects these attributes to actually modify parser behavior (e.g.,namespace_prefixes=Trueshould include namespace info in callbacks), further implementation will be needed.Please confirm whether:
- The linked issue only requires these attributes to be settable without error, or
- They must also function correctly (affecting parse output/behavior)
If (2), consider opening follow-up issues to track the actual implementation of these handlers and flags.
117-141: LGTM! Proper initialization with good documentation.All new fields are correctly initialized with appropriate defaults:
- Boolean flags default to
false(matching CPython behavior)internis initialized as an empty dictionary with a helpful comment explaining its purpose- Handler fields default to
None
184-273: LGTM! Handler properties registered consistently.All 17 handler properties are registered following a consistent pattern using the
create_property!macro. The naming convention (PascalCase with "Handler" suffix) aligns with CPython's pyexpat handlers.
| create_bool_property!(ctx, attributes, "buffer_text", class, buffer_text); | ||
| create_bool_property!( | ||
| ctx, | ||
| attributes, | ||
| "namespace_prefixes", | ||
| class, | ||
| namespace_prefixes | ||
| ); | ||
| create_bool_property!( | ||
| ctx, | ||
| attributes, | ||
| "ordered_attributes", | ||
| class, | ||
| ordered_attributes | ||
| ); | ||
| create_bool_property!( | ||
| ctx, | ||
| attributes, | ||
| "specified_attributes", | ||
| class, | ||
| specified_attributes | ||
| ); | ||
| create_property!(ctx, attributes, "intern", class, intern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
CPython xml.parsers.expat xmlparser attribute names buffer_text namespace_prefixes ordered_attributes specified_attributes
💡 Result:
Summary of xml.parsers.expat xmlparser attributes
-
buffer_text: when True, the parser buffers character data to avoid many small CharacterDataHandler calls (Expat normally splits text at line endings). Default False; can be changed at any time. [1]
-
buffer_size / buffer_used: buffer_size sets the buffer capacity used when buffer_text is True (changing it flushes the buffer). buffer_used reports how many bytes (UTF‑8) are currently stored. [1]
-
ordered_attributes: when nonzero, attributes passed to StartElementHandler are a list (name, value, name, value, ...) in the document order instead of a dict mapping names→values. Default False. [1]
-
specified_attributes: when nonzero, the parser reports only attributes that were explicitly specified in the instance document (not those defaulted from DTD attribute declarations). Use with care if you rely on attribute-declaration defaults. Default False. [1]
-
namespace handling / namespace_prefixes: Expat can do namespace processing when created via ParserCreate with a namespace separator (the Python wrapper exposes this via namespace_separator). With namespace processing enabled, Expat consumes xmlns/xmlns:... attributes (so they won’t appear as normal attributes) and expands element and attribute names to "URIlocalname". Namespace declaration events can be observed with StartNamespaceDeclHandler/EndNamespaceDeclHandler. (The behavior of attribute names and namespace-declaration attributes follows Expat’s namespace processing rules.) [2][1]
Sources
- Python xml.parsers.expat (pyexpat) documentation. [1]
- Expat reference: Namespace processing and attribute handling. [2]
🏁 Script executed:
# First, let me check the actual code to see the full context of what's being registered
head -200 crates/stdlib/src/pyexpat.rs | tail -50Repository: RustPython/RustPython
Length of output: 1783
🌐 Web query:
CPython pyexpat intern attribute dictionary
💡 Result:
Short answer: pyexpat interns attribute names/strings when it builds the Python attributes dictionary so the attribute keys/other XML name strings become interned Python str objects (reducing allocations and speeding dict lookups). The C implementation calls its string-intern helper when converting Expat's XML_Char strings to Python strings before putting them into the attributes dict; the public API then hands you a normal dict (or ordered list if ordered_attributes is enabled). See the StartElementHandler docs and the pyexpat.c source where string_intern is used.
Sources: Python pyexpat documentation (StartElementHandler / attributes). [1] PyExpat C implementation showing string_intern usage when constructing handler arguments. [2]
References
[1] xml.parsers.expat — StartElementHandler / attributes (Python stdlib docs).
[2] Modules/pyexpat.c (source: string_intern usage when converting element/attribute names).
🌐 Web query:
CPython xml.parsers.expat ParserCreate namespace_prefixes attribute xmlparser
💡 Result:
Short answer: There is no documented xmlparser.namespace_prefixes attribute in CPython's xml.parsers.expat. To enable namespace processing you pass namespace_separator to ParserCreate(); namespace declarations are reported via StartNamespaceDeclHandler/EndNamespaceDeclHandler and element/attribute names are expanded using the separator. Expat itself can be instructed to return the (uri, localname, prefix) triplet (XML_SetReturnNSTriplet), but that API is part of libexpat and is not exposed as a simple xmlparser.namespace_prefixes attribute in the Python docs. [1][2]
References
- xml.parsers.expat.ParserCreate / xmlparser handlers (namespace_separator, StartNamespaceDeclHandler). [1]
- Expat namespace processing and XML_SetReturnNSTriplet (triplet return). [2]
Sources
[1] Python docs — xml.parsers.expat (ParserCreate, StartNamespaceDeclHandler).
[2] Expat reference — Namespace Processing / XML_SetReturnNSTriplet.
🌐 Web query:
CPython xml.parsers.expat xmlparser complete list of attributes properties
💡 Result:
Below is a concise, complete listing (as given by the CPython docs) of the xml.parsers.expat xmlparser type: its public methods, instance attributes (properties) and the handler attributes you can set. Source: Python standard-library documentation for xml.parsers.expat. [1]
Methods
- Parse(data[, isfinal])
- ParseFile(file)
- SetBase(base)
- GetBase()
- GetInputContext()
- ExternalEntityParserCreate(context[, encoding])
- SetParamEntityParsing(flag)
- UseForeignDTD([flag])
- SetReparseDeferralEnabled(enabled)
- GetReparseDeferralEnabled()
- SetBillionLaughsAttackProtectionActivationThreshold(threshold)
- SetBillionLaughsAttackProtectionMaximumAmplification(max_factor)
- SetAllocTrackerActivationThreshold(threshold)
- SetAllocTrackerMaximumAmplification(max_factor)
Instance attributes / properties
- buffer_size
- buffer_text
- buffer_used
- ordered_attributes
- specified_attributes
Error-related attributes (valid after ExpatError)
- ErrorByteIndex
- ErrorCode
- ErrorColumnNumber
- ErrorLineNumber
Current parse-position attributes
- CurrentByteIndex
- CurrentColumnNumber
- CurrentLineNumber
Handler attributes (assign callable functions: o.handlername = func)
- XmlDeclHandler(version, encoding, standalone)
- StartDoctypeDeclHandler(doctypeName, systemId, publicId, has_internal_subset)
- EndDoctypeDeclHandler()
- ElementDeclHandler(name, model)
- AttlistDeclHandler(elname, attname, type, default, required)
- StartElementHandler(name, attributes)
- EndElementHandler(name)
- ProcessingInstructionHandler(target, data)
- CharacterDataHandler(data)
- UnparsedEntityDeclHandler(entityName, base, systemId, publicId, notationName)
- EntityDeclHandler(entityName, is_parameter_entity, value, base, systemId, publicId, notationName)
- NotationDeclHandler(notationName, base, systemId, publicId)
- StartNamespaceDeclHandler(prefix, uri)
- EndNamespaceDeclHandler(prefix)
- StartCdataSectionHandler()
- EndCdataSectionHandler()
- CommentHandler(data)
- StartResumeHandler() / (see docs for Expat resume-related handlers)
- ExternalEntityRefHandler(context, base, systemId, publicId)
- SkippedEntityHandler(entityName)
- DefaultHandler(data)
- UnknownEncodingHandler(encoding)
- StartDoctypeDeclHandler / EndDoctypeDeclHandler (listed above)
- (Other Expat-specific handlers exposed in the docs — assignable by the same o.handlername = callable pattern)
Notes
- Many handler names and a few newer methods/attributes (e.g., reparse-deferral and protection-related setters) were added in recent CPython/Expat versions; availability depends on your Python/Expat version. See the docs for the exact signatures and version notes.
Reference
[1] Python documentation — xml.parsers.expat (XMLParser / xmlparser) (CPython stdlib).
🏁 Script executed:
# Check the actual implementation in the file to see if there are comments explaining these attributes
grep -A 5 -B 5 "namespace_prefixes\|intern" crates/stdlib/src/pyexpat.rs | head -60Repository: RustPython/RustPython
Length of output: 2765
Remove non-standard attributes namespace_prefixes and intern or add comments explaining why they diverge from CPython's xml.parsers.expat API.
According to the official CPython documentation, the xmlparser instance attributes are: buffer_size, buffer_text, buffer_used, ordered_attributes, and specified_attributes. The RustPython implementation registers buffer_text, ordered_attributes, and specified_attributes correctly, but also includes:
namespace_prefixes— not documented in CPython's public APIintern— not a public xmlparser attribute in CPython (though internal string interning occurs in the C implementation)
Either remove these or add inline documentation explaining them as RustPython extensions to match CPython's behavior and prevent compatibility issues.
🤖 Prompt for AI Agents
In crates/stdlib/src/pyexpat.rs around lines 160–182, the parser registers
non-standard attributes "namespace_prefixes" and "intern" that are not part of
CPython's xml.parsers.expat public API; either remove the create_bool_property!
for "namespace_prefixes" and the create_property! for "intern" to restore
CPython compatibility, or keep them but add concise inline comments immediately
above each registration stating they are RustPython extensions, citing that they
diverge from CPython's documented attributes and why they are present (e.g.,
implementation convenience or internal interning exposure), so callers and
future maintainers understand the intentional API difference.
|
@copilot please check @coderabbitai's review |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 9427 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 690 🌐 Web query: 💡 Result: The CPython implementation of xml.dom.expatbuilder (including the handling of namespace_prefixes) is in the standard-library file Lib/xml/dom/expatbuilder.py in the CPython repo / Python source distributions. You can view the source there; examples/archives with the file:
If you want I can fetch and paste the specific function/lines that mention namespace_prefixes (or open the file from the CPython GitHub master branch and show the exact code). Sources: Citations:
🌐 Web query: 💡 Result: Short answer: CPython's ElementTree/expat integration exposes a parser attribute named namespace_prefixes (xmlparser.namespace_prefixes) implemented in the pyexpat C module; setting it turns on Expat's "return namespace triplets" mode (XML_SetReturnNSTriplet) so namespace declarations/prefixes are reported to handlers (StartNamespaceDecl/EndNamespaceDecl) instead of being hidden. ElementTree's XMLParser uses the pyexpat parser (ParserCreate/ParserCreateNS) and installs the namespace-decl handlers that Surface these events to TreeBuilder/targets. Sources:
References: If you want, I can paste the exact lines from expatbuilder/ElementTree and pyexpat.c that show the attribute and handler wiring. Citations:
🌐 Web query: 💡 Result: Short answer: No — CPython's xml.parsers.expat (pyexpat) xmlparser type does not expose an "intern" attribute. The documented xmlparser attributes (buffer_size, buffer_text, ordered_attributes, specified_attributes, error info, etc.) do not include any "intern" flag. [1] (docs.python.org) If you need name/attribute interning in Python XML parsing, use the SAX expat-based reader which supports the SAX feature/property for string interning (feature_string_interning / property_interning_dict) — that mechanism performs interning for element/attribute names. [2] (android.googlesource.com) Citations:
@youknowone, I'm CodeRabbit responding to your request. Let me clarify my review findings after further investigation: Review CorrectionMy review comment flagged both ✅
|
I've reviewed the code and verified that: ✅ All tests pass (test_xml_etree, SetAttributeTest, ParseTest) However, I don't see the actual review comments from @coderabbitai in the thread. Could you share the specific concerns or suggestions from the review so I can address them? |
|
I'm unable to access external URLs directly. Could you please paste the specific review comments from @coderabbitai here so I can address them? Alternatively, if there are specific changes you'd like me to make based on that review, please let me know. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
❌ intern - Non-Standard Attribute My review comment was correct about this attribute. The intern attribute at line 182 in crates/stdlib/src/pyexpat.rs should either be: Removed for strict CPython compatibility, or please verify it is correct and continue |
The review comment is incorrect. CPython's Evidence:
The |
Implementation Plan for namespace_prefixes Attribute
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.