Code nits in csv.rs#7834
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRefactors the CSV stdlib to change dialect exposure and parsing, standardize type-error messages (using class names), add quotechar keyword handling, apply per-call overrides for dialect options, and adjust reader/writer builder configuration and runtime error strings. ChangesCSV Module Dialect & Reader/Writer Updates
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
crates/stdlib/src/csv.rs (3)
1004-1024:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftThis
skipinitialspacepre-pass rewrites valid CSV data.
split(|x| x == delimiter)is not quote-aware, so rows likea," b , c",dget mutated beforecsv_coreparses them. It also trims trailing spaces, whichskipinitialspaceshould not do. This needs to be handled during parsing rather than by rewriting the raw line first.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/stdlib/src/csv.rs` around lines 1004 - 1024, The current pre-pass using split + trim_spaces (functions/variables: trim_spaces, skipinitialspace, delimiter, input) rewrites quoted fields and trims trailing spaces incorrectly before csv_core parsing; remove this pre-pass and instead implement skipinitialspace behavior inside the CSV parser/tokenizer (the csv_core parsing logic) so you only ignore a single optional space immediately after a delimiter when not inside a quoted field, without trimming trailing spaces or touching characters inside quotes; adjust the parser's state machine to, on seeing a delimiter and then a space, skip that single space only if not in a quoted context, and delete the trim_spaces/split usage and the String::from_utf8 roundtrip so raw input is preserved for csv_core.
866-927:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRemove the reachable panic in writer quotechar handling.
from_argscan producequotechar = Some(None), and this branch turns that intotodo!(). That makescsv.writer(..., quotechar=None)crash instead of raising a Python exception. This builder also only appliesself.quoting, so a dialect's ownquotingvalue is lost unless it came from kwargs. Resolve the effective dialect once and validatequotechar=Nonebefore building. As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/stdlib/src/csv.rs` around lines 866 - 927, In to_writer, avoid the reachable panic from the todo!() when self.quotechar == Some(None) (produced by from_args) and ensure a dialect's quoting is preserved: first resolve an effective dialect/options by merging DialectItem::Str (lookup in GLOBAL_HASHMAP) or DialectItem::Obj with the explicit self.* overrides (delimiter, quotechar, doublequote, lineterminator, escapechar, quoting), validate that an effective quotechar of None is turned into a Python-level error (return a PyErr) instead of calling todo!(), and then build the csv_core::Writer once using those resolved values (extract the differing quotechar/quoting values first, then call builder.delimiter(...).quote(...).double_quote(...).terminator(...).escape(...).quote_style(...) as appropriate) so the dialect quoting is preserved when self.quoting is not provided; refer to the to_writer function, DialectItem::Str/Obj, GLOBAL_HASHMAP, and the fields quotechar and quoting when making these changes.
793-863:⚠️ Potential issue | 🟠 Major | ⚡ Quick winApply the selected dialect's
quotingandlineterminatorhere.When
self.quoting/self.lineterminatorare unset, this block still forces reader settings from the override fields instead of the selected dialect. A registered/object dialect withQUOTE_NONEor a custom terminator therefore reads with the wrong configuration. Resolve one effectivePyDialectfirst, then layer explicit overrides on top. As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/stdlib/src/csv.rs` around lines 793 - 863, The to_reader function currently applies reader settings per-branch and then layers overrides in a way that ignores the selected dialect's quoting and lineterminator when those fields are unset; fix it by first resolving one effective PyDialect (handle DialectItem::Str via GLOBAL_HASHMAP lookup, DialectItem::Obj, and the default "excel") to extract base values for delimiter, doublequote, quotechar, quoting, and lineterminator, then create a single csv_core::ReaderBuilder from those resolved base values and finally apply explicit overrides from self (self.delimiter, self.quotechar, self.quoting, self.lineterminator, self.doublequote, self.escapechar) only when they are Some, removing duplicated builder logic and the redundant reader.terminator(...) at the end so the final reader reflects the selected dialect unless explicitly overridden.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/stdlib/src/csv.rs`:
- Around line 613-623: The lineterminator handling currently forces a single
byte and rejects CRLF; before converting to a single-byte Terminator::Any, check
the extracted string value (from args.kwargs.swap_remove("lineterminator") ->
try_to_value::<&str>(vm)) for the two-character "\r\n" sequence and if matched
set res.lineterminator = Some(csv_core::Terminator::CRLF) (or the equivalent
CRLF variant on csv_core::Terminator); otherwise proceed with the existing
behavior that converts the &str to bytes and enforces exactly_one to create
Terminator::Any, returning the same type error on other invalid lengths.
---
Outside diff comments:
In `@crates/stdlib/src/csv.rs`:
- Around line 1004-1024: The current pre-pass using split + trim_spaces
(functions/variables: trim_spaces, skipinitialspace, delimiter, input) rewrites
quoted fields and trims trailing spaces incorrectly before csv_core parsing;
remove this pre-pass and instead implement skipinitialspace behavior inside the
CSV parser/tokenizer (the csv_core parsing logic) so you only ignore a single
optional space immediately after a delimiter when not inside a quoted field,
without trimming trailing spaces or touching characters inside quotes; adjust
the parser's state machine to, on seeing a delimiter and then a space, skip that
single space only if not in a quoted context, and delete the trim_spaces/split
usage and the String::from_utf8 roundtrip so raw input is preserved for
csv_core.
- Around line 866-927: In to_writer, avoid the reachable panic from the todo!()
when self.quotechar == Some(None) (produced by from_args) and ensure a dialect's
quoting is preserved: first resolve an effective dialect/options by merging
DialectItem::Str (lookup in GLOBAL_HASHMAP) or DialectItem::Obj with the
explicit self.* overrides (delimiter, quotechar, doublequote, lineterminator,
escapechar, quoting), validate that an effective quotechar of None is turned
into a Python-level error (return a PyErr) instead of calling todo!(), and then
build the csv_core::Writer once using those resolved values (extract the
differing quotechar/quoting values first, then call
builder.delimiter(...).quote(...).double_quote(...).terminator(...).escape(...).quote_style(...)
as appropriate) so the dialect quoting is preserved when self.quoting is not
provided; refer to the to_writer function, DialectItem::Str/Obj, GLOBAL_HASHMAP,
and the fields quotechar and quoting when making these changes.
- Around line 793-863: The to_reader function currently applies reader settings
per-branch and then layers overrides in a way that ignores the selected
dialect's quoting and lineterminator when those fields are unset; fix it by
first resolving one effective PyDialect (handle DialectItem::Str via
GLOBAL_HASHMAP lookup, DialectItem::Obj, and the default "excel") to extract
base values for delimiter, doublequote, quotechar, quoting, and lineterminator,
then create a single csv_core::ReaderBuilder from those resolved base values and
finally apply explicit overrides from self (self.delimiter, self.quotechar,
self.quoting, self.lineterminator, self.doublequote, self.escapechar) only when
they are Some, removing duplicated builder logic and the redundant
reader.terminator(...) at the end so the final reader reflects the selected
dialect unless explicitly overridden.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: d4bde050-1a4f-490e-b563-51c12439dee6
📒 Files selected for processing (1)
crates/stdlib/src/csv.rs
| if let Some(lineterminator) = args.kwargs.swap_remove("lineterminator") { | ||
| res.lineterminator = Some(csv_core::Terminator::Any( | ||
| lineterminator | ||
| .try_to_value::<&str>(vm)? | ||
| .bytes() | ||
| .exactly_one() | ||
| .map_err(|_| { | ||
| let msg = r#""lineterminator" must be a 1-character string"#; | ||
| vm.new_type_error(msg.to_owned()) | ||
| vm.new_type_error(r#""lineterminator" must be a 1-character string"#) | ||
| })?, | ||
| )) | ||
| }; |
There was a problem hiding this comment.
Keep "\r\n" accepted for lineterminator kwargs.
This path now requires exactly one byte, so explicit CRLF terminators fail here even though the dialect parser and getter still treat CRLF as valid. Please special-case "\r\n" before falling back to Terminator::Any.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/stdlib/src/csv.rs` around lines 613 - 623, The lineterminator handling
currently forces a single byte and rejects CRLF; before converting to a
single-byte Terminator::Any, check the extracted string value (from
args.kwargs.swap_remove("lineterminator") -> try_to_value::<&str>(vm)) for the
two-character "\r\n" sequence and if matched set res.lineterminator =
Some(csv_core::Terminator::CRLF) (or the equivalent CRLF variant on
csv_core::Terminator); otherwise proceed with the existing behavior that
converts the &str to bytes and enforces exactly_one to create Terminator::Any,
returning the same type error on other invalid lengths.
Summary by CodeRabbit
Bug Fixes
Improvements