Skip to content

Code nits in csv.rs#7834

Merged
youknowone merged 2 commits into
RustPython:mainfrom
ShaharNaveh:csv-nits
May 11, 2026
Merged

Code nits in csv.rs#7834
youknowone merged 2 commits into
RustPython:mainfrom
ShaharNaveh:csv-nits

Conversation

@ShaharNaveh

@ShaharNaveh ShaharNaveh commented May 11, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • Bug Fixes

    • Clearer, more consistent error messages for CSV parsing/writing (type names shown instead of raw objects).
    • Fixed several error paths for malformed fields and encoding issues.
  • Improvements

    • Adjusted dialect/format option exposure and parsing for more predictable override behavior.
    • Refined handling of quote/escape character and stricter validation of quoting-related keywords.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 0794b5df-fef6-4276-8bee-ed65b47f55f7

📥 Commits

Reviewing files that changed from the base of the PR and between 72c0270 and 6f7c5ae.

📒 Files selected for processing (1)
  • crates/stdlib/src/csv.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/stdlib/src/csv.rs

📝 Walkthrough

Walkthrough

Refactors the CSV stdlib to change dialect exposure and parsing, standardize type-error messages (using class names), add quotechar keyword handling, apply per-call overrides for dialect options, and adjust reader/writer builder configuration and runtime error strings.

Changes

CSV Module Dialect & Reader/Writer Updates

Layer / File(s) Summary
Type Derives & Module Constants
crates/stdlib/src/csv.rs
Adds Eq, PartialEq to QuoteStyle derives and reorders QUOTE_* constants.
Dialect Python Interface & Getsets
crates/stdlib/src/csv.rs
PyDialect getsets now expose quoting, escapechar, and strict; Reader dialect getter reshaped.
Dialect Attribute Parsers & Error Messages
crates/stdlib/src/csv.rs
parse_* helpers (delimiter, quotechar, escapechar, lineterminator, quoting) now format type errors using attr.class().name().
Dialect Registration & Lookup
crates/stdlib/src/csv.rs
register_dialect, get_dialect, unregister_dialect use class name strings for invalid name type errors.
Writer file API check
crates/stdlib/src/csv.rs
writer constructor error message when file lacks write updated to a raw-string message.
QuoteStyle Derive
crates/stdlib/src/csv.rs
QuoteStyle gains Eq and PartialEq derives.
DialectItem Default
crates/stdlib/src/csv.rs
Introduces DialectItem with Default derive and explicit default variant.
FormatOptions Argument Parsing
crates/stdlib/src/csv.rs
FormatOptions::from_args parses quotechar (handles None), enforces quotechar vs quoting rules, updates strict parsing and leftover-kw error formatting; removes inline quoting keyword block.
FormatOptions -> PyDialect Fill
crates/stdlib/src/csv.rs
update_py_dialect changes how delimiter, escapechar, and quotechar overrides are applied when producing a PyDialect.
Per-call Overrides
crates/stdlib/src/csv.rs
get_skipinitialspace and get_delimiter apply per-call FormatOptions overrides on top of selected dialect values.
Reader Builder Configuration
crates/stdlib/src/csv.rs
to_reader sets/disable quoting when quotechar provided or derives quoting otherwise; centralizes terminator handling.
Writer Builder Configuration
crates/stdlib/src/csv.rs
to_writer applies terminator default and writer-side quotechar/doublequote/escapechar/quoting in updated order.
Reader Record Parsing & Error Handling
crates/stdlib/src/csv.rs
Reader::next refactors newline-in-unquoted-field message and revises field-length and UTF‑8 decode error handling; adds TODO in conversion path.
Writer Row Serialization
crates/stdlib/src/csv.rs
Writer::writerow updates non-iterable row type error to include the row's class name.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I nibble commas, tabs, and quote,
I tidy dialects and notes they wrote,
Error names now wear a kinder face,
Readers and writers find their place,
A little hop for CSV grace.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Code nits in csv.rs' is vague and non-descriptive. While it indicates changes to the csv.rs file, it doesn't convey what specific improvements or changes were made, using the generic term 'nits' which obscures the actual substance of the work. Provide a more specific title that describes the main changes, such as 'Refactor error messages and type handling in csv.rs' or 'Update dialect attribute parsing and improve error formatting in csv.rs'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
crates/stdlib/src/csv.rs (3)

1004-1024: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

This skipinitialspace pre-pass rewrites valid CSV data.

split(|x| x == delimiter) is not quote-aware, so rows like a," b , c",d get mutated before csv_core parses them. It also trims trailing spaces, which skipinitialspace should not do. This needs to be handled during parsing rather than by rewriting the raw line first.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/stdlib/src/csv.rs` around lines 1004 - 1024, The current pre-pass
using split + trim_spaces (functions/variables: trim_spaces, skipinitialspace,
delimiter, input) rewrites quoted fields and trims trailing spaces incorrectly
before csv_core parsing; remove this pre-pass and instead implement
skipinitialspace behavior inside the CSV parser/tokenizer (the csv_core parsing
logic) so you only ignore a single optional space immediately after a delimiter
when not inside a quoted field, without trimming trailing spaces or touching
characters inside quotes; adjust the parser's state machine to, on seeing a
delimiter and then a space, skip that single space only if not in a quoted
context, and delete the trim_spaces/split usage and the String::from_utf8
roundtrip so raw input is preserved for csv_core.

866-927: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Remove the reachable panic in writer quotechar handling.

from_args can produce quotechar = Some(None), and this branch turns that into todo!(). That makes csv.writer(..., quotechar=None) crash instead of raising a Python exception. This builder also only applies self.quoting, so a dialect's own quoting value is lost unless it came from kwargs. Resolve the effective dialect once and validate quotechar=None before building. As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/stdlib/src/csv.rs` around lines 866 - 927, In to_writer, avoid the
reachable panic from the todo!() when self.quotechar == Some(None) (produced by
from_args) and ensure a dialect's quoting is preserved: first resolve an
effective dialect/options by merging DialectItem::Str (lookup in GLOBAL_HASHMAP)
or DialectItem::Obj with the explicit self.* overrides (delimiter, quotechar,
doublequote, lineterminator, escapechar, quoting), validate that an effective
quotechar of None is turned into a Python-level error (return a PyErr) instead
of calling todo!(), and then build the csv_core::Writer once using those
resolved values (extract the differing quotechar/quoting values first, then call
builder.delimiter(...).quote(...).double_quote(...).terminator(...).escape(...).quote_style(...)
as appropriate) so the dialect quoting is preserved when self.quoting is not
provided; refer to the to_writer function, DialectItem::Str/Obj, GLOBAL_HASHMAP,
and the fields quotechar and quoting when making these changes.

793-863: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply the selected dialect's quoting and lineterminator here.

When self.quoting/self.lineterminator are unset, this block still forces reader settings from the override fields instead of the selected dialect. A registered/object dialect with QUOTE_NONE or a custom terminator therefore reads with the wrong configuration. Resolve one effective PyDialect first, then layer explicit overrides on top. As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/stdlib/src/csv.rs` around lines 793 - 863, The to_reader function
currently applies reader settings per-branch and then layers overrides in a way
that ignores the selected dialect's quoting and lineterminator when those fields
are unset; fix it by first resolving one effective PyDialect (handle
DialectItem::Str via GLOBAL_HASHMAP lookup, DialectItem::Obj, and the default
"excel") to extract base values for delimiter, doublequote, quotechar, quoting,
and lineterminator, then create a single csv_core::ReaderBuilder from those
resolved base values and finally apply explicit overrides from self
(self.delimiter, self.quotechar, self.quoting, self.lineterminator,
self.doublequote, self.escapechar) only when they are Some, removing duplicated
builder logic and the redundant reader.terminator(...) at the end so the final
reader reflects the selected dialect unless explicitly overridden.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/stdlib/src/csv.rs`:
- Around line 613-623: The lineterminator handling currently forces a single
byte and rejects CRLF; before converting to a single-byte Terminator::Any, check
the extracted string value (from args.kwargs.swap_remove("lineterminator") ->
try_to_value::<&str>(vm)) for the two-character "\r\n" sequence and if matched
set res.lineterminator = Some(csv_core::Terminator::CRLF) (or the equivalent
CRLF variant on csv_core::Terminator); otherwise proceed with the existing
behavior that converts the &str to bytes and enforces exactly_one to create
Terminator::Any, returning the same type error on other invalid lengths.

---

Outside diff comments:
In `@crates/stdlib/src/csv.rs`:
- Around line 1004-1024: The current pre-pass using split + trim_spaces
(functions/variables: trim_spaces, skipinitialspace, delimiter, input) rewrites
quoted fields and trims trailing spaces incorrectly before csv_core parsing;
remove this pre-pass and instead implement skipinitialspace behavior inside the
CSV parser/tokenizer (the csv_core parsing logic) so you only ignore a single
optional space immediately after a delimiter when not inside a quoted field,
without trimming trailing spaces or touching characters inside quotes; adjust
the parser's state machine to, on seeing a delimiter and then a space, skip that
single space only if not in a quoted context, and delete the trim_spaces/split
usage and the String::from_utf8 roundtrip so raw input is preserved for
csv_core.
- Around line 866-927: In to_writer, avoid the reachable panic from the todo!()
when self.quotechar == Some(None) (produced by from_args) and ensure a dialect's
quoting is preserved: first resolve an effective dialect/options by merging
DialectItem::Str (lookup in GLOBAL_HASHMAP) or DialectItem::Obj with the
explicit self.* overrides (delimiter, quotechar, doublequote, lineterminator,
escapechar, quoting), validate that an effective quotechar of None is turned
into a Python-level error (return a PyErr) instead of calling todo!(), and then
build the csv_core::Writer once using those resolved values (extract the
differing quotechar/quoting values first, then call
builder.delimiter(...).quote(...).double_quote(...).terminator(...).escape(...).quote_style(...)
as appropriate) so the dialect quoting is preserved when self.quoting is not
provided; refer to the to_writer function, DialectItem::Str/Obj, GLOBAL_HASHMAP,
and the fields quotechar and quoting when making these changes.
- Around line 793-863: The to_reader function currently applies reader settings
per-branch and then layers overrides in a way that ignores the selected
dialect's quoting and lineterminator when those fields are unset; fix it by
first resolving one effective PyDialect (handle DialectItem::Str via
GLOBAL_HASHMAP lookup, DialectItem::Obj, and the default "excel") to extract
base values for delimiter, doublequote, quotechar, quoting, and lineterminator,
then create a single csv_core::ReaderBuilder from those resolved base values and
finally apply explicit overrides from self (self.delimiter, self.quotechar,
self.quoting, self.lineterminator, self.doublequote, self.escapechar) only when
they are Some, removing duplicated builder logic and the redundant
reader.terminator(...) at the end so the final reader reflects the selected
dialect unless explicitly overridden.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: d4bde050-1a4f-490e-b563-51c12439dee6

📥 Commits

Reviewing files that changed from the base of the PR and between 320355f and 72c0270.

📒 Files selected for processing (1)
  • crates/stdlib/src/csv.rs

Comment thread crates/stdlib/src/csv.rs
Comment on lines 613 to 623
if let Some(lineterminator) = args.kwargs.swap_remove("lineterminator") {
res.lineterminator = Some(csv_core::Terminator::Any(
lineterminator
.try_to_value::<&str>(vm)?
.bytes()
.exactly_one()
.map_err(|_| {
let msg = r#""lineterminator" must be a 1-character string"#;
vm.new_type_error(msg.to_owned())
vm.new_type_error(r#""lineterminator" must be a 1-character string"#)
})?,
))
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep "\r\n" accepted for lineterminator kwargs.

This path now requires exactly one byte, so explicit CRLF terminators fail here even though the dialect parser and getter still treat CRLF as valid. Please special-case "\r\n" before falling back to Terminator::Any.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/stdlib/src/csv.rs` around lines 613 - 623, The lineterminator handling
currently forces a single byte and rejects CRLF; before converting to a
single-byte Terminator::Any, check the extracted string value (from
args.kwargs.swap_remove("lineterminator") -> try_to_value::<&str>(vm)) for the
two-character "\r\n" sequence and if matched set res.lineterminator =
Some(csv_core::Terminator::CRLF) (or the equivalent CRLF variant on
csv_core::Terminator); otherwise proceed with the existing behavior that
converts the &str to bytes and enforces exactly_one to create Terminator::Any,
returning the same type error on other invalid lengths.

@youknowone youknowone merged commit fc00fc2 into RustPython:main May 11, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants