Code nits in `csv.rs` by ShaharNaveh · Pull Request #7834 · RustPython/RustPython

ShaharNaveh · 2026-05-11T09:23:06Z

Summary by CodeRabbit

Bug Fixes
- Clearer, more consistent error messages for CSV parsing/writing (type names shown instead of raw objects).
- Fixed several error paths for malformed fields and encoding issues.
Improvements
- Adjusted dialect/format option exposure and parsing for more predictable override behavior.
- Refined handling of quote/escape character and stricter validation of quoting-related keywords.

coderabbitai · 2026-05-11T09:23:20Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 0794b5df-fef6-4276-8bee-ed65b47f55f7

📥 Commits

Reviewing files that changed from the base of the PR and between 72c0270 and 6f7c5ae.

📒 Files selected for processing (1)

crates/stdlib/src/csv.rs

🚧 Files skipped from review as they are similar to previous changes (1)

crates/stdlib/src/csv.rs

📝 Walkthrough

Walkthrough

Refactors the CSV stdlib to change dialect exposure and parsing, standardize type-error messages (using class names), add quotechar keyword handling, apply per-call overrides for dialect options, and adjust reader/writer builder configuration and runtime error strings.

Changes

CSV Module Dialect & Reader/Writer Updates

Layer / File(s)	Summary
Type Derives & Module Constants `crates/stdlib/src/csv.rs`	Adds `Eq, PartialEq` to `QuoteStyle` derives and reorders `QUOTE_*` constants.
Dialect Python Interface & Getsets `crates/stdlib/src/csv.rs`	`PyDialect` getsets now expose `quoting`, `escapechar`, and `strict`; `Reader` dialect getter reshaped.
Dialect Attribute Parsers & Error Messages `crates/stdlib/src/csv.rs`	parse_* helpers (`delimiter`, `quotechar`, `escapechar`, `lineterminator`, `quoting`) now format type errors using `attr.class().name()`.
Dialect Registration & Lookup `crates/stdlib/src/csv.rs`	`register_dialect`, `get_dialect`, `unregister_dialect` use class name strings for invalid `name` type errors.
Writer file API check `crates/stdlib/src/csv.rs`	`writer` constructor error message when file lacks `write` updated to a raw-string message.
QuoteStyle Derive `crates/stdlib/src/csv.rs`	`QuoteStyle` gains `Eq` and `PartialEq` derives.
DialectItem Default `crates/stdlib/src/csv.rs`	Introduces `DialectItem` with `Default` derive and explicit default variant.
FormatOptions Argument Parsing `crates/stdlib/src/csv.rs`	`FormatOptions::from_args` parses `quotechar` (handles `None`), enforces `quotechar` vs `quoting` rules, updates `strict` parsing and leftover-kw error formatting; removes inline `quoting` keyword block.
FormatOptions -> PyDialect Fill `crates/stdlib/src/csv.rs`	`update_py_dialect` changes how `delimiter`, `escapechar`, and `quotechar` overrides are applied when producing a `PyDialect`.
Per-call Overrides `crates/stdlib/src/csv.rs`	`get_skipinitialspace` and `get_delimiter` apply per-call `FormatOptions` overrides on top of selected dialect values.
Reader Builder Configuration `crates/stdlib/src/csv.rs`	`to_reader` sets/disable quoting when `quotechar` provided or derives quoting otherwise; centralizes terminator handling.
Writer Builder Configuration `crates/stdlib/src/csv.rs`	`to_writer` applies terminator default and writer-side `quotechar`/`doublequote`/`escapechar`/`quoting` in updated order.
Reader Record Parsing & Error Handling `crates/stdlib/src/csv.rs`	`Reader::next` refactors newline-in-unquoted-field message and revises field-length and UTF‑8 decode error handling; adds TODO in conversion path.
Writer Row Serialization `crates/stdlib/src/csv.rs`	`Writer::writerow` updates non-iterable row type error to include the row's class name.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I nibble commas, tabs, and quote,
I tidy dialects and notes they wrote,
Error names now wear a kinder face,
Readers and writers find their place,
A little hop for CSV grace.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Code nits in `csv.rs`' is vague and non-descriptive. While it indicates changes to the csv.rs file, it doesn't convey what specific improvements or changes were made, using the generic term 'nits' which obscures the actual substance of the work.	Provide a more specific title that describes the main changes, such as 'Refactor error messages and type handling in csv.rs' or 'Update dialect attribute parsing and improve error formatting in csv.rs'.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

crates/stdlib/src/csv.rs (3)
1004-1024: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

This skipinitialspace pre-pass rewrites valid CSV data.

split(|x| x == delimiter) is not quote-aware, so rows like a," b , c",d get mutated before csv_core parses them. It also trims trailing spaces, which skipinitialspace should not do. This needs to be handled during parsing rather than by rewriting the raw line first.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/stdlib/src/csv.rs` around lines 1004 - 1024, The current pre-pass
using split + trim_spaces (functions/variables: trim_spaces, skipinitialspace,
delimiter, input) rewrites quoted fields and trims trailing spaces incorrectly
before csv_core parsing; remove this pre-pass and instead implement
skipinitialspace behavior inside the CSV parser/tokenizer (the csv_core parsing
logic) so you only ignore a single optional space immediately after a delimiter
when not inside a quoted field, without trimming trailing spaces or touching
characters inside quotes; adjust the parser's state machine to, on seeing a
delimiter and then a space, skip that single space only if not in a quoted
context, and delete the trim_spaces/split usage and the String::from_utf8
roundtrip so raw input is preserved for csv_core.
866-927: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Remove the reachable panic in writer quotechar handling.

from_args can produce quotechar = Some(None), and this branch turns that into todo!(). That makes csv.writer(..., quotechar=None) crash instead of raising a Python exception. This builder also only applies self.quoting, so a dialect's own quoting value is lost unless it came from kwargs. Resolve the effective dialect once and validate quotechar=None before building. As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/stdlib/src/csv.rs` around lines 866 - 927, In to_writer, avoid the
reachable panic from the todo!() when self.quotechar == Some(None) (produced by
from_args) and ensure a dialect's quoting is preserved: first resolve an
effective dialect/options by merging DialectItem::Str (lookup in GLOBAL_HASHMAP)
or DialectItem::Obj with the explicit self.* overrides (delimiter, quotechar,
doublequote, lineterminator, escapechar, quoting), validate that an effective
quotechar of None is turned into a Python-level error (return a PyErr) instead
of calling todo!(), and then build the csv_core::Writer once using those
resolved values (extract the differing quotechar/quoting values first, then call
builder.delimiter(...).quote(...).double_quote(...).terminator(...).escape(...).quote_style(...)
as appropriate) so the dialect quoting is preserved when self.quoting is not
provided; refer to the to_writer function, DialectItem::Str/Obj, GLOBAL_HASHMAP,
and the fields quotechar and quoting when making these changes.
793-863: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply the selected dialect's quoting and lineterminator here.

When self.quoting/self.lineterminator are unset, this block still forces reader settings from the override fields instead of the selected dialect. A registered/object dialect with QUOTE_NONE or a custom terminator therefore reads with the wrong configuration. Resolve one effective PyDialect first, then layer explicit overrides on top. As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/stdlib/src/csv.rs` around lines 793 - 863, The to_reader function
currently applies reader settings per-branch and then layers overrides in a way
that ignores the selected dialect's quoting and lineterminator when those fields
are unset; fix it by first resolving one effective PyDialect (handle
DialectItem::Str via GLOBAL_HASHMAP lookup, DialectItem::Obj, and the default
"excel") to extract base values for delimiter, doublequote, quotechar, quoting,
and lineterminator, then create a single csv_core::ReaderBuilder from those
resolved base values and finally apply explicit overrides from self
(self.delimiter, self.quotechar, self.quoting, self.lineterminator,
self.doublequote, self.escapechar) only when they are Some, removing duplicated
builder logic and the redundant reader.terminator(...) at the end so the final
reader reflects the selected dialect unless explicitly overridden.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/stdlib/src/csv.rs`:
- Around line 613-623: The lineterminator handling currently forces a single
byte and rejects CRLF; before converting to a single-byte Terminator::Any, check
the extracted string value (from args.kwargs.swap_remove("lineterminator") ->
try_to_value::<&str>(vm)) for the two-character "\r\n" sequence and if matched
set res.lineterminator = Some(csv_core::Terminator::CRLF) (or the equivalent
CRLF variant on csv_core::Terminator); otherwise proceed with the existing
behavior that converts the &str to bytes and enforces exactly_one to create
Terminator::Any, returning the same type error on other invalid lengths.

---

Outside diff comments:
In `@crates/stdlib/src/csv.rs`:
- Around line 1004-1024: The current pre-pass using split + trim_spaces
(functions/variables: trim_spaces, skipinitialspace, delimiter, input) rewrites
quoted fields and trims trailing spaces incorrectly before csv_core parsing;
remove this pre-pass and instead implement skipinitialspace behavior inside the
CSV parser/tokenizer (the csv_core parsing logic) so you only ignore a single
optional space immediately after a delimiter when not inside a quoted field,
without trimming trailing spaces or touching characters inside quotes; adjust
the parser's state machine to, on seeing a delimiter and then a space, skip that
single space only if not in a quoted context, and delete the trim_spaces/split
usage and the String::from_utf8 roundtrip so raw input is preserved for
csv_core.
- Around line 866-927: In to_writer, avoid the reachable panic from the todo!()
when self.quotechar == Some(None) (produced by from_args) and ensure a dialect's
quoting is preserved: first resolve an effective dialect/options by merging
DialectItem::Str (lookup in GLOBAL_HASHMAP) or DialectItem::Obj with the
explicit self.* overrides (delimiter, quotechar, doublequote, lineterminator,
escapechar, quoting), validate that an effective quotechar of None is turned
into a Python-level error (return a PyErr) instead of calling todo!(), and then
build the csv_core::Writer once using those resolved values (extract the
differing quotechar/quoting values first, then call
builder.delimiter(...).quote(...).double_quote(...).terminator(...).escape(...).quote_style(...)
as appropriate) so the dialect quoting is preserved when self.quoting is not
provided; refer to the to_writer function, DialectItem::Str/Obj, GLOBAL_HASHMAP,
and the fields quotechar and quoting when making these changes.
- Around line 793-863: The to_reader function currently applies reader settings
per-branch and then layers overrides in a way that ignores the selected
dialect's quoting and lineterminator when those fields are unset; fix it by
first resolving one effective PyDialect (handle DialectItem::Str via
GLOBAL_HASHMAP lookup, DialectItem::Obj, and the default "excel") to extract
base values for delimiter, doublequote, quotechar, quoting, and lineterminator,
then create a single csv_core::ReaderBuilder from those resolved base values and
finally apply explicit overrides from self (self.delimiter, self.quotechar,
self.quoting, self.lineterminator, self.doublequote, self.escapechar) only when
they are Some, removing duplicated builder logic and the redundant
reader.terminator(...) at the end so the final reader reflects the selected
dialect unless explicitly overridden.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: d4bde050-1a4f-490e-b563-51c12439dee6

📥 Commits

Reviewing files that changed from the base of the PR and between 320355f and 72c0270.

📒 Files selected for processing (1)

crates/stdlib/src/csv.rs

coderabbitai · 2026-05-11T09:28:13Z

            if let Some(lineterminator) = args.kwargs.swap_remove("lineterminator") {
                res.lineterminator = Some(csv_core::Terminator::Any(
                    lineterminator
                        .try_to_value::<&str>(vm)?
                        .bytes()
                        .exactly_one()
                        .map_err(|_| {
-                            let msg = r#""lineterminator" must be a 1-character string"#;
-                            vm.new_type_error(msg.to_owned())
+                            vm.new_type_error(r#""lineterminator" must be a 1-character string"#)
                        })?,
                ))
            };


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep "\r\n" accepted for lineterminator kwargs.

This path now requires exactly one byte, so explicit CRLF terminators fail here even though the dialect parser and getter still treat CRLF as valid. Please special-case "\r\n" before falling back to Terminator::Any.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/stdlib/src/csv.rs` around lines 613 - 623, The lineterminator handling currently forces a single byte and rejects CRLF; before converting to a single-byte Terminator::Any, check the extracted string value (from args.kwargs.swap_remove("lineterminator") -> try_to_value::<&str>(vm)) for the two-character "\r\n" sequence and if matched set res.lineterminator = Some(csv_core::Terminator::CRLF) (or the equivalent CRLF variant on csv_core::Terminator); otherwise proceed with the existing behavior that converts the &str to bytes and enforces exactly_one to create Terminator::Any, returning the same type error on other invalid lengths.

Code nits in csv.rs

72c0270

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

clippy

6f7c5ae

youknowone approved these changes May 11, 2026

View reviewed changes

youknowone merged commit fc00fc2 into RustPython:main May 11, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code nits in `csv.rs`#7834

Code nits in `csv.rs`#7834
youknowone merged 2 commits into
RustPython:mainfrom
ShaharNaveh:csv-nits

ShaharNaveh commented May 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ShaharNaveh commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShaharNaveh commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading