Skip to content

Conversation

@lselden
Copy link
Contributor

@lselden lselden commented Jul 13, 2021

Fix for issue #9284 - Escape fields that contain quotes and newlines, not just those that contain the delimiter

PR Summary

This PR fixes the case of CSV fields that include double quotes or newlines in their values. If a Column value contains ",\r,\n or the delimiter then it will be marked as needing to be surrounded by quotes.

This follows RFC-4180 rules as to what characters trigger a field to be quoted (except it allows for different delimiters than the comma). This also follows how ConvertFrom-Csv parses input CSV files.

PR Context

-UseQuotes AsNeeded currently only looks for delimiters, which means that the field Hello,World will be surrounded in quotes, however the field Hello"World would not be surrounded in quotes, causing this kind of embarassing slip-up:

$testObj = [pscustomobject]@{ Greet = "Hello"; Me = "`""; Here = "World" };
# Greet Me Here
# ----- -- ----
# Hello "  World

$str = $testObj | convertto-csv -UseQuotes AsNeeded
# Greet,Me,Here
# Hello,",World  <-- Should be 'Hello,"""",World'

$str | convertfrom-csv
# Greet Me     Here
# ----- --     ----
# Hello ,World

Newlines are another risk, since they're interpreted as completely different rows.

$testObj = [pscustomobject]@{ FirstColumn = "Hello"; SecondColumn = "World`r`nGoodbye" }
FirstColumn,SecondColumn
Hello,World
Goodbye

Performance Caveat: This version is searching for the existence of 4 characters (using string.IndexOfAny(chars) instead of the original 1 (string.Contains(char). Based on my benchmarks this change does have a miniscule increase in time for that single operation. However, in my naïve in-memory benchmarks against a 100MB CSV file, using -UseQuotes AsNeeded was slightly faster than -UseQuotes Always or -QuoteFields. My guess is that I didn't need to allocate as much memory for those unneeded quotes.

PR Checklist

Fix for issue PowerShell#9284 - Escape fields that contain quotes and newlines, not just those that contain the delimiter
@ghost ghost assigned TravisEz13 Jul 13, 2021
@ghost
Copy link

ghost commented Jul 13, 2021

CLA assistant check
All CLA requirements met.

Copy link
Collaborator

@iSazonov iSazonov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test with new line chars.

Added additional tests for different character scenarios that should be handled
@lselden
Copy link
Contributor Author

lselden commented Jul 13, 2021

Re: Testing - Added additional tests for detecting delimiters, newlines and quotes. I did put the test objects inside the It block, since they were only used per-test - let me know if they should be hoisted up to the Context's BeforeAll section.

Cast string as ReadOnlySpan to avoid allocation of char array
@iSazonov iSazonov added the CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log label Jul 13, 2021
Co-authored-by: Ilya <darpa@yandex.ru>
@ghost
Copy link

ghost commented Jul 22, 2021

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@TravisEz13 TravisEz13 changed the title ConvertTo-Csv: Quote fields with quotes and newlines when using -UseQuotes AsNeeded ConvertTo-Csv: Quote fields with quotes and newlines when using -UseQuotes AsNeeded Aug 4, 2021
@TravisEz13 TravisEz13 merged commit 3e86f9b into PowerShell:master Aug 4, 2021
@ghost ghost removed the Review - Needed The PR is being reviewed label Aug 4, 2021
@TravisEz13 TravisEz13 added this to the 7.2.0-preview.9 milestone Aug 4, 2021
@lselden lselden deleted the csv-usequotes-asneeded-rfc4180-compat branch August 11, 2021 13:19
@ghost
Copy link

ghost commented Aug 23, 2021

🎉v7.2.0-preview.9 has been released which incorporates this pull request.:tada:

Handy links:

@ghost
Copy link

ghost commented Sep 28, 2021

🎉v7.2.0-preview.10 has been released which incorporates this pull request.:tada:

Handy links:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants