-
Notifications
You must be signed in to change notification settings - Fork 8.1k
ConvertTo-Csv: Quote fields with quotes and newlines when using -UseQuotes AsNeeded
#15765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConvertTo-Csv: Quote fields with quotes and newlines when using -UseQuotes AsNeeded
#15765
Conversation
Fix for issue PowerShell#9284 - Escape fields that contain quotes and newlines, not just those that contain the delimiter
iSazonov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test with new line chars.
src/Microsoft.PowerShell.Commands.Utility/commands/utility/CsvCommands.cs
Outdated
Show resolved
Hide resolved
Added additional tests for different character scenarios that should be handled
|
Re: Testing - Added additional tests for detecting delimiters, newlines and quotes. I did put the test objects inside the |
Cast string as ReadOnlySpan to avoid allocation of char array
test/powershell/Modules/Microsoft.PowerShell.Utility/ConvertTo-Csv.Tests.ps1
Show resolved
Hide resolved
test/powershell/Modules/Microsoft.PowerShell.Utility/ConvertTo-Csv.Tests.ps1
Show resolved
Hide resolved
test/powershell/Modules/Microsoft.PowerShell.Utility/ConvertTo-Csv.Tests.ps1
Show resolved
Hide resolved
Co-authored-by: Ilya <darpa@yandex.ru>
|
This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days. |
ConvertTo-Csv: Quote fields with quotes and newlines when using -UseQuotes AsNeeded
|
🎉 Handy links: |
|
🎉 Handy links: |
Fix for issue #9284 - Escape fields that contain quotes and newlines, not just those that contain the delimiter
PR Summary
This PR fixes the case of CSV fields that include double quotes or newlines in their values. If a Column value contains
",\r,\nor the delimiter then it will be marked as needing to be surrounded by quotes.This follows RFC-4180 rules as to what characters trigger a field to be quoted (except it allows for different delimiters than the comma). This also follows how
ConvertFrom-Csvparses input CSV files.PR Context
-UseQuotes AsNeededcurrently only looks for delimiters, which means that the fieldHello,Worldwill be surrounded in quotes, however the fieldHello"Worldwould not be surrounded in quotes, causing this kind of embarassing slip-up:Newlines are another risk, since they're interpreted as completely different rows.
Performance Caveat: This version is searching for the existence of 4 characters (using
string.IndexOfAny(chars)instead of the original 1 (string.Contains(char). Based on my benchmarks this change does have a miniscule increase in time for that single operation. However, in my naïve in-memory benchmarks against a 100MB CSV file, using-UseQuotes AsNeededwas slightly faster than-UseQuotes Alwaysor-QuoteFields. My guess is that I didn't need to allocate as much memory for those unneeded quotes.PR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerWIP:or[ WIP ]to the beginning of the title (theWIPbot will keep its status check atPendingwhile the prefix is present) and remove the prefix when the PR is ready.(which runs in a different PS Host).