Skip to content

Conversation

@jshigetomi
Copy link
Collaborator

@jshigetomi jshigetomi commented Feb 26, 2025

PR Summary

Fixes: #24698

This pull request includes changes to improve the handling of empty and null values in the ConvertFrom-Csv command and updates the corresponding unit tests to verify the new behavior. The most important changes include modifying the Import method in CsvCommands.cs and adding new unit tests in ConvertFrom-Csv.Tests.ps1.

Enhancements to ConvertFrom-Csv command:

Unit tests updates:

PR Context

PR Checklist

@jshigetomi
Copy link
Collaborator Author

I've adjusted the behavior of ConvertFrom-CSV according to the discussion on issue #24698. Essentially I removed the feature that skips blank lines in the Import method in ImportCsvHelper. It still keeps the last cell null so as to keep this from being a breaking change. #17702

@PowerShell PowerShell deleted a comment from azure-pipelines bot Feb 26, 2025
if (values.Count == 1 && string.IsNullOrEmpty(values[0]) && !(_cmdlet is ConvertFromCsvCommand))
{
// skip the blank lines
// Skip empty lines when Import-CSV but keep empty lines when ConvertFromCSV

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug (#24698) affects both ConvertFrom-Csv and Import-Csv; I think the fix should be applied to both commands for consistency. ConvertFrom-Csv was used throughout the original issue for demonstration purposes only.

Do we know if there's a specific reason why the check for values.Count == 1 && string.IsNullOrEmpty(values[0]) was included in the first place?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from Windows PowerShell.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I disable it for Import-CSV as well, `n's are captured as an extra element. Is this the behavior we want?

image

Describe "Import-Csv with different newlines" -Tags "CI" {
    It "Test import-csv with '<name>' newline" -TestCases @(
        @{ name = "CR"; newline = "`r" }
        @{ name = "LF"; newline = "`n" }
        @{ name = "CRLF"; newline = "`r`n" }
        ) {
        param($newline)
        $csvFile = Join-Path $TestDrive -ChildPath $((New-Guid).Guid)
        $delimiter = ','
        "h1,h2,h3$($newline)11,12,13$($newline)21,22,23$($newline)" | Out-File -FilePath $csvFile
        $returnObject = Import-Csv -Path $csvFile -Delimiter $delimiter
        $returnObject.Count | Should -Be 2
        $returnObject[0].h1 | Should -Be 11
        $returnObject[0].h2 | Should -Be 12
        $returnObject[0].h3 | Should -Be 13
        $returnObject[1].h1 | Should -Be 21
        $returnObject[1].h2 | Should -Be 22
        $returnObject[1].h3 | Should -Be 23
    }
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the behavior we want?

No, it isn't.

This is more about ensuring whatever fix is chosen for #24698 is applied to both ConvertFrom-Csv and Import-Csv. In other words, the example CSV data below should deserialize consistently, irrespective of the source being a file or a string.

"P1"
""
""
""

Can #24698 be fixed for both commands without introducing the newline issue?

Copy link
Collaborator Author

@jshigetomi jshigetomi Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it can be deserialized consistently but would result in just one empty string value per header.
For example,

"P1"
""
""
""
# Would produce [pscustomobject] @{ P1 = '' } ignoring the other empty cells
@'
"P1","P2"
"",
"",
'@ | ConvertFrom-Csv
# Would produce [pscustomobject] @{ P1 = ''; P2 = '' } ignoring the other empty cells again

This way we can produce a header only output consistently.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I had in mind when I filed the issue. For each non-empty line in input data, an object should be emitted by ConvertFrom-Csv/Import-Csv.

The following (using either command) produces 3 objects:

@'
"P1"
"a"
"b"
"c"
'@ | ConvertFrom-Csv

Therefore, the following (using either command) should also produce 3 objects:

@'
"P1"
""
""
""
'@ | ConvertFrom-Csv

Likewise, this produces 2 objects, because the empty line is correctly ignored:

@'
"P1"
"a"

"c"
'@ | ConvertFrom-Csv

Therefore, this should also produce 2 objects:

@'
"P1"
""

""
'@ | ConvertFrom-Csv

@iSazonov
Copy link
Collaborator

@jshigetomi Root of the issue in ParseNextRecord method. It does not distinguish between an input string without characters and a string with two double quotes "".

result.Add(current.ToString());
current.Remove(0, current.Length);
// New line outside quote is end of word and end of record
break;

Here current.ToString() returns string.Empty for these two cases. We can check current.Length > 0 before call ToString(). I suggest the fix (with removing

if (values.Count == 1 && string.IsNullOrEmpty(values[0]) && !(_cmdlet is ConvertFromCsvCommand))
{
// Skip empty lines when Import-CSV but keep empty lines when ConvertFromCSV
continue;
}
)

But we get to Zugzwang - sometimes we want to, sometimes we don't. Of course, it would be possible to add more conditions, but this would require a more significant code change in same methods. My suggestion is not to complicate the code. If there are concerns, then make this change an experimental feature.

@iSazonov
Copy link
Collaborator

iSazonov commented Mar 1, 2025

@jshigetomi I checked my guess. I had to make a few more changes. You can see the result here:

master...iSazonov:PowerShell:csvblank

@jshigetomi
Copy link
Collaborator Author

@iSazonov I'm seeing test failure for Import-CSV

image

Maybe a warning to the user would be better for no value csvs?

@iSazonov
Copy link
Collaborator

iSazonov commented Mar 3, 2025

I don't see errors in my branch.

@jshigetomi
Copy link
Collaborator Author

Hmm I'm having some trouble reproducing your results.

master...iSazonov:PowerShell:csvblank#diff-2b77454e7d234e46e6ebdb6c1be4ac0cb9c02b88bbe0e61c9acdfa3f66491e81R1383-R1390

How are you resolving the brackets here?

@iSazonov
Copy link
Collaborator

iSazonov commented Mar 4, 2025

@jshigetomi Please look #25120

@jshigetomi
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@PowerShell PowerShell deleted a comment from azure-pipelines bot Mar 6, 2025
@PowerShell PowerShell deleted a comment from azure-pipelines bot Mar 6, 2025
@jshigetomi
Copy link
Collaborator Author

@surfingoldelephant @iSazonov I added the proposed tests but there seems to be a slight difference in behavior between ConvertFrom-CSV and Import-CSV. Import-CSV is not leaving the last cell $null but placing an empty string.

I think the test cases where the CSV is written to a file then read from the file using Import-CSV is causing the change in behavior. This might be simply a testing limitation. What do you think @iSazonov ?

@surfingoldelephant
Copy link

surfingoldelephant commented Mar 12, 2025

@jshigetomi I believe that's because the test file has an empty line at the end (default Set-Content behavior).

You should see the same result with ConvertFrom-Csv if its input also ends with an empty line.

# Import-Csv:
$file = [IO.Path]::GetTempFileName()
@'
"P1","P2"
,
a,
'@ | Set-Content -LiteralPath $file # Includes empty line at end
$result = Import-Csv -LiteralPath $file

$null -eq $result[1].P2 # False


# ConvertFrom-Csv:
$result = @'
"P1","P2"
,
a,

'@  | ConvertFrom-Csv

$null -eq $result[1].P2 # False

Note this is also the existing behavior (without this PR's fix).

If the same CSV data is used without an empty line at the end, both commands should deserialize the last cell as $null.

# Import-Csv:
$file = [IO.Path]::GetTempFileName()
@'
"P1","P2"
,
a,
'@ | Set-Content -LiteralPath $file -NoNewline # No empty line at end
$result = Import-Csv -LiteralPath $file

$null -eq $result[1].P2 # True


# ConvertFrom-Csv:
$result = @'
"P1","P2"
,
a,
'@  | ConvertFrom-Csv

$null -eq $result[1].P2 # True

@iSazonov
Copy link
Collaborator

iSazonov commented Mar 12, 2025

@jshigetomi I believe that's because the test file has an empty line at the end (default Set-Content behavior).

I think so too. It would be nice to test both cases with last empty line and without it. We could directly use .Net classes to create such files.

One more test I think for Import-Csv if there is a source file with header only.

@surfingoldelephant
Copy link

We could directly use .Net classes to create such files.

Providing Set-Content's input is a single, multi-line string (which is the case with all the tests currently), -NoNewLine also avoids a trailing empty line.

See the Set-Content -LiteralPath $file -NoNewline example in my comment above.

@microsoft-github-policy-service microsoft-github-policy-service bot added the Review - Needed The PR is being reviewed label Mar 19, 2025
@jshigetomi
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jshigetomi
Copy link
Collaborator Author

jshigetomi commented Mar 27, 2025

@surfingoldelephant @iSazonov Thanks for the pointer for Set-Content. I was not aware of the newline behavior.

I made the ConvertFrom-CSV and Import-CSV tests the same and both behave the same.


Describe "Import-Csv with empty and null values" {

Context 'Empty CSV Fields' {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the PR is about edge cases I think about additional tests:

  1. only header presents without newline
  2. header presents with one empty newline

And more:

  1. empty file (-Header parameter presents)
  2. file with one empty line (-Header parameter presents)

Copy link

@surfingoldelephant surfingoldelephant Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, we're expecting those tests to produce no output (zero custom objects).

Let's also include Import-Csv/ConvertFrom-Csv tests with multiple empty lines.

  1. Header only

    @'    
    P1,P2
    '@ | ConvertFrom-Csv
  2. Header followed by one empty line

    @'    
    P1,P2
    
    '@ | ConvertFrom-Csv
  3. Header followed by multiple empty lines

    @'    
    P1,P2
    
    
    '@ | ConvertFrom-Csv
  4. Empty input and -Header specified

    '' | ConvertFrom-Csv -Header P1, P2
  5. One empty line and -Header specified

    @'
    
    '@ | ConvertFrom-Csv -Header P1, P2
  6. Multiple empty lines and -Header specified

    @'
    
    
    '@ | ConvertFrom-Csv -Header P1, P2

Perhaps additional -Header tests to cover the fix:

@'
,
A1,A2
B1,B2
,
'@ | ConvertFrom-Csv -Header P1, P2

# Expected:
[pscustomobject] @{ P1 = '';   P2 = '' }
[pscustomobject] @{ P1 = 'A1'; P2 = 'A2' }
[pscustomobject] @{ P1 = 'B1'; P2 = 'B2' }
[pscustomobject] @{ P1 = '';   P2 = $null }

Copy link

@surfingoldelephant surfingoldelephant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to take a look at no detected delimiter scenarios.

# Detected delimiter:

@'
P1,P2,P3
A1,,
B1,,
C1,,
'@ | ConvertFrom-Csv

# Current/expected behavior:
[pscustomobject] @{ P1 = 'A1'; P2 = ''; P3 = '' }
[pscustomobject] @{ P1 = 'B1'; P2 = ''; P3 = '' }
[pscustomobject] @{ P1 = 'C1'; P2 = ''; P3 = $null }
# No detected delimiter:

@'
P1,P2,P3
A1
B1
C1
'@ | ConvertFrom-Csv

# Current/expected behavior:
[pscustomobject] @{ P1 = 'A1'; P2 = $null; P3 = $null }
[pscustomobject] @{ P1 = 'B1'; P2 = $null; P3 = $null }
[pscustomobject] @{ P1 = 'C1'; P2 = $null; P3 = $null }


@'
P1;P2;P3
A1
B1
C1
'@ | ConvertFrom-Csv -Delimiter ';'

# Current/expected behavior:
[pscustomobject] @{ P1 = 'A1'; P2 = $null; P3 = $null }
[pscustomobject] @{ P1 = 'B1'; P2 = $null; P3 = $null }
[pscustomobject] @{ P1 = 'C1'; P2 = $null; P3 = $null }

Notice when there is no (detected) delimiter, the resultant value is always $null. Can we add tests for this?

@jshigetomi
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jshigetomi
Copy link
Collaborator Author

/azp run PowerShell-Windows-Packaging-CI, PowerShell-CI-linux-packaging

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log Review - Needed The PR is being reviewed

Projects

Status: PR In Progress

Development

Successfully merging this pull request may close these issues.

CSV deserialization outputs nothing if there are only a certain number of empty fields

4 participants