Skip to content

Conversation

@Francisco-Gamino
Copy link
Contributor

@Francisco-Gamino Francisco-Gamino commented May 19, 2017

Addressing issue #3284: ConvertFrom-JSON fails to parse syntactically-correct JSON array

For PowerShell on CoreCLR, we use Json.Net. However, this deserialzer does not throw an exception if an invalid array is passed, so we need to add extra logic to handle this. An invalid array can occurred when the user reads a file via get-content as a collection of PSObjects. E.g.,

# test.json has the following content:
PS D:\> Get-Content .\test.json
[
    1,
    2
]

# The output of Get-Content is a collection of Object[]
PS D:\> (Get-Content .\test.json).GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Object[]                                 System.Array

# When we pipe the output of Get-Content to ConvertFrom-Json, the logic in the cmdlet
# sends the first object to be deserialized, and if an Argument exception 
# is thrown, then it sends the whole collection of objects as one single Json content. 
# For CoreCLR PowerShell, we use Json.Net which does not throw an error when an 
# invalid array is deserialize, so this PR addresses this issue. 

# Here is the output of ConvertFrom-Json after this code change:
PS D:\> Get-Content .\test.json | ConvertFrom-Json
1
2

Added tests for the following:

  • Deserialize a Json array of strings (in multiple lines)
  • Deserialize a Json array of PSObjects (in multiple lines)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a && so we check for existence of [ and ] ? If any of those is missing it is an invalid array anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this should be an '||' because we are trying to detect an invalid array. These are some examples:

  1. '["a"'
  2. '['
  3. '"a"]'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trim will only remove newline characters at the start and end. To remove all newline characters use Replace instead. And .ToCharArray() is unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a String.Empty instead of "". More efficient as it does not create a new zero length string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a && here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the comment above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use [Environment]::NewLine instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test code updated as per Jim's recommendation below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the variations be converted to use -TestCase for It? So if one of the Should fails the others are still run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented two different test cases instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should just be:

$array = [pscustomobject]@{ objectName = "object1Name"; objectValue = "object1Value" },
         [pscustomobject]@{ objectName = "object2Name"; objectValue = "object2Value" }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion. I will update the test code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this (or the above) needs to be done with a file at all. Does the following cover the case?

$result = "[1,","2,","3]" | convertfrom-json
$result.count | Should be 3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test code updated.

Copy link
Member

@adityapatwardhan adityapatwardhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JamesWTruher
Copy link
Collaborator

tests are much cleaner now - LGTM

@SteveL-MSFT
Copy link
Member

@JamesWTruher can you mark your review as Approved?

@Francisco-Gamino
Copy link
Contributor Author

@PowerShell/powershell-maintainers : This is ready to be merged, please let me know if you need anything else. Thanks.

@TravisEz13 TravisEz13 self-assigned this May 19, 2017
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says contains, but the code says starts with/ends with.

Besides that, it just looks more complicated than it needs to be, e.g. 2 calls to IndexOf, then strip newlines then spaces. Why not just use a regex, like:

if (Regex.Match(input, "(^\s*\[)|(\s*\]$)"))
{
    JArray.Parse(input);
}

I'd also like to see some sort of note about whether or not this is considered a NewtonSoft bug - ideally a link to an issue so we can track it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as these markers for

Do you mean these markers are for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment doesn't match the code. Carriage return is \r, but Environemnt.NewLine is \r\n on windows and \n on Unix. And what if the json string is has \r\n but we are calling ConvertFrom-Json on Unix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about other white spaces? For example what if there is \t?
By the way, new lines are white spaces too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

white spaces include new lines.

[char]::IsWhiteSpace([environment]::NewLine, 0)
True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is can be

Typo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments here are confusing to me -- what is the buffer here? Could you please make the comments easier to understand?

@daxian-dbw
Copy link
Member

@Francisco-Gamino One more question -- why do you have to remove all white spaces before calling JArray.Parse? Won't it be enough to just trim the leading and trailing white spaces?

@TravisEz13
Copy link
Member

TravisEz13 commented May 19, 2017

Please update the title and description based on the instructions in Contributing - Pull Request Submission

  • Remove the issues number from the title and into the description (it's already in the description, so, just remove it from the title.)
  • Update the title to use "the present tense and imperative mood" to describe the change, not the issue you are fixing.

@daxian-dbw
Copy link
Member

Chatted with @Francisco-Gamino offline, and it turns out the JsonException thrown by JArray.Parse will be captured and re-thrown as an ArgumentException, and then the enclosing code will capture the ArgumentException and assume that it's caused by partial input and hen keep accumulating pipeline input. Francisco will add more comments to explain this process.

@Francisco-Gamino
Copy link
Contributor Author

@lzybkr: Thank you Jason for your suggestion. Using regular expressions makes the code much simpler.
@lzybkr, @daxian-dbw, and @TravisEz13 : Could you please take another look?

@Francisco-Gamino Francisco-Gamino changed the title Fixing #3284 ConvertFrom-JSON fails to parse syntactically-correct JSON array ConvertFrom-Json deserializes an array of objects in multiple lines. May 24, 2017
@Francisco-Gamino Francisco-Gamino changed the title ConvertFrom-Json deserializes an array of objects in multiple lines. Make ConvertFrom-Json deserialize an array of objects with multiple lines. May 25, 2017
@TravisEz13 TravisEz13 merged commit 7aa7f38 into PowerShell:master May 25, 2017
// To work around this, we need to identify when input is a Json array, and then try to parse it via JArray.Parse().

// If input starts with '[' or ends with ']' (ignoring white spaces).
if ((Regex.Match(input, @"(^\s*\[)|(\s*\]$)")).Success)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reg expression used here seems not right ... @Francisco-Gamino let's discuss it tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants