Skip to content

Conversation

@powercode
Copy link
Collaborator

@powercode powercode commented Jul 30, 2018

Fixes #7409.

PR Summary

code cleanup
object[] -> IList in PSType.ArrayToTuple signature to reduce allocations.
Sort input before grouping. Use sorting to only consider last group when comparing.
adding DebuggerDisplay for GroupInfo.

PR Checklist

Refactoring only - no functional changes.
@powercode
Copy link
Collaborator Author

Appveyor error seem unrelated to my changes.

{
if (propValue != null && propValue.PropertyValue != null)
var propValuePropertyValue = propValue?.PropertyValue;
if (propValuePropertyValue != null)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the check really needed?

Copy link
Collaborator Author

@powercode powercode Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is dereferenced in the else clause. #Closed.

OrderByPropertyComparer orderByPropertyComparer)
{
if (currentObjectEntry != null && currentObjectEntry.orderValues != null && currentObjectEntry.orderValues.Count > 0)
var currentObjectorderValues = currentObjectEntry.orderValues;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be currentObjectOrderValues?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment and fix case.

//add this inputObject to an existing group
currentGroupInfo.Add(currentObjectEntry.inputObject);
}
//add this inputObject to an existing group
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add space after //.

foreach (GroupInfo _grp in _groups)
foreach (GroupInfo grp in _groups)
{
if (AsString)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to move the cycle into the if (AsString) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure i understand what you suggest here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant

if (AsString)
{
    foreach (GroupInfo grp in _groups)
    {
        table.Add(grp.Name, grp.Group); 
    }
}
else
{
    foreach (GroupInfo grp in _groups)
    {
        if (grp.Values.Count == 1)
        {
             ...
        }
    }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that is clearer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought more about performancethough :-) From this point of view it seems for() better than foreach - less allocations. Is the _groups so much that we can get a win?

Also some additional CodeFactor complaints
@powercode
Copy link
Collaborator Author

I would like to fix the parameter sets too.

[Parameter(ParameterSetName = "HashTable", Mandatory = true)]
public SwitchParameter AsHashTable { get; set; }

[Parameter(ParameterSetName = "HashTable")]
public SwitchParameter AsString { get; set; }

Moving parameter checking earlier.

Adding back the slow quadratic code path for the cases where we have more than one set of input types.
@powercode powercode changed the title Improve performance of Group-Object WIP: Improve performance of Group-Object Jul 31, 2018
@powercode powercode changed the title WIP: Improve performance of Group-Object Improve performance of Group-Object Jul 31, 2018
@iSazonov
Copy link
Collaborator

iSazonov commented Aug 1, 2018

I believe we shouldn't mess performance changes with other and it is better fix parameter sets in new PR.

if (AsHashTable
&& !AsString
&& (Property != null && Property.Length > 1
|| _orderByProperty.MshParameterList.Count > 1))
Copy link
Collaborator

@iSazonov iSazonov Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd move this line on previous line.

Copy link
Collaborator

@iSazonov iSazonov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave a comment

{
internal GroupInfoNoElement(OrderByPropertyEntry groupValue)
: base(groupValue)
: base(groupValue)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move to previous line 115.

/// <summary>
/// Gets or sets the AsString parameter.
/// </summary>
/// <value></value>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the empty XML comment.

OrderByPropertyComparer orderByPropertyComparer)
{
if (currentObjectEntry != null && currentObjectEntry.orderValues != null && currentObjectEntry.orderValues.Count > 0)
var currentObjectorderValues = currentObjectEntry.orderValues;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment and fix case.

@daxian-dbw daxian-dbw self-assigned this Aug 3, 2018
Copy link
Collaborator

@iSazonov iSazonov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with two minor comments.

else
{
sb.Append(string.Format(CultureInfo.InvariantCulture, "{0}, ", propValue.PropertyValue.ToString()));
sb.Append(string.Format(CultureInfo.InvariantCulture, "{0}, ", propValuePropertyValue.ToString()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems string.Format was optimized in .Net Core 2 but we could still use AppendFormat() here and in line 160 to get more readable code and exclude extra method calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely should use AppendFormat()

OrderByPropertyComparer orderByPropertyComparer)
{
if (currentObjectEntry != null && currentObjectEntry.orderValues != null && currentObjectEntry.orderValues.Count > 0)
var currentObjectorderValues = currentObjectEntry.orderValues;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currentObjectorderValues should be currentObjectOrderValues.

{
Diagnostics.Assert(inputObjects != null, "inputObjects is null");
Diagnostics.Assert(inputObjects.Length > 0, "inputObjects is empty");
Diagnostics.Assert(inputObjects.Count > 0, "inputObjects is empty");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is redundant. ArrayToTuple(IList, int> has the exact same code. Suggest removing both Assert statements.

{
Diagnostics.Assert(inputObjects != null, "inputObjects is null");
Diagnostics.Assert(inputObjects.Length > 0, "inputObjects is empty");
Diagnostics.Assert(inputObjects.Count > 0, "inputObjects is empty");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These asserts should be backed up with runtime checks. Either document it as returning NULL when constraints are not met for throw the appropriate exceptions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly common pattern in the engine, that there are asserts, but no backing runtime checks. I have always wondered about it...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the thinking is that since it's an internal class, the usage can be demonstrated to be safe and the error checking skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't have an official policy around Assert, I'll not hold up the PR for it and you can ignore the rest. :)

I've been involved in three different threads recently where Dbg.Assert in internal code is hit in debug builds. In two cases, crashes were occurring the release build and the third the lack of the check was masking a bug in another component.

IMHO: Since we don't test debug builds, Dbg.Assert is a placebo at best without a release check.

else
{
sb.Append(string.Format(CultureInfo.InvariantCulture, "{0}, ", propValue.PropertyValue.ToString()));
sb.Append(string.Format(CultureInfo.InvariantCulture, "{0}, ", propValuePropertyValue.ToString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely should use AppendFormat()

/// <param name="groups">List containing Groups.</param>
/// <param name="groupInfoDictionary">Dictionary used to keep track of the groups with hash of the property values being the key.</param>
/// <param name="orderByPropertyComparer">The Comparer to be used while comparing to check if new group has to be created.</param>
internal static void DoOrderedGrouping(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be private

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was internal, and I just kept it so, but I'm happy to make it private.


GroupInfo currentGroupInfo = null;
if (groupInfoDictionary.TryGetValue(currentTupleObject, out currentGroupInfo))
if (groupInfoDictionary.TryGetValue(currentTupleObject, out var currentGroupInfo))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for currentGroupInfo being null in the groupInfoDictionary? If it is expected, a comment is needed; otherwise, it looks like currentObjectEntry.inputObject is dropped on the floor.

Copy link
Collaborator Author

@powercode powercode Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is a case that will ever occur. Just defensive programming.

Both cases are behaving as the original implementation.

One could argue for removing the null check as it is a condition that shouldn't occur, and if it does, it is a bug that should be fixed.

if (groupInfoDictionary.TryGetValue(currentTupleObject, out var currentGroupInfo))
{
// add this inputObject to an existing group
currentGroupInfo?.Add(currentObjectEntry.inputObject);
Copy link
Contributor

@dantraMSFT dantraMSFT Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, when is currentTupleObject present but currentGroupInfo is null.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm removing the null checks.

Removing unnecessary call to .ToArray().

Making internal methods private.
@daxian-dbw
Copy link
Member

@dantraMSFT Can you please take another look?

Copy link
Contributor

@dantraMSFT dantraMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@daxian-dbw daxian-dbw merged commit 037e12e into PowerShell:master Aug 9, 2018
@iSazonov
Copy link
Collaborator

@powercode Many thanks for the great perf improvement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve performance of Group-Object

4 participants