-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Improve performance of Group-Object #7410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Refactoring only - no functional changes.
This reducing alloctions and object copying.
…hen comparing. Much faster when the number of unique values grows large.
|
Appveyor error seem unrelated to my changes. |
| { | ||
| if (propValue != null && propValue.PropertyValue != null) | ||
| var propValuePropertyValue = propValue?.PropertyValue; | ||
| if (propValuePropertyValue != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the check really needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is dereferenced in the else clause. #Closed.
| OrderByPropertyComparer orderByPropertyComparer) | ||
| { | ||
| if (currentObjectEntry != null && currentObjectEntry.orderValues != null && currentObjectEntry.orderValues.Count > 0) | ||
| var currentObjectorderValues = currentObjectEntry.orderValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be currentObjectOrderValues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comment and fix case.
| //add this inputObject to an existing group | ||
| currentGroupInfo.Add(currentObjectEntry.inputObject); | ||
| } | ||
| //add this inputObject to an existing group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add space after //.
| foreach (GroupInfo _grp in _groups) | ||
| foreach (GroupInfo grp in _groups) | ||
| { | ||
| if (AsString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to move the cycle into the if (AsString) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure i understand what you suggest here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant
if (AsString)
{
foreach (GroupInfo grp in _groups)
{
table.Add(grp.Name, grp.Group);
}
}
else
{
foreach (GroupInfo grp in _groups)
{
if (grp.Values.Count == 1)
{
...
}
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that is clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought more about performancethough :-) From this point of view it seems for() better than foreach - less allocations. Is the _groups so much that we can get a win?
Also some additional CodeFactor complaints
|
I would like to fix the parameter sets too. |
Moving parameter checking earlier. Adding back the slow quadratic code path for the cases where we have more than one set of input types.
|
I believe we shouldn't mess performance changes with other and it is better fix parameter sets in new PR. |
| if (AsHashTable | ||
| && !AsString | ||
| && (Property != null && Property.Length > 1 | ||
| || _orderByProperty.MshParameterList.Count > 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd move this line on previous line.
iSazonov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave a comment
| { | ||
| internal GroupInfoNoElement(OrderByPropertyEntry groupValue) | ||
| : base(groupValue) | ||
| : base(groupValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move to previous line 115.
| /// <summary> | ||
| /// Gets or sets the AsString parameter. | ||
| /// </summary> | ||
| /// <value></value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the empty XML comment.
| OrderByPropertyComparer orderByPropertyComparer) | ||
| { | ||
| if (currentObjectEntry != null && currentObjectEntry.orderValues != null && currentObjectEntry.orderValues.Count > 0) | ||
| var currentObjectorderValues = currentObjectEntry.orderValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comment and fix case.
iSazonov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with two minor comments.
| else | ||
| { | ||
| sb.Append(string.Format(CultureInfo.InvariantCulture, "{0}, ", propValue.PropertyValue.ToString())); | ||
| sb.Append(string.Format(CultureInfo.InvariantCulture, "{0}, ", propValuePropertyValue.ToString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems string.Format was optimized in .Net Core 2 but we could still use AppendFormat() here and in line 160 to get more readable code and exclude extra method calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely should use AppendFormat()
| OrderByPropertyComparer orderByPropertyComparer) | ||
| { | ||
| if (currentObjectEntry != null && currentObjectEntry.orderValues != null && currentObjectEntry.orderValues.Count > 0) | ||
| var currentObjectorderValues = currentObjectEntry.orderValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currentObjectorderValues should be currentObjectOrderValues.
| { | ||
| Diagnostics.Assert(inputObjects != null, "inputObjects is null"); | ||
| Diagnostics.Assert(inputObjects.Length > 0, "inputObjects is empty"); | ||
| Diagnostics.Assert(inputObjects.Count > 0, "inputObjects is empty"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert is redundant. ArrayToTuple(IList, int> has the exact same code. Suggest removing both Assert statements.
| { | ||
| Diagnostics.Assert(inputObjects != null, "inputObjects is null"); | ||
| Diagnostics.Assert(inputObjects.Length > 0, "inputObjects is empty"); | ||
| Diagnostics.Assert(inputObjects.Count > 0, "inputObjects is empty"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These asserts should be backed up with runtime checks. Either document it as returning NULL when constraints are not met for throw the appropriate exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fairly common pattern in the engine, that there are asserts, but no backing runtime checks. I have always wondered about it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the thinking is that since it's an internal class, the usage can be demonstrated to be safe and the error checking skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't have an official policy around Assert, I'll not hold up the PR for it and you can ignore the rest. :)
I've been involved in three different threads recently where Dbg.Assert in internal code is hit in debug builds. In two cases, crashes were occurring the release build and the third the lack of the check was masking a bug in another component.
IMHO: Since we don't test debug builds, Dbg.Assert is a placebo at best without a release check.
| else | ||
| { | ||
| sb.Append(string.Format(CultureInfo.InvariantCulture, "{0}, ", propValue.PropertyValue.ToString())); | ||
| sb.Append(string.Format(CultureInfo.InvariantCulture, "{0}, ", propValuePropertyValue.ToString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely should use AppendFormat()
| /// <param name="groups">List containing Groups.</param> | ||
| /// <param name="groupInfoDictionary">Dictionary used to keep track of the groups with hash of the property values being the key.</param> | ||
| /// <param name="orderByPropertyComparer">The Comparer to be used while comparing to check if new group has to be created.</param> | ||
| internal static void DoOrderedGrouping( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was internal, and I just kept it so, but I'm happy to make it private.
|
|
||
| GroupInfo currentGroupInfo = null; | ||
| if (groupInfoDictionary.TryGetValue(currentTupleObject, out currentGroupInfo)) | ||
| if (groupInfoDictionary.TryGetValue(currentTupleObject, out var currentGroupInfo)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use case for currentGroupInfo being null in the groupInfoDictionary? If it is expected, a comment is needed; otherwise, it looks like currentObjectEntry.inputObject is dropped on the floor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that is a case that will ever occur. Just defensive programming.
Both cases are behaving as the original implementation.
One could argue for removing the null check as it is a condition that shouldn't occur, and if it does, it is a bug that should be fixed.
| if (groupInfoDictionary.TryGetValue(currentTupleObject, out var currentGroupInfo)) | ||
| { | ||
| // add this inputObject to an existing group | ||
| currentGroupInfo?.Add(currentObjectEntry.inputObject); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, when is currentTupleObject present but currentGroupInfo is null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm removing the null checks.
Removing unnecessary call to .ToArray(). Making internal methods private.
|
@dantraMSFT Can you please take another look? |
dantraMSFT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@powercode Many thanks for the great perf improvement! |
Fixes #7409.
PR Summary
code cleanup
object[] -> IList in PSType.ArrayToTuple signature to reduce allocations.
Sort input before grouping. Use sorting to only consider last group when comparing.
adding DebuggerDisplay for GroupInfo.
PR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerWIP:to the beginning of the title and remove the prefix when the PR is ready.[feature]if the change is significant or affects feature tests