Skip to content

Improve performance of Group-Object #7409

@powercode

Description

@powercode

As of pwsh 6.1, preview-4, we have quadratic performance when the number of unique values -> n.

By gathering the input first, sorting it, and then only each new value to the last group, we can come much closer to n * log(n) instead of n * n.

I have a prototype with the following perf measurements:

Count Unique OldImpl newImpl Speedup Command
10689 8220 00:00:06.81 00:00:00.23 29,1 $allItemsInPowerShellSrcTree | group {[io.path]::GetFileName($_)}
1690765 3761 00:02:30.34 00:00:22.32 6,7 $u | group

where $u is a dataset of string values out of which 3700 is unique.

The only downside I have seen is that the both the order of the output objects are different, and so is the order within the groups.

Is that part of the public contract?
Is it worth a PR?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Breaking-Changebreaking change that may affect usersCommittee-ReviewedPS-Committee has reviewed this and made a decisionIssue-Enhancementthe issue is more of a feature request than a bugResolution-FixedThe issue is fixed.WG-Cmdlets-Utilitycmdlets in the Microsoft.PowerShell.Utility module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions