Skip to content

Get-Content is slow on large text files. Could it have a parameter to speed it up by not adding NoteProperties? #7537

@HumanEquivalentUnit

Description

@HumanEquivalentUnit

Using Get-Content to read an example 170,000 line wordlist text file.

# Default use, slow.  Roughly 6 seconds. Over 100x longer than the alternatives.

$lines = Get-Content -Path '/path/to/bigfile.txt'



# Fast. Roughly 40ms - 90ms.

$lines = [system.io.file]::ReadAllLines('/path/to/bigfile.txt')



# Fast. Roughly 50-100ms. NB. the ReadCount has to be larger than the file line count,
# otherwise $lines is not a 1-dimensional array. i.e. you need to know the file line 
# count to be able to do this in one move.

$lines = Get-Content -Path '/path/to/bigfile.txt' -ReadCount 200kb



# Fastest. Roughly 30ms - 50ms.

$lines = Get-Content -Path '/path/to/bigfile.txt' -ReadCount 100 | foreach { $_ }

The reason for the slow version is explained here, apparently by Bruce Payette in 2006:

This is a known issue with the way Get-Content works. For each object
returned from the pipe, it adds a bunch of extra information to that object
in the form of NoteProperties.

These properties are being added for every object processed in the
pipeline. We do this to allow cmdlets to work more effectively together.
It's important because things like the Path property may vary across
different object types. In effect, we're doing "property name
normalization". Unfortunately, while this technique provides significant
benefits by making the system more consistent, it isn't free. It adds
significant overhead both in terms of processing time and memory space.
We're investigating ways to reduce these costs without losing the benefits
but in the end, we may need to add a way to suppress adding this extra
information.

I think it's a shame that the default usage of Get-Content is the slow version, but that's likely not going to change. But, 12 years on from this posting, is it time to add a way to suppress adding this extra information?

e.g. a parameter to Get-Content which switches off the NoteProperties. I have no good parameter name suggestion - ideally I would want it to communicate "this is faster" to people who see it in written code, or who read the documentation wondering how they can speed up Get-Content on large files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue-Enhancementthe issue is more of a feature request than a bugResolution-No ActivityIssue has had no activity for 6 months or moreUp-for-GrabsUp-for-grabs issues are not high priorities, and may be opportunities for external contributorsWG-Cmdlets-Corecmdlets in the Microsoft.PowerShell.Core module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions