-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Description
Using Get-Content to read an example 170,000 line wordlist text file.
# Default use, slow. Roughly 6 seconds. Over 100x longer than the alternatives.
$lines = Get-Content -Path '/path/to/bigfile.txt'
# Fast. Roughly 40ms - 90ms.
$lines = [system.io.file]::ReadAllLines('/path/to/bigfile.txt')
# Fast. Roughly 50-100ms. NB. the ReadCount has to be larger than the file line count,
# otherwise $lines is not a 1-dimensional array. i.e. you need to know the file line
# count to be able to do this in one move.
$lines = Get-Content -Path '/path/to/bigfile.txt' -ReadCount 200kb
# Fastest. Roughly 30ms - 50ms.
$lines = Get-Content -Path '/path/to/bigfile.txt' -ReadCount 100 | foreach { $_ }
The reason for the slow version is explained here, apparently by Bruce Payette in 2006:
This is a known issue with the way Get-Content works. For each object
returned from the pipe, it adds a bunch of extra information to that object
in the form of NoteProperties.
These properties are being added for every object processed in the
pipeline. We do this to allow cmdlets to work more effectively together.
It's important because things like the Path property may vary across
different object types. In effect, we're doing "property name
normalization". Unfortunately, while this technique provides significant
benefits by making the system more consistent, it isn't free. It adds
significant overhead both in terms of processing time and memory space.
We're investigating ways to reduce these costs without losing the benefits
but in the end, we may need to add a way to suppress adding this extra
information.
I think it's a shame that the default usage of Get-Content is the slow version, but that's likely not going to change. But, 12 years on from this posting, is it time to add a way to suppress adding this extra information?
e.g. a parameter to Get-Content which switches off the NoteProperties. I have no good parameter name suggestion - ideally I would want it to communicate "this is faster" to people who see it in written code, or who read the documentation wondering how they can speed up Get-Content on large files.