-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Enhance -split operator with RightToLeft splitting #5125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
removed -importsystemmodules switch from powershell.exe and related code removed -consolefile parameter from powershell.exe and related code
…ndows PowerShell replaced LengthInBufferCells() method with Unicode adapted code from PSReadline
This commit adds tests for negative limits to the -split operator. The intended change to allow -split to accept a negative integer as a limit for "backwards" (right-to-left) splitting should work both when splitting using a scriptblock/predicate or a regex pattern
This commit adds the RightToLeft option to the Regex instance in SplitWithPattern() when the limit value specified by the user is negative. Using the RightToLeft options would ensure minimal overhead to existing cases with a positive or no limit. SplitWithPredicate() is modified to inspect characters from right to left by introducing a cursor from which the strIndex derives its value. Works well but requires explicit checking and reversing of the string buffer before populating the resulting list when working right-to-left.
| $res[2] | Should Be "c" | ||
| $res[3] | Should Be "d" | ||
| $res.count | Should Be 1 | ||
| $res[0] | Should Be "a b c d" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we want to get "d" as a last element.
/cc @mklement0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While that might seem intuitive, it would make the Max-substrings operand asymmetrical.
The guiding principal behind the change is that the sign (+/-) signifies only the direction of the split operation, whereas the absolute value determines the maximum resulting number of substrings, with 0 preserving the meaning of "unlimited"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I wasn't thinking of direction, but what end to apply the split to - elements returned should always reflect the input order.
In this case, due to -1, the last 1 element should be returned, preceded by the remaining prefix, so we should get (as @iSazonov suggests):
$rest, $result = 'a b c d' -split '', -1
$rest | Should Be 'a b c'
$result.Count | Should be 1
$result[0] | Should Be 'd'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether you call it direction or not doesn't change the fact that by splitting from the end rather than the start we are in fact modifying the direction of the operation :-)
Wouldn't your proposed result be confusing given the name of the operand (Max-substrings rather than Number-Of-Times-To-Split or similar)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direction is ambiguous, as that could be interpreted as wanting to also reverse the order in which the resulting elements are returned (e.g., 'a b c d' -split ' ', -2 resulting in 'a b', 'd, 'c', which we don't want), so I wanted to clarify.
I guess from the end carries the same ambiguity, but I hope we're on the same page now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think about how negative array indices are treated in PowerShell, it makes perfect sense, and as 1d59be0 shows, shifting to your original suggestion is trivial, but I would like to keep it symmetric (again, given that the second operand is not an index but a boundary value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood and agreed.
I indeed originally had array-index logic in mind, but neglected to consider that the current -split behavior uses the-number-of-elements-I-want-plus-one-for-the-rest logic.
Just to spell it out: with your approach (which we should go with), both 1 and -1 are no-ops - both request no splitting at all, and which end to apply it to is then moot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the symmetric behavior makes sense and updating the documentation with good examples will clear up any confusion. cc @lzybkr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mklement0 Could you please modify your Issue description to satisfy the behavior (for future docs)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iSazonov: Thanks for reminding me - done.
| } | ||
|
|
||
| if(rightToLeft) | ||
| split.Reverse(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we exclude reverse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the List<string>.Reverse() call for performance reasons or...?
Removing it would funtionally result in the substrings for each input being returned in the wrong order, ie:
PS> 'a b c d' -split {$_ -eq ' '},-3
d
c
a b
Rather than the expected:
PS> 'a b c d' -split {$_ -eq ' '},-3
a b
c
d
I see two possible alternatives:
- Change all invocations of
split.Add()tosplit.Insert()conditional onrightToLeft - Implement two completely separate routines for negative and positive
limitvalues, duplicating the code slightly (but keeping performance overhead at a minimum)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my thoughts was about performance. I don't know what case is better. I guess we would use only one routine. At a minimum, we have to start with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting it into 2 loops would mean getting rid of at least one non-local variable assignment per input item and a number of branch points, the intent (splitting from the end) would also be clearer. Any thoughts @lzybkr ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List.Reverse is not expensive, it swaps in place, see the implementation (List is really an Array).
Insert would be more expensive, shifting everything on each insert instead of swapping once like in Reverse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do reverse copy in next line in ExtendList.
By the way, what are we doing this recopy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up.
| List<string> results = new List<string>(); | ||
| foreach (string item in content) | ||
| { | ||
| List<string> split = new List<String>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we could move this to line 653 as
List<string> split = new List<String>(limit); This can help exclude Reverse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep it without specifying capacity up front, because you'd still need to support unbounded capacity in the special case of (limit == 0) and the (very common) case of no Max-substrings specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iSazonov What if someone put 1000000 as limit? I do not think you should pre-allocate so much memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skipped limit == 0
Closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why shouldn't we specify the limit in the ctor?
I'd gladly have powershell allocate the list once to the size I had given than have the list being reallocated and copied many times when the value is larger?
If someone specifies 1000000 they will get a list of roughly 8MB, which would soon be garbage collected, but with a single allocation, instead of the 19 we would get with the 0 default capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - it is good. Reallocation is much worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget about that Min(limit, item.Length +2) comment above, I just realized we're talking about regex with capture support, including 'abcde' -split '((?=.)(.)(?<=.))' similarly terrifying things.
I've added the suggested changes + comments for the List.Reverse() call in c423d8d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still like your thought to use Min(limit, item.Length +2). It covers the common case sufficiently and protects against complete stupidity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a look at the List<T> source, the default capacity reallocation behavior (multiply by 2) actually fits our needs perfectly if we do Min(limit, item.Length + 2) - matches worst case with no captures, and each nested capture-group added will at most require 1 extra re-allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code will never get executed, and its logic is duplicated in the HandleStopJobCompleted() method.
…s PowerShell (#5133) Currently, it's using the same location as Windows PowerShell and the two should not be sharing one as the PSModulePaths are different between the two.
* wrapped STA code sections in `#if STAMODE` for potential future use removed -importsystemmodules switch from powershell.exe and related code removed -consolefile parameter from powershell.exe and related code * removed font and codepage handling code that is only applicable to Windows PowerShell replaced LengthInBufferCells() method with Unicode adapted code from PSReadline * removed `#if CORECLR` statements merging code base * [feature] removed code to show a GUI prompt for credentials as PSCore6 prompts in console * Remove unncessary method 'LengthInBufferCellsFE'
…, otherwise the built script will not run (#5137)
- Add Get-WebListenerUrl Based Examples to WebListener README.md - Fix Cert test example
- Rename powershell.exe to pwsh.exe - Fixe appveyor.psm1 - Update MSI to include 'pwsh' in path and app paths - Revert change for hyper-v powershell direct - Update names in packaging.psm1. - Fix check for SxS
Closes #4274 Adds an -Authentication parameter to Invoke-RestMethod and Invoke-WebRequest Adds an -Token parameter to Invoke-RestMethod and Invoke-WebRequest Adds an -AllowUnencryptedAuthentication parameter to Invoke-RestMethod and Invoke-WebRequest Adds tests for various -Authorization uses -Authentication Parameter has 3 options: Basic, OAuth, and Bearer Basic requires -Credential and provides RFC-7617 Basic Authorization credentials to the remote server OAuth and Bearer require the -Token which is a SecureString containing the bearer token to send to the remote server If any authentication is provided for any transport scheme other than HTTPS, the request will result in an error. A user may use the -AllowUnencryptedAuthentication switch to bypass this behavior and send their secrets unencrypted at their own risk. -Authentication does not work with -UseDefaultCredentials and will result in an error. The existing behavior with -Credential is left untouched. When not supplying -Authentication, A user will not receive an error when using -Credential over unencrypted connections. Code design choice is meant to accommodate more Authentication types in the future. Documentation Needed The 3 new parameters will need to be added to the Invoke-RestMethod and Invoke-WebRequest documentation along with examples. Syntax will need to be updated.
The old implementation always set the new object's properties to null.
For PSCore 6, we are only supporting InitialSessionState. The RunspaceConfiguration APIs were already made internal. This PR removes all the code related to RunspaceConfiguration. This also means that some public APIs have changed. Was deciding between leaving the RunspaceConfiguration parameters and throwing Unsupported, but thought it was better to have it a compile-time error. This should simplify the code base.
This commit changes instantiatiation of the list used for substring chunks in SplitWithPredicate(), to have a specific capacity whenever known up front. Fixed inconsistent bracing style with if/else as well
- Exclude 'Get-PSHostProcessInfo', 'Enter-PSHostProcess' and 'Exit-PSHostProcess' from Unix platforms. - Update tests and add additional cmdlets need to be excluded
This reverts commit 1d59be0.
This commit changes instantiatiation of the list used for substring chunks in SplitWithPredicate(), to have a specific capacity whenever known up front. Fixed inconsistent bracing style with if/else as well
In case no Max-substrings argument is passed when calling -split with a predicate, set the capacity of the resulting list to item.Length + 1 to avoid re-allocating the underlying array multiple times
-split supports array input, but using -split with a scriptblock doesn't currently have any tests for array input
A previously introduced modification to the limit parameter caused subsequent strings to be processed left-to-right when using negative Max-substrings against array input. Moved to local variable maxSubstrings
Simplified logic for Right-to-left splitting by moving Abs(limit) assignment out of loop. Pre-emptive return on limit == 1 moved out as well
Re-instantiating the string buffer in SplitWithPredicate() seems unnecessary, moved declaration outside loop, replaced reassignments with buf.Clear()
In order to avoid having to either call buf.Insert() or implement a custom Reverse() method for StringBuilder, I've replaced the char buffer in SplitWithPredicate() with a List<char>
…owerShell into enhance-Split-RTL-1
|
Alright, I rebased like you suggested, then did a pull and a push (as there were no merge conflicts), but it still dragged all the intermediate commits from the master branch into my PR :-\ is this normal/expected or am I overlooking something obvious? @TravisEz13 |
|
I use I think you need remove the merge commit and try rebase again. |
|
@TravisEz13 @iSazonov Tried to revert the last merge locally with but this just adds a new commit reverting all the changes of the others at once, super messy. How can I "reset" the branch ( Maybe it's easier to just make a diff patch of the relevant files, create a new branch and submit a new PR? Apologies for the git n00bness |
|
@IISResetMe the easier thing is probably to copy off the files you changed, reset this branch to before your first commit, copy over your changed files, then rebase off upstream/master, and push those changes. If you want to do it the |
|
You can remove last commits by git reset --hard ooooooo |
|
Talked to @SteveL-MSFT offline, and decided to start over and push the changes to a new branch with the following commits: IISResetMe@81873b3 (tests) Only concern is that we lose a bit of context from this PR, so wanted to check with you @TravisEz13 @iSazonov before submitting a new PR off the new branch |
|
Ok, only please add a comment in new PR that it overload the PR. |
|
Re-submitted as #5270, please review. Thanks |
This pull request adds RightToLeft support to the binary -split operator, as requested in #4765
I've added tests for negative integer input for the
Max-substringsoperand to work with both regex patterns and scriptblocks.