Skip to content

Conversation

@IISResetMe
Copy link
Collaborator

This pull request adds RightToLeft support to the binary -split operator, as requested in #4765

I've added tests for negative integer input for the Max-substrings operand to work with both regex patterns and scriptblocks.

SteveL-MSFT and others added 6 commits October 12, 2017 10:14
removed -importsystemmodules switch from powershell.exe and related code
removed -consolefile parameter from powershell.exe and related code
…ndows PowerShell

replaced LengthInBufferCells() method with Unicode adapted code from PSReadline
removed code to show a GUI prompt for credentials as PSCore6 prompts in console
This commit adds tests for negative limits to the -split operator. The intended change to allow -split to accept a negative integer as a limit for "backwards" (right-to-left) splitting should work both when splitting using a scriptblock/predicate or a regex pattern
This commit adds the RightToLeft option to the Regex instance in SplitWithPattern() when the limit value specified by the user is negative. Using the RightToLeft options would ensure minimal overhead to existing cases with a positive or no limit.

SplitWithPredicate() is modified to inspect characters from right to left by introducing a cursor from which the strIndex derives its value. Works well but requires explicit checking and reversing of the string buffer before populating the resulting list when working right-to-left.
$res[2] | Should Be "c"
$res[3] | Should Be "d"
$res.count | Should Be 1
$res[0] | Should Be "a b c d"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we want to get "d" as a last element.
/cc @mklement0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While that might seem intuitive, it would make the Max-substrings operand asymmetrical.

The guiding principal behind the change is that the sign (+/-) signifies only the direction of the split operation, whereas the absolute value determines the maximum resulting number of substrings, with 0 preserving the meaning of "unlimited"

Copy link
Contributor

@mklement0 mklement0 Oct 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I wasn't thinking of direction, but what end to apply the split to - elements returned should always reflect the input order.

In this case, due to -1, the last 1 element should be returned, preceded by the remaining prefix, so we should get (as @iSazonov suggests):

$rest, $result = 'a b c d' -split '', -1
$rest | Should Be 'a b c'
$result.Count | Should be 1
$result[0] | Should Be 'd'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether you call it direction or not doesn't change the fact that by splitting from the end rather than the start we are in fact modifying the direction of the operation :-)

Wouldn't your proposed result be confusing given the name of the operand (Max-substrings rather than Number-Of-Times-To-Split or similar)?

Copy link
Contributor

@mklement0 mklement0 Oct 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction is ambiguous, as that could be interpreted as wanting to also reverse the order in which the resulting elements are returned (e.g., 'a b c d' -split ' ', -2 resulting in 'a b', 'd, 'c', which we don't want), so I wanted to clarify.
I guess from the end carries the same ambiguity, but I hope we're on the same page now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think about how negative array indices are treated in PowerShell, it makes perfect sense, and as 1d59be0 shows, shifting to your original suggestion is trivial, but I would like to keep it symmetric (again, given that the second operand is not an index but a boundary value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood and agreed.

I indeed originally had array-index logic in mind, but neglected to consider that the current -split behavior uses the-number-of-elements-I-want-plus-one-for-the-rest logic.

Just to spell it out: with your approach (which we should go with), both 1 and -1 are no-ops - both request no splitting at all, and which end to apply it to is then moot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the symmetric behavior makes sense and updating the documentation with good examples will clear up any confusion. cc @lzybkr

Copy link
Collaborator

@iSazonov iSazonov Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mklement0 Could you please modify your Issue description to satisfy the behavior (for future docs)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov: Thanks for reminding me - done.

}

if(rightToLeft)
split.Reverse();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we exclude reverse?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the List<string>.Reverse() call for performance reasons or...?

Removing it would funtionally result in the substrings for each input being returned in the wrong order, ie:

PS> 'a b c d' -split {$_ -eq ' '},-3
d
c
a b

Rather than the expected:

PS> 'a b c d' -split {$_ -eq ' '},-3
a b
c
d

I see two possible alternatives:

  1. Change all invocations of split.Add() to split.Insert() conditional on rightToLeft
  2. Implement two completely separate routines for negative and positive limit values, duplicating the code slightly (but keeping performance overhead at a minimum)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my thoughts was about performance. I don't know what case is better. I guess we would use only one routine. At a minimum, we have to start with this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting it into 2 loops would mean getting rid of at least one non-local variable assignment per input item and a number of branch points, the intent (splitting from the end) would also be clearer. Any thoughts @lzybkr ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List.Reverse is not expensive, it swaps in place, see the implementation (List is really an Array).

Insert would be more expensive, shifting everything on each insert instead of swapping once like in Reverse.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do reverse copy in next line in ExtendList.
By the way, what are we doing this recopy?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up.

On request in #5125 behavior is changed to allow -n to signify n substrings from the end + remainder of string.
List<string> results = new List<string>();
foreach (string item in content)
{
List<string> split = new List<String>();
Copy link
Collaborator

@iSazonov iSazonov Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we could move this to line 653 as

List<string> split = new List<String>(limit); 

This can help exclude Reverse.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep it without specifying capacity up front, because you'd still need to support unbounded capacity in the special case of (limit == 0) and the (very common) case of no Max-substrings specified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov What if someone put 1000000 as limit? I do not think you should pre-allocate so much memory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skipped limit == 0
Closed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shouldn't we specify the limit in the ctor?
I'd gladly have powershell allocate the list once to the size I had given than have the list being reallocated and copied many times when the value is larger?
If someone specifies 1000000 they will get a list of roughly 8MB, which would soon be garbage collected, but with a single allocation, instead of the 19 we would get with the 0 default capacity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - it is good. Reallocation is much worse.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget about that Min(limit, item.Length +2) comment above, I just realized we're talking about regex with capture support, including 'abcde' -split '((?=.)(.)(?<=.))' similarly terrifying things.

I've added the suggested changes + comments for the List.Reverse() call in c423d8d

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like your thought to use Min(limit, item.Length +2). It covers the common case sufficiently and protects against complete stupidity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a look at the List<T> source, the default capacity reallocation behavior (multiply by 2) actually fits our needs perfectly if we do Min(limit, item.Length + 2) - matches worst case with no captures, and each nested capture-group added will at most require 1 extra re-allocation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

travisty- and others added 13 commits October 16, 2017 10:06
This code will never get executed, and its logic is
duplicated in the HandleStopJobCompleted() method.
…s PowerShell (#5133)

Currently, it's using the same location as Windows PowerShell and the two should not be sharing one as the PSModulePaths are different between the two.
* wrapped STA code sections in `#if STAMODE` for potential future use
removed -importsystemmodules switch from powershell.exe and related code
removed -consolefile parameter from powershell.exe and related code

* removed font and codepage handling code that is only applicable to Windows PowerShell
replaced LengthInBufferCells() method with Unicode adapted code from PSReadline

* removed `#if CORECLR` statements merging code base

* [feature]
removed code to show a GUI prompt for credentials as PSCore6 prompts in console

* Remove unncessary method 'LengthInBufferCellsFE'
- Add Get-WebListenerUrl Based Examples to WebListener README.md
- Fix Cert test example
- Rename powershell.exe to pwsh.exe
- Fixe appveyor.psm1
- Update MSI to include 'pwsh' in path and app paths
- Revert change for hyper-v powershell direct
- Update names in packaging.psm1.
- Fix check for SxS
Closes #4274

Adds an -Authentication parameter to Invoke-RestMethod and Invoke-WebRequest
Adds an -Token parameter to Invoke-RestMethod and Invoke-WebRequest
Adds an -AllowUnencryptedAuthentication parameter to Invoke-RestMethod and Invoke-WebRequest
Adds tests for various -Authorization uses
-Authentication Parameter has 3 options: Basic, OAuth, and Bearer
Basic requires -Credential and provides RFC-7617 Basic Authorization credentials to the remote server
OAuth and Bearer require the -Token which is a SecureString containing the bearer token to send to the remote server
If any authentication is provided for any transport scheme other than HTTPS, the request will result in an error. A user may use the -AllowUnencryptedAuthentication switch to bypass this behavior and send their secrets unencrypted at their own risk.
-Authentication does not work with -UseDefaultCredentials and will result in an error.
The existing behavior with -Credential is left untouched. When not supplying -Authentication, A user will not receive an error when using -Credential over unencrypted connections.

Code design choice is meant to accommodate more Authentication types in the future.

Documentation Needed

The 3 new parameters will need to be added to the Invoke-RestMethod and Invoke-WebRequest documentation along with examples. Syntax will need to be updated.
The old implementation always set the new object's properties to null.
For PSCore 6, we are only supporting InitialSessionState. The RunspaceConfiguration APIs were already made internal. This PR removes all the code related to RunspaceConfiguration. This also means that some public APIs have changed. Was deciding between leaving the RunspaceConfiguration parameters and throwing Unsupported, but thought it was better to have it a compile-time error. This should simplify the code base.
This commit changes instantiatiation of the list used for substring chunks in SplitWithPredicate(), to have a specific capacity whenever known up front.

Fixed inconsistent bracing style with if/else as well
- Exclude 'Get-PSHostProcessInfo', 'Enter-PSHostProcess' and 'Exit-PSHostProcess' from Unix platforms.
- Update tests and add additional cmdlets need to be excluded
IISResetMe and others added 10 commits October 25, 2017 20:02
This commit changes instantiatiation of the list used for substring chunks in SplitWithPredicate(), to have a specific capacity whenever known up front.

Fixed inconsistent bracing style with if/else as well
In case no Max-substrings argument is passed when calling -split with a predicate, set the capacity of the resulting list to item.Length + 1 to avoid re-allocating the underlying array multiple times
-split supports array input, but using -split with a scriptblock doesn't currently have any tests for array input
A previously introduced modification to the limit parameter caused subsequent strings to be processed left-to-right when using negative Max-substrings against array input. Moved to local variable maxSubstrings
Simplified logic for Right-to-left splitting by moving Abs(limit) assignment out of loop. Pre-emptive return on limit == 1 moved out as well
Re-instantiating the string buffer in SplitWithPredicate() seems unnecessary, moved declaration outside loop, replaced reassignments with buf.Clear()
In order to avoid having to either call buf.Insert() or implement a custom Reverse() method for StringBuilder, I've replaced the char buffer in SplitWithPredicate() with a List<char>
@IISResetMe
Copy link
Collaborator Author

IISResetMe commented Oct 25, 2017

Alright, I rebased like you suggested, then did a pull and a push (as there were no merge conflicts), but it still dragged all the intermediate commits from the master branch into my PR :-\ is this normal/expected or am I overlooking something obvious? @TravisEz13

@iSazonov
Copy link
Collaborator

I use

git chechout <branch>
git pull --rebase PowerShell master

I think you need remove the merge commit and try rebase again.

@IISResetMe
Copy link
Collaborator Author

@TravisEz13 @iSazonov Tried to revert the last merge locally with

git revert c8eb3407ac839e5191b99c3a719a7629d9fdd746 -m 2

but this just adds a new commit reverting all the changes of the others at once, super messy. How can I "reset" the branch (--reset hard [latest PR-speficic commit] doesn't seem to work because of the previous rebase/fast-forwarding)?

Maybe it's easier to just make a diff patch of the relevant files, create a new branch and submit a new PR?

Apologies for the git n00bness

@SteveL-MSFT
Copy link
Member

@IISResetMe the easier thing is probably to copy off the files you changed, reset this branch to before your first commit, copy over your changed files, then rebase off upstream/master, and push those changes.

If you want to do it the git way, you should git rebase -i HEAD~56 (for your 56 commits), drop all the commits that aren't yours. As it rebases, you'll hit lots of merge conflicts. You'll have to fix them, git add <file> then git rebase --continue until it's clean. Then git fetch upstream and git rebase upstream/master and resolve any merge conflicts.

@iSazonov
Copy link
Collaborator

You can remove last commits by

git reset --hard ooooooo

@IISResetMe
Copy link
Collaborator Author

Talked to @SteveL-MSFT offline, and decided to start over and push the changes to a new branch with the following commits:

IISResetMe@81873b3 (tests)
IISResetMe@5e5e3e2 (SplitWithPattern())
IISResetMe@cf47352 (SplitWithPredicate())

Only concern is that we lose a bit of context from this PR, so wanted to check with you @TravisEz13 @iSazonov before submitting a new PR off the new branch

@iSazonov
Copy link
Collaborator

Ok, only please add a comment in new PR that it overload the PR.

@IISResetMe
Copy link
Collaborator Author

Re-submitted as #5270, please review. Thanks

@IISResetMe IISResetMe closed this Oct 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.