-
Notifications
You must be signed in to change notification settings - Fork 133
Enhance some cmdlets with Culture parameter #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vexx32
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up! Couple small nits.
Overall I think this is a good idea, but I'm not sure about the case-sensitive by default for most of the suggested values. It does match the c# underpinnings but it's also the reverse of PowerShell's default, where case-_in_sensitive is the assumption. I don't know about suddenly switching that around with these parameters.
1-Draft/RFC0000-RFC-Enhance-some-cmdlets-with-Culture-and-Ordinal.md
Outdated
Show resolved
Hide resolved
1-Draft/RFC0000-RFC-Enhance-some-cmdlets-with-Culture-and-Ordinal.md
Outdated
Show resolved
Hide resolved
…nal.md Co-Authored-By: iSazonov <darpa@yandex.ru>
…nal.md Co-Authored-By: iSazonov <darpa@yandex.ru>
@vexx32 The RFC doesn't suggest to change the case defaults. |
|
Ah, you're right. That was the alternate proposal, I misread. I'm good with the primary proposal. 👍 |
Alternative proposal is not a breaking change. It only keep consistency - one API for script and binary. |
|
@SteveL-MSFT @vexx32 @mklement0 Could you please ping community to discuss the RFC? |
|
(As an aside: I personally cannot ping any community - I have no blog and have only a handful of Twitter followers; I rarely tweet myself.) As currently written, your RFC doesn't provide adequate motivation; there's more to be included from PowerShell/PowerShell#9348. As currently written, the RFC reads like a passive-aggressive resistance to the decision made in PowerShell/PowerShell#9348 (comment): Your desire for unconditional consistency between PowerShell and CoreFx / C# APIs is misguided, and such a discussion doesn't belong in this RFC. |
|
@PowerShell/powershell-committee had a limited review on this today (only 3 out of 6 of us in attendance). A couple notes:
|
|
Before adding commits, I want to discuss it here. Several cmdlets perform operations based solely on the current culture:
while the full set of capabilities should include operations with an indication of a specific culture, and not only linguistic operations but also non-linguistic operations. There are areas of scenarios that rather require non-linguistic operations (such as searching in logs). Linguistic operations include CurrentCulture, specific Culture and InvariantCulture. Non-linguistic is Ordinal. We can also see such a comment (Another confirmation This suggests that OrdinalIgnoreCase is based on InvariantCulture (linguistic) .Net Core actively uses local performance optimization for ASCII characters. Measurements show that OrdinalIgnoreCase for ASCII characters is twice as fast as for other characters (which also confirms the use of InvariantCulture). Such optimization leads to the discrimination of languages other than English that looks unjustified. To achieve this, the CoreFXLab repository package is implemented with support for Simple Case Folding (SCF) based on Unicode standard. It should be noted that Unicode support in Core is based on the implementation of the underlying OS, which sometimes implies different behavior. With SFC, we will always have the same behavior on all platforms. (This is by the way important for PowerShell identifiers, but it’s not in the scope of this RFC) (Also there was an issue for HashSet fixed with InvariantCulture but it seems SFC is more suitable.) So the suggestion is to
Suggestions for implementation:
Select-String -Culture ru-RU -CaseSensetive # lingustic
Select-String -Culture ru-RU # lingustic
Select-String -Culture Ordinal -CaseSensetive # non-lingustic
Select-String -Culture Ordinal # lingustic - OrdinalIgnoreCase
Select-String -Culture Invariant -CaseSensetive # lingustic
Select-String -Culture Invariant # lingustic - InvariantCultureIgnoreCase
Select-String -Culture SimpleCaseFolding # non-lingustic
Select-String -Culture ru-RU -CaseSensetive # lingustic as expected for Culture term
Select-String -Culture ru-RU # lingustic
Select-String -Comparison Ordinal -CaseSensetive # non-lingustic
Select-String -Comparison Ordinal # lingustic - OrdinalIgnoreCase
Select-String -Comparison Invariant -CaseSensetive # lingustic
Select-String -Comparison Invariant # lingustic - InvariantCultureIgnoreCase
Select-String -Comparison SimpleCaseFolding # non-lingustic
Select-String -SimpleMatch "^Error.+count" # Current default is CurrentCulture - we'll have to change to SCF
Select-String -FastMatch "^Error.+count" # Based on SCF
Select-String -SimpleMatch "^Error.+count" -Match "<Full Regex>"
Select-String -FastMatch "^Error (.+) count" -Match "<Full Regex>" # First do fast search a candidate string in large file
# then parse it with full-featured RegexA separate problem is that the grep / ripgrep utilities have an option to interpret the input stream from the file as byte ("binary" in their terms) which can be useful. FileSystemProvider offers us the AsByteStream parameter, which can also be useful here. Select-String -Path source.txt -AsByteStream -SimpleMatch "qwerty"
Select-String -Path source.txt -Encoding Utf8 -SimpleMatch "qwerty" |
|
@PowerShell/powershell-committee reviewed your latest reply. First, really appreciate the additional examples and exposition. Feel free to incorporate them into the RFC when you're ready. Right now, we're leaning towards your proposal 2 with some notes:
|
I agree with best design ( what is best? :-) ). And it depends on how we resolve:
It's a vicious circle. :-) Core team want to see use-cases before. We want to see the API in .Net Core before. @joeyaiello Could you please discuss the SimpleCaseFolding design and incorporating in .Net Core with Core team? After that, a way will be opened without danger of making breaking changes in future. |
|
@mklement0 Could you please take another look? |
|
@PowerShell/powershell-committee reviewed some more:
|
Thanks! My main concern - if .Net Core will have another API design we will fall in a breaking change. If you could speed up the start of this design process it would be very good. Of cause we can postpone SimpleCaseFolding implementation without removing from the RFC. My current thoughts is to implement Culture, then Comparison, then FastMatch.
This was added only to show the full picture. It is not in the RFC and we could have another RFC for this proposal.
If you ask about My vision today is that we could split the work on 2 main parts:
I hope now the RFC ready for final review. |
|
@joeyaiello @SteveL-MSFT The RFC is ready for final review. |
| ##### Samples | ||
|
|
||
| ```powershell | ||
| Select-String -Culture ru-RU -CaseSensitive # lingustic as expected for Culture term |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some of the examples you already have in the code PR tests as examples here? Specifically looking to have both inputs and expected outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
Thanks for the updates, @iSazonov! We'll put this one on our list to review |
|
@iSazonov given our desire to move towards the 2nd proposal you floated in this comment (and it looks like you're in agreement), the next step here would be for you to update the RFC to reflect that proposal (if you're still interested, of course). Unfortunately, we were not able to move the ball forward with the .NET team on the SimpleCaseFolding changes, but I think that there are still benefits here with the We'd also want to confirm that the default, parameter-less invocation of |
Currently .Net 6.0 was moved to ICU and if I understand correctly, now OrdinalIgnoreCase works as SimpleCaseFolding. But they lost a lot of performance because of p/invokes. They are going to invest in ICU and benefit from PGO.
You want working groups, don't you? Let's leave it to them. |
SteveL-MSFT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, WG is discussing this one
No description provided.