-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Improve performance of CheckSuspiciousContent #18665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
I tested this and found the switch statement is more faster (looking asm code I guess C# compiler tries to do a binary search). |
src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will blindly doing h |= 0x20 cause any problem? h could be any character, not just ASCII ones.
The original code is way more readable than this, and I double if the perf gain is worth sacrificing the readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not problem because we reject all except A-Z, a-z, and -.
Perf win of the change is ~10% because:
- old code optimized for upper case chars
- new code does less operations for any chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are talking about improvements in nanosecond level. I don't think it's the bottleneck to the overall script execution, and because of that, I prefer to keep the original code for readability reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doing a Slice here? Can't we use runningHash directly in the loop below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change allows compiler to remove boundary checks in loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use Slice here, so as to make a separate fix easier. Just replacing the array with span would already be a great change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the Slice but this stopping compiler from removing boundary checks in the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have the Slice with comment:
// We need cut unused tail. Since 'i' is current position, the actual length is 'i + 1'. Ex., if i = 0 (first char in the string) the span length is 1, if i = 1 (second char) the span length is 2, and so on.
Span<uint> rh = runningHash.Slice(0, Math.Min(i + 1, runningHash.Length));There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that rh[j - 1] is used in the loop, I doubt the boundary check on that can be eliminated too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this works for rh[j] only.
I was thinking about vectorization, but I assume that you will be against the complication of the code. :-)
Since we have #18693 with tests can we return span?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, let's not using Slice here, just to keep the code simple and readable.
If you want, you can eliminate the call to Math.Min by doing j = val1 < val2 ? val1 : val2 directly before the loop.
Or better than that, have a local Min method that implements Math.Min. So, using Min(a, b) allows the jitter to inline the method as needed, while still giving us the same readable code.
src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are going to change how the hashes are generated, you need to update the hashes in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still used on line 2095
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still used on line 2095
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is not a for (i = 0; i < span.Length; i++) pattern, are you sure the boundary check will be eliminated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't reflect the actual situation of this method. This does.
Having Slice here moves the boundary check from within the loop to outside the loop, so it is better. Also, it doesn't affect readability, so it's good to have this change.
|
This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days. |
Co-authored-by: Dongbo Wang <dongbow@microsoft.com>
This reverts commit c5ef64148cce901daf06476d95b07b12b286238b.
|
@daxian-dbw I changed the direction of the loop so that all the boundary checks are removed. Previously, I removed all allocations and made a quick character filter. So now this is the fastest version. |
|
@iSazonov @TravisEz13's comments were not addressed. Also, I doubt if Travis still has his original tests available, but will let Travis to confirm. |
|
This PR has Quantification details
Why proper sizing of changes matters
Optimal pull request sizes drive a better predictable PR flow as they strike a
What can I do to optimize my changes
How to interpret the change counts in git diff output
Was this comment helpful? 👍 :ok_hand: :thumbsdown: (Email) |
|
LOG const remove at all. |
|
This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days. |
PR Summary
By commit:
Emit) has length 4.Results for
Match("aaaaaaaEmitaaaaaaa"):PR Context
PR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerWIP:or[ WIP ]to the beginning of the title (theWIPbot will keep its status check atPendingwhile the prefix is present) and remove the prefix when the PR is ready.(which runs in a different PS Host).