Skip to content

Conversation

@iSazonov
Copy link
Collaborator

@iSazonov iSazonov commented Nov 26, 2022

PR Summary

By commit:

  1. Use MIN_LEN and MAX_LEN constants for better readability.
  2. Skip hash searches for length < MIN_LEN.
    • Before we searched for strings with lengths 1, 2, and 3 but shortest pattern (Emit) has length 4.
  3. Use Span<> instead of array
    • Exclude array allocations
    • Exclude boundary checks in loops
  4. Replace multiplication with more efficient operations
    • Multiplying a value by 31 is multiplying by 32 minus one value. Then change the multiplication to a shift. (The compiler knows how to replace multiplication by a power of two with a shift, but we do this explicitly to be safe.)
  5. Improve to-lower-case
    • before we do 4-8 operations, 6 in most cases
    • after we do 3-5 operations, 3 in most cases

Results for Match("aaaaaaaEmitaaaaaaa"):

BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.2251/21H2/November2021Update)
Intel Core i5-2410M CPU 2.30GHz (Sandy Bridge), 1 CPU, 4 logical and 2 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX
  DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX

Method text Mean Ratio Code Size Allocated Alloc Ratio
Orig aaaaaaaEmitaaaaaaa 423.8 ns 1.00 4,445 B 144 B 1.00
Fast aaaaaaaEmitaaaaaaa 266.0 ns 0.63 4,493 B - 0.00

PR Context

PR Checklist

@iSazonov iSazonov added the CL-Performance Indicates that a PR should be marked as a performance improvement in the Change Log label Nov 26, 2022
@SteveL-MSFT
Copy link
Member

LookUpHash() uses a large switch statement, would it be faster to change to a dictionary?

@iSazonov
Copy link
Collaborator Author

LookUpHash() uses a large switch statement, would it be faster to change to a dictionary?

I tested this and found the switch statement is more faster (looking asm code I guess C# compiler tries to do a binary search).

Comment on lines +2031 to +2036
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will blindly doing h |= 0x20 cause any problem? h could be any character, not just ASCII ones.
The original code is way more readable than this, and I double if the perf gain is worth sacrificing the readability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not problem because we reject all except A-Z, a-z, and -.
Perf win of the change is ~10% because:

  • old code optimized for upper case chars
  • new code does less operations for any chars.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are talking about improvements in nanosecond level. I don't think it's the bottleneck to the overall script execution, and because of that, I prefer to keep the original code for readability reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doing a Slice here? Can't we use runningHash directly in the loop below?

Copy link
Collaborator Author

@iSazonov iSazonov Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change allows compiler to remove boundary checks in loop.

Copy link
Member

@daxian-dbw daxian-dbw Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use Slice here, so as to make a separate fix easier. Just replacing the array with span would already be a great change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the Slice but this stopping compiler from removing boundary checks in the loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have the Slice with comment:

// We need cut unused tail. Since 'i' is current position, the actual length is 'i + 1'. Ex., if i = 0 (first char in the string) the span length is 1, if i = 1 (second char) the span length is 2, and so on.
Span<uint> rh = runningHash.Slice(0, Math.Min(i + 1, runningHash.Length));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that rh[j - 1] is used in the loop, I doubt the boundary check on that can be eliminated too.

Copy link
Collaborator Author

@iSazonov iSazonov Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this works for rh[j] only.
I was thinking about vectorization, but I assume that you will be against the complication of the code. :-)

Since we have #18693 with tests can we return span?

Copy link
Member

@daxian-dbw daxian-dbw Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, let's not using Slice here, just to keep the code simple and readable.

If you want, you can eliminate the call to Math.Min by doing j = val1 < val2 ? val1 : val2 directly before the loop.
Or better than that, have a local Min method that implements Math.Min. So, using Min(a, b) allows the jitter to inline the method as needed, while still giving us the same readable code.

@PowerShell PowerShell deleted a comment from iSazonov Nov 30, 2022
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to change how the hashes are generated, you need to update the hashes in the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still used on line 2095

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still used on line 2095

@ghost ghost added the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Nov 30, 2022
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is not a for (i = 0; i < span.Length; i++) pattern, are you sure the boundary check will be eliminated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't reflect the actual situation of this method. This does.

Having Slice here moves the boundary check from within the loop to outside the loop, so it is better. Also, it doesn't affect readability, so it's good to have this change.

@ghost ghost removed the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Dec 1, 2022
@ghost ghost added the Review - Needed The PR is being reviewed label Dec 9, 2022
@ghost
Copy link

ghost commented Dec 9, 2022

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@iSazonov
Copy link
Collaborator Author

@daxian-dbw I changed the direction of the loop so that all the boundary checks are removed. Previously, I removed all allocations and made a quick character filter. So now this is the fastest version.
In addition, I added result length check, which should noticeably reduce the number of collisions in real-world scenarios. If Travis sees it in his tests, maybe you (team) can approve this PR. Otherwise just close it.

@ghost ghost removed the Review - Needed The PR is being reviewed label Dec 16, 2022
@daxian-dbw
Copy link
Member

@iSazonov @TravisEz13's comments were not addressed. Also, I doubt if Travis still has his original tests available, but will let Travis to confirm.

@pull-request-quantifier-deprecated

This PR has 29 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Extra Small
Size       : +16 -13
Percentile : 11.6%

Total files changed: 1

Change summary by file extension:
.cs : +16 -13

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

@iSazonov
Copy link
Collaborator Author

LOG const remove at all.

@ghost ghost added the Review - Needed The PR is being reviewed label Dec 24, 2022
@ghost
Copy link

ghost commented Dec 24, 2022

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@iSazonov iSazonov closed this Apr 23, 2023
@ghost ghost removed the Review - Needed The PR is being reviewed label Apr 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-Performance Indicates that a PR should be marked as a performance improvement in the Change Log Extra Small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants