Skip to content

Conversation

@CarloToso
Copy link
Contributor

PR Summary

  • Replace var
  • Use GeneratedRegex
  • Remove EnsureHtmlParser() (no longer needed)
  • Remove unneeded code (we already use UTF8 as default)
                if (string.IsNullOrEmpty(characterSet) && ContentHelper.IsJson(contentType))
                {
                    characterSet = Encoding.UTF8.HeaderName;
                }

PR Context

PR Checklist

@CarloToso CarloToso changed the title BasicHtmlWebResponseObject.Common.cs cleanup Webcmdlets cleanup Mar 10, 2023
private static Regex s_linkRegex;
private static Regex s_tagRegex;
[GeneratedRegex(@"([^""'>/=\s\p{Cc}]+)(?:\s*=\s*(?:""(.*?)""|'(.*?)'|([^'"">\s]+)))?", RegexOptions.Singleline | RegexOptions.IgnoreCase)]
private static partial Regex s_attribNameValueRegex();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point of this change. The original Regex is compiled and works just as fast, but it is more compact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to this change we can remove EnsureHtmlParser();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what are the benefits? The final code is still noticeably larger, but the performance does not increase.

Copy link
Contributor Author

@CarloToso CarloToso Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will revert the regex stuff, move the remaining changes to the other webcmdlet cleanup PR and close this PR

Copy link

@PaulusParssinen PaulusParssinen Mar 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's significant difference between RegexOptions.Compiled and GeneratedRegex. Regex Source Generator will not only reduce startup time but also increase performance.
I can't simply put it into words how much better generated regex is, thankfully Stephen Toub has great post about them here https://devblogs.microsoft.com/dotnet/regular-expression-improvements-in-dotnet-7/#source-generation (In fact, .NET team wants people to switch to GeneratedRegex so bad that they want to intercept every new Regex(..) call and delegate them to source generated one instead)

Stephen also talks about them here https://www.youtube.com/live/rwfNDyBBgks?t=1549
Nick Chapsas has also a quick summary of them https://www.youtube.com/watch?v=WosEhlHATOk

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't simply put it into words how much better generated regex is,

When it comes to code performance neither words nor emotions work - you have to prove with tests that a particular code will become smaller and faster.

In this particular case, the code will only get bigger because PowerShell is dynamic and can never completely get rid of the Regex compiler. The initial improvements won't be noticeable since the initialization of the cmdlet takes a significant amount of time.

Where we would like Regex SG and where it is justified is the pwsh startup scenario. And there are indeed places where this SG can be applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov if you pointed me to such cases I could try to apply Regex SG

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it comes to code performance neither words nor emotions work - you have to prove with tests that a particular code will become smaller and faster.

You're correct here. I should have provided data for my claims.

In this particular case, the code will only get bigger because PowerShell is dynamic and can never completely get rid of the Regex compiler. The initial improvements won't be noticeable since the initialization of the cmdlet takes a significant amount of time.

Where we would like Regex SG and where it is justified is the pwsh startup scenario. And there are indeed places where this SG can be applied.

I didn't realize this code wasn't used in the pwsh itself, I admit my mistake. In this case delaying parsing and doing all the complex IL emitting is justified, if this code is not called in a cmdlet.

This also makes me want to compare the size of the IL emitted for RegexOptions.Compiled and the IL compiled by Roslyn for the source generated one.. interesting.

@pull-request-quantifier-deprecated

This PR has 76 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Small
Size       : +28 -48
Percentile : 30.4%

Total files changed: 2

Change summary by file extension:
.cs : +28 -48

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

CarloToso added a commit to CarloToso/PowerShell that referenced this pull request Mar 10, 2023
@CarloToso CarloToso mentioned this pull request Mar 10, 2023
22 tasks
@CarloToso CarloToso closed this Mar 10, 2023
@CarloToso CarloToso deleted the basichtmlwebresponse-regex branch March 10, 2023 23:14
@SavageHaxCash
Copy link

SavageHaxCash commented Mar 11, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants