-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Optimize IEnumerable variant of replace operator #14221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize IEnumerable variant of replace operator #14221
Conversation
|
This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days. |
| string replacement = PSObject.ToStringParser(context, substitute); | ||
| return regex.Replace(input, replacement); | ||
| cachedReplacementString = PSObject.ToStringParser(context, substitute); | ||
| return regex.Replace(input, cachedReplacementString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better if we can reduce the return statements. How about the following?
if (cachedReplacementString is null && cachedMatchEvaluator is null)
{
switch (substitute)
{
case string replacement:
cachedReplacementString = replacement;
break;
case ScriptBlock sb:
cachedMatchEvaluator = GetMatchEvaluator(context, sb);
break;
case object val when LanguagePrimitives.TryConvertTo(val, out cachedMatchEvaluator):
break;
default:
cachedReplacementString = PSObject.ToStringParser(context, substitute);
break;
}
}
if (cachedReplacementString is not null)
{
return regex.Replace(input, cachedReplacementString);
}
else
{
return regex.Replace(input, cachedMatchEvaluator);
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better
Do you say about readability or anything else?
Readability with "returns" looks good for me - it explicitly shows design intensions.
.Net generates optimal code https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQxNMAfAAgJgIwCwAUFgAwAEWeAdACpwAeM1ASnAOYCuANhAgKIMADgjhQoASwD2AOygBuYrnLxYxAN7Fy2ndqwBmSngBs5KcABWcAMYxybIb2twA8kMQQYUhAEkwjgAotXRC9PAoJGSFOGCRg0N02dkZyUWSGOKIEkPMrW3IoTmBYCRgYuEzs3VEAMyMKawhrAAs4ABMHJzgwOBkYAGUYBEj2SqrtWvIAWU8W/gA3CG5OT29yRpb2mZg5xeXVhABKeJDNLPHyAHpL8gBBGvgEchqJBFh1pe5yAHc4H4luF9mhB5n8ZDYxFA+ACAJ7kOClVpPDatDpwRxNbq9AZDEZmZFNVHbXZLFZeBDUE7ZCR1AIo9qdTE9PqDYYydjkCRQcgyKR2GQ8bjHc7jM4XEJYADsqQ4jFY6K6AUi0ViH02aIxzmZOLZ7EOihFVQAvsQqQkaeQ6YStrNmgtSQdOdzefzBcLxWLxTopTL0vLNXAlVEYkg1UTbfb9uT9WbQiaiLGQlBvqUWpbCsUYKVyu6Lp6vR8oH8qBRRAHtSBE+L6RqutrWXiALwy8vYg0F0I+tJyxnOIMq0M13tYlm49kxhOG6vQP79azDIQwABC3Ck1gA1gVQFWLkOFUzsQ32eRmwAiOcL5erjen9sd3Rd2UMf2K5UhsM2nZ2vZk7wTnehI0RZmJYNh2HsPytDI5AQVyPJ8jygqVlOHZ7q2o66ie5CnrkYG3gBVSPn6w79u+NbEt+DrRgaBG6G0cA1BAPCoLRCRoXWh5jhyZ70YxzH4ShBZET2+59m+qrsQeGEjBO4rxtk8bxkoODkBeEiLiua7rhoRpAA===
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or
if (cachedReplacementString is null && cachedMatchEvaluator is null)
{
(cachedReplacementString, cachedMatchEvaluator) = substitute switch
{
string replacement => (replacement, (MatchEvaluator) null),
ScriptBlock sb => (null, GetMatchEvaluator(context, sb)),
{ } val when LanguagePrimitives.TryConvertTo(val, out MatchEvaluator me) => (null, me),
_ => (PSObject.ToStringParser(context, substitute), null),
};
}
// After first call we will have necessarily either cachedReplacementString or cachedMatchEvaluator.
return cachedReplacementString is { } replacementString
? regex.Replace(input, replacementString)
: regex.Replace(input, cachedMatchEvaluator);
rjmholt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't seem to add any tests... Do we know that this code is tested?
| // After first call we will have necessarily either cachedReplacementString or cachedMatchEvaluator. | ||
| if (cachedReplacementString is not null) | ||
| { | ||
| return regex.Replace(input, cachedReplacementString); | ||
| } | ||
|
|
||
| if (cachedMatchEvaluator is not null) | ||
| { | ||
| return regex.Replace(input, cachedMatchEvaluator); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the semantics here.
Let's say we have an IEnumerable of two strings. On the first string we hit line 1031, and cache that string replacement.
On the second string, it looks like we have a cached value from the first call, so we ignore the second string and immediately use the first again?
How does that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacement string is not input string - I guess you lost this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Ok, so why on earth did the original implementation (i.e. before your change) do it like this? It's pretty easy to convert the substitute to a known result ahead of time without complicated caching logic. Once again, we've very seriously forgotten how to use object-oriented programming.
I think we need to:
- Create the substitute object once, above the loop
- Then pass it into the
ReplaceOperatorImplmethod
That's much much simpler than anything else.
private abstract class RegexReplacer
{
protected RegexReplacer(Regex regex)
{
ReplacementRegex = regex;
}
public abstract string Replace(string input);
protected Regex ReplacementRegex { get; }
}
private class StringReplacer : RegexReplacer
{
private readonly string _replacementString;
public StringReplacer(Regex regex, string replacement)
: base(regex)
{
_replacementString = str;
}
public override string Replace(string input)
{
return ReplacementRegex.Replace(input, _replacementString);
}
}
private class MatchEvaluatorReplacer : RegexReplacer
{
private readonly MatchEvaluator _replacementMatchEvaluator;
public MatchEvaluatorReplacer(Regex regex, MatchEvaluator replacement)
: base(regex)
{
_replacementMatchEvaluator = replacement;
}
public override string Replace(string input)
{
return ReplacementRegex.Replace(input, _replacementMatchEvaluator);
}
}
...
private static RegexReplacer GetSubstituteReplacer(
ExecutionContext context,
Regex regex,
object substitute)
{
switch (substitute)
{
// When the substitution is a string, we can skip all the conversion logic
case string replacement:
return new StringReplacer(regex, replacement);
// ScriptBlocks implement custom regex evaluators, so we must handle them specially
case ScriptBlock sb:
return new MatchEvaluatorReplacer(regex, GetMatchEvaluator(context, sb));
}
// Try once more to convert the substitute to a match evaluator
if (LanguagePrimitives.TryConvertTo(subtitute, out MatchEvaluator matchEvaluator))
{
return new MatchEvaluatorReplacer(regex, matchEvaluator);
}
// Finally, we just treat the substitute as a string
return new StringReplacer(regex, PSObject.ToStringParser(context, substitute));
}
...
// <main method where we're performing the logic>
{
RegexReplacer replacer = GetSubstituteReplacer(context, rr, substitute);
if (list == null)
{
...
return replacer.Replace(lvalString);
}
else
{
List<object> resultList = new List<object>();
while (ParserOps.MoveNext(context, errorPosition, list))
{
string lvalString = PSObject.ToStringParser(context, ParserOps.Current(errorPosition, list));
resultList.Add(replacer.Replace(lvalString));
}
}
}With this:
- There's no
ReplaceOperatorImplmethod at all (which I especially like becauseImplis a smell) - We only look at the substitute once, before looking at the inputs, and execute that logic once (no more reflection-heavy
switchinside the loop) - No more extraneous cached values to juggle and pass around. We only allocate the
RegexReplacerobject we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think creating classes is redundant in the context so I combined the proposal with your previous based on struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's not redundant. The class encapsulates a pre-computed result. We shouldn't be revisiting that switch statement inside the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I can live with your most recent proposal
I added one test. |
Co-authored-by: Dongbo Wang <dongbow@microsoft.com>
Co-authored-by: Dongbo Wang <dongbow@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| /// <param name="substitute">The substitute value.</param> | ||
| /// <returns>The result of the regex.Replace operation.</returns> | ||
| private static object ReplaceOperatorImpl(ExecutionContext context, string input, Regex regex, object substitute) | ||
| private struct ReplaceOperator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out ReplaceOperator is already the name of a static method. I guess let's just call it ReplaceOperatorImpl then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-))) Sorry, I'm already confused about the names.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never liked Impl because it widens the "least representational gap", but I totally acknowledge it's fine in this case. Just want to make the case that I think we should be removing Impls from the code in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never liked
Impl
I tried to avoid it but .... :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ReplaceOperatorProcessor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change it! You've done great work here @iSazonov -- thanks!
|
@PoshChan remind me in 4 hours |
|
@iSazonov can you describe the documentation we need for this? I want to follow up on the label you added. |
|
@rjmholt It seems we haven't docs for the replace operator kind. I will open new issue in docs repo. |
|
@rjmholt, this is the reminder you requested 4 hours ago |
|
🎉 Handy links: |
PR Summary
The PR does an optimization of IEnumerable variant of replace operator:
$ABigList -replace "search pattern", <substitution>.Before the change we evaluate the substitution for every element from $ABigList that can be very expensive if the list is large.
If the substitution is a string we do no extra operations.
But if the substitution is:
MatchEvaluatorclassPSObject.ToStringParser()(seeReplaceOperatorImpl()method)evaluating the substitution for every element becomes expensive.
PSObject.ToStringParser()is PowerShell magic and it is important to have it fast.Conversion to
MatchEvaluatorlooks like an edge case but it is beforePSObject.ToStringParser()so we should optimize the case too.The PR does simple optimization - after first evaluation the substitution is stored in a variable and reused for follow elements of the list.
Follow demo script takes 5 seconds before the change and 1 second after:
PR Context
Follow #14128 and #14087.
PR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerWIP:or[ WIP ]to the beginning of the title (theWIPbot will keep its status check atPendingwhile the prefix is present) and remove the prefix when the PR is ready.