Skip to content

Conversation

@iSazonov
Copy link
Collaborator

@iSazonov iSazonov commented Aug 20, 2019

PR Summary

  • Remove extra cast to PSObject in ToStringEnumerable() method
  • Remove extra Remove() call

PerfView screenshot: left - before the change, right - after the fix.
image

PR Context

Fix #10377

For the scenario:

$w2=${C:\tmp\source2.csv}
for ($i=1; $i -le 100; $i++) { $tmp = $w2 -join "`r`n" }

in $w2 we have an array of strings (not PSObjects!) and join the strings. Before the fix in ToStringEnumerable() we did extra cast every string from $w2 to PSObject which caused unnecessary memory allocations and slow down the scenario because of skipping fast code path for value types.

PR Checklist

@iSazonov iSazonov added the CL-Performance Indicates that a PR should be marked as a performance improvement in the Change Log label Aug 20, 2019
@iSazonov iSazonov added this to the 7.0.0-preview.4 milestone Aug 20, 2019
@iSazonov iSazonov self-assigned this Aug 20, 2019
Copy link
Member

@daxian-dbw daxian-dbw Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By wrapping the object into a PSObject, PSObject.ToString will be able to find the ToString(...) script method/code method defined for the type of the object in types.ps1xml files, so as to use that in precedence.
So simply removing this step will be a breaking change.

I looked at this issue myself, and currently ToStringEnumerator and ToStringEnumerable behave differently in terms of how the ToString(...) methods defined in types.ps1xml files are honored for the element objects.
From the completeness perspective, ToStringEnumerator should do the same as ToStringEnumerable -- wrapping the object before passing to PSObject.ToString.

A simpler and narrow-scoped fix would be to special case string objects:

if (obj != null)
{
    object mshObj = obj -is string ? obj : PSObject.AsPSObject(obj);
    returnValue.Append(PSObject.ToString(context, mshObj, separator, format, formatProvider, false, false));
}

It's technically possible that a ToString method is defined for string type in types.ps1xml, but I don't think that makes any sense practically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts were that:

  • having TryFastTrackPrimitiveTypes() implies that we don't want to find ToString() in types.ps1xml for primitive types.
  • it is unlikely breaking change. Why would anybody want to define custom ToString() for primitive types? I see no use. Even if so then I believe right way is to define a custom type with custom ToString().
  • only -Join is affected and affected only on half as you mentioned - ToStringEnumerator() does not cast to PSObject. So we should fix either ToStringEnumerable() or ToStringEnumerator().

I tend to accept the unlikely breaking change and fix ToStringEnumerable(), not ToStringEnumerator() - we get more benefits (great perf win as minimum).

Copy link
Member

@daxian-dbw daxian-dbw Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would anybody want to define custom ToString() for primitive types?

The thing is, you don't know whether the element of enumerable here is a primitive type. I can perfectly do (Get-ChildItem c:\) -join ";".
For primitive types, I agree it's unlikely that a ToString method is defined in ETS for them, but they are more likely than string (yes, string is not a primitive type).

only -Join is affected and affected only on half as you mentioned

This is not true. PSObject.ToStringEnumerable is used in PSObject.ToString recursively, and thus any calls to PSObject.ToString could be affected.

ToStringEnumerator() does not cast to PSObject. So we should fix either ToStringEnumerable() or ToStringEnumerator().

ToStringEnumerator() is obviously the one that should be fixed. It's missing a functionality that powershell should have provided -- honor the custom ToString method defined in ETS for a type.

Copy link
Collaborator Author

@iSazonov iSazonov Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanations!
If we call PSObject.ToString() for non-PSObject object we always falls in IFormattable code path as last resort and don't honor ETS - is it right?

The IFormattable code path doesn't allow me to fix PSObject.ToString() in universal way so I think we could call TryFastTrackPrimitiveTypes() in ToStringEnumerable() (and ToStringEnumerator()) before casting to PSObject and calling PSObject.ToString().
Last commit demo this.

@ghost ghost added the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Aug 21, 2019
@daxian-dbw daxian-dbw assigned daxian-dbw and unassigned iSazonov Aug 21, 2019
@daxian-dbw
Copy link
Member

Before the fix in ToStringEnumerable() we did extra cast every string from $w2 to PSObject which caused unnecessary memory allocations and slow down the scenario because of skipping fast code path for value types.

You will need to update this PR description since casting to PSObject is intentional. Also, please list the execution time before/after your changes.
It would be great if you can add a test to cover the case where a ToString method is defined in a type .ps1xml file for a type, and see if -join honors that method.

@ghost ghost removed the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Aug 22, 2019
@iSazonov
Copy link
Collaborator Author

I will add perf results after we get a consensus about that the fix should be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DateTime has a higher chance to get a custom ToString definition in ETS.
I think we should just strictly restrict the special handling to string and true primitive types (namely type.IsPrimitive is true).

var objType = obj.GetType();
if (objType != typeof(string) && !objType.IsPrimitive)
{
    obj = PSObject.AsPSObject(obj);
}

returnValue.Append(PSObject.ToString(context, obj, separator, format, formatProvider, false, false));
returnValue.Append(separatorToUse);

Copy link
Collaborator Author

@iSazonov iSazonov Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Decimal? Also the PR intention is performance. In your sample we add more checks.

Then I think about new test and get:

 >1,$PSVersionTable,2 -join ";"

1;System.Management.Automation.PSVersionHashTable;2

>1,@(3,4),2 -join ";"

1;System.Object[];2

I would expect recursion to work. What do you thing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, a perf change shouldn't result in any behavioral changes, and that's why I'm concerned about you being aggressive here.
I would much prefer to be conservative

  • Only treat string specially and keep on wrapping other values into PSObject if we go for the most conservative manner;
  • Only treat IsPrimitive objects specially and keep wrapping other values if we go a bit less conservative

As for decimal, it's OK that we don't cover it. Again, it's not a functionality change that we should cover all scenarios, but a perf change that targets improve common scenarios.
As for IntPtr being a primitive type, I think it's fine too, as it's not really possible to use IntPtr directly in PowerShell script anyways.

My personal preference: only treat string specially. /cc @JamesWTruher @rjmholt what's your thoughts on this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a perf change shouldn't result in any behavioral changes, and that's why I'm concerned about you being aggressive here

I am full agree and very appreciate your help!
The "aggressive" come from original issue where we consider primitive types too. I agree that for join operator strings are the most expected types for the operator. Nevertheless, I believe that there are many scenarios (like custom output formatting) when users will use the join operator in loops with primitive types. Notice that we change only an intermediate code and we added tests to exclude breaking change. So I believe we could be less conservative here. But I do not found how test the enumerator change and I'd revert the change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @JamesWTruher @rjmholt what's your thoughts on this?

I think specialising for string only makes the most sense. It's by far the most common, and the place where we don't break things. I don't think there are many people out there doing lots of string joins on other primitive types, but by breaking ETS we'd lose functionality

@ghost ghost added the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Aug 23, 2019
@daxian-dbw
Copy link
Member

Tests should be added to verify that both ToStringEnumerator and ToStringEnumerable honor the custom definition of ToString in ETS except for primitive types and the string type.

@ghost ghost removed the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Aug 23, 2019
@iSazonov iSazonov force-pushed the remove-join-extra-cast-psobject branch from 17306e7 to 8c72633 Compare August 26, 2019 05:08
@iSazonov
Copy link
Collaborator Author

I did not find how make a test for ToStringEnumerator().

@daxian-dbw daxian-dbw added the Breaking-Change breaking change that may affect users label Aug 30, 2019
@JamesWTruher
Copy link
Collaborator

The thing I'm most concerned about is the breaking nature of this change (and not being labeled initially as such). I think that we need to be very careful about addressing performance issues which result in behavioral change. And in this case, I'm not sure it's needed. I hope that we can query for type extensions (and specifically a ToString), without wrapping something in a psobject. If there's no extension for the type, then I totally agree we shouldn't wrap. But if there is, we need to use it.

I think there's far too much utility in ETS to turn our back on them. It doesn't matter if I can't see why anyone would do that, I'm sure our customers would be able to provide a number of reasons. That's the nature of a platform - it will be used in ways we can't imagine.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Sep 2, 2019

@JamesWTruher Thanks! I see your point about ETS and a breaking change.
It looks very general for the PR - we only bypass ETS for primitive types and strings. The repo docs classifiers this as "unlikely breaking change". (Notice that there already is TryFastTrackPrimitiveTypes() for some code paths.)

Please see the follow example:

Add-Type -TypeDefinition @"
namespace TestNS
{
   public class TestClass
   {
       public override string ToString()
       {
           return "1";
       }
   }
}
"@

$a1=[TestNS.TestClass]::new()

$a1
# Output is "1"

$a1 -join ";"
# Output is "1"

Update-TypeData -TypeName "TestNS.TestClass" -MemberType ScriptMethod -MemberName "ToString" -Value { return "test"}

$a1
# Output is "test"

$a1 -join ";"
# Output is "1" - unexpected?

$a1 -like "1"
# Output is "True" - unexpected?

$a1,$a1 -like "1"
# Output is - unexpected?
# "1"
# "1"

The example demonstrates two inconsistency (possible fix will be a breaking change):

  • -join with one argument works another - it calls PSObject.ToStringParser() without wrapping to psobject and ignores ETS
  • -like (and other string operators) also uses PSObject.ToStringParser() and ignores ETS

Now question is - should we have a consistency here and accept a breaking change,
in other words, must -join work as other operators and ignore ETS too? Or do we should review and fix all string operators (-join, -like, -match,...) to honor ETS?

@TravisEz13 TravisEz13 removed this from the 7.0.0-preview.4 milestone Sep 12, 2019
@TravisEz13 TravisEz13 added this to the 7.0.0-preview.5 milestone Sep 12, 2019
@iSazonov
Copy link
Collaborator Author

iSazonov commented Oct 28, 2019

@daxian-dbw It seems we have gathered the opinions of all who wanted to speak out. Common conclusion is a concern about ETS and that only string type could be fixed. Based on my previous comment I thing the string type case is also subject to the concern and I think close the PR and related issue as "By-design" with workaround to use [string]::join() (Also we could consider a switch to bypass ETS in -join).

@iSazonov iSazonov closed this Oct 29, 2019
@daxian-dbw daxian-dbw removed this from the 7.0.0-preview.6 milestone Jan 9, 2020
@daxian-dbw daxian-dbw added CL-Performance Indicates that a PR should be marked as a performance improvement in the Change Log and removed CL-Performance Indicates that a PR should be marked as a performance improvement in the Change Log labels Feb 11, 2020
@iSazonov iSazonov deleted the remove-join-extra-cast-psobject branch May 15, 2020 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Breaking-Change breaking change that may affect users CL-Performance Indicates that a PR should be marked as a performance improvement in the Change Log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance: -join slower than [string]::Join()

6 participants