WIP: Use new Compile() overload in CompileTree() #7731

iSazonov · 2018-09-07T03:52:10Z

PR Summary

Related #2230.

PR Checklist

lzybkr · 2018-09-07T06:27:38Z

I'm not sure this is a good idea - why we would want to use 2 different interpreters?

iSazonov · 2018-09-07T06:50:51Z

I think that it does not make sense to port (update) the code from CoreFX here and it's better to migrate to CoreFX interpreter. I see that it doesn't support Compile() overload with threshold. So first step is to use Compile(preferInterpretation: true);.
The second step is in question. We could use tiered compilation (instead of Compile() overload with threshold) and remove our interpreter.
https://blogs.msdn.microsoft.com/dotnet/2018/08/02/tiered-compilation-preview-in-net-core-2-1/
There already is TieredCompilation_Tier1CallCountThreshold = 30
https://github.com/dotnet/coreclr/blob/f6174b93d100d46f4641f040b6de5fa254c1ee71/Documentation/project-docs/clr-configuration-knobs.md

From https://github.com/dotnet/coreclr/issues/4331 I see that we can get benefits for crossgened code too.

iSazonov · 2018-09-07T07:53:41Z

I see #7729 - grossgen doesn't work with framework-dependent deployment and tired compilation will come in handy.
/cc @SteveL-MSFT

iSazonov · 2018-09-07T14:12:00Z

I checked with PerfView that TC works. Seems the full set of tests is performed on CIs about the same time.

lzybkr · 2018-09-07T16:05:12Z

Why do you think it's better to migrate? Do you have data that shows it's faster?

It does not jit compile, so loops will be ~50X slower.

daxian-dbw · 2018-09-08T05:52:23Z

@iSazonov The interpreter in PowerShell was updated to act like the tiered compilation. For a script block or a loop in it that contains less than 300 statements, the script block and the loop will initially be evaluated in the interpreted way (fast startup), and after running for a certain number of times, they will be compiled and executed in the jitted native code (better stable performance).

The JIT tiered compilation cannot replace this optimization. The tiered compilation will optimize the Run methods from certain instructions, but no matter how JIT is able to optimize those individual methods, the script is still being evaluated in an interpreted way -- fetch an instruction, data gets pushed to a stack in the interpreter, run some C# code, pop the data, save to local variable list, fetch the next instruction, etc. However, after a compiled delegate gets created on demand, the script will be running directly in the jitted native code, and it's possible for the compiled delegate to further benefit from the tiered compilation and get even better performance.

You can see it as a 3-tiered compilation.

iSazonov · 2018-09-10T14:25:25Z

I added a commit with test hook to switch interpreter/compiler.
Test script:

[System.Management.Automation.Internal.InternalTestHooks]::SetTestHook("ExpressionCompile", $true)

$step1 = measure-command {
for ($i = 0; $i -lt 30; $i++) {
    $a+=1
}
}


[System.Management.Automation.Internal.InternalTestHooks]::SetTestHook("ExpressionCompile", $false)
$step2 = measure-command {
for ($i = 0; $i -lt 30; $i++) {
    $a+=1
}
}

$step1.TotalMilliseconds
$step2.TotalMilliseconds

Results (copy-paste the script to console ):
<Updated because of a bug in test script>

Iterations	TC Crossgened interpreter	TC Crossgened Compiler	Crossgened interpreter	Crossgened Compiler	6.1 RC
30	4	9	4	10	4
300	6	10	7	11	6
1000	17	15	14	13	9
3000	34	19	34	19	17
30000	266	88	296	94	92

~~The results (although I can not consider these results reliable) show that with current change we get better results then with 6.1 RC.~~

I think it's worth it to study further.

/cc @powercode maybe you will be interested.

lzybkr · 2018-09-10T14:54:07Z

Are you certain your experiment is valid? It's possible you only execute one code path because of caching.

iSazonov · 2018-09-10T16:44:40Z

@lzybkr What cache do you mean? I added a test hook to explicitly switch from interpreter (CoreFX) to compiler.

lzybkr · 2018-09-10T16:58:40Z

PowerShell caches script block definitions to avoid recompiling, so I was just asking that you confirm your experiment hits the code paths you expect.

I measured a 50X slowdown when switching to the CoreFx interpreter and this did not surprise me because their interpreter no longer supports JIT.

Your results do surprise me - if JIT is happening at some point I'm happy, but I'd like pointers to where that happens or at least a solid explanation of what magic is making the new interpreter faster.

iSazonov · 2018-09-10T17:32:12Z

I run the test script in interactive session by manually copy-paste and use debugger to confirm that Compile(true)/Compile(false) is called after each the copy-paste.
I tied to run the script in cycle and get other results (many times faster). There I guess was a cache.
I am also surprised by these results. So far I'm inclined not to trust myself. Maybe I'm doing something wrong.

daxian-dbw · 2018-09-10T18:05:42Z

The measurement is too specific and doesn't reflect real scenarios, here are the reasons:

The testing script contains minimal statements, so the cost for JIT compiling is low and it doesn't reflect the cost you would get for a real scenario script with relatively many statements.
When there are too many statements in a script block, it would be too expensive to JIT compile it (cause very slow startup). The tradeoff PowerShell takes is to NeverCompile the script block in that case. With the current CompileOnDemand policy, even though the script block as a whole will never be compiled, the loops in it will still be JIT compiled after running for certian times, as long as they don't contain too many statements (> 300). However, with your change, nothing will be JIT compiled in that case, and the performance will very likely decrease comparied to the current CompileOnDemand.

And BTW, I guess tiered compilation was turned on in your local builds when compared with PS 6.1-RC. That would be another factor that changes the numbers you get from the measurements, even though I don't know how much difference that would make.

iSazonov · 2018-09-11T13:27:15Z

Yes, tiered compilation was turned. Now I tested without it and see ~20% decrease in performance in the scenario.

My main concern was that the interpreter would be much faster in the interactive session, but this fear did not materialize. It seems we could use the compilation of even small scripts in an interactive session. We can even get some benefits because even one line script can be an CPU expensive cycle and compile + tiered compilation seem to bring improvements.

Next test I did for Parser.

$text = ""
foreach ($file in dir -Recurse -Path .\test\powershell\ -Filter "*.ps1") {
    $text = dir -Recurse -Path C:\Users\sie\Documents\GitHub\iSazonov\PowerShell\test\powershell\ -Filter "*.ps1" | Get-Content -Raw 
}


Invoke-Command  -ScriptBlock { for ($j = 0; $j -lt 100; $j++) {

$step3 = measure-command {
foreach ($t in $text) {
    $tokens = $null
    $errors = $null
    [Management.Automation.Language.Parser]::ParseInput($t, [ref]$tokens, [ref]$errors) | Out-Null
}
}
$step3.Milliseconds

}} | Measure-Object -Average

Results for compile (in TC build) - 659 ms for RC1 and 662 ms for TC build - slower ~0.5%.
Results for interpreter - 673 ms for TC build - slower ~2.0%.

(I should note that the variance of th TC build is greater in al tests.)

In the scenario it seems TC does not give any advantages, but it's more likely that the parser is a high-quality code and crossgen is very good too.

Also this test shows that the following test for slow start (@daxian-dbw's point 2) will show the difference only for compilation or interpretation, because parsing will consume the same time.

iSazonov · 2018-09-11T15:36:15Z

Test for small script:

[System.Management.Automation.Internal.InternalTestHooks]::SetTestHook("ExpressionCompile", $false)

$step4a = measure-command {

for ($j1 = 0l; $j1 -lt 30000; $j1++) {

Invoke-Command  -ScriptBlock {

$no = $false
if ($no) {
    # 1
    $a += $a + 1
    $b -= $b - 1
    $c *= $c * 1Mb * 1Kb
    $d /= $d / 1Tb
    $e = "a,b,c,d" -split ","
    $f = "a", "b", "c", "d" -join ";"
    $g = New-Guid
    $i = [Math]::Max(1234567890, 12345678901234567890)
    $j = (1, 2, 3, 4, 5, 6, 7, 8, 9, 0)[9]
    $k = 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 | Measure-Object
}

} # end Invoke-Command

} # end for

} # end measure-command


[System.Management.Automation.Internal.InternalTestHooks]::SetTestHook("ExpressionCompile", $true)

$step4b = measure-command {

for ($j1 = 0l; $j1 -lt 30000; $j1++) {

Invoke-Command  -ScriptBlock {

$no = $false
if ($no) {
    # 1
    $a += $a + 1
    $b -= $b - 1
    $c *= $c * 1Mb * 1Kb
    $d /= $d / 1Tb
    $e = "a,b,c,d" -split ","
    $f = "a", "b", "c", "d" -join ";"
    $g = New-Guid
    $i = [Math]::Max(1234567890, 12345678901234567890)
    $j = (1, 2, 3, 4, 5, 6, 7, 8, 9, 0)[9]
    $k = 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 | Measure-Object
}

} # end Invoke-Command

} # end for

} # end measure-command


$step4a.TotalMilliseconds
$step4b.TotalMilliseconds

Results:
<Updated because of a bug in test script>

Iterations	RC1	Crossgened Interpretator	Crossgened Compile
30	11	11	24
300	32	34	36
3000	199	252	190
30000	1863	1928	1310

~~This result shows that even on RC1 we do possibly not need to compile small scripts. Perhaps this is due to a change between .Net Core and Framework.~~

iSazonov · 2018-09-12T15:29:23Z

Another test:

[System.Management.Automation.Internal.InternalTestHooks]::SetTestHook("ExpressionCompile", $false)
$step1 = measure-command {
for ($ii = 0l; $ii -lt 300; $ii++) {
    $a+=1
    $a += $a + 1
    $b -= $b - 1
    $c *= $c * 1Mb * 1Kb
    $d = 1;$d /= $d / 1Tb
    $e = "a,b,c,d" -split ","
    $f = "a", "b", "c", "d" -join ";"
    $g = New-Guid
    $i = [Math]::Max(123, 1234)
    $j = (1, 2, 3, 4, 5, 6, 7, 8, 9, 0)[9]
    $k = 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 | Measure-Object}
}


[System.Management.Automation.Internal.InternalTestHooks]::SetTestHook("ExpressionCompile", $true)
$step2 = measure-command {
for ($ii = 0l; $ii -lt 300; $ii++) {
    $a+=1
    $a += $a + 1
    $b -= $b - 1
    $c *= $c * 1Mb * 1Kb
    $d = 1;$d /= $d / 1Tb
    $e = "a,b,c,d" -split ","
    $f = "a", "b", "c", "d" -join ";"
    $g = New-Guid
    $i = [Math]::Max(123, 1234)
    $j = (1, 2, 3, 4, 5, 6, 7, 8, 9, 0)[9]
    $k = 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 | Measure-Object}
}

$step1.TotalMilliseconds
$step2.TotalMilliseconds

Results:

Iterations	RC1	Crossgened Interpretator	Crossgened Compile
30	11	23	19
300	50	78	53
3000	386	594	411
5000	634	978	670
10000	1256	1923	1299
20000	2534	3684	2522
30000	3747	5560	3819

iSazonov · 2018-09-12T15:33:55Z

I updated previous test results bacause of a bug in test script.
Now I see that most likely neither compile nor CoreFX interpreter will not give an improvement.
On the other hand, TC gives an improvement in performance.

iSazonov · 2018-09-13T13:39:57Z

.Net Core team announced .Net Core 2.2.0 Preview2 with TC enabled by default.
https://blogs.msdn.microsoft.com/dotnet/2018/09/12/announcing-net-core-2-2-preview-2/

We should definitely continue to investigate the effect of TC on PowerShell Core.

daxian-dbw · 2018-09-13T21:00:12Z

In my native measurement, there is 7% startup time improvement with crossgen'ed pwsh + tiered compilation enabled.

iSazonov · 2018-09-14T03:24:20Z

@daxian-dbw In the blog article PowerShell startup improvement 20% was mentioned. Could you contact directly men who did the test? Perhaps they had more PowerShell performance tests and could give advices on how to fine-tune TC for PowerShell.

daxian-dbw · 2018-09-18T19:16:00Z

@iSazonov I asked for details of the measurement, and it turned out the measurement was made with PSCore 6.0 code base built with release configuration without crossgen. PowerShell was used as a Fx Dependent application in the measurement and executed by dotnet .\bin\release\netcoreapp2.0\win7-x64\pwsh.dll -command exit.

So, the result of the measurements indicates that with a snapshot of the code base at 6.0 timeframe, the perf improvement in .NET Core 2.1 runtime and tiered compilation combined together offer a 20% startup improvement in the Fx Dependent scenario.

However, we got some degradation in startup time in the 6.1 timeframe, and the rough sources include TaskbarJumpList, Experimental Feautre Flag (configuration file access) and more. From my naive measurement, without tiered compilation, 6.1 is about 7% slower than 6.0 in startup time (crossgen'ed), and after turning on tiered compilation, 6.1 is about the same as 6.0. I'm looking in to the degradation.

iSazonov · 2018-09-19T06:06:26Z

@daxian-dbw Thanks! It is interesting!

TaskbarJumpList

I think we could move this to install phase (to msi custom action).

stale · 2018-10-19T06:52:20Z

This PR has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs within 10 days.
Thank you for your contributions.
Community members are welcome to grab these works.

iSazonov · 2018-10-23T12:03:26Z

Make sense ask CoreFX team to implement LightCompiler(threshold) ( 3-tiered compilation)?

lzybkr · 2018-10-23T17:29:07Z

You can certainly ask, though it might be enough to add support for custom instructions in the interpreter.

Today, PowerShell loops are implemented as a custom instruction which switches to the jit compiled version of the loop after sufficient iterations. Our core interpreter works similarly, but PowerShell loops were just special enough that it was a little easier to create a new instruction.

If CoreFX allowed this sort of extension, both loops and entire functions could implement their own tiered compilation strategy.

stale · 2018-11-22T18:21:48Z

This PR has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs within 10 days.
Thank you for your contributions.
Community members are welcome to grab these works.

Use new Compile() overload in CompileTree()

d9ee43a

iSazonov requested review from BrucePay and daxian-dbw as code owners September 7, 2018 03:52

iSazonov requested a review from lzybkr September 7, 2018 03:52

iSazonov added 2 commits September 7, 2018 17:59

Enable Tiered Compilation

f520a8d

Use CoreFX interpreter

ad04d74

iSazonov requested review from TravisEz13, adityapatwardhan, anmenaga and dantraMSFT as code owners September 7, 2018 13:01

Add test hook to conditionally compile

a9d60ac

iSazonov changed the title ~~Use new Compile() overload in CompileTree()~~ WIP: Use new Compile() overload in CompileTree() Sep 12, 2018

iSazonov mentioned this pull request Oct 6, 2018

Question: .Net Core Roadmap: 2.2 vs 3.0 #7956

Closed

stale bot added the Stale label Oct 19, 2018

stale bot removed the Stale label Oct 23, 2018

stale bot added the Stale label Nov 22, 2018

iSazonov closed this Nov 29, 2018

WIP: Use new Compile() overload in CompileTree() #7731

WIP: Use new Compile() overload in CompileTree() #7731

Uh oh!

Conversation

iSazonov commented Sep 7, 2018

PR Summary

PR Checklist

Uh oh!

lzybkr commented Sep 7, 2018

Uh oh!

iSazonov commented Sep 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iSazonov commented Sep 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iSazonov commented Sep 7, 2018

Uh oh!

lzybkr commented Sep 7, 2018

Uh oh!

daxian-dbw commented Sep 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iSazonov commented Sep 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lzybkr commented Sep 10, 2018

Uh oh!

iSazonov commented Sep 10, 2018

Uh oh!

lzybkr commented Sep 10, 2018

Uh oh!

iSazonov commented Sep 10, 2018

Uh oh!

daxian-dbw commented Sep 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iSazonov commented Sep 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iSazonov commented Sep 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iSazonov commented Sep 12, 2018

Uh oh!

iSazonov commented Sep 12, 2018

Uh oh!

iSazonov commented Sep 13, 2018

Uh oh!

daxian-dbw commented Sep 13, 2018

Uh oh!

iSazonov commented Sep 14, 2018

Uh oh!

daxian-dbw commented Sep 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iSazonov commented Sep 19, 2018

Uh oh!

stale bot commented Oct 19, 2018

Uh oh!

iSazonov commented Oct 23, 2018

Uh oh!

lzybkr commented Oct 23, 2018

Uh oh!

stale bot commented Nov 22, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iSazonov commented Sep 7, 2018 •

edited

Loading

iSazonov commented Sep 7, 2018 •

edited

Loading

daxian-dbw commented Sep 8, 2018 •

edited

Loading

iSazonov commented Sep 10, 2018 •

edited

Loading

daxian-dbw commented Sep 10, 2018 •

edited

Loading

iSazonov commented Sep 11, 2018 •

edited

Loading

iSazonov commented Sep 11, 2018 •

edited

Loading

daxian-dbw commented Sep 18, 2018 •

edited

Loading