-
Notifications
You must be signed in to change notification settings - Fork 8.1k
WIP: Use new Compile() overload in CompileTree() #7731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm not sure this is a good idea - why we would want to use 2 different interpreters? |
|
I think that it does not make sense to port (update) the code from CoreFX here and it's better to migrate to CoreFX interpreter. I see that it doesn't support Compile() overload with threshold. So first step is to use From https://github.com/dotnet/coreclr/issues/4331 I see that we can get benefits for crossgened code too. |
|
I see #7729 - grossgen doesn't work with framework-dependent deployment and tired compilation will come in handy. |
|
I checked with PerfView that TC works. Seems the full set of tests is performed on CIs about the same time. |
|
Why do you think it's better to migrate? Do you have data that shows it's faster? It does not jit compile, so loops will be ~50X slower. |
|
@iSazonov The interpreter in PowerShell was updated to act like the tiered compilation. For a script block or a loop in it that contains less than 300 statements, the script block and the loop will initially be evaluated in the interpreted way (fast startup), and after running for a certain number of times, they will be compiled and executed in the jitted native code (better stable performance). The JIT tiered compilation cannot replace this optimization. The tiered compilation will optimize the You can see it as a 3-tiered compilation. |
|
I added a commit with test hook to switch interpreter/compiler. [System.Management.Automation.Internal.InternalTestHooks]::SetTestHook("ExpressionCompile", $true)
$step1 = measure-command {
for ($i = 0; $i -lt 30; $i++) {
$a+=1
}
}
[System.Management.Automation.Internal.InternalTestHooks]::SetTestHook("ExpressionCompile", $false)
$step2 = measure-command {
for ($i = 0; $i -lt 30; $i++) {
$a+=1
}
}
$step1.TotalMilliseconds
$step2.TotalMillisecondsResults (copy-paste the script to console ):
I think it's worth it to study further. /cc @powercode maybe you will be interested. |
|
Are you certain your experiment is valid? It's possible you only execute one code path because of caching. |
|
@lzybkr What cache do you mean? I added a test hook to explicitly switch from interpreter (CoreFX) to compiler. |
|
PowerShell caches script block definitions to avoid recompiling, so I was just asking that you confirm your experiment hits the code paths you expect. I measured a 50X slowdown when switching to the CoreFx interpreter and this did not surprise me because their interpreter no longer supports JIT. Your results do surprise me - if JIT is happening at some point I'm happy, but I'd like pointers to where that happens or at least a solid explanation of what magic is making the new interpreter faster. |
|
I run the test script in interactive session by manually copy-paste and use debugger to confirm that Compile(true)/Compile(false) is called after each the copy-paste. |
|
The measurement is too specific and doesn't reflect real scenarios, here are the reasons:
And BTW, I guess tiered compilation was turned on in your local builds when compared with PS 6.1-RC. That would be another factor that changes the numbers you get from the measurements, even though I don't know how much difference that would make. |
|
Yes, tiered compilation was turned. Now I tested without it and see ~20% decrease in performance in the scenario. My main concern was that the interpreter would be much faster in the interactive session, but this fear did not materialize. It seems we could use the compilation of even small scripts in an interactive session. We can even get some benefits because even one line script can be an CPU expensive cycle and compile + tiered compilation seem to bring improvements. Next test I did for Parser. $text = ""
foreach ($file in dir -Recurse -Path .\test\powershell\ -Filter "*.ps1") {
$text = dir -Recurse -Path C:\Users\sie\Documents\GitHub\iSazonov\PowerShell\test\powershell\ -Filter "*.ps1" | Get-Content -Raw
}
Invoke-Command -ScriptBlock { for ($j = 0; $j -lt 100; $j++) {
$step3 = measure-command {
foreach ($t in $text) {
$tokens = $null
$errors = $null
[Management.Automation.Language.Parser]::ParseInput($t, [ref]$tokens, [ref]$errors) | Out-Null
}
}
$step3.Milliseconds
}} | Measure-Object -AverageResults for compile (in TC build) - 659 ms for RC1 and 662 ms for TC build - slower ~0.5%. (I should note that the variance of th TC build is greater in al tests.) In the scenario it seems TC does not give any advantages, but it's more likely that the parser is a high-quality code and crossgen is very good too. Also this test shows that the following test for slow start (@daxian-dbw's point 2) will show the difference only for compilation or interpretation, because parsing will consume the same time. |
|
Test for small script: [System.Management.Automation.Internal.InternalTestHooks]::SetTestHook("ExpressionCompile", $false)
$step4a = measure-command {
for ($j1 = 0l; $j1 -lt 30000; $j1++) {
Invoke-Command -ScriptBlock {
$no = $false
if ($no) {
# 1
$a += $a + 1
$b -= $b - 1
$c *= $c * 1Mb * 1Kb
$d /= $d / 1Tb
$e = "a,b,c,d" -split ","
$f = "a", "b", "c", "d" -join ";"
$g = New-Guid
$i = [Math]::Max(1234567890, 12345678901234567890)
$j = (1, 2, 3, 4, 5, 6, 7, 8, 9, 0)[9]
$k = 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 | Measure-Object
}
} # end Invoke-Command
} # end for
} # end measure-command
[System.Management.Automation.Internal.InternalTestHooks]::SetTestHook("ExpressionCompile", $true)
$step4b = measure-command {
for ($j1 = 0l; $j1 -lt 30000; $j1++) {
Invoke-Command -ScriptBlock {
$no = $false
if ($no) {
# 1
$a += $a + 1
$b -= $b - 1
$c *= $c * 1Mb * 1Kb
$d /= $d / 1Tb
$e = "a,b,c,d" -split ","
$f = "a", "b", "c", "d" -join ";"
$g = New-Guid
$i = [Math]::Max(1234567890, 12345678901234567890)
$j = (1, 2, 3, 4, 5, 6, 7, 8, 9, 0)[9]
$k = 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 | Measure-Object
}
} # end Invoke-Command
} # end for
} # end measure-command
$step4a.TotalMilliseconds
$step4b.TotalMillisecondsResults:
|
|
Another test: [System.Management.Automation.Internal.InternalTestHooks]::SetTestHook("ExpressionCompile", $false)
$step1 = measure-command {
for ($ii = 0l; $ii -lt 300; $ii++) {
$a+=1
$a += $a + 1
$b -= $b - 1
$c *= $c * 1Mb * 1Kb
$d = 1;$d /= $d / 1Tb
$e = "a,b,c,d" -split ","
$f = "a", "b", "c", "d" -join ";"
$g = New-Guid
$i = [Math]::Max(123, 1234)
$j = (1, 2, 3, 4, 5, 6, 7, 8, 9, 0)[9]
$k = 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 | Measure-Object}
}
[System.Management.Automation.Internal.InternalTestHooks]::SetTestHook("ExpressionCompile", $true)
$step2 = measure-command {
for ($ii = 0l; $ii -lt 300; $ii++) {
$a+=1
$a += $a + 1
$b -= $b - 1
$c *= $c * 1Mb * 1Kb
$d = 1;$d /= $d / 1Tb
$e = "a,b,c,d" -split ","
$f = "a", "b", "c", "d" -join ";"
$g = New-Guid
$i = [Math]::Max(123, 1234)
$j = (1, 2, 3, 4, 5, 6, 7, 8, 9, 0)[9]
$k = 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 | Measure-Object}
}
$step1.TotalMilliseconds
$step2.TotalMilliseconds
Results:
|
|
I updated previous test results bacause of a bug in test script. |
|
.Net Core team announced .Net Core 2.2.0 Preview2 with TC enabled by default. We should definitely continue to investigate the effect of TC on PowerShell Core. |
|
In my native measurement, there is 7% startup time improvement with crossgen'ed pwsh + tiered compilation enabled. |
|
@daxian-dbw In the blog article PowerShell startup improvement 20% was mentioned. Could you contact directly men who did the test? Perhaps they had more PowerShell performance tests and could give advices on how to fine-tune TC for PowerShell. |
|
@iSazonov I asked for details of the measurement, and it turned out the measurement was made with PSCore 6.0 code base built with So, the result of the measurements indicates that with a snapshot of the code base at 6.0 timeframe, the perf improvement in .NET Core 2.1 runtime and tiered compilation combined together offer a 20% startup improvement in the Fx Dependent scenario. However, we got some degradation in startup time in the 6.1 timeframe, and the rough sources include |
|
@daxian-dbw Thanks! It is interesting!
I think we could move this to install phase (to msi custom action). |
|
This PR has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs within 10 days. |
|
Make sense ask CoreFX team to implement |
|
You can certainly ask, though it might be enough to add support for custom instructions in the interpreter. Today, PowerShell loops are implemented as a custom instruction which switches to the jit compiled version of the loop after sufficient iterations. Our core interpreter works similarly, but PowerShell loops were just special enough that it was a little easier to create a new instruction. If CoreFX allowed this sort of extension, both loops and entire functions could implement their own tiered compilation strategy. |
|
This PR has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs within 10 days. |
PR Summary
Related #2230.
PR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerWIP:to the beginning of the title and remove the prefix when the PR is ready.[feature]if the change is significant or affects feature tests