-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Improve startup time by pushing some work off the main thread to thread pool #14304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| "xWebAdministration", | ||
| "xWindowsUpdate", | ||
| }; | ||
| s_knownModules = new Lazy<HashSet<string>>(GetKnownModules); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we could avoid the Lazy allocation with LazyInitializer.EnsureInitialized, like:
private static HashSet<string> KnownModules => LazyInitializer.EnsureInitialized(ref s_knownModules, GetKnownModules);|
Eh, I've always said that MSFT developers shouldn't be given the latest hardware :-) I especially have doubts about GetPolicySetting(). It is slow totally because of newtonsoft.json. It would be more promising for us to migrate to .Net Json and see the result. It will definitely be better. If we see problems we can ask the .Net team, which continues to actively work on it. (Perhaps we need take new look on EarlyStartup.Init().) |
|
@iSazonov Thanks for raising your concerns. I run the benchmark on a 2017-year 4 logical core machine (so not a latest hardware 😄), and I still observe over 10% improvements in the startup.
|
2017? :-) How would you classify my 2011 laptop? :-) Note that it is significantly newer than Windows 7 which still runs PowerShell 7. :-) Seriously, my suggestion is to first make direct corrections, open cases in .Net if necessary (they already shared plans to speed up runtime initialization in 6.0 milestone), and only before next RC to evaluate how much we need to do additional threads. |
|
I agree that leveraging worker thread for parallel processing won't result in much difference in high-load system or low-power/low-rank equipment. However, that's not a reason to not use it, because it's obvious that many more users, who are using the suitable equipments, will benefit from it. Parallel processing is not the answer to everything (Amdahl's law), but it surely is a good tool as long as you use it cautiously -- be careful about the number of threads you spin up and be careful about what work loads to process in parallel, so as to avoid contention/blocking as much as possible. Also, it doesn't contradict with the work you have been doing or the improvement that .NET 6 will bring to us, does it? |
I don't think the OS version matters too much. It's more about how many logical cores the machine has, and of course, how much work load the machine is having at the moment. But can you please run the benchmark on your 2011 machine and share the results? I would like to see if there is any penalty from context switching on a low-power machine. Please make sure you build my branch with |
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1198 (1909/November2018Update/19H2)
Intel Core i5-2410M CPU 2.30GHz (Sandy Bridge), 1 CPU, 4 logical and 2 physical cores
.NET Core SDK=5.0.100
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT
DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT
It is stable result in some runs. |
|
Thanks for sharing the results!
Nah, this is actually not bad at all, as it shows no penalty (context switching and etc.) on your 2011 laptop. |
I say not about a penalty. From my experience Windows works well on 2 cores (including applications) and I guess we will see the same result for our test on more slow hardware. |
|
Don't think parallelization as a workaround, think it as a tool. There is no good or bad about a tool, but only right way or wrong way to use it. As I said above, the right way is "be careful about the number of threads you spin up and be careful about what work loads to process in parallel, so as to avoid contention/blocking as much as possible." An example as to the work load aspect:
Together, they make
This sounds like a pretty extragerated hypothetical example :) But, if we cannot find the right workload for parallelization, then that just means we should not use that tool. I want to emphsize again, parallelization doesn't contradict with the work that improves individual code paths. If it's handled correctly, they amplify the final result.
Yeah, I agree to not rush in this PR. As I mentioned in our discussion at #14281 (comment), more work and testing is needed to "fine-tune the decisions like how many tasks to use, and what work load shall be done in tasks." And decisions will change as more perf work on our side and .NET side get in. But I believe we cannot wait till the RC timeframe, as changes in this PR (or similar PRs) need sufficient bake time. |
I mean last Preview before RC is good time to check, investigate and fix performance issues. |
|
Today my test show regression up to 20% because ... slow DNS request from ApplicationInsights! I saw this previously. So any network issue delays PowerShell startup due to ApplicationInsights - good argument to move it in a different thread. |
|
@iSazonov Are you sure it's due to ApplicationInsights? Did you try disabling the telemetry to prove that's the cause? As we discussed in #14281 (comment), ApplicationInsight doesn't send telemetry packets synchronously. What you saw looks to me more likely a variant of #10983 -- DNS issue may cause the certificate security check to block. |
I saw DnsResolver in ApplicationInsights constructor. |
|
I'm pretty sure ApplicationInsight sends telemetries in an asynchronous way. We are using the default configuraiton with
When using in-memory channel, the But, interestingly, this also indicates that today for the |
|
Sources https://github.com/microsoft/ApplicationInsights-dotnet :-)
|
|
Open PRs should not be assigned to milestone, so they are not assigned to the wrong milestone after they are merged. For backport consideration, use a |
|
This pull request has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 15 days. It will be closed if no further activity occurs within 10 days of this comment. |
|
This pull request has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 15 days. It will be closed if no further activity occurs within 10 days of this comment. |
PR Summary
Comparing with v7.1.0, the changes in this PR improves the startup of a stable version PowerShell by about 19% on a 8 logical core machine, by about 10% on a 4 logical core machine.
Stable version of PowerShell means: experimental features are disabled in the build by default, just like a GA version.
Major changes:
InitialSessionState.CreateDefault2()to a worker thread.GetConfigurationNameFromGroupPolicy()to a worker thread.The method
Utils.GetPolicySetting<T>(...)is pretty expensive at startup time, including loadingsystem.collections.concurrent.dll,system.io.filesytem.dll,newtonsoft.json.dll, the overhead of usingConcurrentDictionary, and the overhead of reading settings from the configuration file.SendPSCoreStartupTelemetryto a worker thread.This is also an expensive call at startup time, including loading
microsoft.applicationinsights.dll, intializing the telemetry client and actually write out telemetry.By pushing them to worker threads, the assembly loading and type initalization are all taken place on the worker thread, so as to have the main thread move forward until it has to block for any of the tasks to finish.
Implementation-wise,
ConsoleHost.DefaultInitialSessionStateis changed to make its getter able to either return a previously setInitialSessionStatevalue, or waiting for a task result and return the result.CommandLineParameterParser.ConfigurationNameis changed to have its getter wait for the 'GetConfigurationNameFromGroupPolicy' task to finish.CreateRunspace(object runspaceCreationArgs)andRunspaceCreationEventArgs. They add an intermediate layer of method call, which is not needed at all.Startup time comparison
The following results are 3 continuous run of the benchmarck program, which compares v7.1.0, v7.2-wip (this pr), and Windows PS 5.1.
Notes:
Start-PSBuild -Clean -ReleaseTag v7.2.0 -CrossGen -Configuration Releaseto diable all experimental features (-ReleaseTag v7.2.0means building for a stable version).Benchmark Run
Benchmark program
PR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerWIP:or[ WIP ]to the beginning of the title (theWIPbot will keep its status check atPendingwhile the prefix is present) and remove the prefix when the PR is ready.