Skip to content

Conversation

@daxian-dbw
Copy link
Member

@daxian-dbw daxian-dbw commented Jan 28, 2020

PR Summary

Fix #11659
Use dedicated threads to read the redirected output and error streams from the child process for out-of-proc jobs.

PR Context

Currently, when Start-Job starts a child PowerShell process, it depends on the Process events OutputDataReceived and ErrorDataReceived to handle messages from the output and error streams.
Internally, the Process implementation uses 2 thread-pool threads to read messages from those 2 streams. Even though the implementation calls stream.ReadAsync on those 2 streams, the actual read operation is synchronous because the underlying Win32 API CreatePipe() only allows the pipe handle to be created in "synchronous" mode.

This results in 2 thread-pool threads being held up indefinitely for each out-of-proc job created. When more out-of-proc jobs are created at the same time, it quickly depletes the thread pool, and causing thread starvation for other tasks that depend on thread-pool threads until the thread pool manager realize it's time to increase the pool.

Thread-pool threads should be used for transient tasks that finish quickly, not for long-running tasks. In this case, reading from pipe in a blocking manner is obviously a long-running tasks and should be done with dedicated threads.

PR Checklist

@ghost ghost assigned iSazonov Jan 28, 2020
@daxian-dbw daxian-dbw requested a review from PaulHigin January 28, 2020 23:11
@ghost ghost added Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept and removed Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept labels Jan 28, 2020
@iSazonov
Copy link
Collaborator

  • Is ForEach -Parallel affected in a way?
  • In original issue many jobs is created. Make sense to have dedicated thread pool for jobs?

@iSazonov iSazonov added the CL-CodeCleanup Indicates that a PR should be marked as a Code Cleanup change in the Change Log label Jan 29, 2020
@iSazonov iSazonov added this to the 7.1.0-preview.1 milestone Jan 29, 2020
@daxian-dbw
Copy link
Member Author

Is ForEach -Parallel affected in a way?

It creates its own thread, so it's not affected.

In original issue many jobs is created. Make sense to have dedicated thread pool for jobs?

I don't think that's needed. Thread pool threads should only for short living tasks. When it comes to long-living tasks, even a separate dedicated thread pool will have the same starvation problem.

@PaulHigin
Copy link
Contributor

Dedicated thread pool for thread jobs is not needed because both ThreadJob and ForEach-Object -Parallel have throttling and queuing, which is basically performs the same function as a thread pool.

@ghost ghost added Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept and removed Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept labels Jan 29, 2020
Copy link
Contributor

@PaulHigin PaulHigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@daxian-dbw
Copy link
Member Author

@PoshChan Please retry mac

@PoshChan
Copy link
Collaborator

@daxian-dbw, I do not understand the build target(s) mac; I only allow static, windows, macos, linux, all

@daxian-dbw
Copy link
Member Author

@PoshChan Please retry macos

@PoshChan
Copy link
Collaborator

@daxian-dbw, successfully started retry of PowerShell-CI-macOS

@daxian-dbw
Copy link
Member Author

Turns out the failure in Windows build is a real issue that needs to be looked into. I will investigate and report back.

@iSazonov
Copy link
Collaborator

@daxian-dbw Could you please address CI Windows fail?

@PaulHigin
Copy link
Contributor

@daxian-dbw
I think the issue may be in handling expected errors that reflect various ways listener loop can end. You can see example code for the NamedPipe and SSH transports.

@daxian-dbw
Copy link
Member Author

Thanks @PaulHigin for the pointers!
I was busy with some other work with higher priority. Will get back to this soon.

@daxian-dbw daxian-dbw closed this Mar 26, 2020
@daxian-dbw daxian-dbw reopened this Mar 26, 2020
@daxian-dbw
Copy link
Member Author

daxian-dbw commented Mar 27, 2020

All tests passed now 😄
It turns out I forgot to remove the following two lines from KillServerProcess():

_serverProcess.CancelOutputRead();
_serverProcess.CancelErrorRead();

They should be called only if we are actively using the OutputDataReceived and ErrorDataReceived events to asynchronously read the redirected output and error. Otherwise, they will throw exception which got swollen silently by KillServerProcess() (by design), and thus the line _serverProcess.Kill() will never be executed, which causes the child process started by Start-Job lingers around forever.

The issue results in 34 child pwsh processes not cleaned up after the jobs finish, which eat up the memory in the Windows CI environment. That's why we see the stack overflow and out-of-memory error thrown from CoreCLR (Failed to create CoreCLR, HRESULT: 0x8007000E).

@PaulHigin @iSazonov Can you please take another look at the changes? Thank you!

Copy link
Contributor

@PaulHigin PaulHigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test to ensure the child process is cleaned up on background job completion? Or whatever the scenario is that invokes KillServerProcess()?

@ghost ghost added the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Mar 27, 2020
@daxian-dbw
Copy link
Member Author

Sure, I will add a test to verify the server process is terminated after the job is gone.

@ghost ghost removed the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Mar 27, 2020
@daxian-dbw
Copy link
Member Author

@PaulHigin, tests are added to verify that the background process is cleaned up after job is gone. Please review it when you have time. Thanks!

@ghost ghost added Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept and removed Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept labels Mar 27, 2020
@daxian-dbw
Copy link
Member Author

@iSazonov I think this is now ready to merge. But please take another look before you do so as there are changes after your last review.

@iSazonov iSazonov added CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log and removed CL-CodeCleanup Indicates that a PR should be marked as a Code Cleanup change in the Change Log labels Mar 28, 2020
@iSazonov iSazonov merged commit 1ec7c82 into PowerShell:master Mar 28, 2020
@daxian-dbw daxian-dbw deleted the thread branch March 28, 2020 18:10
@ghost
Copy link

ghost commented Apr 23, 2020

🎉v7.1.0-preview.2 has been released which incorporates this pull request.:tada:

Handy links:

@ghost ghost mentioned this pull request Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Powershell jobs surprisingly slow...

4 participants