-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Implement ForEach-Object -Parallel runspace reuse
#12122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ForEach-Object -Parallel runspace reuse
#12122
Conversation
TravisEz13
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing significant
TravisEz13
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing significant
|
I see a comment in code that ResetRunspaceState() resets a debugger state. Is it ok? |
|
@iSazonov I am not aware of any issues. But let me know if there is something specific you are concerned about. |
|
@PaulHigin I am thinking about a scenario in which an user set a breakpoint - will the breakpoint works well if we reset and reuse runspaces? |
|
That would be advanced debugging indeed. We currently don't support setting overall foreach parallel session state (although that is something we can consider for the future), so like any other state, breakpoints would need to be set for each loop iteration. In this case we definitely want to reset the runspace breakpoint state otherwise all instances will be stopped in the debugger. |
|
Adding committee review tag, to consider whether this change can be taken. |
|
I think we can accept and document the debugger behavior (maybe add warning in code) - if user want debug the user can remove For reference PowerShell/src/System.Management.Automation/engine/InitialSessionState.cs Lines 3295 to 3296 in 320656c
From the comment in the code we need to review "the event, transaction and debug managers". |
|
@PowerShell/powershell-committee reviewed this, we believe that the majority of use cases are not impacted by state leaking by not using a new runspace and the benefits for perf and memory usage outweigh the risks. We should optionally add a |
|
We need to document the new behavior and this "Reset the event, transaction and debug managers". |
|
@iSazonov There should be no new behavior. But if there is a breaking change then we can address it at that time. |
|
|
||
| // Dispose all active runspaces | ||
| DisposeRunspaces(); | ||
| _stopping = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is setting to false still needed after stopping everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not. I just included it because it felt complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I'm asking is because when looking at the following code in HandleTaskStateChanged, I was wondering maybe this code is possible to run even after StopAll(). It might be better to remove this line, but it's up to you :)
if (!_stopping)
{
// StopAll disposes tasks.
task.Dispose();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stop is done synchronously, so I am more worried about tasks that complete before the stop command that might not get disposed. I have done quite a bit of adhoc testing with abrupt stops so I am inclined to will leave things as-is unless there is a compelling reason to change :). But I really appreciate your input!
| task.Dispose(); | ||
| } | ||
| tasksToStop = new PSTaskBase[_taskPool.Values.Count]; | ||
| _taskPool.Values.CopyTo(tasksToStop, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why this copy is needed. It's a reference copy, right? So both tasksToStop and _taskPool will pointing to the same set of PSTaskBase objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but _taskPool dictionary is modified when a task is disposed. Also the task must be disposed outside the _syncObject lock to avoid a deadlock. To solve these two problems I create a temporary local copy and use that to dispose the final list of runspaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. So is there a possibility that new task get added to _taskPool after the copy is done? In that case, the new task will not be disposed, but I guess it will be handled elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like to use the word 'impossible' with multi-threading, but it should not be possible to add anymore tasks since the pool is closed and adding tasks is done on a single thread.
| } | ||
| } | ||
|
|
||
| RemoveActiveRunspace(runspace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to try dequeue again in case the runspace is broken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it is better to replace a broken runspace. The task pool will ensure maximum number of threads/runspaces are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- When the pipeline objects are not coming in very fast and the parallel script block takes a small time to finish, it's likely there is no need to replace the broken runspace -- rest of the available runspaces can handle all incoming tasks.
- When tasks are coming in fast, the broken runspace will be replaced when all available runspace are used up.
So it's possible that dequeuing again until the queue is empty can have better performance. But I guess the chance of having a broken runspace in the queue is low, so the difference won't be noticeable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. But this is a tail off scenario. Another strategy is steady state, maintaining maximum pool tasks. But I agree that a broken runspace is unlikely (except for a pathological script which would probably crash the process anyway). At this point I would like to see actual perf scenarios (such as the great work precipitating this change) before making more changes. I have been thinking about pooling threads as well, but there are problems with that and it is not clear what perf improvement would result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for a few comments and questions, nothing blocking though.
ForEach-Object -Parallel runspace reuse
|
🎉 Handy links: |
PR Summary
This PR addresses Issue #11478, by implementing runspace reuse.
PR Context
The current ForEach-Object -Parallel implementation creates a new runspace for each loop iteration to ensure maximum isolation. However, this can be a significant performance and resource hit, especially for loops that do a small amount of work compared to creating runspaces. Or if there are a lot of iterations where each performs significant work, memory usage climbs drastically even though runspace resources are released timely. The dotNet GC allows very large memory usage before collecting available resources. Manually calling GC.Collect() helps significantly, bringing memory usage way down, but is onerous to implement in script.
The easiest and cleanest solution is to reuse runspaces in a pool. However, there may be side effects of state being leaked between iterations, although runspace.ResetRunspaceState() is used.
The fix is to add a runspace pool from which runspace objects are drawn and returned.
Also added -UseNewRunspace switch to ForEach-Object -Parallel parameter set, to allow option to create new runspace per iteration, per review feedback from committee.
PR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerWIP:or[ WIP ]to the beginning of the title (theWIPbot will keep its status check atPendingwhile the prefix is present) and remove the prefix when the PR is ready.