Implement `ForEach-Object -Parallel` runspace reuse #12122

PaulHigin · 2020-03-13T18:19:02Z

PR Summary

This PR addresses Issue #11478, by implementing runspace reuse.

PR Context

The current ForEach-Object -Parallel implementation creates a new runspace for each loop iteration to ensure maximum isolation. However, this can be a significant performance and resource hit, especially for loops that do a small amount of work compared to creating runspaces. Or if there are a lot of iterations where each performs significant work, memory usage climbs drastically even though runspace resources are released timely. The dotNet GC allows very large memory usage before collecting available resources. Manually calling GC.Collect() helps significantly, bringing memory usage way down, but is onerous to implement in script.

The easiest and cleanest solution is to reuse runspaces in a pool. However, there may be side effects of state being leaked between iterations, although runspace.ResetRunspaceState() is used.

The fix is to add a runspace pool from which runspace objects are drawn and returned.

Also added -UseNewRunspace switch to ForEach-Object -Parallel parameter set, to allow option to create new runspace per iteration, per review feedback from committee.

PR Checklist

src/System.Management.Automation/engine/hostifaces/PSTask.cs

TravisEz13

Nothing significant

TravisEz13

Nothing significant

src/System.Management.Automation/engine/InternalCommands.cs

src/System.Management.Automation/engine/hostifaces/PSTask.cs

src/System.Management.Automation/engine/InternalCommands.cs

src/System.Management.Automation/engine/hostifaces/PSTask.cs

iSazonov · 2020-03-16T17:14:21Z

I see a comment in code that ResetRunspaceState() resets a debugger state. Is it ok?

PaulHigin · 2020-03-16T17:44:00Z

@iSazonov I am not aware of any issues. But let me know if there is something specific you are concerned about.

iSazonov · 2020-03-16T17:48:17Z

@PaulHigin I am thinking about a scenario in which an user set a breakpoint - will the breakpoint works well if we reset and reuse runspaces?

PaulHigin · 2020-03-16T18:10:52Z

That would be advanced debugging indeed. We currently don't support setting overall foreach parallel session state (although that is something we can consider for the future), so like any other state, breakpoints would need to be set for each loop iteration. In this case we definitely want to reset the runspace breakpoint state otherwise all instances will be stopped in the debugger.

PaulHigin · 2020-03-16T18:23:23Z

Adding committee review tag, to consider whether this change can be taken.

iSazonov · 2020-03-17T03:16:52Z

I think we can accept and document the debugger behavior (maybe add warning in code) - if user want debug the user can remove -Parallel parameter and debug.

For reference

PowerShell/src/System.Management.Automation/engine/InitialSessionState.cs

Lines 3295 to 3296 in 320656c

    
           // Reset the event, transaction and debug managers. 
        
           context.ResetManagers();

From the comment in the code we need to review "the event, transaction and debug managers".

SteveL-MSFT · 2020-03-18T22:34:41Z

@PowerShell/powershell-committee reviewed this, we believe that the majority of use cases are not impacted by state leaking by not using a new runspace and the benefits for perf and memory usage outweigh the risks. We should optionally add a -UseNewRunspace type switch, but would not block this PR for that. We would consider changing the default behavior if feedback indicates such a decision is warranted.

iSazonov · 2020-03-19T04:30:58Z

We need to document the new behavior and this "Reset the event, transaction and debug managers".

PaulHigin · 2020-03-19T17:30:41Z

@iSazonov There should be no new behavior. But if there is a breaking change then we can address it at that time.

daxian-dbw · 2020-03-19T23:00:16Z

src/System.Management.Automation/engine/hostifaces/PSTask.cs

+
+            // Dispose all active runspaces
+            DisposeRunspaces();
+            _stopping = false;


Is setting to false still needed after stopping everything?

Probably not. I just included it because it felt complete.

The reason I'm asking is because when looking at the following code in HandleTaskStateChanged, I was wondering maybe this code is possible to run even after StopAll(). It might be better to remove this line, but it's up to you :)

if (!_stopping) { // StopAll disposes tasks. task.Dispose(); }

The stop is done synchronously, so I am more worried about tasks that complete before the stop command that might not get disposed. I have done quite a bit of adhoc testing with abrupt stops so I am inclined to will leave things as-is unless there is a compelling reason to change :). But I really appreciate your input!

daxian-dbw · 2020-03-19T23:00:47Z

src/System.Management.Automation/engine/hostifaces/PSTask.cs

-                    task.Dispose();
-                }
+                tasksToStop = new PSTaskBase[_taskPool.Values.Count];
+                _taskPool.Values.CopyTo(tasksToStop, 0);


I'm curious why this copy is needed. It's a reference copy, right? So both tasksToStop and _taskPool will pointing to the same set of PSTaskBase objects.

Yes, but _taskPool dictionary is modified when a task is disposed. Also the task must be disposed outside the _syncObject lock to avoid a deadlock. To solve these two problems I create a temporary local copy and use that to dispose the final list of runspaces.

Got it. So is there a possibility that new task get added to _taskPool after the copy is done? In that case, the new task will not be disposed, but I guess it will be handled elsewhere?

I don't like to use the word 'impossible' with multi-threading, but it should not be possible to add anymore tasks since the pool is closed and adding tasks is done on a single thread.

daxian-dbw · 2020-03-19T23:04:04Z

src/System.Management.Automation/engine/hostifaces/PSTask.cs

+                    }
+                }
+
+                RemoveActiveRunspace(runspace);


Do we want to try dequeue again in case the runspace is broken?

I feel it is better to replace a broken runspace. The task pool will ensure maximum number of threads/runspaces are used.

When the pipeline objects are not coming in very fast and the parallel script block takes a small time to finish, it's likely there is no need to replace the broken runspace -- rest of the available runspaces can handle all incoming tasks.

When tasks are coming in fast, the broken runspace will be replaced when all available runspace are used up.

So it's possible that dequeuing again until the queue is empty can have better performance. But I guess the chance of having a broken runspace in the queue is low, so the difference won't be noticeable.

I understand. But this is a tail off scenario. Another strategy is steady state, maintaining maximum pool tasks. But I agree that a broken runspace is unlikely (except for a pathological script which would probably crash the process anyway). At this point I would like to see actual perf scenarios (such as the great work precipitating this change) before making more changes. I have been thinking about pooling threads as well, but there are problems with that and it is not clear what perf improvement would result.

daxian-dbw

LGTM except for a few comments and questions, nothing blocking though.

ghost · 2020-04-23T18:03:26Z

🎉v7.1.0-preview.2 has been released which incorporates this pull request.:tada:

Handy links:

Release Notes

PaulHigin added 2 commits March 12, 2020 16:48

Implement foreach parallel runspace reuse

19daa56

Change runspace dispose

0b30889

ghost assigned TravisEz13 Mar 13, 2020

PaulHigin requested review from daxian-dbw and iSazonov March 13, 2020 18:19

TravisEz13 reviewed Mar 13, 2020

View reviewed changes

src/System.Management.Automation/engine/hostifaces/PSTask.cs Outdated Show resolved Hide resolved

TravisEz13 reviewed Mar 13, 2020

View reviewed changes

src/System.Management.Automation/engine/hostifaces/PSTask.cs Show resolved Hide resolved

TravisEz13 reviewed Mar 13, 2020

View reviewed changes

PaulHigin mentioned this pull request Mar 13, 2020

Possible Memory Leak in Foreach -Parallel, version 7.rc1 #11478

Closed

Refactor runspace reset check

0267869

PoshChan reviewed Mar 13, 2020

View reviewed changes

src/System.Management.Automation/engine/InternalCommands.cs Show resolved Hide resolved

PoshChan reviewed Mar 13, 2020

View reviewed changes

src/System.Management.Automation/engine/InternalCommands.cs Show resolved Hide resolved

iSazonov reviewed Mar 13, 2020

View reviewed changes

src/System.Management.Automation/engine/hostifaces/PSTask.cs Show resolved Hide resolved

src/System.Management.Automation/engine/hostifaces/PSTask.cs Show resolved Hide resolved

src/System.Management.Automation/engine/hostifaces/PSTask.cs Show resolved Hide resolved

PaulHigin added 2 commits March 13, 2020 15:05

Fix race condition.

87e2050

Fix CodFactor issues

bb14550

PoshChan reviewed Mar 13, 2020

View reviewed changes

src/System.Management.Automation/engine/InternalCommands.cs Show resolved Hide resolved

iSazonov reviewed Mar 14, 2020

View reviewed changes

src/System.Management.Automation/engine/InternalCommands.cs Outdated Show resolved Hide resolved

src/System.Management.Automation/engine/hostifaces/PSTask.cs Outdated Show resolved Hide resolved

PaulHigin added the Review - Committee The PR/Issue needs a review from the PowerShell Committee label Mar 16, 2020

SteveL-MSFT added Committee-Reviewed PS-Committee has reviewed this and made a decision and removed Review - Committee The PR/Issue needs a review from the PowerShell Committee labels Mar 18, 2020

iSazonov added the Documentation Needed in this repo Documentation is needed in this repo label Mar 19, 2020

iSazonov added this to the 7.1.0-preview.1 milestone Mar 19, 2020

PaulHigin removed the Documentation Needed in this repo Documentation is needed in this repo label Mar 19, 2020

Implement -UseNewRunspace parameter switch, add tests

384314a

PaulHigin changed the title ~~[WIP] Implement ForEach-Object -Parallel runspace reuse~~ Implement ForEach-Object -Parallel runspace reuse Mar 19, 2020

PaulHigin requested review from PoshChan, TravisEz13 and iSazonov March 19, 2020 21:56

Fix Codacy error

7083a7c

daxian-dbw reviewed Mar 19, 2020

View reviewed changes

PaulHigin mentioned this pull request Mar 20, 2020

Add documentation for new ForEach-Object -Parallel '-UseNewRunspace' parameter switch MicrosoftDocs/PowerShell-Docs#5609

Closed

daxian-dbw approved these changes Mar 20, 2020

View reviewed changes

iSazonov approved these changes Mar 20, 2020

View reviewed changes

TravisEz13 changed the title ~~Implement ForEach-Object -Parallel runspace reuse~~ Implement ForEach-Object -Parallel runspace reuse Mar 23, 2020

TravisEz13 merged commit cce214e into PowerShell:master Mar 23, 2020

PaulHigin deleted the foreach-parallel-rsreuse branch March 23, 2020 21:04

PaulHigin mentioned this pull request Apr 6, 2020

foreach-object Memory usage #12263

Closed

PaulHigin mentioned this pull request Apr 14, 2020

Accessing global functions from inside ForEach-Object -Parallel #12313

Closed

TravisEz13 added the CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log label Apr 22, 2020

Implement ForEach-Object -Parallel runspace reuse #12122

Implement ForEach-Object -Parallel runspace reuse #12122

Uh oh!

Conversation

PaulHigin commented Mar 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

PR Context

PR Checklist

Uh oh!

Uh oh!

Uh oh!

TravisEz13 left a comment

Choose a reason for hiding this comment

Uh oh!

TravisEz13 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iSazonov commented Mar 16, 2020

Uh oh!

PaulHigin commented Mar 16, 2020

Uh oh!

iSazonov commented Mar 16, 2020

Uh oh!

PaulHigin commented Mar 16, 2020

Uh oh!

PaulHigin commented Mar 16, 2020

Uh oh!

iSazonov commented Mar 17, 2020

Uh oh!

SteveL-MSFT commented Mar 18, 2020

Uh oh!

iSazonov commented Mar 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PaulHigin commented Mar 19, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daxian-dbw Mar 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daxian-dbw left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost commented Apr 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Implement `ForEach-Object -Parallel` runspace reuse #12122

Implement `ForEach-Object -Parallel` runspace reuse #12122

PaulHigin commented Mar 13, 2020 •

edited

Loading

iSazonov commented Mar 19, 2020 •

edited

Loading

daxian-dbw Mar 20, 2020 •

edited

Loading

daxian-dbw left a comment •

edited

Loading