Skip to content

Conversation

@PaulHigin
Copy link
Contributor

@PaulHigin PaulHigin commented Mar 13, 2020

PR Summary

This PR addresses Issue #11478, by implementing runspace reuse.

PR Context

The current ForEach-Object -Parallel implementation creates a new runspace for each loop iteration to ensure maximum isolation. However, this can be a significant performance and resource hit, especially for loops that do a small amount of work compared to creating runspaces. Or if there are a lot of iterations where each performs significant work, memory usage climbs drastically even though runspace resources are released timely. The dotNet GC allows very large memory usage before collecting available resources. Manually calling GC.Collect() helps significantly, bringing memory usage way down, but is onerous to implement in script.

The easiest and cleanest solution is to reuse runspaces in a pool. However, there may be side effects of state being leaked between iterations, although runspace.ResetRunspaceState() is used.

The fix is to add a runspace pool from which runspace objects are drawn and returned.

Also added -UseNewRunspace switch to ForEach-Object -Parallel parameter set, to allow option to create new runspace per iteration, per review feedback from committee.

PR Checklist

@ghost ghost assigned TravisEz13 Mar 13, 2020
Copy link
Member

@TravisEz13 TravisEz13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing significant

Copy link
Member

@TravisEz13 TravisEz13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing significant

@iSazonov
Copy link
Collaborator

I see a comment in code that ResetRunspaceState() resets a debugger state. Is it ok?

@PaulHigin
Copy link
Contributor Author

@iSazonov I am not aware of any issues. But let me know if there is something specific you are concerned about.

@iSazonov
Copy link
Collaborator

@PaulHigin I am thinking about a scenario in which an user set a breakpoint - will the breakpoint works well if we reset and reuse runspaces?

@PaulHigin
Copy link
Contributor Author

That would be advanced debugging indeed. We currently don't support setting overall foreach parallel session state (although that is something we can consider for the future), so like any other state, breakpoints would need to be set for each loop iteration. In this case we definitely want to reset the runspace breakpoint state otherwise all instances will be stopped in the debugger.

@PaulHigin
Copy link
Contributor Author

Adding committee review tag, to consider whether this change can be taken.

@PaulHigin PaulHigin added the Review - Committee The PR/Issue needs a review from the PowerShell Committee label Mar 16, 2020
@iSazonov
Copy link
Collaborator

I think we can accept and document the debugger behavior (maybe add warning in code) - if user want debug the user can remove -Parallel parameter and debug.

For reference

// Reset the event, transaction and debug managers.
context.ResetManagers();

From the comment in the code we need to review "the event, transaction and debug managers".

@SteveL-MSFT
Copy link
Member

@PowerShell/powershell-committee reviewed this, we believe that the majority of use cases are not impacted by state leaking by not using a new runspace and the benefits for perf and memory usage outweigh the risks. We should optionally add a -UseNewRunspace type switch, but would not block this PR for that. We would consider changing the default behavior if feedback indicates such a decision is warranted.

@SteveL-MSFT SteveL-MSFT added Committee-Reviewed PS-Committee has reviewed this and made a decision and removed Review - Committee The PR/Issue needs a review from the PowerShell Committee labels Mar 18, 2020
@iSazonov
Copy link
Collaborator

iSazonov commented Mar 19, 2020

We need to document the new behavior and this "Reset the event, transaction and debug managers".

@iSazonov iSazonov added the Documentation Needed in this repo Documentation is needed in this repo label Mar 19, 2020
@iSazonov iSazonov added this to the 7.1.0-preview.1 milestone Mar 19, 2020
@PaulHigin PaulHigin removed the Documentation Needed in this repo Documentation is needed in this repo label Mar 19, 2020
@PaulHigin
Copy link
Contributor Author

@iSazonov There should be no new behavior. But if there is a breaking change then we can address it at that time.

@PaulHigin PaulHigin changed the title [WIP] Implement ForEach-Object -Parallel runspace reuse Implement ForEach-Object -Parallel runspace reuse Mar 19, 2020

// Dispose all active runspaces
DisposeRunspaces();
_stopping = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is setting to false still needed after stopping everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. I just included it because it felt complete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I'm asking is because when looking at the following code in HandleTaskStateChanged, I was wondering maybe this code is possible to run even after StopAll(). It might be better to remove this line, but it's up to you :)

                    if (!_stopping)
                    {
                        // StopAll disposes tasks.
                        task.Dispose();
                    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stop is done synchronously, so I am more worried about tasks that complete before the stop command that might not get disposed. I have done quite a bit of adhoc testing with abrupt stops so I am inclined to will leave things as-is unless there is a compelling reason to change :). But I really appreciate your input!

task.Dispose();
}
tasksToStop = new PSTaskBase[_taskPool.Values.Count];
_taskPool.Values.CopyTo(tasksToStop, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this copy is needed. It's a reference copy, right? So both tasksToStop and _taskPool will pointing to the same set of PSTaskBase objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but _taskPool dictionary is modified when a task is disposed. Also the task must be disposed outside the _syncObject lock to avoid a deadlock. To solve these two problems I create a temporary local copy and use that to dispose the final list of runspaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. So is there a possibility that new task get added to _taskPool after the copy is done? In that case, the new task will not be disposed, but I guess it will be handled elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like to use the word 'impossible' with multi-threading, but it should not be possible to add anymore tasks since the pool is closed and adding tasks is done on a single thread.

}
}

RemoveActiveRunspace(runspace);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to try dequeue again in case the runspace is broken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it is better to replace a broken runspace. The task pool will ensure maximum number of threads/runspaces are used.

Copy link
Member

@daxian-dbw daxian-dbw Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When the pipeline objects are not coming in very fast and the parallel script block takes a small time to finish, it's likely there is no need to replace the broken runspace -- rest of the available runspaces can handle all incoming tasks.
  • When tasks are coming in fast, the broken runspace will be replaced when all available runspace are used up.

So it's possible that dequeuing again until the queue is empty can have better performance. But I guess the chance of having a broken runspace in the queue is low, so the difference won't be noticeable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. But this is a tail off scenario. Another strategy is steady state, maintaining maximum pool tasks. But I agree that a broken runspace is unlikely (except for a pathological script which would probably crash the process anyway). At this point I would like to see actual perf scenarios (such as the great work precipitating this change) before making more changes. I have been thinking about pooling threads as well, but there are problems with that and it is not clear what perf improvement would result.

Copy link
Member

@daxian-dbw daxian-dbw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for a few comments and questions, nothing blocking though.

@TravisEz13 TravisEz13 changed the title Implement ForEach-Object -Parallel runspace reuse Implement ForEach-Object -Parallel runspace reuse Mar 23, 2020
@TravisEz13 TravisEz13 merged commit cce214e into PowerShell:master Mar 23, 2020
@PaulHigin PaulHigin deleted the foreach-parallel-rsreuse branch March 23, 2020 21:04
@TravisEz13 TravisEz13 added the CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log label Apr 22, 2020
@ghost
Copy link

ghost commented Apr 23, 2020

🎉v7.1.0-preview.2 has been released which incorporates this pull request.:tada:

Handy links:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log Committee-Reviewed PS-Committee has reviewed this and made a decision

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants