Skip to content

Conversation

@daxian-dbw
Copy link
Member

@daxian-dbw daxian-dbw commented Oct 1, 2018

PR Summary

Fix BeginInvoke()/EndInvoke(IAsyncResult) to return results when Stop() or BeginStop(callback, state)/EndStop(IAsyncResult) was called previously.

Without this fix, EndInvoke(IAsyncResult) returns null when it should return a collection of results:

PS:1> $ps = [powershell]::Create()
PS:2> $ps.AddScript("'Hello'; Sleep 2; 'World'") > $null
PS:3> $a = $ps.BeginInvoke(); $ps.Stop()
PS:4> $a = $ps.BeginInvoke(); $m = $ps.EndInvoke($a)
PS:5> $m -eq $null
True

The root cause is that OutputBuffer is not set to null in Stop and BeginStop/EndStop when the powershell instance owns the OutputBuffer object. Since the pipeline has been intentionally stopped by the caller, the OutputBuffer should be set to null as well when it's owned by the PowerShell instance.

PR Checklist

@daxian-dbw daxian-dbw requested a review from BrucePay October 1, 2018 23:30
@daxian-dbw daxian-dbw self-assigned this Oct 1, 2018
Describe "EndInvoke() should return a collection of results" -Tags "CI" {
BeforeAll {
$ps = [powershell]::Create()
$ps.AddScript("'Hello'; Sleep 1; 'World'") > $null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sleep 1

We should use Wait-UntilTrue and use flags to sync threads.

Copy link

@anmenaga anmenaga Oct 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daxian-dbw I also don't like sleep timeouts-based synchronization in tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for the comments. The Sleep 1 is not really necessary, so I removed it.

Copy link
Collaborator

@rjmholt rjmholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but have left a comment as well

}

// PowerShell no longer owns the output buffer when it is passed back to the caller.
ResetOutputBufferAsNeeded();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have this repeated call in a finally block? Or is there a scenario where an error occurs and we don't want to do it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question and I'm not really sure if we should put it in a finally block. Don't know about other exceptions, but if the NewArgumentException is thrown from 4 lines above, maybe we don't want to reset the output? I'm inclined to keep it as is given the original code is not in a finally block and we haven't yet heard any issues regarding that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okie doke -- thought I'd comment since we call it at the end of 3/4 branches of the code.

}

EndInvokeAsyncResult = psAsyncResult;
psAsyncResult.EndInvoke();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario seems to be completely broken, however I don't think this is the right solution. My understanding is that the pattern is that EndInvoke is always called after BeginInvoke, and the output buffer is not released until EndInvoke is called. However, this line will throw a "pipeline stopped" exception, and the output buffer is never released.

psAsyncResult.EndInvoke();              // Stop throws System.Management.Automation.PipelineStoppedException

I think we should catch the pipeline stopped exception and ensure EndInvoke() always releases the output buffer.

try
{
    psAsyncResult.EndInvoke();
}
catch (System.Management.Automation.PipelineStoppedException) { }
finally
{
    if (OutputBuffer) { ... }
}

...
return psAsyncResult.Output

Then the pattern is:

$ps = [powershell]::Create()
$ps.AddScript('1..10 | % { sleep 1; "Output $_" }')
$async = $ps.BeginInvoke()
sleep 3
$ps.Stop()
$output = $ps.EndInvoke($async)
$async = $ps.BeginInvoke()
...

Copy link
Member Author

@daxian-dbw daxian-dbw Oct 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to @PaulHigin offline and the concern is more like "if an terminating exception is thrown from $ps.EndInvoke, there should be a way to get the partial output". However, I found that's already the case now:

$ps.Commands.Clear()
$ps.AddScript("'Hello World'; throw blah") > $null
$a=$ps.BeginInvoke()
$b = $ps.EndInvoke($a)
$b
> Hello World
$ps.AddScript("'Hello World'; [System.IO.FileInfo]::new(`$null)") > $null
$a = $ps.BeginInvoke()
$b = $ps.EndInvoke($a)
$b
> Hello World

As for the PipelineStoppedException thrown from EndInvoke due to $ps.Stop() being called previously, the cleanup is not needed as it's already done in Stop(), and we have to do the cleanup in Stop because with the current behavior, a developer will never call $ps.EndInvoke after calling $ps.Stop because that will raise an exception.

If we really want the user be able to get the partial results when $ps.Stop() is called, then the partial results probably should be returned somehow from the Stop method. The discussion of this would be a separate issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want the user be able to get the partial results when $ps.Stop() is called,

I think it could be in EndInvoke() as option (and perhaps timeout option too).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daxian-dbw You have an error in your example script.

PS C:\> $ps = [powershell]::Create()
PS C:\> $ps.AddScript('"Hello!"; throw "blah"') > $null
PS C:\> $a = $ps.BeginInvoke()
PS C:\> $b = $ps.EndInvoke($a)
Exception calling "EndInvoke" with "1" argument(s): "blah"
At line:1 char:1
+ $b = $ps.EndInvoke($a)
+ ~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : RuntimeException

PS C:\> $b
PS C:\>

I feel this is fundamentally broken because if a terminating error occurs (including stop) then any data generated up to that point is lost ... a major sin in my book.

I feel that updating PowerShell.Stop() to throw away the output buffer only compounds this problem, while breaking the existing contract of BeginInvoke/EndInvoke pattern.

Some ideas to remedy:

  1. Catch pipeline stop exception and insert into Error stream as non-terminating error. But this doesn't address a general terminating exception.
  2. Catch all terminating exceptions (in EndInvoke) and insert into Error stream, marking the error as terminating in some way.
  3. Create a new "EndInvoke" exception class that is thrown, which wraps the actual terminating error and also provides the output buffer as a parameter.

Copy link
Member Author

@daxian-dbw daxian-dbw Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have an error in your example script.

Good catch. Yes, I see the problem with throw now. It's weird though [System.IO.FileInfo]::new($null) works as you expected in my second example, which makes it so inconsistent. Please feel free to open an issue for your concern on this behavior (along with the ideas for the potential fix).

I feel that updating PowerShell.Stop() to throw away the output buffer only compounds this problem.

Base on the code below

OutputBuffer = new PSDataCollection<PSObject>();
OutputBufferOwner = true;
if (_isBatching || ExtraCommands.Count != 0)
{
return BeginBatchInvoke<T, PSObject>(input, OutputBuffer, settings, callback, state);
}
return CoreInvokeAsync<T, PSObject>(input, OutputBuffer, settings, callback, state, OutputBuffer);

it looks to me setting OutputBuffer = null does not throw away the output buffer. The PSDataCollection object referenced by OutputBuffer gets passed to CoreInvokeAsync via the parameters output and asyncResultOutput, then output is used by the Worker as the output stream and asyncResultOutput is assigned to PowerShellAsyncResult.Output, which will be used to return the results from EndInvoke.

So, even though OutputBuffer is set to null in Stop() and EndStop, it won't affect the output stream used by the worker, and theoretically the partial results are still available in psAsyncResult.Output when getting to EndInvoke.

while breaking the existing contract of BeginInvoke/EndInvoke pattern.

Having a pair of BeginXXX and EndXXX is the pattern only when you want the execution to complete, either to success or failure.

However, when Stop is called after BeginInvok, I don't think it's a must-have for the user to call EndInvoke(), because

  1. that's not something the user can do today and we won't be able to change their existing habit;
  2. there is a good chance that the user just doesn't care about the execution anymore after calling Stop;
  3. it would be a breaking change if we suddenly tell the user you have to call EndInvok again after Stop to clean up some state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that calling PowerShell.Stop() should not lose data, and the change you suggest does this (let me know if I am incorrect here). I admit that EndInvoke() is also broken and loses data but I feel we should fix this rather than compound the problem. I also respectfully disagree that Stop() should work the same as EndInvoke(). My reading of the code is that that was not the intention. Perhaps we should submit this to the committee to get their view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that calling PowerShell.Stop() should not lose data, and the change you suggest does this (let me know if I am incorrect here)

Offline @daxian-dbw Dongbo showed me that this is in fact incorrect (I didn't look at the code closely enough).
The issue of EndInvoke() losing data can be treated separately and Dongbo will create an issue to track it. I have no objection to this change.

Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#8005 is opened to track @PaulHigin's concern on the behavior of PowerShell.EndInvoke.

Copy link
Contributor

@PaulHigin PaulHigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But please create EndInvoke() losing data issue to track separately.

@daxian-dbw daxian-dbw merged commit 9b94f18 into PowerShell:master Oct 12, 2018
@daxian-dbw daxian-dbw deleted the endinvoke branch October 12, 2018 17:47
adityapatwardhan pushed a commit to adityapatwardhan/PowerShell that referenced this pull request Apr 9, 2019
…dStop was called previously (PowerShell#7917)

The root cause is that `OutputBuffer` is not cleaned up (set to `null`) in `Stop` and `BeginStop/EndStop` when the powershell instance owns the `OutputBuffer` object. Since the pipeline has been intentionally stopped by the caller, the `OutputBuffer` should be set to `null` as well when it's owned by the PowerShell instance, just like how it's cleaned up in `EndInvoke`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants