-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Fix web cmdlets hang after network connectivity lost #18005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| if (previousLength != output.Length) | ||
| { | ||
| // Reset cancelation timer only if we get an information from network on the interation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a dotNet bug, where cancelation timer is ineffective during stream copy task. Does it make sense to submit an issue report about this?
Also, the timeout time is now only for when there is no network connectivity. User expectation is that the timeout time is for the entire process including both connected and non-connected times. We would need to update the documentation to make clear what the -TimeOutSec parameter does.
| // Reset cancelation timer only if we get an information from network on the interation. | |
| // Reset cancelation timer while information continues to flow from network. | |
| // Cancelation timer applies only during no network connectivity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a dotNet bug, where cancelation timer is ineffective during stream copy task. Does it make sense to submit an issue report about this?
This works for whole CopyToAsync but it is not that we want since full download can take unpredictable time. We need only monitor every internal request to target service for hang - the CopyToAsync has internal cycle where ReadAsync() is used. It is just the ReadAsync hangs and we need reset the timer for every iteration in the cycle.
That you assumed it was a bug in the .Net Runtime is an interesting psychological experiment. :-) That was my assumption too, and I opened a discussion there a few days ago. Only the .Net team offers workarounds and doesn't agree that HttpClient.Timeout should apply to operations on streams it creates itself, even though it works with all its other operations.
You could share your expectations there too if you want.
We would need to update the documentation to make clear what the -TimeOutSec parameter does.
Both .Net and PowerShell docs should be updated for the Timeout parameter - that it applies to every request to target service and protects from too long (infinite) network operations.
…Cmdlet/StreamHelper.cs Co-authored-by: Paul Higinbotham <paulhi@microsoft.com>
PaulHigin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LTGM
Please create a document bug to clarify that the timeout parameter applies to no-network-connectivity rather than total operation time.
src/Microsoft.PowerShell.Commands.Utility/commands/utility/WebCmdlet/StreamHelper.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.PowerShell.Commands.Utility/commands/utility/WebCmdlet/StreamHelper.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.PowerShell.Commands.Utility/commands/utility/WebCmdlet/StreamHelper.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.PowerShell.Commands.Utility/commands/utility/WebCmdlet/StreamHelper.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.PowerShell.Commands.Utility/commands/utility/WebCmdlet/StreamHelper.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.PowerShell.Commands.Utility/commands/utility/WebCmdlet/StreamHelper.cs
Outdated
Show resolved
Hide resolved
…Cmdlet/StreamHelper.cs Co-authored-by: Dongbo Wang <dongbow@microsoft.com>
…Cmdlet/StreamHelper.cs Co-authored-by: Dongbo Wang <dongbow@microsoft.com>
…Cmdlet/StreamHelper.cs Co-authored-by: Dongbo Wang <dongbow@microsoft.com>
…Cmdlet/StreamHelper.cs Co-authored-by: Dongbo Wang <dongbow@microsoft.com>
|
|
||
| if (previousLength != output.Length) | ||
| { | ||
| // Reset cancelation timer while information continues to flow from network. | ||
| // Cancelation timer applies only during no network connectivity. | ||
| previousLength = output.Length; | ||
| cts.CancelAfter(timeout); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this is not needed.
We should timeout just in the timespan specified by -Timeout, but what you are doing here is to refresh the timer after every 1 second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case you say the Timeout must be longer than the expected time to fully load the file. This immediately indicates a lot of problems. For example, the time to fully load a file is unpredictable even if there are no problems, because this time is determined by many factors (client, server, network). In general, even the size of the file is unknown. Thus, neither the script developer nor the user can set a reasonable timeout value - it will be very long for the script to accurately execute. But such a protective timeout is already possible in Task Scheduler or CIs - it can be an hour or two.
If you look from HttpClient side then before uploading it performs sending request and receiving response, these two operations are performed with the same timeout, but these are instantaneous operations - here you need timeout of a minute or two, but not an hour or two.
If you look at it from the service side, any service closes the session when it loses contact with the client after a short time. With Kestrel, for example, this is 130 seconds by default. Usually it's faster (about a minute). This means that it doesn't make sense for the client to wait an hour or two. This also makes it clear that it makes sense to specify a timeout just for each request and response operation. That's exactly what this code does. We can explicitly specify in the documentation that if network connectivity is lost, cmdlet will crash after a given timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the doc for -TimeoutSec:
Specifies how long the request can be pending before it times out. Enter a value in seconds. The default value, 0, specifies an indefinite time-out.
To an end user, it doesn't matter it's because of sending/receiving http request or downloading a file, when specifying -TimtoutSec, what the user wants is for Invoke-WebRequest/RestMethod to timeout if the operation cannot finish in X seconds. So, we need to make sure that's the case (maybe not accurately X seconds, but closely around that time).
I prefer to use the existing -TimeoutSec. However, even if we have a separate parameter just for downloading timeout, it's semantics should be "timeout if the downloading doesn't finish in X seconds". An end user doesn't need to know it didn't finish because of bad network speed, or because the network disconnected and then reconnected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Timeout did never work for download scenario in all PowerShell versions.. So we can not say "the user want" for the download operation. In IO operations timeout is always protection from pending (which is just in the documentation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JamesWTruher and @SeeminglyScience, can you please share your thoughts on this (summary captured in #18005 (comment))?
Mark the PR to be reviewed by cmdlet WG.
| cts.CancelAfter(timeout); | ||
| Task copyTask = input.CopyToAsync(output, cts.Token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| cts.CancelAfter(timeout); | |
| Task copyTask = input.CopyToAsync(output, cts.Token); | |
| Task copyTask = input.CopyToAsync(output, cts.Token); | |
| cts.CancelAfter(timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think it is important?
I follow HttpClient implementation. It activates timeout before starting an operation.
https://source.dot.net/#System.Net.Http/System/Net/Http/HttpClient.cs,185
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that important, and you can choose to not accept this suggestion.
But in the HttpClient.cs implementation, it's using await and thus you cannot start the timer after calling await base.SendAsync. So, they have to start the timer before calling SendAsync there.
| } | ||
| } | ||
| catch (OperationCanceledException) | ||
| catch (OperationCanceledException exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this exception will be thrown when the cancellation happens?
In my script test, I get an AggregateException that wraps a TaskCanceledException instead.
$s = [System.Threading.CancellationTokenSource]::new()
try {
$t = [System.Threading.Tasks.Task]::Delay(100000, $s.Token); ## delay for 100 seconds
$s.CancelAfter(2000); ## cancel after 2 seconds
$t.Wait()
} catch {
$e = $_
}
PS:32> $e.Exception.InnerException.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True AggregateException System.Exception
PS:33> $e.Exception.InnerException.InnerExceptions[0].GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True TaskCanceledException System.OperationCanceledException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we could fall in race condition if timeout was raised while we are in the loop body. So I replaced Wait with another overload which checks cancelation token first. https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs,2670
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any problem with timeout being raised while in the loop body. Wait(millisecondsTimeout, cancellationToken) will call cancellationToken.ThrowIfCancellationRequested within InternalWait, after a quick SpinWait.
| catch (OperationCanceledException) | ||
| catch (OperationCanceledException exc) | ||
| { | ||
| // We cannot distinct 'cancel` (user press Ctrl-C) and 'timeout' so we report the exception as-is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we can differentiate them. Just check if cancellationToken is cancelled. If Ctrl+c is pressed, that token will be cancelled, otherwise, it's the timeout. So, we should write out error only if the cancellation is caused by the timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, we can (cancellationToken vs cts). Fixed. Also renamed cts to timeoutCTS.
|
This PR has Quantification details
Why proper sizing of changes matters
Optimal pull request sizes drive a better predictable PR flow as they strike a
What can I do to optimize my changes
How to interpret the change counts in git diff output
Was this comment helpful? 👍 :ok_hand: :thumbsdown: (Email) |
|
For "WG-Cmdlets-Utility" group, please review the use of the existing parameter What this PR does is to have a timeout for the interval between receiving any bytes during a downloading operation. More concretely, it's -- if no byte is received during X seconds (value of the timeout), then stop the downloading operation. From the implementation perspective, today, However, from the user's perspective, it's likely a user is treating it as " The WG should review if it's OK for this PR to use the same |
|
Added to WG agenda |
|
This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days. |
|
@PowerShell/wg-powershell-cmdlets reviewed this. Based on the documentation currently, |
|
@stevenebutler the comments in this PR could be useful |

PR Summary
Use cancelation timer to unblock web cmdlets after lost network connectivity with target service.
Tested manually.
PR Context
Repro steps:
It is not big issue for interactive session since user can press Ctrl-C but for script scenario this can block the script infinitely and it is already critical issue.
The root of this problem is that although HttpClient generates a special stream for getting Content, it does not extend Timeout to it. So we have to use a workaround.
With the fix the command from example above will be terminated after 5 sec beginning with network connectivity lost and write "Invoke-WebRequest: The operation was canceled." error to inform users that result file is corrupted.
Notice, this doesn't change behavior if user press Ctrl-C - no error is written although I would expect it to write, but that's another issue.
PR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerWIP:or[ WIP ]to the beginning of the title (theWIPbot will keep its status check atPendingwhile the prefix is present) and remove the prefix when the PR is ready.(which runs in a different PS Host).