Skip to content

Conversation

@iSazonov
Copy link
Collaborator

@iSazonov iSazonov commented Sep 1, 2022

PR Summary

Use cancelation timer to unblock web cmdlets after lost network connectivity with target service.

Tested manually.

PR Context

Repro steps:

  1. Start download a file
Invoke-WebRequest https://github.com/PowerShell/PowerShell/releases/download/v7.3.0-preview.7/PowerShell-7.3.0-preview.7-win-x64.zip -OutFile C:\temp\PowerShell-7.3.0-preview.7-win-x64.zip -TimeoutSec 5
  1. Turn off Wi-Fi Internet connection (or unplug cable from Internet router)
  2. Wait ~1min (the cmdlet is not terminated after 5 sec)
  3. Restore network connectivity
  4. The cmdlet hangs infinitely (Ctrl-C works!)

It is not big issue for interactive session since user can press Ctrl-C but for script scenario this can block the script infinitely and it is already critical issue.

The root of this problem is that although HttpClient generates a special stream for getting Content, it does not extend Timeout to it. So we have to use a workaround.

With the fix the command from example above will be terminated after 5 sec beginning with network connectivity lost and write "Invoke-WebRequest: The operation was canceled." error to inform users that result file is corrupted.

image

image

Notice, this doesn't change behavior if user press Ctrl-C - no error is written although I would expect it to write, but that's another issue.

PR Checklist

@iSazonov iSazonov added the CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log label Sep 1, 2022
@iSazonov iSazonov requested a review from PaulHigin as a code owner September 1, 2022 13:01
@ghost ghost assigned adityapatwardhan Sep 1, 2022

if (previousLength != output.Length)
{
// Reset cancelation timer only if we get an information from network on the interation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a dotNet bug, where cancelation timer is ineffective during stream copy task. Does it make sense to submit an issue report about this?
Also, the timeout time is now only for when there is no network connectivity. User expectation is that the timeout time is for the entire process including both connected and non-connected times. We would need to update the documentation to make clear what the -TimeOutSec parameter does.

Suggested change
// Reset cancelation timer only if we get an information from network on the interation.
// Reset cancelation timer while information continues to flow from network.
// Cancelation timer applies only during no network connectivity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a dotNet bug, where cancelation timer is ineffective during stream copy task. Does it make sense to submit an issue report about this?

This works for whole CopyToAsync but it is not that we want since full download can take unpredictable time. We need only monitor every internal request to target service for hang - the CopyToAsync has internal cycle where ReadAsync() is used. It is just the ReadAsync hangs and we need reset the timer for every iteration in the cycle.

That you assumed it was a bug in the .Net Runtime is an interesting psychological experiment. :-) That was my assumption too, and I opened a discussion there a few days ago. Only the .Net team offers workarounds and doesn't agree that HttpClient.Timeout should apply to operations on streams it creates itself, even though it works with all its other operations.
You could share your expectations there too if you want.

We would need to update the documentation to make clear what the -TimeOutSec parameter does.

Both .Net and PowerShell docs should be updated for the Timeout parameter - that it applies to every request to target service and protects from too long (infinite) network operations.

@ghost ghost added the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Sep 1, 2022
…Cmdlet/StreamHelper.cs

Co-authored-by: Paul Higinbotham <paulhi@microsoft.com>
@ghost ghost removed the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Sep 2, 2022
Copy link
Contributor

@PaulHigin PaulHigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM

Please create a document bug to clarify that the timeout parameter applies to no-network-connectivity rather than total operation time.

@ghost ghost added the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Sep 2, 2022
…Cmdlet/StreamHelper.cs

Co-authored-by: Dongbo Wang <dongbow@microsoft.com>
@ghost ghost removed the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Sep 3, 2022
iSazonov and others added 3 commits September 3, 2022 23:30
…Cmdlet/StreamHelper.cs

Co-authored-by: Dongbo Wang <dongbow@microsoft.com>
…Cmdlet/StreamHelper.cs

Co-authored-by: Dongbo Wang <dongbow@microsoft.com>
…Cmdlet/StreamHelper.cs

Co-authored-by: Dongbo Wang <dongbow@microsoft.com>
@iSazonov iSazonov added the Documentation Needed in this repo Documentation is needed in this repo label Sep 7, 2022
Comment on lines 318 to 325

if (previousLength != output.Length)
{
// Reset cancelation timer while information continues to flow from network.
// Cancelation timer applies only during no network connectivity.
previousLength = output.Length;
cts.CancelAfter(timeout);
}
Copy link
Member

@daxian-dbw daxian-dbw Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is not needed.
We should timeout just in the timespan specified by -Timeout, but what you are doing here is to refresh the timer after every 1 second.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case you say the Timeout must be longer than the expected time to fully load the file. This immediately indicates a lot of problems. For example, the time to fully load a file is unpredictable even if there are no problems, because this time is determined by many factors (client, server, network). In general, even the size of the file is unknown. Thus, neither the script developer nor the user can set a reasonable timeout value - it will be very long for the script to accurately execute. But such a protective timeout is already possible in Task Scheduler or CIs - it can be an hour or two.
If you look from HttpClient side then before uploading it performs sending request and receiving response, these two operations are performed with the same timeout, but these are instantaneous operations - here you need timeout of a minute or two, but not an hour or two.
If you look at it from the service side, any service closes the session when it loses contact with the client after a short time. With Kestrel, for example, this is 130 seconds by default. Usually it's faster (about a minute). This means that it doesn't make sense for the client to wait an hour or two. This also makes it clear that it makes sense to specify a timeout just for each request and response operation. That's exactly what this code does. We can explicitly specify in the documentation that if network connectivity is lost, cmdlet will crash after a given timeout.

Copy link
Member

@daxian-dbw daxian-dbw Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the doc for -TimeoutSec:

Specifies how long the request can be pending before it times out. Enter a value in seconds. The default value, 0, specifies an indefinite time-out.

To an end user, it doesn't matter it's because of sending/receiving http request or downloading a file, when specifying -TimtoutSec, what the user wants is for Invoke-WebRequest/RestMethod to timeout if the operation cannot finish in X seconds. So, we need to make sure that's the case (maybe not accurately X seconds, but closely around that time).

I prefer to use the existing -TimeoutSec. However, even if we have a separate parameter just for downloading timeout, it's semantics should be "timeout if the downloading doesn't finish in X seconds". An end user doesn't need to know it didn't finish because of bad network speed, or because the network disconnected and then reconnected.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Timeout did never work for download scenario in all PowerShell versions.. So we can not say "the user want" for the download operation. In IO operations timeout is always protection from pending (which is just in the documentation)

Copy link
Member

@daxian-dbw daxian-dbw Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesWTruher and @SeeminglyScience, can you please share your thoughts on this (summary captured in #18005 (comment))?
Mark the PR to be reviewed by cmdlet WG.

Comment on lines 309 to 310
cts.CancelAfter(timeout);
Task copyTask = input.CopyToAsync(output, cts.Token);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cts.CancelAfter(timeout);
Task copyTask = input.CopyToAsync(output, cts.Token);
Task copyTask = input.CopyToAsync(output, cts.Token);
cts.CancelAfter(timeout);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think it is important?
I follow HttpClient implementation. It activates timeout before starting an operation.
https://source.dot.net/#System.Net.Http/System/Net/Http/HttpClient.cs,185

Copy link
Member

@daxian-dbw daxian-dbw Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that important, and you can choose to not accept this suggestion.
But in the HttpClient.cs implementation, it's using await and thus you cannot start the timer after calling await base.SendAsync. So, they have to start the timer before calling SendAsync there.

}
}
catch (OperationCanceledException)
catch (OperationCanceledException exc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this exception will be thrown when the cancellation happens?
In my script test, I get an AggregateException that wraps a TaskCanceledException instead.

$s = [System.Threading.CancellationTokenSource]::new()
try {
    $t = [System.Threading.Tasks.Task]::Delay(100000, $s.Token); ## delay for 100 seconds
    $s.CancelAfter(2000);   ## cancel after 2 seconds
    $t.Wait()
} catch {
    $e = $_
}
PS:32> $e.Exception.InnerException.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     AggregateException                       System.Exception

PS:33> $e.Exception.InnerException.InnerExceptions[0].GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     TaskCanceledException                    System.OperationCanceledException

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is no AggregateException. I guess because we use Wait(cts.Token) but in your example Wait().
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we could fall in race condition if timeout was raised while we are in the loop body. So I replaced Wait with another overload which checks cancelation token first. https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs,2670

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any problem with timeout being raised while in the loop body. Wait(millisecondsTimeout, cancellationToken) will call cancellationToken.ThrowIfCancellationRequested within InternalWait, after a quick SpinWait.

catch (OperationCanceledException)
catch (OperationCanceledException exc)
{
// We cannot distinct 'cancel` (user press Ctrl-C) and 'timeout' so we report the exception as-is
Copy link
Member

@daxian-dbw daxian-dbw Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we can differentiate them. Just check if cancellationToken is cancelled. If Ctrl+c is pressed, that token will be cancelled, otherwise, it's the timeout. So, we should write out error only if the cancellation is caused by the timeout.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, we can (cancellationToken vs cts). Fixed. Also renamed cts to timeoutCTS.

@ghost ghost added Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept and removed Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept labels Sep 9, 2022
@pull-request-quantifier-deprecated

This PR has 20 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Extra Small
Size       : +15 -5
Percentile : 8%

Total files changed: 1

Change summary by file extension:
.cs : +15 -5

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

@daxian-dbw daxian-dbw added the WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module label Sep 12, 2022
@daxian-dbw
Copy link
Member

daxian-dbw commented Sep 12, 2022

For "WG-Cmdlets-Utility" group, please review the use of the existing parameter TimeoutSec in the file downloading scenario that this PR is trying to address. See #18005 (review) as reference.

What this PR does is to have a timeout for the interval between receiving any bytes during a downloading operation. More concretely, it's -- if no byte is received during X seconds (value of the timeout), then stop the downloading operation.

From the implementation perspective, today, -TimeoutSec only applies to the httpClient.Timeout, and thus is only for sending http request and receiving http response, and it doesn't cover the file downloading time, so it's possible that today Invoke-WebRequest takes way longer than the specified TimeoutSec to return because the downloading takes a long time.

However, from the user's perspective, it's likely a user is treating it as "Invoke-WebRequest call should be cancelled if it cannot be finished in TimeoutSec".

The WG should review if it's OK for this PR to use the same -TimeoutSec parameter and overload its meaning.

@SteveL-MSFT
Copy link
Member

Added to WG agenda

@ghost ghost added the Review - Needed The PR is being reviewed label Sep 27, 2022
@ghost
Copy link

ghost commented Sep 27, 2022

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@SteveL-MSFT
Copy link
Member

@PowerShell/wg-powershell-cmdlets reviewed this. Based on the documentation currently, -TimeoutSec is for the initial connection and the definition of this parameter should not change. We suggest renaming this parameter to -ConnectTimeoutSec and alias to -TimeoutSec. For this new capability, we propose adding a new -NetworkTimeoutSec (default to 0 meaning infinite) to handle the case of intermittent network issues.

@ghost ghost removed the Review - Needed The PR is being reviewed label Oct 5, 2022
@iSazonov iSazonov closed this Oct 5, 2022
@CarloToso
Copy link
Contributor

@stevenebutler the comments in this PR could be useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log Documentation Needed in this repo Documentation is needed in this repo Extra Small WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants