Skip to content

Conversation

@CarloToso
Copy link
Contributor

@CarloToso CarloToso commented Jan 23, 2023

PR Summary

In Invoke-WebRequest and Invoke-RestMethod -Outfile only works if you specify a file path
After this PR:

  • If you specify an existing directory, the file will be placed by its server-designated name there.
  • If you specify a file path, that path is saved to as-is; a file path is either:
    - the path to an existing file (which is then quietly replaced, as before)
    - the path to an existing directory followed by the name of a new file to be created

@mklement0
The filename is taken from the last segment of the Uri

Problems:

  • it currently doesn't work with -Resume and I'm not sure it can
  • I don't know how to make this an experimental feature

PR Context

#11671

PR Checklist

else
{
// Get file name from last segment of Uri
OutFile = Path.Combine(OutFile, System.Net.WebUtility.UrlDecode(response.RequestMessage.RequestUri.Segments[^1]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't value. It is better to generate an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're wrong, I have tested some websites and they rarely (never) use the Content-Disposition header. So it would always generate an error.
Tested downloads: Firefox, Chrome, 7zip, Adobe Reader

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case there is no point in analyzing Content-Disposition at all :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please update the PR description.

Copy link
Contributor Author

@CarloToso CarloToso Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. What should we do about the problem with -Resume ?
I'll add a new error
Added error when Resume is used and outfile doesn't point to a file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content-Disposition may not be used often (I can't speak to that), but it is definitely used, including on GitHub:

E.g., in an archive download for a given project, such as https://github.com/mklement0/fileicon/archive/stable.zip, where using the last URL segment wouldn't even tell you the name of the product, the Content-Disposition field provides a helpful file name:

# -> 'attachment; filename=fileicon-stable.zip'
(Invoke-WebRequest -Method Head  https://github.com/mklement0/fileicon/archive/stable.zip).Headers['Content-Disposition']

Both curl and wget support Content-Disposition, albeit on an opt-in basis (-J and --content-disposition)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mklement0 Please describe the desired behaviour and I will implement it, I removed the check for Content-Disposition only because @iSazonov requested it

Copy link
Contributor

@mklement0 mklement0 Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @CarloToso.

My suggestion is as follows - though I wonder if it requires a wider discussion:

I suggest also making the feature opt-in, given the following warning from man curl with respect to its opt-in (-J / --remote-header-name):

WARNING: Exercise judicious use of this option, especially on
Windows. A rogue server could send you the name of a DLL or
other file that could be loaded automatically by Windows or some
third party software.

That requires a new switch, say -UseSuggestedFileName, which would have to report an error if the -OutFile argument targets a file.

There's also a discoverability problem - also with the use-last-URI-segment approach, given that redirections may be involved, and that the new default behavior uses the redirected-to URI's last segment:

  • How does the user learn what file name was ultimately used (short of examining the target directory for the most recently added file)?
    • At the very least, -Verbose should include this information.
    • -PassThru should use an ETS instance member to decorate the passed-through Microsoft.PowerShell.Commands.BasicHtmlWebResponseObject instance with a .FullName property that reflects the output file's full path.

As for how to parse Content-Disposition and fallback behavior:

  • Based on the previously linked description, the C# equivalent of the following is probably good enough in practice, though the recommendation is to give precedence to filename* (sic) entries:

    • <value-of-Content-Disposition-field(s)> -replace '^.*\bfilename\*?="?([^;"]+)?.*', '$1'
    • If there's no Content-Disposition field or if the file-name extraction fails, fall back to the last-URI-segment approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could move this discussion to issue #11671

@CarloToso
Copy link
Contributor Author

The test error looks like a random failure unrelated to this PR

@CarloToso CarloToso closed this Jan 25, 2023
@CarloToso CarloToso reopened this Jan 25, 2023
}

// OutFile must not be a directory to use Resume.
if (Resume.IsPresent && Directory.Exists(QualifiedOutFile))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If we resolve the path we should cache it here and use later.
  2. It seems it is not correct check since we could get the file name from URL.

Copy link
Contributor Author

@CarloToso CarloToso Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I agree with you, I'll do it later
  2. I think it's correct, this is the logic:
  • if -Outfile points to a file, old logic you can use resume
  • if -Outfile points to a folder (Directory.Exists(QualifiedOutFile)) you can't use the resume switch (throws an error) and the filename is taken later from the url

Copy link
Contributor Author

@CarloToso CarloToso Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov I think I need a hand with caching the resolved path

This comment was marked as duplicate.

@iSazonov
Copy link
Collaborator

@CarloToso We need new tests for null, empty, folder.

@CarloToso
Copy link
Contributor Author

@iSazonov I added the tests for null and empty -OutFile to #19044

@CarloToso CarloToso marked this pull request as draft January 29, 2023 17:04
@ghost ghost added the Review - Needed The PR is being reviewed label Feb 17, 2023
@ghost
Copy link

ghost commented Feb 17, 2023

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@CarloToso
Copy link
Contributor Author

#firefox download
curl "https://download.mozilla.org/?product=firefox-stub&os=win&lang=it" -L -O
#--> _product=firefox-stub&os=win&lang=it
Invoke-WebRequest "https://download.mozilla.org/?product=firefox-stub&os=win&lang=it" -OutFile .\
#--> Firefox Installer.exe (BETTER)

#github download
curl "https://github.com/mklement0/ttab/archive/stable.zip" -L -O
#--> stable.zip (BETTER)
Invoke-WebRequest "https://github.com/mklement0/ttab/archive/stable.zip" -OutFile .\
#--> stable

@mklement0 It seems to me that both approaches have some issues... What should we do?

@mklement0
Copy link
Contributor

@CarloToso, yes, both approaches are defensible.

  • The curl approach has one advantage: it is obvious to the user what the output file name will be - though it may not be a meaningful one.

  • Your current approach may provide a more meaningful filename, but the user will have to investigate the output directory to find out what filename was used.


The curl approach is the simplest one (but note that curl also offers -J to honor a Content-Disposition filename).

If we stick with what you have now, we should offer a way to discover the filename via (a) -Verbose output and (b) via the object returned when you use -PassThru (in the simplest form with an ETS instance property containing a [System.IO.FileInfo] instance describing the output file) - this is why I asked for this aspect to be decided on first - see #11671 (comment)

The latter will also become important if and when we implement Content-Disposition support later.

(Curiously curl -JO also doesn't report the filename ultimately used, and while you can glean it from the -v (verbose) output, it's buried in a lot of other info; I think we can do better here).

@mklement0
Copy link
Contributor

@CarloToso, realistically speaking, given that my plea to decide on the pass-through issue before merging this PR are falling on deaf ears:

In the absence of a feature that allows discovery of the output filename via -PassThru, I suggest changing the implementation to follow curl'ls lead: use the input URI's last segment as the filename.

Also - I'm not sure how you're handling this - note that curl reports an error if the input URI lacks a server-relative path; e.g.:

PS> curl -LO http://example.org
curl: Remote file name has no length!
curl: (23) Failed writing received data to disk/application

This can notably also happen if the last URI segment happens to be invalid as a filename.
It happens with your https://download.mozilla.org/?product=firefox-stub&os=win&lang=it example, but, curiously, only non non-Windows platforms (perhaps because curl.exe actively strips out chars. such as ? on Windows.

@CarloToso
Copy link
Contributor Author

@mklement0 I'm slowly working on addressing your suggestions about -PassThru, I created WebResponseHelper.GetOutFilePath as a stepping stone. I became aware of the problem you just described some hours ago and I'm working on a solution for a follow up PR. My current Roadmap is:

  • This PR --> first implementation, it only works for files using the last segment of the uri (fails for http://example.org), filename shown if -Verbose

One or Two other PR:

  • Add Content-Disposition support, choose each time between curl method and my method for filenames
  • Handle http://example.org case
  • Add warnings (like curl) each time we choose a folder as download target, discovery of the output filename via -PassThru, add more tests

I don't think I should add more commits to this PR

@mklement0
Copy link
Contributor

Thank you, @CarloToso, that makes sense, except I'm unclear on what you mean by:

choose each time between curl method and my method for filenames

Add warnings (like curl) each time we choose a folder as download target

@CarloToso
Copy link
Contributor Author

@mklement0

  1. Each time choose what filename works better between:
  • Uri.AbsoluteUri.Split("/")[^1] (sanitized) [curl]
  • response.RequestMessage.RequestUri.Segments[^1] [my method]
  1. Add a warning message when the user chooses a folder path with -OutFile, you said it could be unsafe for the user

WARNING: Exercise judicious use of this option, especially on
Windows. A rogue server could send you the name of a DLL or
other file that could be loaded automatically by Windows or some
third party software.

@mklement0
Copy link
Contributor

Thanks for clarifying , @CarloToso.

Just food for thought, we can defer further discussion to your future PRs:

Re 1: How would you decide in cases where both methods yield a viable filename?

Re 2: Personally, I think warning in the documentation is sufficient (which is what curl does).

@CarloToso
Copy link
Contributor Author

@iSazonov @SteveL-MSFT Can we merge this? So I con continue working on the name from Content-Disposition feature

@ghost ghost added Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept and removed Review - Needed The PR is being reviewed labels Mar 20, 2023
…s.Tests.ps1

Co-authored-by: Steve Lee <slee@microsoft.com>
@ghost ghost removed the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Mar 20, 2023
…s.Tests.ps1

Co-authored-by: Steve Lee <slee@microsoft.com>
@pull-request-quantifier-deprecated

This PR has 44 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Extra Small
Size       : +41 -3
Percentile : 17.6%

Total files changed: 6

Change summary by file extension:
.cs : +16 -3
.resx : +3 -0
.ps1 : +22 -0

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

@ghost
Copy link

ghost commented Apr 20, 2023

🎉v7.4.0-preview.3 has been released which incorporates this pull request.:tada:

Handy links:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log Extra Small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants