-
Notifications
You must be signed in to change notification settings - Fork 55
fix: clean URLs before fetching content #257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This prevents from submitting requests to URL containing any `../` that could trigger security proxies. Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
|
Signed-off-by: Paul Mars <paul.mars@canonical.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @upils, it is great to see your work so quickly. Just one question and a few minor comments. Have you validated using strace or similar that all the URLs in a normal installation are properly cleaned now? I don't want to miss any use, so running chisel cut with and without a clean cache with strace might give us more reassurance.
EDIT: Thinking more about it, strace is a terrible tool for this job.
internal/archive/archive.go
Outdated
| suffix = "dists/" + index.suite + "/" + suffix | ||
| } | ||
|
|
||
| u, err := url.JoinPath(baseURL, suffix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe another option would be to keep the original logic and then do url.JoinPath(baseURL, "") to signal that what is important here is to clean the URL of the possible ... What is your opinion?
Another option is to use cleanURL instead as a variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using url.JoinPath already conveys the intent of building a "clean" URL (it is clearly stated in the documentation of the function). I am not too convinced by the cleanURL name because it makes sense when we think of it as a result of the cleaning but not when we think of it as how it will be consumed. I am proposing reqURL. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using url.JoinPath already conveys the intent of building a "clean" URL (it is clearly stated in the documentation of the function)
Sure but I wouldn't know "why" only that the function does it. The why is actually only documented in the fact that the test fails if I changed it to +. I don't have a strong opinion here, but I would like to preserve somehow the intent if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. However the more I think about it, the more I think calling index.fetch("../../"+ suffix,...) is the real problem, especially since it seems to be to "remove" the dist/<suite>/ part added to the URL. I will try to see if we can make it cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ../../ was indeed to workaround the addition of dists/<suite>/ to the suffix. So I removed the ../../ and added a comment to clarify why fetch() manipulates the prefix.
I have tested running |
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Do not use `../../` in the path. It was trying to manipulate the behavior of `fetch()`, making assumptions on its implementation. Signed-off-by: Paul Mars <paul.mars@canonical.com>
letFunny
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments. As we discussed offline changing the rest of the code introduces more risks. I know you fully tested the current implementation and looked at all usages but please remember to do that again before the final submission.
internal/archive/archive.go
Outdated
| // Scope content fetching with the suite unless fetching a package from the pool | ||
| if !strings.HasPrefix(suffix, "pool/") { | ||
| suffix = "dists/" + index.suite + "/" + suffix | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to move the magic one layer up, so that instead of:
index.fetch("InRelease") // relative to dist/.../
index.fetch(packagesPath + ...) // relative to dist/.../
index.fetch(suffix) // full path
We make all of the fetching ask for the full path relative to the archive root which sounds cleaner to me.
If we instead keep the changes I would suggest the make the message a bit simpler, add punctuation and move it after the if statement, example:
if !strings.HasPrefix(suffix, "pool/") {
// If path is not a package then it is relative to the suite.
suffix = "dists/" + index.suite + "/" + suffix
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We make all of the fetching ask for the full path relative to the archive root which sounds cleaner to me.
I think that is the other way around: fetching calls made by index.fetch() should by default be relative to the suite path. And fetch() takes care of removing this scope when fetching a package. As a method of ubuntuIndex (configured with a specific suite) it makes more sense to me than fetching from the root of the archive.
I agree with your suggestion for the comment.
Signed-off-by: Paul Mars <paul.mars@canonical.com>
| suffix := section.Get("Filename") | ||
| logf("Fetching %s...", suffix) | ||
| reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk) | ||
| reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Note to reviewer]: Adding ../../ to the suffix aimed at forcing fetch() to construct a path starting at the root of the archive and pointing at a package (not under the scope of a suite). This was acting against the implementation of fetch() that is also already handling suffixes of URLs pointing at packages (prefixed with pool/). There is no change in behavior after removing ../../ except the URL is now already clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now subtly more brittle. I checked the Debian documentation for archives and the guarantee about Filename is that:
These fields in Packages files give the filename(s) of (the parts of) a package in the distribution directories, relative to the root of the Debian hierarchy
Which means that we are relying on the implicit assumption that pool/ is the prefix. Maybe it is not important and it will never change but I do wonder if it would be better to decouple it as I suggested and always use absolute paths. This wouldn't break if the archive changes or if users were to use Debian archives for example.
letFunny
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Paul. I think the latest iteration is the least controversial. You are fixing the bug without doing refactors not strictly needed for the bugfix. And thank you for testing it with mitmproxy. Just a few questions.
| return nil, fmt.Errorf("cannot clean requested URL: %v", err) | ||
| } | ||
| if cleanURL != req.URL.String() { | ||
| return nil, fmt.Errorf("test expected a clean URL %q, got %q", cleanURL, req.URL.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: a clean URL -> clean URL
| suffix := section.Get("Filename") | ||
| logf("Fetching %s...", suffix) | ||
| reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk) | ||
| reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now subtly more brittle. I checked the Debian documentation for archives and the guarantee about Filename is that:
These fields in Packages files give the filename(s) of (the parts of) a package in the distribution directories, relative to the root of the Debian hierarchy
Which means that we are relying on the implicit assumption that pool/ is the prefix. Maybe it is not important and it will never change but I do wonder if it would be better to decouple it as I suggested and always use absolute paths. This wouldn't break if the archive changes or if users were to use Debian archives for example.
letFunny
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with Paul, there are tradeoffs for both approaches of having absolute paths vs keeping it like it is. I see the point about trying to do the minimal change now because it is extremely unlikely that /pool/ is going to change. Anyway, I don't want to block the PR, it looks good to me. Thank you Paul!
Clean URLs before fetching content from the archive, preventing from triggering security proxies.
Fixes #255