Skip to content

Conversation

@upils
Copy link

@upils upils commented Nov 21, 2025

  • Have you signed the CLA?

Clean URLs before fetching content from the archive, preventing from triggering security proxies.

Fixes #255

This prevents from submitting requests to URL containing any `../` that could trigger security proxies.

Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
@github-actions
Copy link

github-actions bot commented Nov 21, 2025

Command Mean [s] Min [s] Max [s] Relative
BASE 9.193 ± 0.046 9.160 9.318 1.00
HEAD 9.210 ± 0.052 9.158 9.326 1.00 ± 0.01

Signed-off-by: Paul Mars <paul.mars@canonical.com>
@upils upils changed the title fix: Resolve URL path before fetching content fix: clean URL before fetching content Nov 21, 2025
@upils upils changed the title fix: clean URL before fetching content fix: clean URLs before fetching content Nov 21, 2025
Copy link
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @upils, it is great to see your work so quickly. Just one question and a few minor comments. Have you validated using strace or similar that all the URLs in a normal installation are properly cleaned now? I don't want to miss any use, so running chisel cut with and without a clean cache with strace might give us more reassurance.

EDIT: Thinking more about it, strace is a terrible tool for this job.

suffix = "dists/" + index.suite + "/" + suffix
}

u, err := url.JoinPath(baseURL, suffix)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another option would be to keep the original logic and then do url.JoinPath(baseURL, "") to signal that what is important here is to clean the URL of the possible ... What is your opinion?

Another option is to use cleanURL instead as a variable name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using url.JoinPath already conveys the intent of building a "clean" URL (it is clearly stated in the documentation of the function). I am not too convinced by the cleanURL name because it makes sense when we think of it as a result of the cleaning but not when we think of it as how it will be consumed. I am proposing reqURL. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using url.JoinPath already conveys the intent of building a "clean" URL (it is clearly stated in the documentation of the function)

Sure but I wouldn't know "why" only that the function does it. The why is actually only documented in the fact that the test fails if I changed it to +. I don't have a strong opinion here, but I would like to preserve somehow the intent if possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. However the more I think about it, the more I think calling index.fetch("../../"+ suffix,...) is the real problem, especially since it seems to be to "remove" the dist/<suite>/ part added to the URL. I will try to see if we can make it cleaner.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ../../ was indeed to workaround the addition of dists/<suite>/ to the suffix. So I removed the ../../ and added a comment to clarify why fetch() manipulates the prefix.

@upils
Copy link
Author

upils commented Nov 24, 2025

Have you validated using strace or similar that all the URLs in a normal installation are properly cleaned now? I don't want to miss any use, so running chisel cut with and without a clean cache with strace might give us more reassurance.

I have tested running chisel debug check-release-archives --release ubuntu-25.04 with mitmproxy, both before and after the fix, confirming URLs are now clean.

Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Do not use `../../` in the path. It was trying to manipulate the behavior of
`fetch()`, making assumptions on its implementation.

Signed-off-by: Paul Mars <paul.mars@canonical.com>
Copy link
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments. As we discussed offline changing the rest of the code introduces more risks. I know you fully tested the current implementation and looked at all usages but please remember to do that again before the final submission.

Comment on lines 382 to 385
// Scope content fetching with the suite unless fetching a package from the pool
if !strings.HasPrefix(suffix, "pool/") {
suffix = "dists/" + index.suite + "/" + suffix
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to move the magic one layer up, so that instead of:

index.fetch("InRelease") // relative to dist/.../
index.fetch(packagesPath + ...) // relative to dist/.../
index.fetch(suffix) // full path

We make all of the fetching ask for the full path relative to the archive root which sounds cleaner to me.

If we instead keep the changes I would suggest the make the message a bit simpler, add punctuation and move it after the if statement, example:

	if !strings.HasPrefix(suffix, "pool/") {
		// If path is not a package then it is relative to the suite.
		suffix = "dists/" + index.suite + "/" + suffix
	}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We make all of the fetching ask for the full path relative to the archive root which sounds cleaner to me.

I think that is the other way around: fetching calls made by index.fetch() should by default be relative to the suite path. And fetch() takes care of removing this scope when fetching a package. As a method of ubuntuIndex (configured with a specific suite) it makes more sense to me than fetching from the root of the archive.

I agree with your suggestion for the comment.

Signed-off-by: Paul Mars <paul.mars@canonical.com>
suffix := section.Get("Filename")
logf("Fetching %s...", suffix)
reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk)
reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Note to reviewer]: Adding ../../ to the suffix aimed at forcing fetch() to construct a path starting at the root of the archive and pointing at a package (not under the scope of a suite). This was acting against the implementation of fetch() that is also already handling suffixes of URLs pointing at packages (prefixed with pool/). There is no change in behavior after removing ../../ except the URL is now already clean.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now subtly more brittle. I checked the Debian documentation for archives and the guarantee about Filename is that:

These fields in Packages files give the filename(s) of (the parts of) a package in the distribution directories, relative to the root of the Debian hierarchy

Which means that we are relying on the implicit assumption that pool/ is the prefix. Maybe it is not important and it will never change but I do wonder if it would be better to decouple it as I suggested and always use absolute paths. This wouldn't break if the archive changes or if users were to use Debian archives for example.

@upils upils requested a review from letFunny November 25, 2025 12:50
Copy link
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Paul. I think the latest iteration is the least controversial. You are fixing the bug without doing refactors not strictly needed for the bugfix. And thank you for testing it with mitmproxy. Just a few questions.

return nil, fmt.Errorf("cannot clean requested URL: %v", err)
}
if cleanURL != req.URL.String() {
return nil, fmt.Errorf("test expected a clean URL %q, got %q", cleanURL, req.URL.String())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: a clean URL -> clean URL

suffix := section.Get("Filename")
logf("Fetching %s...", suffix)
reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk)
reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now subtly more brittle. I checked the Debian documentation for archives and the guarantee about Filename is that:

These fields in Packages files give the filename(s) of (the parts of) a package in the distribution directories, relative to the root of the Debian hierarchy

Which means that we are relying on the implicit assumption that pool/ is the prefix. Maybe it is not important and it will never change but I do wonder if it would be better to decouple it as I suggested and always use absolute paths. This wouldn't break if the archive changes or if users were to use Debian archives for example.

Copy link
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline with Paul, there are tradeoffs for both approaches of having absolute paths vs keeping it like it is. I see the point about trying to do the minimal change now because it is extremely unlikely that /pool/ is going to change. Anyway, I don't want to block the PR, it looks good to me. Thank you Paul!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP requests to Ubuntu archive with path traversal in the URL not working in restricted environment

2 participants