fix: clean URLs before fetching content #257

upils · 2025-11-21T15:07:58Z

Have you signed the CLA?

Clean URLs before fetching content from the archive, preventing from triggering security proxies.

Fixes #255

This prevents from submitting requests to URL containing any `../` that could trigger security proxies. Signed-off-by: Paul Mars <paul.mars@canonical.com>

Signed-off-by: Paul Mars <paul.mars@canonical.com>

github-actions · 2025-11-21T15:13:20Z

Command	Mean [s]	Min [s]	Max [s]	Relative
`BASE`	9.193 ± 0.046	9.160	9.318	1.00
`HEAD`	9.210 ± 0.052	9.158	9.326	1.00 ± 0.01

Signed-off-by: Paul Mars <paul.mars@canonical.com>

letFunny

Thank you @upils, it is great to see your work so quickly. Just one question and a few minor comments. Have you validated using strace or similar that all the URLs in a normal installation are properly cleaned now? I don't want to miss any use, so running chisel cut with and without a clean cache with strace might give us more reassurance.

EDIT: Thinking more about it, strace is a terrible tool for this job.

letFunny · 2025-11-24T11:59:33Z

internal/archive/archive.go

+		suffix = "dists/" + index.suite + "/" + suffix
+	}
+
+	u, err := url.JoinPath(baseURL, suffix)


Maybe another option would be to keep the original logic and then do url.JoinPath(baseURL, "") to signal that what is important here is to clean the URL of the possible ... What is your opinion?

Another option is to use cleanURL instead as a variable name.

I think using url.JoinPath already conveys the intent of building a "clean" URL (it is clearly stated in the documentation of the function). I am not too convinced by the cleanURL name because it makes sense when we think of it as a result of the cleaning but not when we think of it as how it will be consumed. I am proposing reqURL. WDYT?

I think using url.JoinPath already conveys the intent of building a "clean" URL (it is clearly stated in the documentation of the function)

Sure but I wouldn't know "why" only that the function does it. The why is actually only documented in the fact that the test fails if I changed it to +. I don't have a strong opinion here, but I would like to preserve somehow the intent if possible.

Good point. However the more I think about it, the more I think calling index.fetch("../../"+ suffix,...) is the real problem, especially since it seems to be to "remove" the dist/<suite>/ part added to the URL. I will try to see if we can make it cleaner.

Using ../../ was indeed to workaround the addition of dists/<suite>/ to the suffix. So I removed the ../../ and added a comment to clarify why fetch() manipulates the prefix.

internal/archive/archive.go

internal/archive/archive_test.go

upils · 2025-11-24T14:13:00Z

Have you validated using strace or similar that all the URLs in a normal installation are properly cleaned now? I don't want to miss any use, so running chisel cut with and without a clean cache with strace might give us more reassurance.

I have tested running chisel debug check-release-archives --release ubuntu-25.04 with mitmproxy, both before and after the fix, confirming URLs are now clean.

Signed-off-by: Paul Mars <paul.mars@canonical.com>

Do not use `../../` in the path. It was trying to manipulate the behavior of `fetch()`, making assumptions on its implementation. Signed-off-by: Paul Mars <paul.mars@canonical.com>

letFunny

Some more comments. As we discussed offline changing the rest of the code introduces more risks. I know you fully tested the current implementation and looked at all usages but please remember to do that again before the final submission.

letFunny · 2025-11-25T12:12:00Z

internal/archive/archive.go

+	// Scope content fetching with the suite unless fetching a package from the pool
+	if !strings.HasPrefix(suffix, "pool/") {
+		suffix = "dists/" + index.suite + "/" + suffix
+	}


Another option would be to move the magic one layer up, so that instead of:

index.fetch("InRelease") // relative to dist/.../ index.fetch(packagesPath + ...) // relative to dist/.../ index.fetch(suffix) // full path

We make all of the fetching ask for the full path relative to the archive root which sounds cleaner to me.

If we instead keep the changes I would suggest the make the message a bit simpler, add punctuation and move it after the if statement, example:

if !strings.HasPrefix(suffix, "pool/") { // If path is not a package then it is relative to the suite. suffix = "dists/" + index.suite + "/" + suffix }

We make all of the fetching ask for the full path relative to the archive root which sounds cleaner to me.

I think that is the other way around: fetching calls made by index.fetch() should by default be relative to the suite path. And fetch() takes care of removing this scope when fetching a package. As a method of ubuntuIndex (configured with a specific suite) it makes more sense to me than fetching from the root of the archive.

I agree with your suggestion for the comment.

Signed-off-by: Paul Mars <paul.mars@canonical.com>

upils · 2025-11-25T12:47:21Z

internal/archive/archive.go

 	suffix := section.Get("Filename")
 	logf("Fetching %s...", suffix)
-	reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk)
+	reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk)


[Note to reviewer]: Adding ../../ to the suffix aimed at forcing fetch() to construct a path starting at the root of the archive and pointing at a package (not under the scope of a suite). This was acting against the implementation of fetch() that is also already handling suffixes of URLs pointing at packages (prefixed with pool/). There is no change in behavior after removing ../../ except the URL is now already clean.

This is now subtly more brittle. I checked the Debian documentation for archives and the guarantee about Filename is that:

These fields in Packages files give the filename(s) of (the parts of) a package in the distribution directories, relative to the root of the Debian hierarchy

Which means that we are relying on the implicit assumption that pool/ is the prefix. Maybe it is not important and it will never change but I do wonder if it would be better to decouple it as I suggested and always use absolute paths. This wouldn't break if the archive changes or if users were to use Debian archives for example.

letFunny

Thanks for working on this Paul. I think the latest iteration is the least controversial. You are fixing the bug without doing refactors not strictly needed for the bugfix. And thank you for testing it with mitmproxy. Just a few questions.

letFunny · 2025-11-26T10:51:23Z

internal/archive/archive_test.go

+		return nil, fmt.Errorf("cannot clean requested URL: %v", err)
+	}
+	if cleanURL != req.URL.String() {
+		return nil, fmt.Errorf("test expected a clean URL %q, got %q", cleanURL, req.URL.String())


Nit: a clean URL -> clean URL

letFunny · 2025-11-26T11:00:04Z

internal/archive/archive.go

 	suffix := section.Get("Filename")
 	logf("Fetching %s...", suffix)
-	reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk)
+	reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk)


This is now subtly more brittle. I checked the Debian documentation for archives and the guarantee about Filename is that:

These fields in Packages files give the filename(s) of (the parts of) a package in the distribution directories, relative to the root of the Debian hierarchy

Which means that we are relying on the implicit assumption that pool/ is the prefix. Maybe it is not important and it will never change but I do wonder if it would be better to decouple it as I suggested and always use absolute paths. This wouldn't break if the archive changes or if users were to use Debian archives for example.

letFunny

Discussed offline with Paul, there are tradeoffs for both approaches of having absolute paths vs keeping it like it is. I see the point about trying to do the minimal change now because it is extremely unlikely that /pool/ is going to change. Anyway, I don't want to block the PR, it looks good to me. Thank you Paul!

upils added 2 commits November 21, 2025 16:05

fix: Resolve URL path before fetching content

638e4c8

This prevents from submitting requests to URL containing any `../` that could trigger security proxies. Signed-off-by: Paul Mars <paul.mars@canonical.com>

fix: missing import

24dc87f

Signed-off-by: Paul Mars <paul.mars@canonical.com>

letFunny mentioned this pull request Nov 21, 2025

HTTP requests to Ubuntu archive with path traversal in the URL not working in restricted environment #255

Open

test: URL cleaning

183f188

Signed-off-by: Paul Mars <paul.mars@canonical.com>

upils changed the title ~~fix: Resolve URL path before fetching content~~ fix: clean URL before fetching content Nov 21, 2025

upils changed the title ~~fix: clean URL before fetching content~~ fix: clean URLs before fetching content Nov 21, 2025

letFunny reviewed Nov 24, 2025

View reviewed changes

upils added 5 commits November 24, 2025 17:20

fix: mark error as internal

0c2633d

Signed-off-by: Paul Mars <paul.mars@canonical.com>

style: improve naming

7b84848

Signed-off-by: Paul Mars <paul.mars@canonical.com>

fix(test): improve error message formatting

1fbc902

Signed-off-by: Paul Mars <paul.mars@canonical.com>

docs: clarify suffix handling

57cef9f

Signed-off-by: Paul Mars <paul.mars@canonical.com>

refactor: simplify fetching package

31a78dc

Do not use `../../` in the path. It was trying to manipulate the behavior of `fetch()`, making assumptions on its implementation. Signed-off-by: Paul Mars <paul.mars@canonical.com>

letFunny reviewed Nov 25, 2025

View reviewed changes

refactor: revert some changes to minimize diff

a8a06ee

Signed-off-by: Paul Mars <paul.mars@canonical.com>

upils commented Nov 25, 2025

View reviewed changes

upils requested a review from letFunny November 25, 2025 12:50

letFunny reviewed Nov 26, 2025

View reviewed changes

letFunny approved these changes Nov 26, 2025

View reviewed changes

fix: clean URLs before fetching content #257

Are you sure you want to change the base?

fix: clean URLs before fetching content #257

Uh oh!

Conversation

upils commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

letFunny left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

upils commented Nov 24, 2025

Uh oh!

letFunny left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

letFunny left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

letFunny left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Nov 21, 2025 •

edited

Loading

letFunny left a comment •

edited

Loading