Skip to content

avoid moving/renaming the hosts file - fix docker container issue#759

Merged
brianmay merged 2 commits intosshuttle:masterfrom
nikatlas:fix-docker-container-cross-device-link-error
Jun 2, 2022
Merged

avoid moving/renaming the hosts file - fix docker container issue#759
brianmay merged 2 commits intosshuttle:masterfrom
nikatlas:fix-docker-container-cross-device-link-error

Conversation

@nikatlas
Copy link
Contributor

@nikatlas nikatlas commented May 18, 2022

Docker mounts the /etc/hosts file when running a container.
The rest of the /etc folder is not mounted on the same mountpoint as the hosts file.

sshuttle is trying to backup the file by creating a hard link.
Hard links in linux require the files to be in the same mount.

There does not seem to be a reason a hard link is generated just for a backup.
Also, after the backup is captured sshuttle tries to overwrite the hosts file by os.rename(TMPFILE, HOSTSFILE).

In docker containers the hosts file is mounted and is not allowed to be manipulated resulting in a Resource Busy error.
However the user can edit the file normally by manipulating the content of the file.

THIS PR:

  • copies the /etc/hosts to /etc/hosts.sbak instead of creating a hard link
  • opens the /etc/hosts file and writes the contents of /etc/hosts.XXX.tmp file instead of using os.rename to overwrite the file

Troubleshooting - my problem:
I had this problem with my container and patched the problem with this PR.

  • I was receiving a OSError: [Errno 18] Cross-device link: '/etc/hosts' -> '/etc/hosts.sbak'.
  • I initially created a placeholder file since I didn't care about a backup in a container
  • Then the tmp file that contained the DNS entries could not be stored in the /etc/hosts file because the file could not be overwritten(not allowed by docker)
  • Then i commented out the os.rename(tmpname, HOSTSFILE) on firewall.py:L50
  • The tunneling worked but i was missing the DNS entries from the hosts file
  • So, I made this PR to write to the existing file instead of overwritting the file

@skuhl
Copy link
Contributor

skuhl commented May 18, 2022

Thanks for pull request and details about what docker does.

I can explain why the sshuttle behaves as it currently does (and perhaps the code should be updated to include comments to explain this better).

To make the backup, we create a hard link to the original hosts file...and then eventually the original host file gets overwritten by renaming a file on top of it. The end result should be that the backup file contains the old contents. The link() call is used because it provides a way to atomically create a copy of the file. Making a copy of the file atomically might be useful if another process is editing hosts or if another instance of sshuttle is doing so.

Using rename() to overwrite the old hosts file with the new one is also done to copy the data into /etc/hosts atomically. If it isn't written atomically, other processes doing DNS lookups might not behave correctly while sshuttle is editing the file. Similarly, it would be possible to have two instances of sshuttle trying to write the file at the same time.

And, although those two steps are atomic, it is possible that differences in timing would cause some data to get lost temporarily since we can't read the contents of the file, add our entries, and write all in an atomic fashion. However, my understanding of this code is that this problem would sort itself out with enough time as long as the timing conflict doesn't happen repeatedly and more hosts get added.

The best solution might be to use the existing technique and, if failure, fallback to the technique in the PR. We might want to notify the user that they might encounter trouble running multiple copies of sshuttle simultaneously. Either way, more comments in the code would improve the readability. I'm open to further suggestions. I don't use the feature that overwrites /etc/hosts and I don't typically run multiple copies of sshuttle simultaneously.

@skuhl
Copy link
Contributor

skuhl commented May 18, 2022

I implemented an alternative fix that uses nikatlas's approach as a fallback, and prints a warning. I can probably send a new PR request tomorrow once I test it a little more.

I noticed a couple more things about this code and this PR: We default to setting /etc/hosts to 600 permissions instead of 644 in the unlikely situation where stat() on the original file fails. I believe we can use shutil.move() to move the file instead of reading/writing as is done in this PR. This PR doesn't delete the temporary file after we are done using it.

@nikatlas nikatlas force-pushed the fix-docker-container-cross-device-link-error branch 3 times, most recently from 033e2bd to 359369a Compare May 19, 2022 13:58
@nikatlas
Copy link
Contributor Author

nikatlas commented May 19, 2022

@skuhl
I pushed an enhanced version that:

  • Fallbacks to my approach
  • Removes the tmpfile.
  • Added a simple unit test.
  • Uses shutil.move tested it and works. This command supports cross-partition moves since it runs in 2 steps as i understand, copy and delete. Not sure if it copies the contents or the file.

I think the permission issue is not related and can be addressed in another PR.

@skuhl
Copy link
Contributor

skuhl commented May 19, 2022

Looks good to me! The code could use some more comments and a warning about the non-atomic situation might be helpful, but I'm happy to add those later (unless someone else beats me to it). Thanks!

@nikatlas nikatlas force-pushed the fix-docker-container-cross-device-link-error branch from e0e21c1 to a2face7 Compare May 20, 2022 09:17
@nikatlas
Copy link
Contributor Author

Added a comment and a warning message.

@nikatlas
Copy link
Contributor Author

@skuhl Should we run the checks and merge if looks good?

@skuhl
Copy link
Contributor

skuhl commented May 25, 2022

Yeah, it still looks good to me. My only remaining thought is that it might make sense to only print the warning once, and it seems like it would get printed repeatedly? I haven't tried running it in docker to test myself. Either way, I think the fix is fine as it and can only improve the situation for folks using Docker.

I can't do anything else. @brianmay merges things in and has control over the github workflow approvals.

@alwaysastudent
Copy link

alwaysastudent commented May 27, 2022

Is there a Dockerfile you used to test this fix? Could you please share? I am getting a different error unfortunately on a debian-backed image.

@nikatlas
Copy link
Contributor Author

I actually used atlantis(ghcr.io/runatlantis/atlantis) Docker image which is based on alpine which is not Debian.
Build an alpine image and should be the same.
@alwaysastudent

@nikatlas
Copy link
Contributor Author

nikatlas commented Jun 1, 2022

@alwaysastudent @brianmay Any update on this one?

@alwaysastudent
Copy link

alwaysastudent commented Jun 1, 2022

@alwaysastudent @brianmay Any update on this one?

@nikatlas -I don't maintain/contribute to this project, but I believe change helps when I tested with debian:stretch and ubuntu images. @brianmay or @skuhl might help with merging this fix?

The error/warn, quoted below, I see is from here and I think it has nothing to do with your fix

sd_bus_open_system: No such file or directory
fw: Received non-zero return code 1 when flushing DNS resolver cache.

thank you for this fix.

@brianmay
Copy link
Member

brianmay commented Jun 1, 2022

Is this ready to merge? Reading the comments, I am not really sure.

@alwaysastudent
Copy link

Is this ready to merge? Reading the comments, I am not really sure.

@brianmay I think the PR is ready to be merged. I tested this and it is helpful. @nikatlas can confirm on this.

@nikatlas
Copy link
Contributor Author

nikatlas commented Jun 2, 2022

@brianmay I have implemented most of the changes requested by @skuhl.
This PR will help those who work with Docker which uses a different mount for the /etc/hosts file.

The PR is ready to be merged.

@brianmay brianmay merged commit 93200f7 into sshuttle:master Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants