refdb_fs: refresh packed refs after loose during iteration#4981
refdb_fs: refresh packed refs after loose during iteration#4981
Conversation
|
One note: if you add back in the |
|
I poked at a few related areas, wondering if there were other instances:
|
|
It seems like this just swaps one sort of race for another, though, doesn't it?
IOW, there's no guarantee that you ever see anything like an atomic snapshot of the state of references... but this new race is superior because you at least see that a reference with a given name exists, even if you're seeing that it's outdated. I guess this is better? And I'm guessing that this is solving an actual problem that you're having in production, since you're opening this issue here. Since I missed the Git Contributor Summit this year, I would be remiss if I didn't mention that it's disappointing to spend a lot of time in libgit2 to bring the loose/packed reference storage system up to parity with git, when that parity already has so many obvious deficiencies. |
|
With my ObComplaint out of the way, thanks for investigating and opening this issue. I'm sure I'm not alone in appreciating you coming over to the dark side that is libgit2 to bring us fixes. 😀 |
|
One final piece of information to dump: if you want to recreate the race, you can run diff --git a/src/refdb_fs.c b/src/refdb_fs.c
index d582cb8a2..1e006f439 100644
--- a/src/refdb_fs.c
+++ b/src/refdb_fs.c
@@ -678,6 +678,13 @@ static int refdb_fs_backend__iterator(
if ((error = packed_reload(backend)) < 0)
return error;
+ /*
+ * This is obviously contrived, but imagine that we are scheduled out
+ * by the OS and this pack-refs is run simultaneously by some other
+ * session.
+ */
+ system("git pack-refs --all --prune");
+
iter = git__calloc(1, sizeof(refdb_fs_iter));
GIT_ERROR_CHECK_ALLOC(iter);
|
Yes, it's impossible to have an atomic view of the whole ref namespace given the file formats. And yes, it may be outdated, but for a pretty narrow definition of outdated. If you read simultaneously with somebody updating a ref or packing a ref, you are guaranteed to see either the old value or the new value, atomically (which is the best we can ever do, since there's no locking on the reading side!). But across multiple refs (e.g., say renaming X to Y) it's possible that you may see "old" from one and "new" from another (e.g., not see either X or Y, depending on the ordering). There's nothing we can do to fix that. So that's strictly better than the current behavior (which may say "does not exist", or may even report an outdated value from a stale
Yes, we've actually seen this in production, though it seems to be fairly rare (OTOH, we only detect it when it causes some catastrophic failure, and since it's a problem on the read side, we might just be silently serving bad results without realizing it). And calling this "solving" is generous, because it does not (yet) actually work. 😉
Yeah, knowing first hand how much work has gone into performance and race-fixing on the git.git side, I'm sad at the thought of having to do it all again (especially since I convinced somebody much smarter than me to do most of the heavy lifting in git.git!). The ref code in git.git is actually pretty clean these days. I wonder to what degree it would be possible to lift it (especially the low-level packed/loose iterator bits) more or less wholesale. Most of that code is from @mhagger, who I suspect would be amenable to including it in libgit2. It may be a giant rabbit hole, though. |
|
As far as licenses are concerned, I've already agreed to allow my code from Git to be incorporated into libgit2. Unfortunately my reference-related code in Git is mixed with lots of other people's code, and I haven't checked whether those people have also agreed to cross-licensing. As far as actually doing the work is concerned, I'm afraid that I don't have any spare bandwidth. I'd be happy to offer tips and encouragement, but not much more than that. Sharing more reference code between the two projects would be good for another reason: we could implement the reftable format for storing references once and have it in both projects. It has significantly better performance, uses less disk space, and gives you atomic reading and updates of references without the need for a read lock. |
I'm very much here for this. The sooner we can add a portable ref storage format, the sooner we can start the incredibly long and painful process of deprecating the loose/packed ref storage mechanisms. It sounds like you're busy with other things, @mhagger -- as am I -- but if you wanted to pair up so that we had a common implementation between libgit2 and git.git, then I'd be willing to find some time for this very noble project. |
There's a possible race between loose and packed refs when iterating
over refs. It works like this:
0. A loose ref X exists.
1. libgit2 reads the packed-refs file, which does not mention X.
2. A simultaneously-running `git pack-refs` commands packs X and
prunes the loose version.
3. libgit2 reads the loose directory, which no longer has X.
At this point, libgit2 claims that X does not exist, even though it
does. We can solve this by reading loose refs before packed. This works
because refs always migrate from loose to packed, but not the other way
around (technically an update to a packed ref creates a new loose file,
but it doesn't _delete_ the packed one, so we're guaranteed to always
see either the old or the new version).
See also git/git@98eeb09, which fixed
this same race in upstream Git, and discusses the race in more detail.
Note that just naively flipping the order of packed_load() and
iter_load_loose_paths() doesn't quite work. The latter expects the
packed refs to already be loaded, so that it can mark them with
PACKREF_SHADOWED. When doing it in reverse order, we have two options:
1. packed_reload() should take the "loose" vector and check to see
if refs it loads are shadowed. Doing this efficiently requires the
loose vector to be sorted, which might impact the output.
2. After loading both, take an additional pass over the loose vector
and mark shadowed packrefs. This is slightly less efficient than
(1) because of the extra pass, but it's big-O identical.
This patch does (2), since it seems less the risky route, and the
efficiency difference should be minimal. If we really care about
efficiency, then we should scrap the current PACKREF_SHADOWED bit
entirely, and instead do a sorted merge of the two lists.
Note that in this patch we do leave the existing PACKREF_SHADOWED
code in iter_load_loose_paths(). If the packed-refs file has already
been loaded, then that code would kick in (and our subsequent
packed-load would be a noop, as we only reload the file if the timestamp
indicates it has changed).
When checking if a ref exists, we load the packed-refs file and then return true if the ref is found either loose or in the packed-refs cache. This is subject to the same race described in the previous commit, where a simultaneous pack-refs may migrate a ref from loose to packed and cause us to see neither. We can fix it by making sure we load packed-refs only after checking the loose store. As a bonus, this is a minor optimization: we'll avoid even looking at the packed-refs file if we see a loose version.
e99b2de to
5df823f
Compare
|
Noble projects aside, I think it's worth fixing this in the short term if we can. And I think I've got it working, and this is ready for real review.
That's not exactly right. It does keep two separate lists, but when creating the "loose" list, it consults the "packed" list to mark any refs there as "shadowed". When flipping the order of load, we have to do the reverse (or take a separate pass to resolve the shadowing -- see the discussion in 83d4060).
This one was easy to fix, and is another instance of the same race, so I handled it in 5df823f.
I didn't fix this one. It's really orthogonal to this PR. And while in theory the fix is simple (flip the order of the |
The problem here is not due to our Correct would be It should be easily possible to reverse this, though: instead of setting the flag when loading loose refs, we should be able to set it when loading packed refs by searching through the sorted loose refs vector. |
|
Please see #4984 for a different approach, fixing the above ordering dependency. I didn't yet create proper commit messages, so please regard it as RFC |
I have a PR around renaming/deleting which might fix it already (known as #4301, #4292, PR at #4316). I'd be interested in doing the "noble project" thing, since I've already looked pretty hard at that code, and having a shared implementation would indeed be ❤️ (especially the reftable thing). |
|
@pks-t Yeah, I think we crossed comments/pushes. Looking at your solution in #4984, I think it's roughly the same approach as what I have here, with the exception that I "resolve" the shadow flag with a loop immediately after loading the loose/packed refs into memory. Whereas you do it progressively as the caller actually iterates. They're big-O equivalent, but yours may be slightly faster in practice due to piggy-backing on the existing iteration and using the lockless copy of the cache. I'm happy if either solution gets merged. |
|
@tiennou If I'm reading your #4316 right, I don't think it solves the particular problem in |
Seems so, sorry for the duplicated work.
There are some additional cleanups in #4984 to prepare the actual fix, which is why I'd personally prefer if that one got merged. |
OK. I'll close this then, and will assume you can shepherd your PR through to getting merge (FWIW, it looks fine to me, but I am also pretty far from an expert on libgit2 😉). |
There's a possible race between loose and packed refs when iterating over refs. It works like this:
A loose ref X exists.
libgit2 reads the packed-refs file, which does not mention X.
A simultaneously-running
git pack-refscommands packs X and prunes the loose version.libgit2 reads the loose directory, which no longer has X.
At this point, libgit2 claims that X does not exist, even though it does. We can solve this by reading loose refs before packed. This works because refs always migrate from loose to packed, but not the other way around (technically an update to a packed ref creates a new loose file, but it doesn't delete the packed one, so we're guaranteed to always see either the old or the new version).
See also git/git@98eeb09, which fixed this same race in upstream Git, and discusses the race in more detail.
The commit here unfortunately breaks a bunch of tests! The problem seems to be that one cannot naively swap the order of the loose and packed loads. I think the code does not actually understand the precedence between "loose" and "packed", but rather just loads the packed refs and then overlays the loose ones on top in a last-one-wins fashion. By flipping the order, we need a first-one-wins strategy. I don't see a way to easily do that with thegit_sortedcachedata structure.So until that is sorted, this PR is garbage; but I wanted to open it to identify the problem and show what the general form of the solution should look like.[I've force-pushed up a non-broken version].