Conversation
8f53ad6 to
83ceed9
Compare
src/refdb_reftable.c
Outdated
| &entry->oid_old, &entry->oid_cur, | ||
| NULL, entry->msg)) < 0) | ||
| goto out; | ||
| } |
There was a problem hiding this comment.
I'll probably need to deduplicate pre-existing reflog entries. With filesystem-based reflog, we're rewriting the complete reflog every time, I think, but with reftables I'll need to figure out which entries are new.
There was a problem hiding this comment.
yes, I think you are right. (I don't understand why you get duplicate entries, though.)
There was a problem hiding this comment.
This is mostly about git_reflog semantics in libgit2. When you pass it to the filesystem-based reflog, you need to have the complete reflog history in order to re-write it in full. So with reftable I'll need to see which entry is the last one that we've already written to disk and only start writing after that one.
There was a problem hiding this comment.
ok. In reftable, the log entries are stored newest-first, so a simple seek to the refname will tell you the last entry written.
hanwen
left a comment
There was a problem hiding this comment.
filesystem-based one, our support for that is... lacking. We have
dozens of places where we assume $GITDIR/HEAD exists, we write
references like MERGE_HEAD, ORIG_HEAD manually without consulting the
refdb, lack support to create new repos with a different refdb as we
create the filesystem structures like $GITDIR/refs and $GITDIR/HEAD
directly in repository.c without consulting the refdb at all. We'll
have to carefully vet our code to remove all accesses to anything
refish outside of the refdb and instead call into the refdb layer.
Yes, I think git-core also suffers from this.
It seems to me that reftables currently don't support a few operations
that we expect from any given refdb implementation. Most importantly I
couldn't figure out how to delete and rename a reflog and how to
implement explicit locking (e.g. what the file backend does when
creating .git/refs/heads/master.lock).
For deleting a complete reflog, you'd have to delete all the entries
individually. For renaming a reflog, the suggestion is to record a
deletion + creation. This would let you retractively understand that
the rename took place.
Why do you want explicit locking? The API is written as if you'd be
doing transactions, i.e. you try to make the change, and if you get a
LOCK_ERROR, you'll have to try again.
I found it quite hard to get into reftables as the interface exposes
about quite a lot of the low-level details. This is probably my own
fault as I usually tend to dive heads-deep into anything new without
looking left and right first. But one resource that would've helped
quite a lot is something like high-level examples that represent
common git operations. I feel like I've got it right by now, but I'd
be happy to get feedback on this to see how wrong I am.
the best resource is probably code
gitgitgadget/git#539, which adds support to
git-core.
Are there any testvectors around? Like a repository that one may embed
into libgit2 itself that uses reftables to verify we behave as
expected. I couldn't get a hold of any and as it's currently
non-trivial to use reftables anywhere I wasn't able to come up with
tests yet. As a result, my current code will probably fail very
quickly.
The easiest is to compile JGit,
bazel build org.eclipse.jgit.pgm:pgm
this leaves a binary in
bazel-bin/org.eclipse.jgit.pgm/jgit
then you can do
jgit init
jgit convert-ref-storage --format reftable
There currently aren't test vectors, but it's a good suggestion. I
could add some to the repository.
At some places I feel like there's high potential for namespacing
issues. E.g. struct iterator is highly generic and may trivially clash
with any preexisting types. It would be nice to use namespace prefixes
for all exported symbols.
I want to do this, but I've been putting it off, because I don't have
an IDE that will help me with renaming, and doing it by hand is
somewhat labor intensive.
I think it would help if the C files were layed out in a more
"standard" library way. Most importantly, separate out the
"reftable.h" header file from the rest into its own folder so that one
can add it to the include path without having to care about
accidentally overriding "tree.h" that's part of libgit2 itself.
I'll get to that.
I didn't find any way to override memory allocation functions. I think
we've discussed that at the Contributor's summit already.
Anyway, thanks a lot for working on reftables @hanwen, this is really
exciting! I'd be happy to get feedback on my current implementation,
which basically represents how I think it should work. Chances are
high I've got it completely wrong, though ;)
you're welcome! I am glad to see it get uptake :)
src/refdb_reftable.c
Outdated
| &entry->oid_old, &entry->oid_cur, | ||
| NULL, entry->msg)) < 0) | ||
| goto out; | ||
| } |
There was a problem hiding this comment.
yes, I think you are right. (I don't understand why you get duplicate entries, though.)
Because we're kind of in a special place with libgit2. With filesystem based refs, you can explicitly lock references to keep other applications from updating them by themselves. As libgit2 may be part of a long-running application (e.g. an IDE, but it could be anything) where the author of that application may have the need to lock references for a longer time, we support a transactional API that keeps other Git applications from updating them. While it's true that you would get an error anyway with the reftable API, this kind of opportunistic locking wouldn't help in such a usecase. Thanks for your answers, I'll make sure to update the PR soonish with your feedback! |
ee17225 to
0bfe212
Compare
|
On Wed, Mar 25, 2020 at 3:09 PM Patrick Steinhardt <notifications@github.com>
wrote
-
At some places I feel like there's high potential for namespacing
issues. E.g. struct iterator is highly generic and may trivially clash
with any preexisting types. It would be nice to use namespace prefixes for
all exported symbols.
I fixed this.
-
-
I think it would help if the C files were layed out in a more
"standard" library way. Most importantly, separate out the "reftable.h"
header file from the rest into its own folder so that one can add it to the
include path without having to care about accidentally overriding "tree.h"
that's part of libgit2 itself.
I moved reftable.h in a different directory.
-
I didn't find any way to override memory allocation functions. I think
we've discussed that at the Contributor's summit already.
I added this. reftable_set_malloc().
…--
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
|
5944c49 to
65c0d65
Compare
| #include <sys/time.h> | ||
| #include <sys/types.h> | ||
| #include <unistd.h> | ||
| #include <zlib.h> |
There was a problem hiding this comment.
This currently doesn't compile on Windows due to <sys/time.h> not being found. I think it should work just fine if changing this to <time.h>. Instead, we could probably have the equivalent of REFTABLE_IN_GITCORE, but I think I'd rather prefer the header to be agnostic of libgit2.
There was a problem hiding this comment.
On linux, this would yield:
c/stack.c:226:12: error: implicit declaration of function 'gettimeofday' [-Werror=implicit-function-declaration]
226 | int err = gettimeofday(&deadline, NULL);
| ^~~~~~~~~~~~
|
Thanks a lot for addressing my concerns, @hanwen! I've updated the PR to include all of them. |
|
On Fri, Mar 27, 2020 at 07:05:28AM -0700, Han-Wen Nienhuys wrote:
@hanwen commented on this pull request.
On linux, this would yield:
c/stack.c:226:12: error: implicit declaration of function 'gettimeofday' [-Werror=implicit-function-declaration]
226 | int err = gettimeofday(&deadline, NULL);
| ^~~~~~~~~~~~
Too bad. So the reftable would need to check for some Windows-specific
defines and adjust accordingly.
By the way, getting back to a different topic: `HEAD` and its various
other incarnations `*_HEAD`. Are those expected to be managed by
reftable, too? It would make sense for some of them, but on the other
hand we have some that are references, but have a special format. The
most common one that comes to my mind is `FETCH_HEAD`, but there are
others, too.
|
|
For the ref-like objects, this is what JGit does In JGit, HEAD is expected to be managed by RefDatabase. Unfortunately, nobody has been able to articulate to me what the expectations/abstractions for this in git-core should do. Can you do transactional updates with HEAD and another branch together in libgit2? |
|
On Sat, Mar 28, 2020 at 08:18:02AM -0700, Han-Wen Nienhuys wrote:
In JGit, HEAD is expected to be managed by RefDatabase. Unfortunately,
nobody has been able to articulate to me what the
expectations/abstractions for this in git-core should do. Can you do
transactional updates with HEAD and another branch together in
libgit2?
Yes, this can be done via our transaction API. It does require the
reference backend to support locking though, which is not currently
possible with the reftable refdb.
I don't know if the issue is that reftables do not support locking or
whether our own assumptions were too centered on file refs and not
really on the transactional part. Anyway, this is nothing we can easily
change as it's part of our public interface.
|
|
Hey Patrick, can you check out https://godoc.org/github.com/google/reftable#Addition and https://godoc.org/github.com/google/reftable#Stack.NewAddition would that work as an API for you? |
|
Just to make sure I understand it, this is how I'd imagine it to look like in C: /* Create a new transactional addition for the stack */
reftable_stack_new_addition(&addition, stack);
/* Write a set of ref/log records via the usual writer callback */
reftable_addition_add(addition, reftable_backend_write_records, &args1);
/* Writing multiple times is okay for a single addition */
reftable_addition_add(addition, reftable_backend_write_records, &args2);
/* Now commit changes to persist it to disk and update the stack */
reftable_addition_commit(addition);If I understand it correctly, then If my above assumptions are correct then the API should work for us, thanks a lot! |
deps/reftable/include/reftable.h
Outdated
| /* 4-byte identifier ("sha1", "s256") of the hash. | ||
| * Defaults to SHA1 if unset | ||
| */ | ||
| uint32_t hash_id; |
There was a problem hiding this comment.
I think this doesn't quite work as advertised. I've set up reftable_write_options opts ={0}, where I'd expect it to just default to "sha1". But reftable_new_merged_table() fails when comparing r->hash_id != hash_id.
There was a problem hiding this comment.
It would also be helpful to have some macros available so that one may set up the expected values without having to define them locally.
There was a problem hiding this comment.
There was a problem hiding this comment.
Sorry, forgot to check in for comments. The following fails if I do not set up options.hash_id:
const char *dir, *table;
struct reftable_write_options options = {0};
struct reftable_ref_record ref = {0};
struct reftable_stack *stack;
dir = "/path/to/testrepo-reftable/.git/reftable";
table = "/path/to/testrepo-reftable/.git/reftable/tables.list";
/* This line is required, otherwise `reftable_new_stack` fails. */
options.hash_id = 0x73686131;
cl_must_pass(reftable_new_stack(&stack, dir, table, options));
cl_must_pass(reftable_stack_reload(stack));
cl_must_pass(reftable_stack_read_ref(stack, "refs/heads/master", &ref));There was a problem hiding this comment.
Note that I'm currently not using an up to date version of the reftable library as I first want to try to get what I have completely functioning (which it mostly is already, but there's still some issues with writing for preexisting branches).
There was a problem hiding this comment.
which version are you using?
btw, if you are writing existing branches, make sure you sure you set an update_index that is beyond the update_index of the existing branch.
65c0d65 to
7b53cfc
Compare
|
I've added a test vector and was able to get a first test to pass with the reftable backend, which reads a reference and then renames it. More tests and bug fixes will follow. I've fixed the refdb detection to now consult "extensions.refStorage" as mandated by the documentation. I've noticed that another requirement is to get repository format version 1 implemented. There's a temporary commit in here that just removes the error check. |
|
google/reftable@7aabfe0 for the transaction api in C. |
|
BTW, - I would be grateful for any feedback about reftable on windows. I don't have windows, and I'd like to keep it that way. |
7b53cfc to
4c2ab8f
Compare
|
Rebased on current master, which now includes support for repository format v1 and extensions.
I may dig up my Windows VM as soon as I've got the backend working on my own machine. Until then there's currently only the CI that provides any kind of feedback on Windows. |
4c2ab8f to
4ab39b1
Compare
65fc25f to
70fd8b1
Compare
|
@pks-t - does libgit2 already have code to detect file / directory conflicts? Looks like I may need to write and/or refactor that to check transactions, and wondering if that makes sense to put into the reftable library. |
|
On Mon, Apr 27, 2020 at 11:19:38AM -0700, Han-Wen Nienhuys wrote:
@pks-t - does libgit2 already have code to detect file / directory
conflicts? Looks like I may need to write and/or refactor that to
check transactions, and wondering if that makes sense to put into the
reftable library.
No, not yet, but it was on my todo list. I wouldn't complain if it was
moved into reftable, though.
Didn't have much time lately due to the world being even crazier than it
used to be, but I hope to tackle the missing pieces soonish again.
|
70fd8b1 to
6276b81
Compare
|
Rebased on latest master, updated the reftable library to 66d38fe (C: move block_iter_next comment to header, 2020-04-28), fixed a set of bugs in the reflog implementation and improved error handling in most places. All in all the PR is close to being finished, but I've stumbled over one weird bug where compressing of the reftable stack seems to fail after writing symbolic references. No idea yet whether that's a fault of mine or in the reftable library, but I'll investigate. |
There are a bunch of tests where we read or write references via the filesystem directly. This only works with the "files" backend, but naturally breaks if we supported any other reference format. Refactor these tests to instead use the refdb to access those.
When testing conditional includes we overwrite the repository's config file with the relevant conditions. This causes us to fully overwrite all repository configuration, including the repository format version and any extensions. While the test repository used in this test does not have any extensions, we will add a reftable-enabled repository that does rely on the "refStorage" extension eventually. Fix this by only modifying the relevant config keys.
We have a bunch of checks for properties of the "files" reference
backend:
- Whether a specific reference has been packed or whether it still
exists as a loose reference.
- Whether empty ref directories get pruned.
- Whether we properly fsync data to disk.
These checks continue to be sensible for that backend, but for any other
backend they plain don't work. Adapt the tests so that we only run them
in case the repository uses the "files" backend.
When initializing the "files" refdb we read a couple of values from the Git configuration. Unfortunately, this causes a chicken-and-egg problem when reading configuration with "includeIf.onbranch" conditionals: we need to instantiate the refdb to evaluate the condition, but we need to read the configuration to initialize the refdb. We currently work around the issue by reading the "HEAD" file directly when evaluating the conditional include. But while that works with the "files" backend, any other backends that store "HEAD" anywhere else will break. Prepare for a fix by deferring reading the configuration. We really only need to be able to execute `git_refdb_lookup()`, so all we need to ensure is that we can look up a branch without triggering any config reads.
With the preceding commit we have refactored the "files" backend so that it can be both instantiated and used to look up a reference without reading any configuration. With this change in place we don't cause infinite recursion anymore when using the refdb to evaluate "onbranch" conditions. Refactor the code to use the refdb to look up "HEAD". Note that we cannot use `git_reference_lookup()` here, as that function itself tries to normalize reference names, which in turn involves reading the Git configuration. So instead, we use the lower-level `git_refdb_lookup()` function, as we don't need the normalization anyway.
Expose a function to read a loose reference. This function will be used in a subsequent commit to read pseudo-refs on the generic refdb layer.
Regardless of which reference storage format is used, pseudorefs will always be looked up via the filesystem as loose refs. This is because pseudorefs do not strictly follow the reference format and may contain additional metadata that is not present in a normal reference. We don't honor this in `git_reference_lookup()` though but instead defer to the refdb to read such references. This obviously works just fine with the "files" backend, but any other backend would have to grow custom logic to handle reading pseudorefs. Refactor `git_reference_lookup_resolved()` so that it knows to always read pseudorefs as loose references. This allows refdb implementations to not care about pseudoref handling at all.
With a recent upgrade to a newer version of MSVC we now get a bunch of warnings when two operands use different enum types. While sensible in theory, in practice we have a couple of non-public enums that extend public enums, like for example with `GIT_SUBMODULE_STATUS`. Let's for now disable this warning to unblock our builds. The alternative would be to add casts all over the place, but that feels rather cumbersome.
392cffb to
4884111
Compare
|
In the current state all tests now pass on Linux. There's a couple patches I've sent upstream to fix issues with the reftable library in Git, but those are rather minor. Still needed is an adjustment of "deps/reftable/system.c" to work with non-Linux systems as well as CI changes to actually run tests with reftables enabled. I should be able to wrap that up next week, I guess. |
|
There are also a bunch of PRs that are dependencies in libgit2:
|
Import the reftable library from commit f1228cd12c1 (reftable: make REFTABLE_UNUSED C99 compatible, 2025-05-29). This is an exact copy of the reftable library. The library will be wired into libgit2 over the next couple of commits.
While the reftable library is mostly decoupled from the Git codebase, some specific functionality intentionally taps into Git subsystems. To make porting of the reftable library easier all of these are contained in "system.h" and "system.c". Reimplement those compatibility shims so that they work for libgit2.
The reftable writer accidentally uses the Git-specific `QSORT()` macro. This macro removes the need for the caller to provide the element size, but other than that it's mostly equivalent to `qsort()`. Replace the macro accordingly. This incompatibility is a bug in Git that needs to be fixed upstream.
While perfectly legal, older compiler toolchains complain when
zero-initializing structs that contain nested structs with `{0}`:
/home/libgit2/source/deps/reftable/stack.c:862:35: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
struct reftable_addition empty = REFTABLE_ADDITION_INIT;
^~~~~~~~~~~~~~~~~~~~~~
/home/libgit2/source/deps/reftable/stack.c:707:33: note: expanded from macro 'REFTABLE_ADDITION_INIT'
#define REFTABLE_ADDITION_INIT {0}
^
Silence this warning by using `{{0}}` instead.
Wire up the reftable library with CMake. At the current point in time the library is not yet plugged into libgit2 itself.
While it is possible to pass flags to `reftable_stack_new_addition()` to alter the behaviour of a stack addition, it is not possible to do so via `reftable_stack_add()`. Plug this gap.
4884111 to
eb9ed40
Compare
Implement the reftable backend that is used to read and write reftables. The backend is not yet used anywhere after this commit.
Wire up the "reftable" storage format via the newly implemented reftable backend.
Generate test resources for reftables. These resources are basically the
Git repositories we already have, but converted to use the "reftable"
format. For most of the part, this conversion is done by executing `git
refs migrate`.
A couple notes:
- This require a recent Git upstream version with not-yet-upstreamed
patches due to a bug in `git refs migrate` with reflogs.
- The migration command does not yet support repositories with
worktrees. Those were converted by first removing the worktrees,
migrating the refs and then recreating them.
- The HEAD_TRACKER reference in testrepo.git is not recognized as a
root ref and is thus not automatically migrated.
- testrepo.git has an empty reflog for refs/heads/with-empty-log that
does not get migrated.
Wire up reftable tests so that they can be executed by setting the `CLAR_REF_FORMAT` environment variable. This only catches tests that use `cl_git_sandbox_init()`, but that should cover most of our tests. So this infrastructure isn't perfect, but for now it's good enough. We may want to iterate on it in the future.
eb9ed40 to
22b7bd2
Compare
|
Closing in favor of #7117, as I'm working on this with my GitLab account right now. |
So this is a very rough reftables refdb implementation for libgit2. It's main intent is to start a discussion, see what prerequisites we have in libgit2 to properly implement refdbs other than refdb_fs and whether I'm using the reftable API correctly. Thoughts:
While we theoretically support reference databases other than the filesystem-based one, our support for that is... lacking. We have dozens of places where we assume $GITDIR/HEAD exists, we write references like MERGE_HEAD, ORIG_HEAD manually without consulting the refdb, lack support to create new repos with a different refdb as we create the filesystem structures like $GITDIR/refs and $GITDIR/HEAD directly in repository.c without consulting the refdb at all. We'll have to carefully vet our code to remove all accesses to anything refish outside of the refdb and instead call into the refdb layer.
It seems to me that reftables currently don't support a few operations that we expect from any given refdb implementation. Most importantly I couldn't figure out how to delete and rename a reflog and how to implement explicit locking (e.g. what the file backend does when creating .git/refs/heads/master.lock).
I found it quite hard to get into reftables as the interface exposes about quite a lot of the low-level details. This is probably my own fault as I usually tend to dive heads-deep into anything new without looking left and right first. But one resource that would've helped quite a lot is something like high-level examples that represent common git operations. I feel like I've got it right by now, but I'd be happy to get feedback on this to see how wrong I am.
Are there any testvectors around? Like a repository that one may embed into libgit2 itself that uses reftables to verify we behave as expected. I couldn't get a hold of any and as it's currently non-trivial to use reftables anywhere I wasn't able to come up with tests yet. As a result, my current code will probably fail very quickly.
At some places I feel like there's high potential for namespacing issues. E.g.
struct iteratoris highly generic and may trivially clash with any preexisting types. It would be nice to use namespace prefixes for all exported symbols.I think it would help if the C files were layed out in a more "standard" library way. Most importantly, separate out the "reftable.h" header file from the rest into its own folder so that one can add it to the include path without having to care about accidentally overriding "tree.h" that's part of libgit2 itself.
I didn't find any way to override memory allocation functions. I think we've discussed that at the Contributor's summit already.
Anyway, thanks a lot for working on reftables @hanwen, this is really exciting! I'd be happy to get feedback on my current implementation, which basically represents how I think it should work. Chances are high I've got it completely wrong, though ;)