reftable library by hanwen · Pull Request #847 · git/git

hanwen · 2020-09-16T18:42:20Z

This splits the giant commit from gitgitgadget#539 into a series of smaller commits, which build and have unittests.

Changes relative to last series: version 19 Apr 2021 (tip: b8729fe)

drop EINVAL for broken refs.
fix 2 UBSAN warnings. Thanks to Andrzej Hunt for reporting.
make tests pass against master (use oidread)
handle worktree/xx/refname refs.
handle main-worktree/ref
fix double free in branch renaming
handle relative paths (used in git upload-pack invocation) for ref backend creation.

cc: Han-Wen Nienhuys hanwen@google.com
cc: Jeff King peff@peff.net
cc: Ramsay Jones ramsay@ramsayjones.plus.com
cc: Jonathan Nieder jrnieder@gmail.com
cc: Johannes Schindelin Johannes.Schindelin@gmx.de
cc: Jonathan Tan jonathantanmy@google.com
cc: Josh Steadmon steadmon@google.com
cc: Emily Shaffer emilyshaffer@google.com
cc: Patrick Steinhardt ps@pks.im
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Felipe Contreras felipe.contreras@gmail.com
cc: Derrick Stolee stolee@gmail.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Han-Wen Nienhuys hanwen@google.comg
cc: "brian m. carlson" sandals@crustytoothpaste.net

hanwen · 2020-09-16T19:09:19Z

/submit

gitgitgadget-git · 2020-09-16T19:10:39Z

Submitted as pull.847.git.git.1600283416.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-git-847/hanwen/libreftable-v1

To fetch this version to local tag pr-git-847/hanwen/libreftable-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-git-847/hanwen/libreftable-v1

gitgitgadget-git · 2020-09-16T23:58:03Z

This branch is now known as hn/reftable.

gitgitgadget-git · 2020-09-16T23:58:04Z

This patch series was integrated into seen via c00808e.

gitgitgadget-git · 2020-09-17T01:09:25Z

This patch series was integrated into seen via 4f1956c.

gitgitgadget-git · 2020-09-17T21:22:49Z

This patch series was integrated into seen via f9f83c1.

gitgitgadget-git · 2020-09-17T22:20:16Z

This patch series was integrated into seen via 488ad6a.

gitgitgadget-git · 2020-09-18T00:59:16Z

This patch series was integrated into seen via ab283b9.

gitgitgadget-git · 2020-09-18T22:12:20Z

This patch series was integrated into seen via a0bd498.

gitgitgadget-git · 2020-09-19T01:46:53Z

This patch series was integrated into seen via 0417227.

gitgitgadget-git · 2020-09-19T22:51:38Z

This patch series was integrated into seen via e227917.

gitgitgadget-git · 2020-09-20T01:04:30Z

Makefile

@@ -720,6 +720,7 @@ TEST_BUILTINS_OBJS += test-read-cache.o
 TEST_BUILTINS_OBJS += test-read-graph.o


On the Git mailing list, Junio C Hamano wrote (reply to this):

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > +static void reftable_log_record_key(const void *r, struct strbuf *dest) > +{ > + const struct reftable_log_record *rec = > + (const struct reftable_log_record *)r; > + int len = strlen(rec->refname); > + uint8_t i64[8]; > + uint64_t ts = 0; > + strbuf_reset(dest); > + strbuf_add(dest, (uint8_t *)rec->refname, len + 1); > + > + ts = (~ts) - rec->update_index; > + put_be64(&i64[0], ts); > + strbuf_add(dest, i64, sizeof(i64)); > +} We seem to be getting reftable/record.c: In function 'reftable_log_record_key': reftable/record.c:578:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] put_be64(&i64[0], ts); ^ CC reftable/refname.o when this series is merged to 'seen'. cf. e.g. https://travis-ci.org/github/git/git/jobs/728655368 Thanks.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Sun, Sep 20, 2020 at 3:00 AM Junio C Hamano <gitster@pobox.com> wrote: > > +static void reftable_log_record_key(const void *r, struct strbuf *dest) > > +{ > > + const struct reftable_log_record *rec = > > + (const struct reftable_log_record *)r; > > + int len = strlen(rec->refname); > > + uint8_t i64[8]; > > + uint64_t ts = 0; > > + strbuf_reset(dest); > > + strbuf_add(dest, (uint8_t *)rec->refname, len + 1); > > + > > + ts = (~ts) - rec->update_index; > > + put_be64(&i64[0], ts); > > + strbuf_add(dest, i64, sizeof(i64)); > > +} > > We seem to be getting > > reftable/record.c: In function 'reftable_log_record_key': > reftable/record.c:578:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] > put_be64(&i64[0], ts); > ^ > CC reftable/refname.o > > when this series is merged to 'seen'. Thanks for bringing this up. I did see this before, but I've been unable to reproduce this locally, and I forgot about it. The problem is actually triggered by the Git-provided put_be64() (which appears unused in the Git source code). The definition #define put_be64(p, v) do { *(uint64_t *)(p) = htonll(v); } while (0) is trying to reinterpret a char* as a uint64_t* , which is illegal in strict aliasing rules? I originally had +void put_be64(uint8_t *out, uint64_t v) +{ + int i = sizeof(uint64_t); + while (i--) { + out[i] = (uint8_t)(v & 0xff); + v >>= 8; + } +} in my reftable library, which is portable. Is there a reason for the magic with htonll and friends? -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

On the Git mailing list, Jeff King wrote (reply to this):

On Mon, Sep 21, 2020 at 03:13:05PM +0200, Han-Wen Nienhuys wrote: > The problem is actually triggered by the Git-provided put_be64() > (which appears unused in the Git source code). The definition > > #define put_be64(p, v) do { *(uint64_t *)(p) = htonll(v); } while (0) > > is trying to reinterpret a char* as a uint64_t* , which is illegal in > strict aliasing rules? Ah, thanks for finding that. I think this resolves the mystery I had around aliasing warnings with put_be32() in f39ad38410 (fast-export: use local array to store anonymized oid, 2020-06-25). I got confused because when I looked at the implementation of put_be32(), I stupidly looked at the NO_ALIGNED_LOADS fallback code, which indeed does not alias. But the one we use on x86 does. I thought this would be OK because C allows aliasing through "char *" pointers. But I think it may only go the other way. I.e., you can type-pun a uint64_t as a char, but not the other way around. But then why doesn't basically every use of put_be32() and put_be64() complain? > I originally had > > +void put_be64(uint8_t *out, uint64_t v) > +{ > + int i = sizeof(uint64_t); > + while (i--) { > + out[i] = (uint8_t)(v & 0xff); > + v >>= 8; > + } > +} > > in my reftable library, which is portable. Is there a reason for the > magic with htonll and friends? Presumably it was thought to be faster. This comes originally from the block-sha1 code in 660231aa97 (block-sha1: support for architectures with memory alignment restrictions, 2009-08-12). I don't know how it compares in practice, and especially these days. Our fallback routines are similar to an unrolled version of what you wrote above. -Peff

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Sep 24, 2020 at 03:21:51AM -0400, Jeff King wrote: > > I originally had > > > > +void put_be64(uint8_t *out, uint64_t v) > > +{ > > + int i = sizeof(uint64_t); > > + while (i--) { > > + out[i] = (uint8_t)(v & 0xff); > > + v >>= 8; > > + } > > +} > > > > in my reftable library, which is portable. Is there a reason for the > > magic with htonll and friends? > > Presumably it was thought to be faster. This comes originally from the > block-sha1 code in 660231aa97 (block-sha1: support for architectures > with memory alignment restrictions, 2009-08-12). I don't know how it > compares in practice, and especially these days. > > Our fallback routines are similar to an unrolled version of what you > wrote above. We should be able to measure it pretty easily, since block-sha1 uses a lot of get_be32/put_be32. I generated a 4GB random file, built with BLK_SHA1=Yes and -O2, and timed: t/helper/test-tool sha1 <foo.rand Then I did the same, but building with -DNO_UNALIGNED_LOADS. The latter actually ran faster, by a small margin. Here are the hyperfine results: [stock] Time (mean ± σ): 6.638 s ± 0.081 s [User: 6.269 s, System: 0.368 s] Range (min … max): 6.550 s … 6.841 s 10 runs [-DNO_UNALIGNED_LOADS] Time (mean ± σ): 6.418 s ± 0.015 s [User: 6.058 s, System: 0.360 s] Range (min … max): 6.394 s … 6.447 s 10 runs For casual use as in reftables I doubt the difference is even measurable. But this result implies that perhaps we ought to just be using the fallback version all the time. -Peff

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes: > Then I did the same, but building with -DNO_UNALIGNED_LOADS. The latter > actually ran faster, by a small margin. Here are the hyperfine results: > > [stock] > Time (mean ± σ): 6.638 s ± 0.081 s [User: 6.269 s, System: 0.368 s] > Range (min … max): 6.550 s … 6.841 s 10 runs > > [-DNO_UNALIGNED_LOADS] > Time (mean ± σ): 6.418 s ± 0.015 s [User: 6.058 s, System: 0.360 s] > Range (min … max): 6.394 s … 6.447 s 10 runs > > For casual use as in reftables I doubt the difference is even > measurable. But this result implies that perhaps we ought to just be > using the fallback version all the time. I like that one. One less configurable knob that makes us execute different codepaths is one less thing to be worried about.

gitgitgadget-git · 2020-09-21T05:15:39Z

This patch series was integrated into seen via 7b0ab69.

gitgitgadget-git · 2020-09-21T13:16:08Z

User Han-Wen Nienhuys <hanwen@google.com> has been added to the cc: list.

gitgitgadget-git · 2020-09-21T23:12:06Z

This patch series was integrated into seen via d1726fb.

gitgitgadget-git · 2020-09-22T06:20:42Z

This patch series was integrated into seen via 0c5022d.

gitgitgadget-git · 2020-09-22T17:58:59Z

This patch series was integrated into seen via 3b8fedc.

gitgitgadget-git · 2020-09-22T21:27:45Z

This patch series was integrated into seen via f5586d7.

gitgitgadget-git · 2020-09-23T18:01:14Z

This patch series was integrated into seen via 75d9f0d.

gitgitgadget-git · 2020-09-24T07:23:28Z

User Jeff King <peff@peff.net> has been added to the cc: list.

gitgitgadget-git · 2020-09-25T00:22:29Z

This patch series was integrated into seen via 4ec7939.

gitgitgadget-git · 2020-09-25T21:05:49Z

This patch series was integrated into seen via 0875b0a.

gitgitgadget-git · 2020-09-25T23:01:18Z

This patch series was integrated into seen via 8bb79dc.

gitgitgadget-git · 2020-09-27T00:51:41Z

This patch series was integrated into seen via 503b919.

gitgitgadget-git · 2020-09-27T22:15:53Z

This patch series was integrated into seen via 9312a58.

gitgitgadget-git · 2020-09-28T06:25:00Z

This patch series was integrated into seen via 4d8ca35.

gitgitgadget-git · 2020-09-28T23:14:28Z

This patch series was integrated into seen via 8cb3f1f.

gitgitgadget-git · 2020-09-29T22:03:05Z

This patch series was integrated into seen via 98d584a.

The reftable format is structured as a sequence of block. Within a block, records are prefix compressed, with an index of offsets for fully expand keys to enable binary search within blocks. This commit provides the logic to read and write these blocks. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

The reftable format includes support for an (OID => ref) map. This map can speed up visibility and reachability checks. In particular, various operations along the fetch/push path within Gerrit have ben sped up by using this structure. The map is constructed with help of a binary tree. Object IDs are hashes, so they are uniformly distributed. Hence, the tree does not attempt forced rebalancing. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

This supports reading a single reftable file. The commit introduces an abstract iterator type, which captures the usecases both of reading individual refs, and iterating over a segment of the ref namespace. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

With support for reading and writing files in place, we can construct files (in memory) and attempt to read them back. Because some sections of the format are optional (eg. indices, log entries), we have to exercise this code using multiple sizes of input data Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

This is needed to create a merged view multiple reftables Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

This adds an abstract, read-only interface to the ref database. This primitive is used to construct the read view of the ref database (the read view is constructed by merging several *.ref files). It also provides the mechanism to provide a unified view of the refs in the main repository and the per-worktree refs. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

The packed/loose format has restrictions on refnames: a and a/b cannot coexist. This limitation does not apply to reftable per se, but must be maintained for interoperability. This code adds validation routines to abort transactions that are trying to add invalid names. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

provide a command-line utility for inspecting individual tables, and inspecting a complete ref database Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

For background, see Documentation/technical/reftable.txt. This introduces the file refs/reftable-backend.c containing a reftable-powered ref storage backend. It can be activated by setting GIT_TEST_REFTABLE in the environment. When GIT_TEST_REFTABLE is set, the test prerequisite !REFFILES is set. There is no option to git-init for now, as the test suite still shows failures with GIT_TEST_REFTABLE=1. Example use: see t/t0031-reftable.sh Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Junio Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <patrick.steinhardt@elego.de> Co-authored-by: Jeff King <peff@peff.net>

In our git-prompt script we strive to use Bash builtins wherever possible, because fork()-ing subshells for command substitutions and fork()+exec()-ing Git commands are expensive on some platforms. We even read and parse '.git/HEAD' using Bash builtins to get the name of the current branch [1]. However, the upcoming reftable refs backend won't use '.git/HEAD' at all, but will write an invalid refname as placeholder for backwards compatibility instead, which will break our git-prompt script. Update the git-prompt script to recognize the placeholder '.git/HEAD' written by the reftable backend (its content is specified in the reftable specs), and then fall back to use 'git symbolic-ref' to get the name of the current branch. [1] 3a43c4b (bash prompt: use bash builtins to find out current branch, 2011-03-31) Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>

This command dumps individual tables or a stack of of tables. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

* Reftable for now lacks detailed error messages for directory/file conflicts. Skip message comparisons. * Mark tests that muck with .git directly as REFFILES. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

gitgitgadget-git · 2021-07-07T01:21:05Z

There was a status update in the "Stalled" section about the branch hn/reftable on the Git mailing list:

The "reftable" backend for the refs API.

gitgitgadget-git · 2021-07-07T01:58:32Z

This patch series was integrated into seen via 92fd260.

gitgitgadget-git · 2021-07-07T01:58:33Z

This patch series was integrated into next via 0d6f79d.

gitgitgadget-git · 2021-07-08T21:28:15Z

This patch series was integrated into seen via 171a6b6.

gitgitgadget-git · 2021-07-08T22:32:09Z

There was a status update in the "Stalled" section about the branch hn/reftable on the Git mailing list:

The "reftable" backend for the refs API.

gitgitgadget-git · 2021-07-14T00:51:31Z

This patch series was integrated into seen via c9780bb.

gitgitgadget-git · 2021-07-14T00:51:32Z

This patch series was integrated into next via c9780bb.

gitgitgadget-git · 2021-07-14T00:51:32Z

This patch series was integrated into master via c9780bb.

gitgitgadget-git · 2021-07-14T00:51:36Z

Closed via c9780bb.

gitgitgadget-git bot added the seen label Sep 16, 2020

neyzo6226 approved these changes Sep 18, 2020

View reviewed changes

gitgitgadget-git bot reviewed Sep 20, 2020

View reviewed changes

hanwen and others added 17 commits July 6, 2021 15:18

reftable: write reftable files

9a916a5

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

reftable: generic interface to tables

cc18267

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

reftable: add a heap-based priority queue for reftable records

c55448a

This is needed to create a merged view multiple reftables Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

reftable: implement stack, a mutable database of reftable files.

eeaf64f

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

reftable: add dump utility

0f65aaf

provide a command-line utility for inspecting individual tables, and inspecting a complete ref database Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

Add "test-tool dump-reftable" command.

9917594

This command dumps individual tables or a stack of of tables. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

t1301: document what needs to be done for reftable

7db7791

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

t1401,t2011: parameterize HEAD.lock for REFFILES

b1ea7eb

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

t1404: annotate test cases with REFFILES

ee6def5

* Reftable for now lacks detailed error messages for directory/file conflicts. Skip message comparisons. * Mark tests that muck with .git directly as REFFILES. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

gitgitgadget-git bot added the next label Jul 7, 2021

gitgitgadget-git bot added the master label Jul 14, 2021

gitgitgadget-git bot closed this Jul 14, 2021

hanwen mentioned this pull request Jul 19, 2021

Support reftable ref backend for Git #1054

Closed

		@@ -720,6 +720,7 @@ TEST_BUILTINS_OBJS += test-read-cache.o
		TEST_BUILTINS_OBJS += test-read-graph.o

Conversation

hanwen commented Sep 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanwen commented Sep 16, 2020

Uh oh!

gitgitgadget-git bot commented Sep 16, 2020

Uh oh!

gitgitgadget-git bot commented Sep 16, 2020

Uh oh!

gitgitgadget-git bot commented Sep 16, 2020

Uh oh!

gitgitgadget-git bot commented Sep 17, 2020

Uh oh!

gitgitgadget-git bot commented Sep 17, 2020

Uh oh!

gitgitgadget-git bot commented Sep 17, 2020

Uh oh!

gitgitgadget-git bot commented Sep 18, 2020

Uh oh!

gitgitgadget-git bot commented Sep 18, 2020

Uh oh!

gitgitgadget-git bot commented Sep 19, 2020

Uh oh!

gitgitgadget-git bot commented Sep 19, 2020

Uh oh!

gitgitgadget-git bot Sep 20, 2020

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 21, 2020

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 24, 2020

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 24, 2020

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 24, 2020

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot commented Sep 21, 2020

Uh oh!

gitgitgadget-git bot commented Sep 21, 2020

Uh oh!

gitgitgadget-git bot commented Sep 21, 2020

Uh oh!

gitgitgadget-git bot commented Sep 22, 2020

Uh oh!

gitgitgadget-git bot commented Sep 22, 2020

Uh oh!

gitgitgadget-git bot commented Sep 22, 2020

Uh oh!

gitgitgadget-git bot commented Sep 23, 2020

Uh oh!

gitgitgadget-git bot commented Sep 24, 2020

Uh oh!

gitgitgadget-git bot commented Sep 25, 2020

Uh oh!

gitgitgadget-git bot commented Sep 25, 2020

Uh oh!

gitgitgadget-git bot commented Sep 25, 2020

Uh oh!

gitgitgadget-git bot commented Sep 27, 2020

Uh oh!

gitgitgadget-git bot commented Sep 27, 2020

Uh oh!

gitgitgadget-git bot commented Sep 28, 2020

Uh oh!

gitgitgadget-git bot commented Sep 28, 2020

Uh oh!

gitgitgadget-git bot commented Sep 29, 2020

Uh oh!

gitgitgadget-git bot commented Jul 7, 2021

Uh oh!

gitgitgadget-git bot commented Jul 7, 2021

Uh oh!

gitgitgadget-git bot commented Jul 7, 2021

Uh oh!

gitgitgadget-git bot commented Jul 8, 2021

hanwen commented Sep 16, 2020 •

edited

Loading