Conversation
|
/submit |
|
Submitted as pull.847.git.git.1600283416.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
This branch is now known as |
|
This patch series was integrated into seen via c00808e. |
|
This patch series was integrated into seen via 4f1956c. |
|
This patch series was integrated into seen via f9f83c1. |
|
This patch series was integrated into seen via 488ad6a. |
|
This patch series was integrated into seen via ab283b9. |
|
This patch series was integrated into seen via a0bd498. |
|
This patch series was integrated into seen via 0417227. |
|
This patch series was integrated into seen via e227917. |
| @@ -720,6 +720,7 @@ TEST_BUILTINS_OBJS += test-read-cache.o | |||
| TEST_BUILTINS_OBJS += test-read-graph.o | |||
There was a problem hiding this comment.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +static void reftable_log_record_key(const void *r, struct strbuf *dest)
> +{
> + const struct reftable_log_record *rec =
> + (const struct reftable_log_record *)r;
> + int len = strlen(rec->refname);
> + uint8_t i64[8];
> + uint64_t ts = 0;
> + strbuf_reset(dest);
> + strbuf_add(dest, (uint8_t *)rec->refname, len + 1);
> +
> + ts = (~ts) - rec->update_index;
> + put_be64(&i64[0], ts);
> + strbuf_add(dest, i64, sizeof(i64));
> +}
We seem to be getting
reftable/record.c: In function 'reftable_log_record_key':
reftable/record.c:578:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
put_be64(&i64[0], ts);
^
CC reftable/refname.o
when this series is merged to 'seen'.
cf. e.g. https://travis-ci.org/github/git/git/jobs/728655368
Thanks.
There was a problem hiding this comment.
On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):
On Sun, Sep 20, 2020 at 3:00 AM Junio C Hamano <gitster@pobox.com> wrote:
> > +static void reftable_log_record_key(const void *r, struct strbuf *dest)
> > +{
> > + const struct reftable_log_record *rec =
> > + (const struct reftable_log_record *)r;
> > + int len = strlen(rec->refname);
> > + uint8_t i64[8];
> > + uint64_t ts = 0;
> > + strbuf_reset(dest);
> > + strbuf_add(dest, (uint8_t *)rec->refname, len + 1);
> > +
> > + ts = (~ts) - rec->update_index;
> > + put_be64(&i64[0], ts);
> > + strbuf_add(dest, i64, sizeof(i64));
> > +}
>
> We seem to be getting
>
> reftable/record.c: In function 'reftable_log_record_key':
> reftable/record.c:578:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
> put_be64(&i64[0], ts);
> ^
> CC reftable/refname.o
>
> when this series is merged to 'seen'.
Thanks for bringing this up. I did see this before, but I've been
unable to reproduce this locally, and I forgot about it.
The problem is actually triggered by the Git-provided put_be64()
(which appears unused in the Git source code). The definition
#define put_be64(p, v) do { *(uint64_t *)(p) = htonll(v); } while (0)
is trying to reinterpret a char* as a uint64_t* , which is illegal in
strict aliasing rules? I originally had
+void put_be64(uint8_t *out, uint64_t v)
+{
+ int i = sizeof(uint64_t);
+ while (i--) {
+ out[i] = (uint8_t)(v & 0xff);
+ v >>= 8;
+ }
+}
in my reftable library, which is portable. Is there a reason for the
magic with htonll and friends?
--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
There was a problem hiding this comment.
On the Git mailing list, Jeff King wrote (reply to this):
On Mon, Sep 21, 2020 at 03:13:05PM +0200, Han-Wen Nienhuys wrote:
> The problem is actually triggered by the Git-provided put_be64()
> (which appears unused in the Git source code). The definition
>
> #define put_be64(p, v) do { *(uint64_t *)(p) = htonll(v); } while (0)
>
> is trying to reinterpret a char* as a uint64_t* , which is illegal in
> strict aliasing rules?
Ah, thanks for finding that. I think this resolves the mystery I had
around aliasing warnings with put_be32() in f39ad38410 (fast-export: use
local array to store anonymized oid, 2020-06-25). I got confused because
when I looked at the implementation of put_be32(), I stupidly looked at
the NO_ALIGNED_LOADS fallback code, which indeed does not alias. But the
one we use on x86 does.
I thought this would be OK because C allows aliasing through "char *"
pointers. But I think it may only go the other way. I.e., you can
type-pun a uint64_t as a char, but not the other way around. But then
why doesn't basically every use of put_be32() and put_be64() complain?
> I originally had
>
> +void put_be64(uint8_t *out, uint64_t v)
> +{
> + int i = sizeof(uint64_t);
> + while (i--) {
> + out[i] = (uint8_t)(v & 0xff);
> + v >>= 8;
> + }
> +}
>
> in my reftable library, which is portable. Is there a reason for the
> magic with htonll and friends?
Presumably it was thought to be faster. This comes originally from the
block-sha1 code in 660231aa97 (block-sha1: support for architectures
with memory alignment restrictions, 2009-08-12). I don't know how it
compares in practice, and especially these days.
Our fallback routines are similar to an unrolled version of what you
wrote above.
-Peff
There was a problem hiding this comment.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, Sep 24, 2020 at 03:21:51AM -0400, Jeff King wrote:
> > I originally had
> >
> > +void put_be64(uint8_t *out, uint64_t v)
> > +{
> > + int i = sizeof(uint64_t);
> > + while (i--) {
> > + out[i] = (uint8_t)(v & 0xff);
> > + v >>= 8;
> > + }
> > +}
> >
> > in my reftable library, which is portable. Is there a reason for the
> > magic with htonll and friends?
>
> Presumably it was thought to be faster. This comes originally from the
> block-sha1 code in 660231aa97 (block-sha1: support for architectures
> with memory alignment restrictions, 2009-08-12). I don't know how it
> compares in practice, and especially these days.
>
> Our fallback routines are similar to an unrolled version of what you
> wrote above.
We should be able to measure it pretty easily, since block-sha1 uses a
lot of get_be32/put_be32. I generated a 4GB random file, built with
BLK_SHA1=Yes and -O2, and timed:
t/helper/test-tool sha1 <foo.rand
Then I did the same, but building with -DNO_UNALIGNED_LOADS. The latter
actually ran faster, by a small margin. Here are the hyperfine results:
[stock]
Time (mean ± σ): 6.638 s ± 0.081 s [User: 6.269 s, System: 0.368 s]
Range (min … max): 6.550 s … 6.841 s 10 runs
[-DNO_UNALIGNED_LOADS]
Time (mean ± σ): 6.418 s ± 0.015 s [User: 6.058 s, System: 0.360 s]
Range (min … max): 6.394 s … 6.447 s 10 runs
For casual use as in reftables I doubt the difference is even
measurable. But this result implies that perhaps we ought to just be
using the fallback version all the time.
-Peff
There was a problem hiding this comment.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Jeff King <peff@peff.net> writes:
> Then I did the same, but building with -DNO_UNALIGNED_LOADS. The latter
> actually ran faster, by a small margin. Here are the hyperfine results:
>
> [stock]
> Time (mean ± σ): 6.638 s ± 0.081 s [User: 6.269 s, System: 0.368 s]
> Range (min … max): 6.550 s … 6.841 s 10 runs
>
> [-DNO_UNALIGNED_LOADS]
> Time (mean ± σ): 6.418 s ± 0.015 s [User: 6.058 s, System: 0.360 s]
> Range (min … max): 6.394 s … 6.447 s 10 runs
>
> For casual use as in reftables I doubt the difference is even
> measurable. But this result implies that perhaps we ought to just be
> using the fallback version all the time.
I like that one. One less configurable knob that makes us execute
different codepaths is one less thing to be worried about.
|
This patch series was integrated into seen via 7b0ab69. |
|
User |
|
This patch series was integrated into seen via d1726fb. |
|
This patch series was integrated into seen via 0c5022d. |
|
This patch series was integrated into seen via 3b8fedc. |
|
This patch series was integrated into seen via f5586d7. |
|
This patch series was integrated into seen via 75d9f0d. |
|
User |
|
This patch series was integrated into seen via 4ec7939. |
|
This patch series was integrated into seen via 0875b0a. |
|
This patch series was integrated into seen via 8bb79dc. |
|
This patch series was integrated into seen via 503b919. |
|
This patch series was integrated into seen via 9312a58. |
|
This patch series was integrated into seen via 4d8ca35. |
|
This patch series was integrated into seen via 8cb3f1f. |
|
This patch series was integrated into seen via 98d584a. |
The reftable format is structured as a sequence of block. Within a block, records are prefix compressed, with an index of offsets for fully expand keys to enable binary search within blocks. This commit provides the logic to read and write these blocks. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
The reftable format includes support for an (OID => ref) map. This map can speed up visibility and reachability checks. In particular, various operations along the fetch/push path within Gerrit have ben sped up by using this structure. The map is constructed with help of a binary tree. Object IDs are hashes, so they are uniformly distributed. Hence, the tree does not attempt forced rebalancing. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
This supports reading a single reftable file. The commit introduces an abstract iterator type, which captures the usecases both of reading individual refs, and iterating over a segment of the ref namespace. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
With support for reading and writing files in place, we can construct files (in memory) and attempt to read them back. Because some sections of the format are optional (eg. indices, log entries), we have to exercise this code using multiple sizes of input data Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
This is needed to create a merged view multiple reftables Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
This adds an abstract, read-only interface to the ref database. This primitive is used to construct the read view of the ref database (the read view is constructed by merging several *.ref files). It also provides the mechanism to provide a unified view of the refs in the main repository and the per-worktree refs. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
The packed/loose format has restrictions on refnames: a and a/b cannot coexist. This limitation does not apply to reftable per se, but must be maintained for interoperability. This code adds validation routines to abort transactions that are trying to add invalid names. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
provide a command-line utility for inspecting individual tables, and inspecting a complete ref database Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
For background, see Documentation/technical/reftable.txt. This introduces the file refs/reftable-backend.c containing a reftable-powered ref storage backend. It can be activated by setting GIT_TEST_REFTABLE in the environment. When GIT_TEST_REFTABLE is set, the test prerequisite !REFFILES is set. There is no option to git-init for now, as the test suite still shows failures with GIT_TEST_REFTABLE=1. Example use: see t/t0031-reftable.sh Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Junio Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <patrick.steinhardt@elego.de> Co-authored-by: Jeff King <peff@peff.net>
In our git-prompt script we strive to use Bash builtins wherever possible, because fork()-ing subshells for command substitutions and fork()+exec()-ing Git commands are expensive on some platforms. We even read and parse '.git/HEAD' using Bash builtins to get the name of the current branch [1]. However, the upcoming reftable refs backend won't use '.git/HEAD' at all, but will write an invalid refname as placeholder for backwards compatibility instead, which will break our git-prompt script. Update the git-prompt script to recognize the placeholder '.git/HEAD' written by the reftable backend (its content is specified in the reftable specs), and then fall back to use 'git symbolic-ref' to get the name of the current branch. [1] 3a43c4b (bash prompt: use bash builtins to find out current branch, 2011-03-31) Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
This command dumps individual tables or a stack of of tables. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
* Reftable for now lacks detailed error messages for directory/file conflicts. Skip message comparisons. * Mark tests that muck with .git directly as REFFILES. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
|
There was a status update in the "Stalled" section about the branch The "reftable" backend for the refs API. |
|
This patch series was integrated into seen via 92fd260. |
|
This patch series was integrated into next via 0d6f79d. |
|
This patch series was integrated into seen via 171a6b6. |
|
There was a status update in the "Stalled" section about the branch The "reftable" backend for the refs API. |
|
This patch series was integrated into seen via c9780bb. |
|
This patch series was integrated into next via c9780bb. |
|
This patch series was integrated into master via c9780bb. |
|
Closed via c9780bb. |
This splits the giant commit from gitgitgadget#539 into a series of smaller commits, which build and have unittests.
Changes relative to last series: version 19 Apr 2021 (tip: b8729fe)
cc: Han-Wen Nienhuys hanwen@google.com
cc: Jeff King peff@peff.net
cc: Ramsay Jones ramsay@ramsayjones.plus.com
cc: Jonathan Nieder jrnieder@gmail.com
cc: Johannes Schindelin Johannes.Schindelin@gmx.de
cc: Jonathan Tan jonathantanmy@google.com
cc: Josh Steadmon steadmon@google.com
cc: Emily Shaffer emilyshaffer@google.com
cc: Patrick Steinhardt ps@pks.im
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Felipe Contreras felipe.contreras@gmail.com
cc: Derrick Stolee stolee@gmail.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Han-Wen Nienhuys hanwen@google.comg
cc: "brian m. carlson" sandals@crustytoothpaste.net