Skip to content

Add range-diff, a tbdiff lookalike#1

Closed
dscho wants to merge 21 commits intogitgitgadget:masterfrom
dscho:branch-diff
Closed

Add range-diff, a tbdiff lookalike#1
dscho wants to merge 21 commits intogitgitgadget:masterfrom
dscho:branch-diff

Conversation

@dscho
Copy link
Copy Markdown
Member

@dscho dscho commented Jun 5, 2018

The incredibly useful git-tbdiff tool to compare patch series (say, to see what changed between two iterations sent to the Git mailing list) is slightly less useful for this developer due to the fact that it requires the hungarian and numpy Python packages which are for some reason really hard to build in MSYS2. So hard that I even had to give up, because it was simply easier to re-implement the whole shebang as a builtin command.

The project at https://github.com/trast/tbdiff seems to be dormant, anyway. Funny (and true) story: I looked at the open Pull Requests to see how active that project is, only to find to my surprise that I had submitted one in August 2015, and that it was still unanswered let alone merged.

While at it, I forward-ported AEvar's patch to force --decorate=no because git -p tbdiff would fail otherwise.

Side note: I work on implementing range-diff not only to make life easier for reviewers who have to suffer through v2, v3, ... of my patch series, but also to verify my changes before submitting a new iteration. And also, maybe even more importantly, I plan to use it to verify my merging-rebases of Git for
Windows (for which I previously used to redirect the pre-rebase/post-rebase diffs vs upstream and then compare them using git diff --no-index). And of course any interested person can see what changes were necessary e.g. in the merging-rebase of Git for Windows onto v2.17.0 by running a command like:

        base=^{/Start.the.merging-rebase}
        tag=v2.17.0.windows.1
        pre=$tag$base^2
        git range-diff $pre$base..$pre $tag$base..$tag

The command uses what it calls the "dual color mode" (can be disabled via --no-dual-color) which helps identifying what actually changed: it prefixes lines with a - (and red background) that correspond to the first commit range, and with a + (and green background) that correspond to the second range. The rest of the lines will be colored according to the original diffs.

Changes since v4:

  • Fixed a typo in the commit message of "range-diff: add tests" that was introduced in v4.
  • White-space fixes.
  • Fixed the length of the first header underline in the man page.
  • Changed the preprocessor guard in linear-assignment.h to reflect the new name (instead of the old name, which was hungarian.h).
  • Likewise, changed the preprocessor guards in range-diff.h to hide the history of the thrice-renamed command.
  • Fixed indentation in the completion.
  • Instead of trying to paper over white-space error handling that does not apply to "diffs of diffs", dual color mode now simply disables all white-space warnings.
  • When showing the "single arg must be symmetric range" error message, git range-diff now also shows the usage.
  • Adjusted the commit message of "range-diff: adjust the output of the commit pairs" to avoid the surprise of the reviewer when onelines are printed all of a sudden, too.
  • "range-diff: adjust the output of the commit pairs" is now using a simpler way to print onelines.
  • We are now sandwiching the diff_opt_parse() loop between two parse_options(), to make sure that we caught all options, and that the -- separator is handled.
  • Adjusted the lookup_commit_reference() call to the newest master (it now takes a the_repository parameter).

Changes since v3:

  • The cover letter was adjusted to reflect the new reality (the command is called range-diff now, not branch-diff, and --dual-color is the default).
  • The documentation was adjusted a bit more in the patch that makes --dual-color the default.
  • Clarified the calculation of the cost matrix, as per Stefan Beller's request.
  • The man page now spells out that merge commits are ignored in the commit ranges (not merges per se).
  • The code in linear-assignment.c was adjusted to use the SWAP() macro.
  • The commit message of the patch introducing the first rudimentary implementation no longer talks about the "Hungarian" algorithm, but about the "linear assignment algorithm" instead.
  • A bogus indentation change was backed out from the patch introducing the first rudimentary implementation.
  • Instead of merely warning about missing .. in the 2-parameter invocation, we now exit with the error message.
  • The diff_opt_parse() function is allowed to return a value larger than 1, indicating that more than just one command-line parameter was parsed. We now advance by the indicated value instead of always advancing exactly 1 (which is still correct much of the time).
  • A lengthy if...else if...else if...else was simplified (from a logical point of view) by reordering it.
  • The unnecessarily static variable dashes was turned into a local variable of the caller.
  • The commit message talking about the new man page still referred to git branch --diff, which has been fixed.
  • A forgotten t7910 reference was changed to t3206.
  • An unbalanced double-tick was fixed in the man page.
  • Fixed grammar both of the commit message and the description of the --no-dual-color option.
  • To fix the build, a blank man page is now introduced together with the new range-diff command, even if it is populated for real only at a later patch (i.e. at the same time as before).
  • The headaches Junio fears would be incurred by that simple workaround to avoid bogus white-space error reporting are fended off: a more complex patch is now in place that adds (and uses) a new white-space flag. Sadly, as is all too common when Junio "encourages" me to replace a simple workaround by something "proper", it caused all kinds of headaches to get this right, so I am rather less certain that the "proper" fix will cause us less headaches than the simple workaround would have done. But whatever.
  • The dual color mode now also dims the changes that are exclusively in the first specified commit range, and uses bold face on the changes exclusively in the second one. This matches the intuition when using range-diff to compare an older iteration of a patch series to a newer one: the changes from the previous iteration that were replaced by new ones "fade", while the changes that replace them are "shiny new".

Changes since v2:

  • Right-aligned the patch numbers in the commit pairs.
  • Used ALLOC_ARRAY() in hungarian.c instead of xmalloc(sizeof()*size).
  • Changed compute_assignment()s return type from int to void, as it always succeeds.
  • Changed the Hungarian Algorithm to use an integer cost matrix.
  • Changed the --creation-weight option to --creation-factor where is an integer.
  • Retitled 1/19 and 2/19 to better conform with the current conventions, as pointed out (and suggested) by Junio.
  • Shut up Coverity, and at the same time avoided passing the unnecessary i and j parameters to output_pair_header().
  • Removed support for the --no-patches option: we inherit diff_options' support for -s already (and much more).
  • Removed the ugly _INV enum values, and introduced a beautiful GIT_COLOR_REVERSE instead. This way, whatever the user configured as color.diff.new (or .old) will be used in reverse in the dual color mode.
  • Instead of overriding the fragment header color, the dual color mode will now reverse the "outer" fragment headers, too.
  • Turned the stand-alone branch-diff command into the --diff option of git branch. Adjusted pretty much all commit messages to account for this. This change should no longer be visible: see below.
  • Pretty much re-wrote the completion, to support the new --diff mode of git-branch. See below: it was reverted for range-diff.
  • Renamed t7910 to t3206, to be closer to the git-branch tests.
  • Ensured that git_diff_ui_config() gets called, and therefore color.diff.* respected.
  • Avoided leaking four_spaces.
  • Fixed a declaration in a for (;;) statement (which Junio had as a fixup! that I almost missed).
  • Renamed branch --diff, which had been renamed from branch-diff (which was picked to avoid re-using tbdiff) to range-diff.
  • Renamed hungarian.c and its header to linear-assignment.c
  • Made --dual-color the default, and changed it to still auto-detect whether color should be used rather than forcing it

@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 5, 2018
@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Jun 7, 2018

An error occurred while submitting:

Error: Branch f8c4e30c63bbfa7efec44ff0f6d0404326723d35 is not rebased to upstream/master

@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 8, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 8, 2018
@gitgitgadget gitgitgadget deleted a comment from gitgitgadget bot Jun 8, 2018
dscho pushed a commit that referenced this pull request Jun 20, 2018
Change "fetch" to treat "+" in refspecs (aka --force) to mean we
should clobber a local tag of the same name.

This changes the long-standing behavior of "fetch" added in
853a369 ("[PATCH] Multi-head fetch.", 2005-08-20), before this
change all tag fetches effectively had --force enabled. The original
rationale in that change was:

    > Tags need not be pointing at commits so there is no way to
    > guarantee "fast-forward" anyway.

That comment and the rest of the history of "fetch" shows that the
"+" (--force) part of refpecs was only conceived for branch updates,
while tags have accepted any changes from upstream unconditionally and
clobbered the local tag object. Changing this behavior has been
discussed as early as 2011[1].

I the current behavior doesn't make sense, it easily results in local
tags accidentally being clobbered. Ideally we'd namespace our tags
per-remote, but as with my 97716d2 ("fetch: add a --prune-tags
option and fetch.pruneTags config", 2018-02-09) it's easier to work
around the current implementation than to fix the root cause, so this
implements suggestion #1 from [1], "fetch" now only clobbers the tag
if either "+" is provided as part of the refspec, or if "--force" is
provided on the command-line.

This also makes it nicely symmetrical with how "tag" itself
works. We'll now refuse to clobber any existing tags unless "--force"
is supplied, whether that clobbering would happen by clobbering a
local tag with "tag", or by fetching it from the remote with "fetch".

It's still not at all nicely symmetrical with how "git push" works, as
discussed in the updated pull-fetch-param.txt documentation, but this
change brings them more into line with one another. I don't think
there's any reason "fetch" couldn't fully converge with the behavior
used by "push", but that's a topic for another change.

One of the tests added in 31b808a ("clone --single: limit the fetch
refspec to fetched branch", 2012-09-20) is being changed to use
--force where a clone would clobber a tag. This changes nothing about
the existing behavior of the test.

1. https://public-inbox.org/git/20111123221658.GA22313@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@dscho dscho force-pushed the branch-diff branch 2 times, most recently from 79ce337 to 4a68b95 Compare June 30, 2018 21:36
@dscho
Copy link
Copy Markdown
Member Author

dscho commented Jul 3, 2018

/submit

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Jul 3, 2018

An error occurred while submitting:

Error: git tag -F - -a pr-1/dscho/branch-diff-v3 4a68b95 failed: 128,
fatal: tag 'pr-1/dscho/branch-diff-v3' already exists

@dscho
Copy link
Copy Markdown
Member Author

dscho commented Jul 3, 2018

/submit

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Jul 3, 2018

Submitted as pull.1.v3.git.gitgitgadget@gmail.com

@dscho dscho force-pushed the branch-diff branch 3 times, most recently from ebf3fea to d4e27c6 Compare July 9, 2018 20:39
@dscho dscho force-pushed the branch-diff branch 4 times, most recently from 5a2cf0c to d8498fb Compare July 21, 2018 21:40
@dscho
Copy link
Copy Markdown
Member Author

dscho commented Jul 21, 2018

/submit

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Jul 21, 2018

Submitted as pull.1.v4.git.gitgitgadget@gmail.com

dscho pushed a commit that referenced this pull request Aug 17, 2018
When "git rebase -i" is told to squash two or more commits into
one, it labeled the log message for each commit with its number.
It correctly called the first one "1st commit", but the next one
was "commit #1", which was off-by-one.  This has been corrected.

* pw/rebase-i-squash-number-fix:
  rebase -i: fix numbering in squash message
dscho pushed a commit that referenced this pull request Aug 17, 2018
When "git rebase -i" is told to squash two or more commits into
one, it labeled the log message for each commit with its number.
It correctly called the first one "1st commit", but the next one
was "commit #1", which was off-by-one.  This has been corrected.

* pw/rebase-i-squash-number-fix:
  rebase -i: fix numbering in squash message
dscho pushed a commit that referenced this pull request Aug 20, 2018
When "git rebase -i" is told to squash two or more commits into
one, it labeled the log message for each commit with its number.
It correctly called the first one "1st commit", but the next one
was "commit #1", which was off-by-one.  This has been corrected.

* pw/rebase-i-squash-number-fix:
  rebase -i: fix numbering in squash message
dscho pushed a commit that referenced this pull request Aug 23, 2018
The verbose output of the test 'reword without issues functions as
intended' in 't3423-rebase-reword.sh', added in a9279c6 (sequencer:
do not squash 'reword' commits when we hit conflicts, 2018-06-19),
contains the following error output:

  sed: -e expression #1, char 2: extra characters after command

This error comes from within the 'fake-editor.sh' script created by
'lib-rebase.sh's set_fake_editor() function, and the root cause is the
FAKE_LINES="pick 1 reword 2" variable in the test in question, in
particular the "pick" word.  'fake-editor.sh' assumes 'pick' to be the
default rebase command and doesn't support an explicit 'pick' command
in FAKE_LINES.  As a result, 'pick' will be used instead of a line
number when assembling the following 'sed' script:

  sed -n picks/^pick/pick/p

which triggers the aforementioned error.

Luckily, this didn't affect the test's correctness: the erroring 'sed'
command doesn't write anything to the todo script, and processing the
rest of FAKE_LINES generates the desired todo script, as if that
'pick' command were not there at all.

The minimal fix would be to remove the 'pick' word from FAKE_LINES,
but that would leave us susceptible to similar issues in the future.

Instead, teach the fake-editor script to recognize an explicit 'pick'
command, which is still a fairly trivial change.

In the future we might want to consider reinforcing this fake editor
script with an &&-chain and stricter parsing of the FAKE_LINES
variable (e.g. to error out when encountering unknown rebase commands
or commands and line numbers in the wrong order).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit that referenced this pull request Aug 31, 2018
Change "fetch" to treat "+" in refspecs (aka --force) to mean we
should clobber a local tag of the same name.

This changes the long-standing behavior of "fetch" added in
853a369 ("[PATCH] Multi-head fetch.", 2005-08-20). Before this
change, all tag fetches effectively had --force enabled. See the
git-fetch-script code in fast_forward_local() with the comment:

    > Tags need not be pointing at commits so there is no way to
    > guarantee "fast-forward" anyway.

That commit and the rest of the history of "fetch" shows that the
"+" (--force) part of refpecs was only conceived for branch updates,
while tags have accepted any changes from upstream unconditionally and
clobbered the local tag object. Changing this behavior has been
discussed as early as 2011[1].

The current behavior doesn't make sense to me, it easily results in
local tags accidentally being clobbered. We could namespace our tags
per-remote and not locally populate refs/tags/*, but as with my
97716d2 ("fetch: add a --prune-tags option and fetch.pruneTags
config", 2018-02-09) it's easier to work around the current
implementation than to fix the root cause.

So this change implements suggestion #1 from Jeff's 2011 E-Mail[1],
"fetch" now only clobbers the tag if either "+" is provided as part of
the refspec, or if "--force" is provided on the command-line.

This also makes it nicely symmetrical with how "tag" itself works when
creating tags. I.e. we refuse to clobber any existing tags unless
"--force" is supplied. Now we can refuse all such clobbering, whether
it would happen by clobbering a local tag with "tag", or by fetching
it from the remote with "fetch".

Ref updates outside refs/{tags,heads/* are still still not symmetrical
with how "git push" works, as discussed in the recently changed
pull-fetch-param.txt documentation. This change brings the two
divergent behaviors more into line with one another. I don't think
there's any reason "fetch" couldn't fully converge with the behavior
used by "push", but that's a topic for another change.

One of the tests added in 31b808a ("clone --single: limit the fetch
refspec to fetched branch", 2012-09-20) is being changed to use
--force where a clone would clobber a tag. This changes nothing about
the existing behavior of the test.

1. https://public-inbox.org/git/20111123221658.GA22313@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit that referenced this pull request Sep 11, 2018
check_one_conflict() compares `i` to `active_nr` in two places to avoid
buffer overruns, but left out an important third location.

The code did used to have a check here comparing i to active_nr, back
before commit fb70a06 ("rerere: fix an off-by-one non-bug",
2015-06-28), however the code at the time used an 'if' rather than a
'while' meaning back then that this loop could not have read past the
end of the array, making the check unnecessary and it was removed.
Unfortunately, in commit 5eda906 ("rerere: handle conflicts with
multiple stage #1 entries", 2015-07-24), the 'if' was changed to a
'while' and the check comparing i and active_nr was not re-instated,
leading to this problem.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit that referenced this pull request Sep 27, 2018
Ever since the split index feature was introduced [1], refreshing a
split index is prone to a variant of the classic racy git problem.
There are a couple of unrelated tests in the test suite that
occasionally fail when run with 'GIT_TEST_SPLIT_INDEX=yes', but
't1700-split-index.sh', the only test script focusing solely on split
index, has never noticed this issue, because it only cares about how
the index is split under various circumstances and all the different
ways to turn the split index feature on and off.

Add a dedicated test script 't1701-racy-split-index.sh' to exercise
the split index feature in racy situations as well; kind of a
"t0010-racy-git.sh for split index" but with modern style (the tests
do everything in &&-chained list of commands in 'test_expect_...'
blocks, and use 'test_cmp' for more informative output on failure).

The tests cover the following sequences of index splitting, updating,
and racy file modifications, with the last two cases demonstrating the
racy split index problem:

  1. Split the index while adding a racily clean file:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file    # size stays the same

     This case already works properly.  Even though the cache entry's
     stat data matches with the modifid file in the worktree,
     subsequent git commands will notice that the (split) index and
     the file have the same mtime, and then will go on to check the
     file's content and notice its dirtiness.

  2. Add a racily clean file to an already split index:

       git update-index --split-index
       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file

     This case already works properly.  After the second 'git
     update-index' writes the newly added file's cache entry to the
     new split index, it basically works in the same way as case #1.

  3. Split the index when it (i.e. the not yet splitted index)
     contains a racily clean cache entry, i.e. an entry whose cached
     stat data matches with the corresponding file in the worktree and
     the cached mtime matches that of the index:

       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --split-index --add other-file

     This case already works properly.  The shared index is written by
     do_write_index(), i.e. the same function that is responsible for
     writing "regular" and split indexes as well.  This function
     cleverly notices the racily clean cache entry, and writes the
     entry to the new shared index with smudged stat data, i.e. file
     size set to 0.  When subsequent git commands read the index, they
     will notice that the smudged stat data doesn't match with the
     file in the worktree, and then go on to check the file's content
     and notice its dirtiness.

  4. Update the split index when it contains a racily clean cache
     entry:

       git update-index --split-index
       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --add other-file

     This case already works properly.  After the second 'git
     update-index' the newly added file's cache entry is only stored
     in the split index.  If a cache entry is present in the split
     index (even if it is a replacement of an outdated entry in the
     shared index), then it will always be included in the new split
     index on subsequent split index updates (until the file is
     removed or a new shared index is written), independently from
     whether the entry is racily clean or not.  When do_write_index()
     writes the new split index, it notices the racily clean cache
     entry, and smudges its stat date.  Subsequent git commands
     reading the index will notice the smudged stat data and then go
     on to check the file's content and notice its dirtiness.

  5. Update the split index when a racily clean cache entry is stored
     only in the shared index:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --add other-file

     This case fails due to the racy split index problem.  In the
     second 'git update-index' prepare_to_write_split_index() decides,
     among other things, which cache entries stored only in the shared
     index should be replaced in the new split index.  Alas, this
     function never looks out for racily clean cache entries, and
     since the file's stat data in the worktree hasn't changed since
     the shared index was written, the entry won't be replaced in the
     new split index.  Consequently, do_write_index() doesn't even get
     this racily clean cache entry, and can't smudge its stat data.
     Subsequent git commands will then see that the index has more
     recent mtime than the file and that the (not smudged) cached stat
     data still matches with the file in the worktree, and,
     ultimately, will erroneously consider the file clean.

  6. Update the split index after unpack_trees() copied a racily clean
     cache entry from the shared index:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git read-tree -m HEAD

     This case fails due to the racy split index problem.  This
     basically fails for the same reason as case #5 above, but there
     is one important difference, which warrants the dedicated test.
     While that second 'git update-index' in case #5 updates
     index_state in place, in this case 'git read-tree -m' calls
     unpack_trees(), which throws out the entire index, and constructs
     a new one from the (potentially updated) copies of the original's
     cache entries.  Consequently, when prepare_to_write_split_index()
     gets to work on this reconstructed index, it takes a different
     code path than in case #5 when deciding which cache entries in
     the shared index should be replaced.  The result is the same,
     though: the racily clean cache entry goes unnoticed, it isn't
     added to the split index with smudged stat data, and subsequent
     git commands will then erroneously consider the file clean.

Note that in the last two 'test_expect_failure' cases I omitted the
'#' (as in nr. of trial) from the tests' name on purpose for now, as
it confuses 'prove' into thinking that those tests failed
unexpectedly.

[1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge
    branch 'nd/split-index', 2014-07-16).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit that referenced this pull request Sep 28, 2018
Ever since the split index feature was introduced [1], refreshing a
split index is prone to a variant of the classic racy git problem.
There are a couple of unrelated tests in the test suite that
occasionally fail when run with 'GIT_TEST_SPLIT_INDEX=yes', but
't1700-split-index.sh', the only test script focusing solely on split
index, has never noticed this issue, because it only cares about how
the index is split under various circumstances and all the different
ways to turn the split index feature on and off.

Add a dedicated test script 't1701-racy-split-index.sh' to exercise
the split index feature in racy situations as well; kind of a
"t0010-racy-git.sh for split index" but with modern style (the tests
do everything in &&-chained list of commands in 'test_expect_...'
blocks, and use 'test_cmp' for more informative output on failure).

The tests cover the following sequences of index splitting, updating,
and racy file modifications, with the last two cases demonstrating the
racy split index problem:

  1. Split the index while adding a racily clean file:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file    # size stays the same

     This case already works properly.  Even though the cache entry's
     stat data matches with the modifid file in the worktree,
     subsequent git commands will notice that the (split) index and
     the file have the same mtime, and then will go on to check the
     file's content and notice its dirtiness.

  2. Add a racily clean file to an already split index:

       git update-index --split-index
       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file

     This case already works properly.  After the second 'git
     update-index' writes the newly added file's cache entry to the
     new split index, it basically works in the same way as case #1.

  3. Split the index when it (i.e. the not yet splitted index)
     contains a racily clean cache entry, i.e. an entry whose cached
     stat data matches with the corresponding file in the worktree and
     the cached mtime matches that of the index:

       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --split-index --add other-file

     This case already works properly.  The shared index is written by
     do_write_index(), i.e. the same function that is responsible for
     writing "regular" and split indexes as well.  This function
     cleverly notices the racily clean cache entry, and writes the
     entry to the new shared index with smudged stat data, i.e. file
     size set to 0.  When subsequent git commands read the index, they
     will notice that the smudged stat data doesn't match with the
     file in the worktree, and then go on to check the file's content
     and notice its dirtiness.

  4. Update the split index when it contains a racily clean cache
     entry:

       git update-index --split-index
       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --add other-file

     This case already works properly.  After the second 'git
     update-index' the newly added file's cache entry is only stored
     in the split index.  If a cache entry is present in the split
     index (even if it is a replacement of an outdated entry in the
     shared index), then it will always be included in the new split
     index on subsequent split index updates (until the file is
     removed or a new shared index is written), independently from
     whether the entry is racily clean or not.  When do_write_index()
     writes the new split index, it notices the racily clean cache
     entry, and smudges its stat date.  Subsequent git commands
     reading the index will notice the smudged stat data and then go
     on to check the file's content and notice its dirtiness.

  5. Update the split index when a racily clean cache entry is stored
     only in the shared index:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --add other-file

     This case fails due to the racy split index problem.  In the
     second 'git update-index' prepare_to_write_split_index() decides,
     among other things, which cache entries stored only in the shared
     index should be replaced in the new split index.  Alas, this
     function never looks out for racily clean cache entries, and
     since the file's stat data in the worktree hasn't changed since
     the shared index was written, the entry won't be replaced in the
     new split index.  Consequently, do_write_index() doesn't even get
     this racily clean cache entry, and can't smudge its stat data.
     Subsequent git commands will then see that the index has more
     recent mtime than the file and that the (not smudged) cached stat
     data still matches with the file in the worktree, and,
     ultimately, will erroneously consider the file clean.

  6. Update the split index after unpack_trees() copied a racily clean
     cache entry from the shared index:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git read-tree -m HEAD

     This case fails due to the racy split index problem.  This
     basically fails for the same reason as case #5 above, but there
     is one important difference, which warrants the dedicated test.
     While that second 'git update-index' in case #5 updates
     index_state in place, in this case 'git read-tree -m' calls
     unpack_trees(), which throws out the entire index, and constructs
     a new one from the (potentially updated) copies of the original's
     cache entries.  Consequently, when prepare_to_write_split_index()
     gets to work on this reconstructed index, it takes a different
     code path than in case #5 when deciding which cache entries in
     the shared index should be replaced.  The result is the same,
     though: the racily clean cache entry goes unnoticed, it isn't
     added to the split index with smudged stat data, and subsequent
     git commands will then erroneously consider the file clean.

Note that in the last two 'test_expect_failure' cases I omitted the
'#' (as in nr. of trial) from the tests' name on purpose for now, as
it confuses 'prove' into thinking that those tests failed
unexpectedly.

[1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge
    branch 'nd/split-index', 2014-07-16).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit that referenced this pull request Oct 5, 2018
Reimplement oidset using khash.h in order to reduce its memory footprint
and make it faster.

Performance of a command that mainly checks for duplicate objects using
an oidset, with master and Clang 6.0.1:

  $ cmd="./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'"

  $ /usr/bin/time $cmd >/dev/null
  0.22user 0.03system 0:00.25elapsed 99%CPU (0avgtext+0avgdata 48484maxresident)k
  0inputs+0outputs (0major+11204minor)pagefaults 0swaps

  $ hyperfine "$cmd"
  Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'

    Time (mean ± σ):     250.0 ms ±   6.0 ms    [User: 225.9 ms, System: 23.6 ms]

    Range (min … max):   242.0 ms … 261.1 ms

And with this patch:

  $ /usr/bin/time $cmd >/dev/null
  0.14user 0.00system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 41396maxresident)k
  0inputs+0outputs (0major+8318minor)pagefaults 0swaps

  $ hyperfine "$cmd"
  Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'

    Time (mean ± σ):     151.9 ms ±   4.9 ms    [User: 130.5 ms, System: 21.2 ms]

    Range (min … max):   148.2 ms … 170.4 ms

Initial-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit that referenced this pull request Oct 12, 2018
Ever since the split index feature was introduced [1], refreshing a
split index is prone to a variant of the classic racy git problem.
There are a couple of unrelated tests in the test suite that
occasionally fail when run with 'GIT_TEST_SPLIT_INDEX=yes', but
't1700-split-index.sh', the only test script focusing solely on split
index, has never noticed this issue, because it only cares about how
the index is split under various circumstances and all the different
ways to turn the split index feature on and off.

Add a dedicated test script 't1701-racy-split-index.sh' to exercise
the split index feature in racy situations as well; kind of a
"t0010-racy-git.sh for split index" but with modern style (the tests
do everything in &&-chained list of commands in 'test_expect_...'
blocks, and use 'test_cmp' for more informative output on failure).

The tests cover the following sequences of index splitting, updating,
and racy file modifications, with the last two cases demonstrating the
racy split index problem:

  1. Split the index while adding a racily clean file:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file    # size stays the same

     This case already works properly.  Even though the cache entry's
     stat data matches with the modifid file in the worktree,
     subsequent git commands will notice that the (split) index and
     the file have the same mtime, and then will go on to check the
     file's content and notice its dirtiness.

  2. Add a racily clean file to an already split index:

       git update-index --split-index
       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file

     This case already works properly.  After the second 'git
     update-index' writes the newly added file's cache entry to the
     new split index, it basically works in the same way as case #1.

  3. Split the index when it (i.e. the not yet splitted index)
     contains a racily clean cache entry, i.e. an entry whose cached
     stat data matches with the corresponding file in the worktree and
     the cached mtime matches that of the index:

       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --split-index --add other-file

     This case already works properly.  The shared index is written by
     do_write_index(), i.e. the same function that is responsible for
     writing "regular" and split indexes as well.  This function
     cleverly notices the racily clean cache entry, and writes the
     entry to the new shared index with smudged stat data, i.e. file
     size set to 0.  When subsequent git commands read the index, they
     will notice that the smudged stat data doesn't match with the
     file in the worktree, and then go on to check the file's content
     and notice its dirtiness.

  4. Update the split index when it contains a racily clean cache
     entry:

       git update-index --split-index
       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --add other-file

     This case already works properly.  After the second 'git
     update-index' the newly added file's cache entry is only stored
     in the split index.  If a cache entry is present in the split
     index (even if it is a replacement of an outdated entry in the
     shared index), then it will always be included in the new split
     index on subsequent split index updates (until the file is
     removed or a new shared index is written), independently from
     whether the entry is racily clean or not.  When do_write_index()
     writes the new split index, it notices the racily clean cache
     entry, and smudges its stat date.  Subsequent git commands
     reading the index will notice the smudged stat data and then go
     on to check the file's content and notice its dirtiness.

  5. Update the split index when a racily clean cache entry is stored
     only in the shared index:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --add other-file

     This case fails due to the racy split index problem.  In the
     second 'git update-index' prepare_to_write_split_index() decides,
     among other things, which cache entries stored only in the shared
     index should be replaced in the new split index.  Alas, this
     function never looks out for racily clean cache entries, and
     since the file's stat data in the worktree hasn't changed since
     the shared index was written, the entry won't be replaced in the
     new split index.  Consequently, do_write_index() doesn't even get
     this racily clean cache entry, and can't smudge its stat data.
     Subsequent git commands will then see that the index has more
     recent mtime than the file and that the (not smudged) cached stat
     data still matches with the file in the worktree, and,
     ultimately, will erroneously consider the file clean.

  6. Update the split index after unpack_trees() copied a racily clean
     cache entry from the shared index:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git read-tree -m HEAD

     This case fails due to the racy split index problem.  This
     basically fails for the same reason as case #5 above, but there
     is one important difference, which warrants the dedicated test.
     While that second 'git update-index' in case #5 updates
     index_state in place, in this case 'git read-tree -m' calls
     unpack_trees(), which throws out the entire index, and constructs
     a new one from the (potentially updated) copies of the original's
     cache entries.  Consequently, when prepare_to_write_split_index()
     gets to work on this reconstructed index, it takes a different
     code path than in case #5 when deciding which cache entries in
     the shared index should be replaced.  The result is the same,
     though: the racily clean cache entry goes unnoticed, it isn't
     added to the split index with smudged stat data, and subsequent
     git commands will then erroneously consider the file clean.

Note that in the last two 'test_expect_failure' cases I omitted the
'#' (as in nr. of trial) from the tests' description on purpose for
now, as it breakes the TAP output [2]; it will be added at the end of
the series, when those two tests will be flipped to
'test_expect_success'.

[1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge
    branch 'nd/split-index', 2014-07-16).

[2] In the TAP output a '#' should separate the test's description
    from the TODO directive emitted by 'test_expect_failure'.  The
    additional '#' in "#$trial" interferes with this, the test harness
    won't recognize the TODO directive, and will report that those
    tests failed unexpectedly.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Copy link
Copy Markdown
Member Author

dscho commented Nov 1, 2018

This patch series was integrated into pu via git@81eab68.

Copy link
Copy Markdown
Member Author

dscho commented Nov 1, 2018

This patch series was integrated into next via git@81eab68.

Copy link
Copy Markdown
Member Author

dscho commented Nov 1, 2018

This patch series was integrated into master via git@81eab68.

Copy link
Copy Markdown
Member Author

dscho commented Nov 1, 2018

This patch series was integrated into maint via git@81eab68.

Copy link
Copy Markdown
Member Author

dscho commented Nov 1, 2018

Closed via 81eab68.

@dscho dscho deleted the branch-diff branch November 1, 2018 23:02
@@ -870,6 +870,7 @@ LIB_OBJS += gpg-interface.o
LIB_OBJS += graph.o
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> +		else if (!line.buf[0] || starts_with(line.buf, "index "))
> +			/*
> +			 * A completely blank (not ' \n', which is context)
> +			 * line is not valid in a diff.  We skip it

I noticed this while wondering how somebody could teach range-diff
to honor --notes=amlog while preparing the patches to be compared
[*1*], but this assumption goes against what POSIX.1 says these
days.

    It is implementation-defined whether an empty unaffected line is
    written as an empty line or a line containing a single <space> character.

cf. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html#tag_20_34_10_07

We need to insert ", as we disable user's diff.suppressBlankEmpty
settings" before ".  We skip it" (and if we get affected by the
setting, we need to fix it; it is not ultra-urgent, though).

[Footnote]

*1* ... which I do not have a good answer to, yet.  As discussed
earlier, the diffopt passed into the show_range_diff() machinery is
primarily meant for the final output (i.e. how the matching patches
from the two iterations are compared) and not about how the patches
to be compared are generated.  Worse, --notes=amlog (and possibly
other useful options) are parsed by "git log" side of the machinery,
not "git diff" side that populates diffopt.

dscho pushed a commit that referenced this pull request Mar 18, 2019
One-way merge is supposed to take stat info from the index and
everything else from the given tree. This implies stage 0 because trees
can't have non-zero stages. The add_entry(.., old, ...) call however
will keep stage index from the index.

This is normally not a problem if the entry from the index is
normal (stage #0). But if there is a conflict, we'll get stage #1 entry
as "old" and it gets recorded in the final index. Fix it by clearing
stage mask.

This bug probably comes from b5b4250 (git-read-tree: make one-way
merge also honor the "update" flag, 2005-06-07). Before this commit, we
may create the final ("dst") index entry from the one in index, but we
do clear CE_STAGEMASK.

I briefly checked two- and three-way merge functions. I think we don't
have the same problem in those.

Reported-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit that referenced this pull request Mar 19, 2019
Phillip found out that 'git checkout -f <branch>' does not restore
conflict/unmerged files correctly. All tracked files should be taken
from <branch> and all non-zero stages removed. Most of this is true,
except that the final file could be in stage one instead of zero.

"checkout -f" (among other commands) does this with one-way merge, which
is supposed to take stat info from the index and everything else from
the given tree. The add_entry(.., old, ...) call in oneway_merge()
though will keep stage index from the index.

This is normally not a problem if the entry from the index is
normal (stage #0). But if there is a conflict, stage #0 does not exist
and we'll get stage #1 entry as "old" variable, which gets recorded in
the final index. Fix it by clearing stage mask.

This bug probably comes from b5b4250 (git-read-tree: make one-way
merge also honor the "update" flag, 2005-06-07). Before this commit, we
may create the final ("dst") index entry from the one in index, but we
do clear CE_STAGEMASK.

I briefly checked two- and three-way merge functions. I think we don't
have the same problem in those.

Reported-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit that referenced this pull request Apr 10, 2019
Phillip found out that 'git checkout -f <branch>' does not restore
conflict/unmerged files correctly. All tracked files should be taken
from <branch> and all non-zero stages removed. Most of this is true,
except that the final file could be in stage one instead of zero.

"checkout -f" (among other commands) does this with one-way merge, which
is supposed to take stat info from the index and everything else from
the given tree. The add_entry(.., old, ...) call in oneway_merge()
though will keep stage index from the index.

This is normally not a problem if the entry from the index is
normal (stage #0). But if there is a conflict, stage #0 does not exist
and we'll get stage #1 entry as "old" variable, which gets recorded in
the final index. Fix it by clearing stage mask.

This bug probably comes from b5b4250 (git-read-tree: make one-way
merge also honor the "update" flag, 2005-06-07). Before this commit, we
may create the final ("dst") index entry from the one in index, but we
do clear CE_STAGEMASK.

I briefly checked two- and three-way merge functions. I think we don't
have the same problem in those.

Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@derrickstolee derrickstolee mentioned this pull request Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants