Skip to content

Commit fb63d7f

Browse files
committed
git-add: make the entry stat-clean after re-adding the same contents
Earlier in commit 0781b8a (add_file_to_index: skip rehashing if the cached stat already matches), add_file_to_index() were taught not to re-add the path if it already matches the index. The change meant well, but was not executed quite right. It used ie_modified() to see if the file on the work tree is really different from the index, and skipped adding the contents if the function says "not modified". This was wrong. There are three possible comparison results between the index and the file in the work tree: - with lstat(2) we _know_ they are different. E.g. if the length or the owner in the cached stat information is different from the length we just obtained from lstat(2), we can tell the file is modified without looking at the actual contents. - with lstat(2) we _know_ they are the same. The same length, the same owner, the same everything (but this has a twist, as described below). - we cannot tell from lstat(2) information alone and need to go to the filesystem to actually compare. The last case arises from what we call 'racy git' situation, that can be caused with this sequence: $ echo hello >file $ git add file $ echo aeiou >file ;# the same length If the second "echo" is done within the same filesystem timestamp granularity as the first "echo", then the timestamp recorded by "git add" and the timestamp we get from lstat(2) will be the same, and we can mistakenly say the file is not modified. The path is called 'racily clean'. We need to reliably detect racily clean paths are in fact modified. To solve this problem, when we write out the index, we mark the index entry that has the same timestamp as the index file itself (that is the time from the point of view of the filesystem) to tell any later code that does the lstat(2) comparison not to trust the cached stat info, and ie_modified() then actually goes to the filesystem to compare the contents for such a path. That's all good, but it should not be used for this "git add" optimization, as the goal of "git add" is to actually update the path in the index and make it stat-clean. With the false optimization, we did _not_ cause any data loss (after all, what we failed to do was only to update the cached stat information), but it made the following sequence leave the file stat dirty: $ echo hello >file $ git add file $ echo hello >file ;# the same contents $ git add file The solution is not to use ie_modified() which goes to the filesystem to see if it is really clean, but instead use ie_match_stat() with "assume racily clean paths are dirty" option, to force re-adding of such a path. There was another problem with "git add -u". The codepath shares the same issue when adding the paths that are found to be modified, but in addition, it asked "git diff-files" machinery run_diff_files() function (which is "git diff-files") to list the paths that are modified. But "git diff-files" machinery uses the same ie_modified() call so that it does not report racily clean _and_ actually clean paths as modified, which is not what we want. The patch allows the callers of run_diff_files() to pass the same "assume racily clean paths are dirty" option, and makes "git-add -u" codepath to use that option, to discover and re-add racily clean _and_ actually clean paths. We could further optimize on top of this patch to differentiate the case where the path really needs re-adding (i.e. the content of the racily clean entry was indeed different) and the case where only the cached stat information needs to be refreshed (i.e. the racily clean entry was actually clean), but I do not think it is worth it. This patch applies to maint and all the way up. Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent 4bd5b7d commit fb63d7f

File tree

4 files changed

+8
-3
lines changed

4 files changed

+8
-3
lines changed

builtin-add.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ static void update(int verbose, const char *prefix, const char **files)
123123
rev.diffopt.format_callback_data = &verbose;
124124
if (read_cache() < 0)
125125
die("index file corrupt");
126-
run_diff_files(&rev, 0);
126+
run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
127127
}
128128

129129
static void refresh(int verbose, const char **pathspec)

diff-lib.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -338,6 +338,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
338338
int entries, i;
339339
int diff_unmerged_stage = revs->max_count;
340340
int silent_on_removed = option & DIFF_SILENT_ON_REMOVED;
341+
unsigned ce_option = ((option & DIFF_RACY_IS_MODIFIED)
342+
? CE_MATCH_RACY_IS_DIRTY : 0);
341343

342344
if (diff_unmerged_stage < 0)
343345
diff_unmerged_stage = 2;
@@ -443,7 +445,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
443445
ce->sha1, ce->name, NULL);
444446
continue;
445447
}
446-
changed = ce_match_stat(ce, &st, 0);
448+
changed = ce_match_stat(ce, &st, ce_option);
447449
if (!changed && !revs->diffopt.find_copies_harder)
448450
continue;
449451
oldmode = ntohl(ce->ce_mode);

diff.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,8 @@ extern const char *diff_unique_abbrev(const unsigned char *, int);
226226

227227
/* do not report anything on removed paths */
228228
#define DIFF_SILENT_ON_REMOVED 01
229+
/* report racily-clean paths as modified */
230+
#define DIFF_RACY_IS_MODIFIED 02
229231
extern int run_diff_files(struct rev_info *revs, unsigned int option);
230232
extern int setup_diff_no_index(struct rev_info *revs,
231233
int argc, const char ** argv, int nongit, const char *prefix);

read-cache.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -388,6 +388,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
388388
int size, namelen, pos;
389389
struct stat st;
390390
struct cache_entry *ce;
391+
unsigned ce_option = CE_MATCH_IGNORE_VALID|CE_MATCH_RACY_IS_DIRTY;
391392

392393
if (lstat(path, &st))
393394
die("%s: unable to stat (%s)", path, strerror(errno));
@@ -422,7 +423,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
422423
pos = index_name_pos(istate, ce->name, namelen);
423424
if (0 <= pos &&
424425
!ce_stage(istate->cache[pos]) &&
425-
!ie_modified(istate, istate->cache[pos], &st, CE_MATCH_IGNORE_VALID)) {
426+
!ie_match_stat(istate, istate->cache[pos], &st, ce_option)) {
426427
/* Nothing changed, really */
427428
free(ce);
428429
return 0;

0 commit comments

Comments
 (0)