Skip to content

Commit 3443546

Browse files
Linus TorvaldsJunio C Hamano
authored andcommitted
Use a *real* built-in diff generator
This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
1 parent c150462 commit 3443546

File tree

14 files changed

+1820
-8
lines changed

14 files changed

+1820
-8
lines changed

Makefile

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,7 @@ PYMODULES = \
188188
gitMergeCommon.py
189189

190190
LIB_FILE=libgit.a
191+
XDIFF_LIB=xdiff/lib.a
191192

192193
LIB_H = \
193194
blob.h cache.h commit.h count-delta.h csum-file.h delta.h \
@@ -209,7 +210,7 @@ LIB_OBJS = \
209210
fetch-clone.o revision.o pager.o \
210211
$(DIFF_OBJS)
211212

212-
LIBS = $(LIB_FILE)
213+
LIBS = $(LIB_FILE) $(XDIFF_LIB)
213214
LIBS += -lz
214215

215216
#
@@ -544,12 +545,18 @@ init-db.o: init-db.c
544545
-DDEFAULT_GIT_TEMPLATE_DIR='"$(template_dir_SQ)"' $*.c
545546

546547
$(LIB_OBJS): $(LIB_H)
547-
$(patsubst git-%$X,%.o,$(PROGRAMS)): $(LIB_H)
548+
$(patsubst git-%$X,%.o,$(PROGRAMS)): $(LIBS)
548549
$(DIFF_OBJS): diffcore.h
549550

550551
$(LIB_FILE): $(LIB_OBJS)
551552
$(AR) rcs $@ $(LIB_OBJS)
552553

554+
XDIFF_OBJS=xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o
555+
556+
$(XDIFF_LIB): $(XDIFF_OBJS)
557+
$(AR) rcs $@ $(XDIFF_OBJS)
558+
559+
553560
doc:
554561
$(MAKE) -C Documentation all
555562

diff.c

Lines changed: 73 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
#include "quote.h"
99
#include "diff.h"
1010
#include "diffcore.h"
11+
#include "xdiff/xdiff.h"
1112

1213
static const char *diff_opts = "-pu";
1314

@@ -178,6 +179,49 @@ static void emit_rewrite_diff(const char *name_a,
178179
copy_file('+', temp[1].name);
179180
}
180181

182+
static int fill_mmfile(mmfile_t *mf, const char *file)
183+
{
184+
int fd = open(file, O_RDONLY);
185+
struct stat st;
186+
char *buf;
187+
unsigned long size;
188+
189+
mf->ptr = NULL;
190+
mf->size = 0;
191+
if (fd < 0)
192+
return 0;
193+
fstat(fd, &st);
194+
size = st.st_size;
195+
buf = xmalloc(size);
196+
mf->ptr = buf;
197+
mf->size = size;
198+
while (size) {
199+
int retval = read(fd, buf, size);
200+
if (retval < 0) {
201+
if (errno == EINTR || errno == EAGAIN)
202+
continue;
203+
break;
204+
}
205+
if (!retval)
206+
break;
207+
buf += retval;
208+
size -= retval;
209+
}
210+
mf->size -= size;
211+
close(fd);
212+
return 0;
213+
}
214+
215+
static int fn_out(void *priv, mmbuffer_t *mb, int nbuf)
216+
{
217+
int i;
218+
219+
for (i = 0; i < nbuf; i++)
220+
if (!fwrite(mb[i].ptr, mb[i].size, 1, stdout))
221+
return -1;
222+
return 0;
223+
}
224+
181225
static const char *builtin_diff(const char *name_a,
182226
const char *name_b,
183227
struct diff_tempfile *temp,
@@ -186,6 +230,7 @@ static const char *builtin_diff(const char *name_a,
186230
const char **args)
187231
{
188232
int i, next_at, cmd_size;
233+
mmfile_t mf1, mf2;
189234
const char *const diff_cmd = "diff -L%s -L%s";
190235
const char *const diff_arg = "-- %s %s||:"; /* "||:" is to return 0 */
191236
const char *input_name_sq[2];
@@ -255,12 +300,34 @@ static const char *builtin_diff(const char *name_a,
255300
}
256301
}
257302

258-
/* This is disgusting */
259-
*args++ = "sh";
260-
*args++ = "-c";
261-
*args++ = cmd;
262-
*args = NULL;
263-
return "/bin/sh";
303+
/* Un-quote the paths */
304+
if (label_path[0][0] != '/')
305+
label_path[0] = quote_two("a/", name_a);
306+
if (label_path[1][0] != '/')
307+
label_path[1] = quote_two("b/", name_b);
308+
309+
printf("--- %s\n", label_path[0]);
310+
printf("+++ %s\n", label_path[1]);
311+
312+
if (fill_mmfile(&mf1, temp[0].name) < 0 ||
313+
fill_mmfile(&mf2, temp[1].name) < 0)
314+
die("unable to read files to diff");
315+
316+
/* Crazy xdl interfaces.. */
317+
{
318+
xpparam_t xpp;
319+
xdemitconf_t xecfg;
320+
xdemitcb_t ecb;
321+
322+
xpp.flags = XDF_NEED_MINIMAL;
323+
xecfg.ctxlen = 3;
324+
ecb.outf = fn_out;
325+
xdl_diff(&mf1, &mf2, &xpp, &xecfg, &ecb);
326+
}
327+
328+
free(mf1.ptr);
329+
free(mf2.ptr);
330+
return NULL;
264331
}
265332

266333
struct diff_filespec *alloc_filespec(const char *path)

xdiff/xdiff.h

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
/*
2+
* LibXDiff by Davide Libenzi ( File Differential Library )
3+
* Copyright (C) 2003 Davide Libenzi
4+
*
5+
* This library is free software; you can redistribute it and/or
6+
* modify it under the terms of the GNU Lesser General Public
7+
* License as published by the Free Software Foundation; either
8+
* version 2.1 of the License, or (at your option) any later version.
9+
*
10+
* This library is distributed in the hope that it will be useful,
11+
* but WITHOUT ANY WARRANTY; without even the implied warranty of
12+
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13+
* Lesser General Public License for more details.
14+
*
15+
* You should have received a copy of the GNU Lesser General Public
16+
* License along with this library; if not, write to the Free Software
17+
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
18+
*
19+
* Davide Libenzi <davidel@xmailserver.org>
20+
*
21+
*/
22+
23+
#if !defined(XDIFF_H)
24+
#define XDIFF_H
25+
26+
#ifdef __cplusplus
27+
extern "C" {
28+
#endif /* #ifdef __cplusplus */
29+
30+
31+
#define XDF_NEED_MINIMAL (1 << 1)
32+
33+
#define XDL_PATCH_NORMAL '-'
34+
#define XDL_PATCH_REVERSE '+'
35+
#define XDL_PATCH_MODEMASK ((1 << 8) - 1)
36+
#define XDL_PATCH_IGNOREBSPACE (1 << 8)
37+
38+
#define XDL_MMB_READONLY (1 << 0)
39+
40+
#define XDL_MMF_ATOMIC (1 << 0)
41+
42+
#define XDL_BDOP_INS 1
43+
#define XDL_BDOP_CPY 2
44+
#define XDL_BDOP_INSB 3
45+
46+
47+
typedef struct s_mmfile {
48+
char *ptr;
49+
long size;
50+
} mmfile_t;
51+
52+
typedef struct s_mmbuffer {
53+
char *ptr;
54+
long size;
55+
} mmbuffer_t;
56+
57+
typedef struct s_xpparam {
58+
unsigned long flags;
59+
} xpparam_t;
60+
61+
typedef struct s_xdemitcb {
62+
void *priv;
63+
int (*outf)(void *, mmbuffer_t *, int);
64+
} xdemitcb_t;
65+
66+
typedef struct s_xdemitconf {
67+
long ctxlen;
68+
} xdemitconf_t;
69+
70+
typedef struct s_bdiffparam {
71+
long bsize;
72+
} bdiffparam_t;
73+
74+
75+
#define xdl_malloc(x) malloc(x)
76+
#define xdl_free(ptr) free(ptr)
77+
#define xdl_realloc(ptr,x) realloc(ptr,x)
78+
79+
void *xdl_mmfile_first(mmfile_t *mmf, long *size);
80+
void *xdl_mmfile_next(mmfile_t *mmf, long *size);
81+
long xdl_mmfile_size(mmfile_t *mmf);
82+
83+
int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
84+
xdemitconf_t const *xecfg, xdemitcb_t *ecb);
85+
86+
#ifdef __cplusplus
87+
}
88+
#endif /* #ifdef __cplusplus */
89+
90+
#endif /* #if !defined(XDIFF_H) */
91+

0 commit comments

Comments
 (0)