Skip to content

Commit 126640a

Browse files
dschoJ. Bruce Fields
authored andcommitted
Add a birdview-on-the-source-code section to the user manual
In http://thread.gmane.org/gmane.comp.version-control.git/42479, a birdview on the source code was requested. J. Bruce Fields suggested that my reply should be included in the user manual, and there was nothing of an outcry, so here it is, not 2 months later. It includes modifications as suggested by J. Bruce Fields, Karl Hasselström and Daniel Barkalow. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
1 parent 0ab564b commit 126640a

File tree

1 file changed

+219
-0
lines changed

1 file changed

+219
-0
lines changed

Documentation/user-manual.txt

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3160,6 +3160,225 @@ confusing and scary messages, but it won't actually do anything bad. In
31603160
contrast, running "git prune" while somebody is actively changing the
31613161
repository is a *BAD* idea).
31623162

3163+
[[birdview-on-the-source-code]]
3164+
A birdview on Git's source code
3165+
-----------------------------
3166+
3167+
While Git's source code is quite elegant, it is not always easy for
3168+
new developers to find their way through it. A good idea is to look
3169+
at the contents of the initial commit:
3170+
_e83c5163316f89bfbde7d9ab23ca2e25604af290_ (also known as _v0.99~954_).
3171+
3172+
Tip: you can see what files are in there with
3173+
3174+
----------------------------------------------------
3175+
$ git show e83c5163316f89bfbde7d9ab23ca2e25604af290:
3176+
----------------------------------------------------
3177+
3178+
and look at those files with something like
3179+
3180+
-----------------------------------------------------------
3181+
$ git show e83c5163316f89bfbde7d9ab23ca2e25604af290:cache.h
3182+
-----------------------------------------------------------
3183+
3184+
Be sure to read the README in that revision _after_ you are familiar with
3185+
the terminology (<<glossary>>), since the terminology has changed a little
3186+
since then. For example, we call the things "commits" now, which are
3187+
described in that README as "changesets".
3188+
3189+
Actually a lot of the structure as it is now can be explained by that
3190+
initial commit.
3191+
3192+
For example, we do not call it "cache" any more, but "index", however, the
3193+
file is still called `cache.h`. Remark: Not much reason to change it now,
3194+
especially since there is no good single name for it anyway, because it is
3195+
basically _the_ header file which is included by _all_ of Git's C sources.
3196+
3197+
If you grasp the ideas in that initial commit (it is really small and you
3198+
can get into it really fast, and it will help you recognize things in the
3199+
much larger code base we have now), you should go on skimming `cache.h`,
3200+
`object.h` and `commit.h` in the current version.
3201+
3202+
In the early days, Git (in the tradition of UNIX) was a bunch of programs
3203+
which were extremely simple, and which you used in scripts, piping the
3204+
output of one into another. This turned out to be good for initial
3205+
development, since it was easier to test new things. However, recently
3206+
many of these parts have become builtins, and some of the core has been
3207+
"libified", i.e. put into libgit.a for performance, portability reasons,
3208+
and to avoid code duplication.
3209+
3210+
By now, you know what the index is (and find the corresponding data
3211+
structures in `cache.h`), and that there are just a couple of object types
3212+
(blobs, trees, commits and tags) which inherit their common structure from
3213+
`struct object`, which is their first member (and thus, you can cast e.g.
3214+
`(struct object *)commit` to achieve the _same_ as `&commit->object`, i.e.
3215+
get at the object name and flags).
3216+
3217+
Now is a good point to take a break to let this information sink in.
3218+
3219+
Next step: get familiar with the object naming. Read <<naming-commits>>.
3220+
There are quite a few ways to name an object (and not only revisions!).
3221+
All of these are handled in `sha1_name.c`. Just have a quick look at
3222+
the function `get_sha1()`. A lot of the special handling is done by
3223+
functions like `get_sha1_basic()` or the likes.
3224+
3225+
This is just to get you into the groove for the most libified part of Git:
3226+
the revision walker.
3227+
3228+
Basically, the initial version of `git log` was a shell script:
3229+
3230+
----------------------------------------------------------------
3231+
$ git-rev-list --pretty $(git-rev-parse --default HEAD "$@") | \
3232+
LESS=-S ${PAGER:-less}
3233+
----------------------------------------------------------------
3234+
3235+
What does this mean?
3236+
3237+
`git-rev-list` is the original version of the revision walker, which
3238+
_always_ printed a list of revisions to stdout. It is still functional,
3239+
and needs to, since most new Git programs start out as scripts using
3240+
`git-rev-list`.
3241+
3242+
`git-rev-parse` is not as important any more; it was only used to filter out
3243+
options that were relevant for the different plumbing commands that were
3244+
called by the script.
3245+
3246+
Most of what `git-rev-list` did is contained in `revision.c` and
3247+
`revision.h`. It wraps the options in a struct named `rev_info`, which
3248+
controls how and what revisions are walked, and more.
3249+
3250+
The original job of `git-rev-parse` is now taken by the function
3251+
`setup_revisions()`, which parses the revisions and the common command line
3252+
options for the revision walker. This information is stored in the struct
3253+
`rev_info` for later consumption. You can do your own command line option
3254+
parsing after calling `setup_revisions()`. After that, you have to call
3255+
`prepare_revision_walk()` for initialization, and then you can get the
3256+
commits one by one with the function `get_revision()`.
3257+
3258+
If you are interested in more details of the revision walking process,
3259+
just have a look at the first implementation of `cmd_log()`; call
3260+
`git-show v1.3.0~155^2~4` and scroll down to that function (note that you
3261+
no longer need to call `setup_pager()` directly).
3262+
3263+
Nowadays, `git log` is a builtin, which means that it is _contained_ in the
3264+
command `git`. The source side of a builtin is
3265+
3266+
- a function called `cmd_<bla>`, typically defined in `builtin-<bla>.c`,
3267+
and declared in `builtin.h`,
3268+
3269+
- an entry in the `commands[]` array in `git.c`, and
3270+
3271+
- an entry in `BUILTIN_OBJECTS` in the `Makefile`.
3272+
3273+
Sometimes, more than one builtin is contained in one source file. For
3274+
example, `cmd_whatchanged()` and `cmd_log()` both reside in `builtin-log.c`,
3275+
since they share quite a bit of code. In that case, the commands which are
3276+
_not_ named like the `.c` file in which they live have to be listed in
3277+
`BUILT_INS` in the `Makefile`.
3278+
3279+
`git log` looks more complicated in C than it does in the original script,
3280+
but that allows for a much greater flexibility and performance.
3281+
3282+
Here again it is a good point to take a pause.
3283+
3284+
Lesson three is: study the code. Really, it is the best way to learn about
3285+
the organization of Git (after you know the basic concepts).
3286+
3287+
So, think about something which you are interested in, say, "how can I
3288+
access a blob just knowing the object name of it?". The first step is to
3289+
find a Git command with which you can do it. In this example, it is either
3290+
`git show` or `git cat-file`.
3291+
3292+
For the sake of clarity, let's stay with `git cat-file`, because it
3293+
3294+
- is plumbing, and
3295+
3296+
- was around even in the initial commit (it literally went only through
3297+
some 20 revisions as `cat-file.c`, was renamed to `builtin-cat-file.c`
3298+
when made a builtin, and then saw less than 10 versions).
3299+
3300+
So, look into `builtin-cat-file.c`, search for `cmd_cat_file()` and look what
3301+
it does.
3302+
3303+
------------------------------------------------------------------
3304+
git_config(git_default_config);
3305+
if (argc != 3)
3306+
usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>");
3307+
if (get_sha1(argv[2], sha1))
3308+
die("Not a valid object name %s", argv[2]);
3309+
------------------------------------------------------------------
3310+
3311+
Let's skip over the obvious details; the only really interesting part
3312+
here is the call to `get_sha1()`. It tries to interpret `argv[2]` as an
3313+
object name, and if it refers to an object which is present in the current
3314+
repository, it writes the resulting SHA-1 into the variable `sha1`.
3315+
3316+
Two things are interesting here:
3317+
3318+
- `get_sha1()` returns 0 on _success_. This might surprise some new
3319+
Git hackers, but there is a long tradition in UNIX to return different
3320+
negative numbers in case of different errors -- and 0 on success.
3321+
3322+
- the variable `sha1` in the function signature of `get_sha1()` is `unsigned
3323+
char *`, but is actually expected to be a pointer to `unsigned
3324+
char[20]`. This variable will contain the 160-bit SHA-1 of the given
3325+
commit. Note that whenever a SHA-1 is passed as "unsigned char *", it
3326+
is the binary representation, as opposed to the ASCII representation in
3327+
hex characters, which is passed as "char *".
3328+
3329+
You will see both of these things throughout the code.
3330+
3331+
Now, for the meat:
3332+
3333+
-----------------------------------------------------------------------------
3334+
case 0:
3335+
buf = read_object_with_reference(sha1, argv[1], &size, NULL);
3336+
-----------------------------------------------------------------------------
3337+
3338+
This is how you read a blob (actually, not only a blob, but any type of
3339+
object). To know how the function `read_object_with_reference()` actually
3340+
works, find the source code for it (something like `git grep
3341+
read_object_with | grep ":[a-z]"` in the git repository), and read
3342+
the source.
3343+
3344+
To find out how the result can be used, just read on in `cmd_cat_file()`:
3345+
3346+
-----------------------------------
3347+
write_or_die(1, buf, size);
3348+
-----------------------------------
3349+
3350+
Sometimes, you do not know where to look for a feature. In many such cases,
3351+
it helps to search through the output of `git log`, and then `git show` the
3352+
corresponding commit.
3353+
3354+
Example: If you know that there was some test case for `git bundle`, but
3355+
do not remember where it was (yes, you _could_ `git grep bundle t/`, but that
3356+
does not illustrate the point!):
3357+
3358+
------------------------
3359+
$ git log --no-merges t/
3360+
------------------------
3361+
3362+
In the pager (`less`), just search for "bundle", go a few lines back,
3363+
and see that it is in commit 18449ab0... Now just copy this object name,
3364+
and paste it into the command line
3365+
3366+
-------------------
3367+
$ git show 18449ab0
3368+
-------------------
3369+
3370+
Voila.
3371+
3372+
Another example: Find out what to do in order to make some script a
3373+
builtin:
3374+
3375+
-------------------------------------------------
3376+
$ git log --no-merges --diff-filter=A builtin-*.c
3377+
-------------------------------------------------
3378+
3379+
You see, Git is actually the best tool to find out about the source of Git
3380+
itself!
3381+
31633382
[[glossary]]
31643383
include::glossary.txt[]
31653384

0 commit comments

Comments
 (0)