@@ -3160,6 +3160,225 @@ confusing and scary messages, but it won't actually do anything bad. In
31603160contrast, running "git prune" while somebody is actively changing the
31613161repository is a *BAD* idea).
31623162
3163+ [[birdview-on-the-source-code]]
3164+ A birdview on Git's source code
3165+ -----------------------------
3166+
3167+ While Git's source code is quite elegant, it is not always easy for
3168+ new developers to find their way through it. A good idea is to look
3169+ at the contents of the initial commit:
3170+ _e83c5163316f89bfbde7d9ab23ca2e25604af290_ (also known as _v0.99~954_).
3171+
3172+ Tip: you can see what files are in there with
3173+
3174+ ----------------------------------------------------
3175+ $ git show e83c5163316f89bfbde7d9ab23ca2e25604af290:
3176+ ----------------------------------------------------
3177+
3178+ and look at those files with something like
3179+
3180+ -----------------------------------------------------------
3181+ $ git show e83c5163316f89bfbde7d9ab23ca2e25604af290:cache.h
3182+ -----------------------------------------------------------
3183+
3184+ Be sure to read the README in that revision _after_ you are familiar with
3185+ the terminology (<<glossary>>), since the terminology has changed a little
3186+ since then. For example, we call the things "commits" now, which are
3187+ described in that README as "changesets".
3188+
3189+ Actually a lot of the structure as it is now can be explained by that
3190+ initial commit.
3191+
3192+ For example, we do not call it "cache" any more, but "index", however, the
3193+ file is still called `cache.h`. Remark: Not much reason to change it now,
3194+ especially since there is no good single name for it anyway, because it is
3195+ basically _the_ header file which is included by _all_ of Git's C sources.
3196+
3197+ If you grasp the ideas in that initial commit (it is really small and you
3198+ can get into it really fast, and it will help you recognize things in the
3199+ much larger code base we have now), you should go on skimming `cache.h`,
3200+ `object.h` and `commit.h` in the current version.
3201+
3202+ In the early days, Git (in the tradition of UNIX) was a bunch of programs
3203+ which were extremely simple, and which you used in scripts, piping the
3204+ output of one into another. This turned out to be good for initial
3205+ development, since it was easier to test new things. However, recently
3206+ many of these parts have become builtins, and some of the core has been
3207+ "libified", i.e. put into libgit.a for performance, portability reasons,
3208+ and to avoid code duplication.
3209+
3210+ By now, you know what the index is (and find the corresponding data
3211+ structures in `cache.h`), and that there are just a couple of object types
3212+ (blobs, trees, commits and tags) which inherit their common structure from
3213+ `struct object`, which is their first member (and thus, you can cast e.g.
3214+ `(struct object *)commit` to achieve the _same_ as `&commit->object`, i.e.
3215+ get at the object name and flags).
3216+
3217+ Now is a good point to take a break to let this information sink in.
3218+
3219+ Next step: get familiar with the object naming. Read <<naming-commits>>.
3220+ There are quite a few ways to name an object (and not only revisions!).
3221+ All of these are handled in `sha1_name.c`. Just have a quick look at
3222+ the function `get_sha1()`. A lot of the special handling is done by
3223+ functions like `get_sha1_basic()` or the likes.
3224+
3225+ This is just to get you into the groove for the most libified part of Git:
3226+ the revision walker.
3227+
3228+ Basically, the initial version of `git log` was a shell script:
3229+
3230+ ----------------------------------------------------------------
3231+ $ git-rev-list --pretty $(git-rev-parse --default HEAD "$@") | \
3232+ LESS=-S ${PAGER:-less}
3233+ ----------------------------------------------------------------
3234+
3235+ What does this mean?
3236+
3237+ `git-rev-list` is the original version of the revision walker, which
3238+ _always_ printed a list of revisions to stdout. It is still functional,
3239+ and needs to, since most new Git programs start out as scripts using
3240+ `git-rev-list`.
3241+
3242+ `git-rev-parse` is not as important any more; it was only used to filter out
3243+ options that were relevant for the different plumbing commands that were
3244+ called by the script.
3245+
3246+ Most of what `git-rev-list` did is contained in `revision.c` and
3247+ `revision.h`. It wraps the options in a struct named `rev_info`, which
3248+ controls how and what revisions are walked, and more.
3249+
3250+ The original job of `git-rev-parse` is now taken by the function
3251+ `setup_revisions()`, which parses the revisions and the common command line
3252+ options for the revision walker. This information is stored in the struct
3253+ `rev_info` for later consumption. You can do your own command line option
3254+ parsing after calling `setup_revisions()`. After that, you have to call
3255+ `prepare_revision_walk()` for initialization, and then you can get the
3256+ commits one by one with the function `get_revision()`.
3257+
3258+ If you are interested in more details of the revision walking process,
3259+ just have a look at the first implementation of `cmd_log()`; call
3260+ `git-show v1.3.0~155^2~4` and scroll down to that function (note that you
3261+ no longer need to call `setup_pager()` directly).
3262+
3263+ Nowadays, `git log` is a builtin, which means that it is _contained_ in the
3264+ command `git`. The source side of a builtin is
3265+
3266+ - a function called `cmd_<bla>`, typically defined in `builtin-<bla>.c`,
3267+ and declared in `builtin.h`,
3268+
3269+ - an entry in the `commands[]` array in `git.c`, and
3270+
3271+ - an entry in `BUILTIN_OBJECTS` in the `Makefile`.
3272+
3273+ Sometimes, more than one builtin is contained in one source file. For
3274+ example, `cmd_whatchanged()` and `cmd_log()` both reside in `builtin-log.c`,
3275+ since they share quite a bit of code. In that case, the commands which are
3276+ _not_ named like the `.c` file in which they live have to be listed in
3277+ `BUILT_INS` in the `Makefile`.
3278+
3279+ `git log` looks more complicated in C than it does in the original script,
3280+ but that allows for a much greater flexibility and performance.
3281+
3282+ Here again it is a good point to take a pause.
3283+
3284+ Lesson three is: study the code. Really, it is the best way to learn about
3285+ the organization of Git (after you know the basic concepts).
3286+
3287+ So, think about something which you are interested in, say, "how can I
3288+ access a blob just knowing the object name of it?". The first step is to
3289+ find a Git command with which you can do it. In this example, it is either
3290+ `git show` or `git cat-file`.
3291+
3292+ For the sake of clarity, let's stay with `git cat-file`, because it
3293+
3294+ - is plumbing, and
3295+
3296+ - was around even in the initial commit (it literally went only through
3297+ some 20 revisions as `cat-file.c`, was renamed to `builtin-cat-file.c`
3298+ when made a builtin, and then saw less than 10 versions).
3299+
3300+ So, look into `builtin-cat-file.c`, search for `cmd_cat_file()` and look what
3301+ it does.
3302+
3303+ ------------------------------------------------------------------
3304+ git_config(git_default_config);
3305+ if (argc != 3)
3306+ usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>");
3307+ if (get_sha1(argv[2], sha1))
3308+ die("Not a valid object name %s", argv[2]);
3309+ ------------------------------------------------------------------
3310+
3311+ Let's skip over the obvious details; the only really interesting part
3312+ here is the call to `get_sha1()`. It tries to interpret `argv[2]` as an
3313+ object name, and if it refers to an object which is present in the current
3314+ repository, it writes the resulting SHA-1 into the variable `sha1`.
3315+
3316+ Two things are interesting here:
3317+
3318+ - `get_sha1()` returns 0 on _success_. This might surprise some new
3319+ Git hackers, but there is a long tradition in UNIX to return different
3320+ negative numbers in case of different errors -- and 0 on success.
3321+
3322+ - the variable `sha1` in the function signature of `get_sha1()` is `unsigned
3323+ char *`, but is actually expected to be a pointer to `unsigned
3324+ char[20]`. This variable will contain the 160-bit SHA-1 of the given
3325+ commit. Note that whenever a SHA-1 is passed as "unsigned char *", it
3326+ is the binary representation, as opposed to the ASCII representation in
3327+ hex characters, which is passed as "char *".
3328+
3329+ You will see both of these things throughout the code.
3330+
3331+ Now, for the meat:
3332+
3333+ -----------------------------------------------------------------------------
3334+ case 0:
3335+ buf = read_object_with_reference(sha1, argv[1], &size, NULL);
3336+ -----------------------------------------------------------------------------
3337+
3338+ This is how you read a blob (actually, not only a blob, but any type of
3339+ object). To know how the function `read_object_with_reference()` actually
3340+ works, find the source code for it (something like `git grep
3341+ read_object_with | grep ":[a-z]"` in the git repository), and read
3342+ the source.
3343+
3344+ To find out how the result can be used, just read on in `cmd_cat_file()`:
3345+
3346+ -----------------------------------
3347+ write_or_die(1, buf, size);
3348+ -----------------------------------
3349+
3350+ Sometimes, you do not know where to look for a feature. In many such cases,
3351+ it helps to search through the output of `git log`, and then `git show` the
3352+ corresponding commit.
3353+
3354+ Example: If you know that there was some test case for `git bundle`, but
3355+ do not remember where it was (yes, you _could_ `git grep bundle t/`, but that
3356+ does not illustrate the point!):
3357+
3358+ ------------------------
3359+ $ git log --no-merges t/
3360+ ------------------------
3361+
3362+ In the pager (`less`), just search for "bundle", go a few lines back,
3363+ and see that it is in commit 18449ab0... Now just copy this object name,
3364+ and paste it into the command line
3365+
3366+ -------------------
3367+ $ git show 18449ab0
3368+ -------------------
3369+
3370+ Voila.
3371+
3372+ Another example: Find out what to do in order to make some script a
3373+ builtin:
3374+
3375+ -------------------------------------------------
3376+ $ git log --no-merges --diff-filter=A builtin-*.c
3377+ -------------------------------------------------
3378+
3379+ You see, Git is actually the best tool to find out about the source of Git
3380+ itself!
3381+
31633382[[glossary]]
31643383include::glossary.txt[]
31653384
0 commit comments