@@ -97,3 +97,86 @@ Languages: C, shell(bash)
9797
9898Possible mentors:
9999* Christian Couder ` <christian.couder@gmail.com> `
100+
101+ ### Reachability bitmap improvements
102+
103+ [ Reachability bitmaps] [ vmg-bitmaps ] allow Git to quickly answer queries about
104+ which objects are reachable from a given commit. Instead of a commits parents
105+ and its root tree recursively, we can use a precomputed set of objects encoded
106+ as a bit-string and stored in a ` .bitmap ` file to answer the query near
107+ instantaneously.
108+
109+ There are a couple of areas where bitmap performance itself could be improved:
110+
111+ - Individual bitmaps are stored compressed (with [ EWAH] [ ewah ] ), but we have
112+ some sense that it can be slow to decompress individual bitmaps (which we
113+ have to do in order to read them, but also to do things like negate them, OR
114+ and AND them together, etc).
115+
116+ One possible project could be to explore using an alternative compression
117+ scheme (like the more modern [ Roaring+Run] [ roaring-run ] technique) to see if
118+ we can improve overall bitmap performance by reducing the amount of time it
119+ takes to read an individual bitmap.
120+
121+ This project would entail designing a suite of performance tests, along with
122+ any necessary changes to the ` .bitmap ` format necessary to accommodate the
123+ new compression scheme, making those changes, and then running the
124+ performance tests to measure the resulting speed-up.
125+
126+ - Loading a ` .bitmap ` file can be slow for large bitmaps. This is because we
127+ have to read the file sequentially in order to discover the offset of each
128+ bitmap within the file.
129+
130+ It should be possible to shave off some time from this step by including a
131+ small "table of contents" that indicates which commits have bitmaps, and
132+ where to find them in the ` .bitmap ` file. In the past [ some efforts have
133+ been made] [ ttaylorr-commit-table ] to do this. But we should undertake more
134+ performance testing to prove whether this is or isn't a good idea before
135+ submitting a patch series in this area.
136+
137+ - [ Recent changes] [ ttaylorr-bitmaps ] have made it possible to repack a
138+ repository's objects into a sequence of packs whose object count forms a
139+ geometric progression (e.g., if the first pack has ` N ` objects, the next
140+ pack will have at least ` 2N ` objects, then ` 4N ` objects and so on).
141+
142+ But even when repacking a repository in this way, regenerating its bitmaps
143+ can still take a long time. One possible approach to this would be a new
144+ mode of generating bitmaps that is more "incremental" in nature. In other
145+ words, a mode which only adds new bitmaps for commits introduced between
146+ successive bitmap generations.
147+
148+ Because of how individual bitmaps are generated, this will result in only
149+ having to traverse objects between the new bitmap tips and old ones,
150+ resulting in overall faster bitmap generation.
151+
152+ Like the above, this project would involve designing a set of performance
153+ tests, implementing the changes required to introduce this new type of
154+ bitmap generation, and then running those tests against your new code.
155+
156+ - Other (larger, longer-term) ideas include: rethinking how we select which
157+ commits receive bitmaps (and/or having bitmaps represent multiple commits
158+ instead of just one to "summarize" small sets of commits), or improving how
159+ we handle queries that use a bitmap but do not have complete coverage.
160+ GSoC students should consider these projects more advanced, and thus they
161+ are not explained in as much detail here. Instead, this point serves to
162+ illustrate that there are opportunities to explore larger projects should we
163+ decide they are more interesting than the above or we have time to take them
164+ on.
165+
166+ This project will give GSoC students a broad overview of reachability bitmaps,
167+ with the goal of improving their performance in some way or another. Students
168+ can expect hands-on mentorship, but will have the agency to pick one or more of
169+ the above sub-projects (or create their own!) that interests them most.
170+
171+ Project Size: Medium
172+
173+ Languages: C, shell
174+
175+ Possible mentors:
176+ * Taylor Blau ` <me@ttaylorr.com> `
177+
178+ [ vmg-bitmaps ] : https://github.blog/2015-09-22-counting-objects/
179+ [ ewah ] : https://arxiv.org/abs/0901.3751
180+ [ roaring-run ] : https://roaringbitmap.org/about/
181+ [ ttaylorr-commit-table ] : https://lore.kernel.org/git/YNuiM8TR5evSeNsN@nand.local/
182+ [ ttaylorr-bitmaps ] : https://github.blog/2021-04-29-scaling-monorepo-maintenance/
0 commit comments