Skip to content

Commit f62312e

Browse files
ttaylorrpeff
authored andcommitted
packfile: introduce 'find_kept_pack_entry()'
Future callers will want a function to fill a 'struct pack_entry' for a given object id but _only_ from its position in any kept pack(s). In particular, an new 'git repack' mode which ensures the resulting packs form a geometric progress by object count will mark packs that it does not want to repack as "kept in-core", and it will want to halt a reachability traversal as soon as it visits an object in any of the kept packs. But, it does not want to halt the traversal at non-kept, or .keep packs. The obvious alternative is 'find_pack_entry()', but this doesn't quite suffice since it only returns the first pack it finds, which may or may not be kept (and the mru cache makes it unpredictable which one you'll get if there are options). Short of that, you could walk over all packs looking for the object in each one, but it scales with the number of packs, which may be prohibitive. Introduce 'find_kept_pack_entry()', a function which is like 'find_pack_entry()', but only fills in objects in the kept packs. Handle packs which have .keep files, as well as in-core kept packs separately, since certain callers will want to distinguish one from the other. (Though on-disk and in-core kept packs share the adjective "kept", it is best to think of the two sets as independent.) There is a gotcha when looking up objects that are duplicated in kept and non-kept packs, particularly when the MIDX stores the non-kept version and the caller asked for kept objects only. This could be resolved by teaching the MIDX to resolve duplicates by always favoring the kept pack (if one exists), but this breaks an assumption in existing MIDXs, and so it would require a format change. The benefit to changing the MIDX in this way is marginal, so we instead have a more thorough check here which is explained with a comment. Callers will be added in subsequent patches. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent 2283e0e commit f62312e

File tree

2 files changed

+64
-5
lines changed

2 files changed

+64
-5
lines changed

packfile.c

Lines changed: 59 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2042,7 +2042,10 @@ static int fill_pack_entry(const struct object_id *oid,
20422042
return 1;
20432043
}
20442044

2045-
int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e)
2045+
static int find_one_pack_entry(struct repository *r,
2046+
const struct object_id *oid,
2047+
struct pack_entry *e,
2048+
int kept_only)
20462049
{
20472050
struct list_head *pos;
20482051
struct multi_pack_index *m;
@@ -2052,26 +2055,77 @@ int find_pack_entry(struct repository *r, const struct object_id *oid, struct pa
20522055
return 0;
20532056

20542057
for (m = r->objects->multi_pack_index; m; m = m->next) {
2055-
if (fill_midx_entry(r, oid, e, m))
2058+
if (!fill_midx_entry(r, oid, e, m))
2059+
continue;
2060+
2061+
if (!kept_only)
2062+
return 1;
2063+
2064+
if (((kept_only & ON_DISK_KEEP_PACKS) && e->p->pack_keep) ||
2065+
((kept_only & IN_CORE_KEEP_PACKS) && e->p->pack_keep_in_core))
20562066
return 1;
20572067
}
20582068

20592069
list_for_each(pos, &r->objects->packed_git_mru) {
20602070
struct packed_git *p = list_entry(pos, struct packed_git, mru);
2061-
if (!p->multi_pack_index && fill_pack_entry(oid, e, p)) {
2062-
list_move(&p->mru, &r->objects->packed_git_mru);
2063-
return 1;
2071+
if (p->multi_pack_index && !kept_only) {
2072+
/*
2073+
* If this pack is covered by the MIDX, we'd have found
2074+
* the object already in the loop above if it was here,
2075+
* so don't bother looking.
2076+
*
2077+
* The exception is if we are looking only at kept
2078+
* packs. An object can be present in two packs covered
2079+
* by the MIDX, one kept and one not-kept. And as the
2080+
* MIDX points to only one copy of each object, it might
2081+
* have returned only the non-kept version above. We
2082+
* have to check again to be thorough.
2083+
*/
2084+
continue;
2085+
}
2086+
if (!kept_only ||
2087+
(((kept_only & ON_DISK_KEEP_PACKS) && p->pack_keep) ||
2088+
((kept_only & IN_CORE_KEEP_PACKS) && p->pack_keep_in_core))) {
2089+
if (fill_pack_entry(oid, e, p)) {
2090+
list_move(&p->mru, &r->objects->packed_git_mru);
2091+
return 1;
2092+
}
20642093
}
20652094
}
20662095
return 0;
20672096
}
20682097

2098+
int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e)
2099+
{
2100+
return find_one_pack_entry(r, oid, e, 0);
2101+
}
2102+
2103+
int find_kept_pack_entry(struct repository *r,
2104+
const struct object_id *oid,
2105+
unsigned flags,
2106+
struct pack_entry *e)
2107+
{
2108+
/*
2109+
* Load all packs, including midx packs, since our "kept" strategy
2110+
* relies on that. We're relying on the side effect of it setting up
2111+
* r->objects->packed_git, which is a little ugly.
2112+
*/
2113+
get_all_packs(r);
2114+
return find_one_pack_entry(r, oid, e, flags);
2115+
}
2116+
20692117
int has_object_pack(const struct object_id *oid)
20702118
{
20712119
struct pack_entry e;
20722120
return find_pack_entry(the_repository, oid, &e);
20732121
}
20742122

2123+
int has_object_kept_pack(const struct object_id *oid, unsigned flags)
2124+
{
2125+
struct pack_entry e;
2126+
return find_kept_pack_entry(the_repository, oid, flags, &e);
2127+
}
2128+
20752129
int has_pack_index(const unsigned char *sha1)
20762130
{
20772131
struct stat st;

packfile.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,13 +162,18 @@ int packed_object_info(struct repository *r,
162162
void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1);
163163
const struct packed_git *has_packed_and_bad(struct repository *r, const unsigned char *sha1);
164164

165+
#define ON_DISK_KEEP_PACKS 1
166+
#define IN_CORE_KEEP_PACKS 2
167+
165168
/*
166169
* Iff a pack file in the given repository contains the object named by sha1,
167170
* return true and store its location to e.
168171
*/
169172
int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e);
173+
int find_kept_pack_entry(struct repository *r, const struct object_id *oid, unsigned flags, struct pack_entry *e);
170174

171175
int has_object_pack(const struct object_id *oid);
176+
int has_object_kept_pack(const struct object_id *oid, unsigned flags);
172177

173178
int has_pack_index(const unsigned char *sha1);
174179

0 commit comments

Comments
 (0)