Add support for cloning SHA256 repositories by pjbgf · Pull Request #1825 · go-git/go-git

pjbgf · 2026-01-26T07:30:14Z

This PR progresses the SHA256 support journey by:

Enabling the cloning of SHA1 and SHA256 repositories.
Setting the object-format capability from the server side when it is supported for both SHA1 and SHA256.
Adding new storer.ObjectFormatGetter and storage.ObjectFormatSetter interfaces and implementing them on in-tree storage options. External storage will continue to work, however won't be able to support SHA256 until they implement the new interfaces.

Running the clone example against a sha256 repository:

$ go run _examples/clone/main.go https://gitlab.com/pjbgf/sha256.git /tmp/test4
git clone https://gitlab.com/pjbgf/sha256.git /tmp/test4 --recursive
commit b2755b63e92d79410f28a33bef90107789c63a139121a457920c668d5805e974
Author: Paulo Gomes <pjbgf@linux.com>
Date:   Thu Sep 25 22:44:22 2025 +0100

    Add dirs

    Signed-off-by: Paulo Gomes <pjbgf@linux.com>

During the implementation I stumbled onto other issues which I fixed as part of this PR as it was either blocking the way, or the change was small enough to make sense to do so:

format/idxfile lacked defensive programming to remove (or error) when objects were not fully materialised, which could lead to invalid index files. This resulted into object not found errors, which may be related to existing reports of such symptom still happening on v6 (e.g. Pull returns object not found #207 go-git sometimes returns object not found when querying for the commit object for the HEAD #827)
format/idxfile was growing the backing arrays for names, offset and crc32 as it iterated through objects. This is rather inefficient and has performance impact when processing very large pack files.
Expanded on test coverage for filesystem and memory storage, to avoid future regression when handling ObjectFormat.

Relates to #706.

ℹ️ Some of this work was done as part of openSUSE's Hack Week 25. Thanks SUSE for sponsoring the go-git project. 🙌

Copilot

Pull request overview

This PR progresses SHA256 support by enabling cloning of SHA256 repositories. It introduces ObjectFormatGetter and ObjectFormatSetter interfaces to allow storage backends to report and configure their hash algorithm. The implementation updates transport protocols, packfile handling, and index file encoding to support both SHA1 and SHA256.

Changes:

Added ObjectFormatGetter and ObjectFormatSetter interfaces implemented by in-tree storage backends (memory and filesystem)
Updated transport layer to detect and propagate object format from capabilities or hash sizes
Modified packfile parser/scanner to handle SHA256 hashes and updated idxfile writer to skip unmaterialized delta objects
Removed ErrSHA256NotSupported error as SHA256 is now supported

Reviewed changes

Copilot reviewed 49 out of 49 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
storage/storer.go	Defines new ObjectFormatGetter/Setter interfaces
storage/memory/storage.go	Implements object format interfaces for memory storage
storage/filesystem/storage.go	Implements object format interfaces for filesystem storage
plumbing/transport/*.go	Updates transport to detect and pass object format to packfile parser
plumbing/format/packfile/*.go	Updates packfile parsing to support SHA256 hashes
plumbing/format/idxfile/*.go	Refactors encoder and adds defensive checks for unmaterialized objects
plumbing/protocol/packp/*.go	Updates protocol parsing to handle variable hash sizes
repository.go	Removes ErrSHA256NotSupported (breaking change)
repository_test.go	Adds comprehensive tests for SHA256 cloning
worktree_commit_test.go	Converts assertions to require for better test reliability
internal/server/*	Adds internal HTTP server infrastructure for testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

plumbing/transport/fetch.go

plumbing/format/revfile/encoder.go

storage/filesystem/storage.go

worktree_commit_test.go

repository.go

Copilot · 2026-01-26T07:34:13Z

storage/memory/storage.go

+	s.config.Extensions.ObjectFormat = of
+	s.config.Core.RepositoryFormatVersion = formatcfg.Version1


Direct access to s.config on lines 88-89 could cause a nil pointer dereference if config has not been initialized yet. While Config() initializes it lazily, SetObjectFormat doesn't call Config() first. This should either call s.Config() first to ensure initialization, or handle the nil case explicitly.

aymanbagabas · 2026-01-26T16:25:13Z

storage/storer.go

+// ObjectFormatGetter is implemented by storage backends that can report
+// their object format (hash algorithm), such as SHA-1 or SHA-256.
+type ObjectFormatGetter interface {
+	// ObjectFormat returns the object format (hash algorithm) used by this storage.
+	ObjectFormat() formatcfg.ObjectFormat
+}
+
+// ObjectFormatSetter is implemented by storage backends that support
+// configuring the object format (hash algorithm) used for the repository.
+// This is typically called during repository initialization or when the
+// first packfile is fetched.
+type ObjectFormatSetter interface {
+	// SetObjectFormat configures the object format (hash algorithm) for this storage.
+	SetObjectFormat(formatcfg.ObjectFormat) error
+}
+


Why do we want a custom interface for this? This makes it look like a repository object format is a special object in Git. Instead, I would reuse the config.ConfigStorer interface to get/set the object format and come up with a storage.ReadObjectFormat(config.ConfigStorer) and storage.SetObjectFormat(config.ConfigStorer, formatcfg.ObjectFormat) or something similar.

This was a compromise, the reasoning being:

When we init a new repository we don't know what ObjectFormat it will have. That information is acquired talking with the server and then the new format is set using UpdateObjectStorage. Alternatively, we could hold the repository initialisation to after the fetching process - but that would be a larger change.

A key motivation here was backwards compatibility with off-tree storage implementations. This approach means they continue to compile and work when interacting with SHA1 repositories. But safely fail when interacting with a SHA256 repo.

You need to specify the object format for new repositories, otherwise, it defaults to SHA1. That's how git works by default. Migrating a repository from one format to another requires creating a new repository with the new format, and import/export the repository from the old to the new format via git fast-import and git fast-export.

I think there should be an option to specify the object format for new repositories. UpdateObjectStorage and others should always use the repository format. The transport layer should validate that new requests match the storage format.

Empty repos are kind of this funny-place where you'd think the hash function is undefined, but it is actually defined and stored (implicitly or explicitly) as part of the config. The main reason is what happens when you have an empty repository, and then you clone it?

This is a valid operation that must work predictably. An empty repository is initialized by git forges like Gitea, then you clone it, add commits and push them. Waiting until you create the first objects to define the repository ObjectFormat does not work.

The data sent when you transfer a repo includes this type, even if there are no object, to make sure that the ObjectFormat is unambiguous.

Thank you both for the feedback in such a timely manner. This approach was taken to lean on existing primitives while Cloning new repositories as at that point in time you don't know what ObjectFormat your server has. As per upstream:

We have a chicken-and-egg situation between initializing the refdb
* and spawning transport helpers:
*
* - Initializing the refdb requires us to know about the object
* format. We thus have to spawn the transport helper to learn
* about it.
*
* - The transport helper may want to access the Git repository. But
* because the refdb has not been initialized, we don't have "HEAD"
* or "refs/". Thus, the helper cannot find the Git repository.

Upstream worked around this by introducing a lazy refdb init, which instead of initialising a valid refdb it actually creates an invalid refdb - only finishing off the init process once the target objectformat is agreed with the remote. Let me take that approach into consideration and update this PR accordingly.

Following-up from this discussion, I made a few changes to better align with upstream:

Added the partial init and moved the setting of object format to the pack negotiation logic. That only takes place on the first negotiation during a clone operation, subsequent negotiations won't trigger a object format change in the storage.

Tests were added to ensure that object format can't be changed on initialised repositories - regardless of it being set explicitly or not.

The go-git Storage abstraction for filesystem ignores any ObjectFormat passed-in to NewStorageWithOptions when the underlying fs contains an initialised repository. In such cases, the repository config is taken a single source of truth.

Are there any additional scenarios we may need to cater for on the test cases?

aymanbagabas · 2026-01-26T16:29:10Z

plumbing/format/packfile/common.go

+	if setter, ok := s.(storage.ObjectFormatSetter); ok {
+		err := setter.SetObjectFormat(of)
+		if err != nil {
+			return err
+		}
+	}


What happens if the repository has a different object format from the incoming packfile? I think in that case this should fail with an error. I think we should always use the storage object format and pass that down to the parser

This PR makes both memory and filesystem implementations to fail if they are not empty when a change of object format is requested.

This is a fail-safe mechanism to enable single object format repositories to work, while erroring when a repo is trying to handle multiple formats. This should be reviewed as part of the translation table implementation. I'm copying this thread to the issue so that this discussion can be discoverable.

aymanbagabas · 2026-01-26T16:30:16Z

plumbing/format/packfile/parser_options.go

 	}
 }

+func WithObjectFormat(of config.ObjectFormat) ParserOption {


Shouldn't we infer the object format from the repository storage?

Overall, I don't think we can:

A new repository likely won't have the ObjectFormat set. Even if it does, it may not align with the target pack being fetched. For example, new repo (defaults to SHA1) then fetches a SHA256 packfile.

An existing repository set with a different ObjectFormat than the one for the packfile being downloaded.

These changes would set the repository objectformat to the target packfile format, provided this is a new/empty repository. This feels like a safe approach until we have multi ObjectFormat support. What do you think?

For reference, in the filesystem storage, the Parser will be called here with the ObjectFormat coming from either the repository config, for existing underlying repo, or from the storage options otherwise.