Skip to content

feat: unstable SHA256 support#1206

Merged
ehuss merged 12 commits into
rust-lang:mainfrom
weihanglo:sha256-git2
May 18, 2026
Merged

feat: unstable SHA256 support#1206
ehuss merged 12 commits into
rust-lang:mainfrom
weihanglo:sha256-git2

Conversation

@weihanglo
Copy link
Copy Markdown
Member

@weihanglo weihanglo commented Jan 12, 2026

This adds an unstable-sha256 Cargo feature, as a follow-up of #1201.
Also adds smoke tests for affected operations/types.

Part of #1090

Design

Introduce ObjectFormat accessors and *_ext API variants unconditionally,
The actual SHA256 wiring are gated behind the unstable-sha256 Cargo feature
in order to keep public API additive.

The *_ext suffix follows libgit2's convention for extended function variants.
At next major version, the original methods may absorb the format parameter.
The actual change will leave for future.

Insta-stable

  • NEW ObjectFormat enum (#[non_exhaustive] when unstable-sha256 is not enabled)
  • NEW Oid::ZERO_SHA1 const
  • NEW Oid::object_format() to access underlying OID type
  • NEW Oid::raw_bytes() to return the full underlying byte buffer
  • NEW Repository::object_format() to get hash algo on a repo
  • NEW Remote::object_format() to get hash algo on a remote
  • NEW RepositoryInitOptions::object_format() setter
  • NEW Oid::from_str_ext / Oid::hash_object_ext / Oid::hash_file_ext
  • NEW Diff::from_buffer_ext
  • NEW Index::new_ext / Index::open_ext
  • NEW Indexer::new_ext
  • NEW Odb::new_ext
  • DEPRECATED Oid::zero() in favor of Oid::ZERO_SHA1

Behind unstable-sha256

  • NEW ObjectFormat::Sha256 enum variant
  • NEW Oid::ZERO_SHA256 const
  • CHANGED All _ext methods dispatch to SHA256-aware FFI paths
  • CHANGED Oid::as_bytes returns logical bytes (SHA1 -> 20; SHA256 -> 32)
  • CHANGED Oid::from_bytes accepts 20 or 32 bytes

Comment thread src/oid.rs Outdated
@lorenzleutgeb
Copy link
Copy Markdown

lorenzleutgeb commented Apr 25, 2026

I believe to have found something that you missed in this PR. Note that git2::util::zeroed_raw_oid does not take any arguments:

git2-rs/src/util.rs

Lines 271 to 275 in e6919b7

/// Creates a zeroed git_oid structure.
#[inline]
pub(crate) fn zeroed_raw_oid() -> raw::git_oid {
unsafe { std::mem::zeroed() }
}

I believe that this means that the kind: c_uchar (where c_uchar is defined as u8) member of the returned libgit2_sys::git_oid will have the value 0u8.

The function git2::util::zeroed_raw_oid is called by git2::Oid::zero, which also takes no arguments. If the function git2::Oid::object_format is called on the result of git2::Oid::zero, the call

unsafe { Binding::from_raw(self.raw.kind as raw::git_oid_t) }

will hit the panic! at

_ => panic!("Unknown git oid type"),

This is because 0u8 as u32 (which we got from std::mem::zeroed) is neither GIT_OID_SHA1 = 1, nor GIT_OID_SHA256 = 2.

I discovered this experimenting in code that is downstream of git2, building against this PR. In one of my tests I did essentially git2::Oid::zero().object_format().

Then I started to think about how much unsafe really is necessary here. Maybe zeroed_raw_oid could look like

#[inline]
pub(crate) fn zeroed_raw_oid(format: ObjectFormat) -> raw::git_oid {
  raw::git_oid {
    kind: format.raw(),
    id: [0; raw::GIT_OID_MAX_SIZE],
  }
}

In fact one might even ask why not to have

impl Oid {
  pub const ZERO_SHA1: Self = Self {
    raw: raw::git_oid {
      kind: raw::GIT_OID_SHA1,
      id: [0; raw::GIT_OID_MAX_SIZE],
    }
  };
  
  pub const ZERO_SHA256: Self = Self {
    raw: raw::git_oid {
      kind: raw::GIT_OID_SHA256,
      id: [0; raw::GIT_OID_MAX_SIZE],
    }
  };
  
  // NOTE: Maybe call this `zero_with_object_format` to not break `zero`.
  pub fn zero(format: ObjectFormat) -> Self {
    // TODO: Match on `format` and return `Oid::ZERO_SHA1` or `Oid::ZERO_SHA256`.
  }
}

This way, users that know at compile time which kind of zero they want can just use the constant, while those that decide at run time would call zero. And there is no unsafe involved.

If you do not want git2::Oid::zero to take an argument for backwards compatibility, then you could have it return a SHA-1 object identifier always, and mark it deprecated in favor of these constants and a new zero_with_object_format (if I have not missed a reason why the constants cannot be used).

For oid.rs I also had the gut feeling that some unsafe could be removed, but I do not have any constructive comments here, sorry.

@lorenzleutgeb
Copy link
Copy Markdown

lorenzleutgeb commented Apr 25, 2026

Another comment regarding FromStr: I understand you opted to remove it to "avoid misuse", but I would argue that having FromStr actually is quite convenient. Instead of removing it, you could consider implementing FromStr in a way that it checks the length of the argument and then calls into the more specialized function that takes a string and an object format. The caller can then still inspect the object format of the result. You leave all options open for your users, and keep backwards compatibility.

Copy link
Copy Markdown
Contributor

@DanielEScherzer DanielEScherzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some thoughts, not going to pretend that I understand this fully

View changes since this review

Comment thread src/index.rs Outdated
Comment thread src/odb.rs Outdated
Comment thread src/oid.rs Outdated
Comment thread src/oid.rs Outdated
#[cfg(not(feature = "unstable-sha256"))]
{
if bytes.len() != raw::GIT_OID_SHA1_SIZE {
return Err(Error::from_str("raw byte array must be 20 bytes"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a clearer error for trying to accidentally use 32 bytes when the feature isn't enabled?

Also, why have different ways of creating the SHA1 oid depending on if the SHA256 feature is available?

let oid_type = match bytes.len() {
    raw::GIT_OID_SHA1_SIZE => raw::GIT_OID_SHA1,
    #[cfg(feature = "unstable-sha256")]
    raw::GIT_OID_SHA256_SIZE => raw::GIT_OID_SHA256,
    #[cfg(not(feature = "unstable-sha256"))]
    raw::GIT_OID_SHA256_SIZE => {
        return Err(Error::from_str(
            "raw byte array of 32 bytes requires unstable-sha256 feature to be enabled",
        ))
    }
    _ => {
        return Err(Error::from_str(
            "raw byte array must be 20 bytes (SHA1) or 32 bytes (SHA256, if enabled)",
        ))
    }
};
unsafe {
    try_call!(raw::git_oid_from_raw(&mut raw, bytes.as_ptr(), oid_type));
}
Ok(Oid { raw })

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, though I am not sure if here we should talk about unstable-sha256 feature, downstream consumer of git2-rs may just propagate error message to end-users, and that might be weird to see that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the error message is propagated to end users, presumably the end users had some kind of say in the input value

Comment thread src/oid.rs Outdated
Comment thread src/remote.rs Outdated
}
#[cfg(not(feature = "unstable-sha256"))]
{
let _ = oid_type;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, even if the git2-rs bindings don't have SHA256 enabled, if the oid_type corresponds to SHA256 we should say that and probably return an error, rather than just claiming that the repository is formatted with SHA1?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, without unstable-sha256, the underlying libgit2-sys can't return a SHA256 oid or even open a SHA256 repo.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the underlying library was built with sha256 enabled, but the git2 crate wasn't?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK none of the mainstream packagers distribute libgit2.so with SHA256 support. The reason being it was broken before libgit2 1.9.3 (released a week ago). Also when sha256 is enabled, libgit2 built a shared libray with -experimental suffix. Unless packager intentionally patch that build script to remove the suffix, it is pretty distinguishable. However, it is unlikely someone does that, since the two are ABI incompatible and doing that breaks other programs dynamically linking to it.

I also raised this concern in rust-lang/cargo#16939 (comment) but from the other angle – Cargo may need to build from source always as nobody ships libgit2-experimental.so.

@rustbot

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-author Status: Waiting on PR author label May 5, 2026
@weihanglo
Copy link
Copy Markdown
Member Author

weihanglo commented May 7, 2026

Another comment regarding FromStr: I understand you opted to remove it to "avoid misuse", but I would argue that having FromStr actually is quite convenient. Instead of removing it, you could consider implementing FromStr in a way that it checks the length of the argument and then calls into the more specialized function that takes a string and an object format. The caller can then still inspect the object format of the result. You leave all options open for your users, and keep backwards compatibility.

This is still ambiguous for those not 41-64 char hex strings. I'd still consider deprecating it and eventually remove.
Alternatively, we just keep it there and have a compatibility note.

@weihanglo weihanglo force-pushed the sha256-git2 branch 2 times, most recently from 9b1f12d to 5778238 Compare May 8, 2026 00:03
Copy link
Copy Markdown
Member Author

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New revision is up.

The major difference is that we stop follow libgit2@main but 1.9.3. The other is API design, which is stated in the PR description:

Introduce ObjectFormat accessors and *_ext API variants unconditionally,
The actual SHA256 wiring are gated behind the unstable-sha256 Cargo feature
in order to keep public API additive.

The *_ext suffix follows libgit2's convention for extended function variants.
At next major version, the original methods may absorb the format parameter.
The actual change will leave for future.

View changes since this review

@weihanglo weihanglo marked this pull request as ready for review May 8, 2026 00:27
@weihanglo
Copy link
Copy Markdown
Member Author

@DanielEScherzer most of your review comments are addressed. Thanks!

@rustbot

This comment has been minimized.

Comment thread src/oid.rs Outdated
}
}

// Sanity-check libgit2's OID size constants to keep the docs and the
Copy link
Copy Markdown
Contributor

@DanielEScherzer DanielEScherzer May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use a bit more inclusive language here?

Suggested change
// Sanity-check libgit2's OID size constants to keep the docs and the
// Double-check libgit2's OID size constants to keep the docs and the

View changes since the review

Comment thread src/oid.rs Outdated

#[test]
fn conversions_ext_sha1() {
use crate::ObjectFormat;
Copy link
Copy Markdown
Contributor

@DanielEScherzer DanielEScherzer May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding these uses to some tests, but using crate::ObjectFormat::* in others (object_format_always_sha1), maybe just add it to the overall mod tests?

View changes since the review

@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented May 17, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

weihanglo added 12 commits May 18, 2026 16:29
This is a preparation of SHA256 support.

* Added `Oid::object_format()`
* Added `Oid::raw_bytes()`
* Changed `Oid::from_bytes()` to use `GIT_OID_SHA1_SIZE`
* Added `Repository::object_format()`
* Added `Remote::object_format()`
* Added `RepositoryInitOptions::object_format()`
Add `*_ext` method variants that accept an `ObjectFormat` param.
The short-hand methods now delegate to `_ext` variants with Sha1.

* Added `Oid::from_str_ext`
* Added `Oid::hash_object_ext`
* Added `Oid::hash_file_ext`
* Added `Diff::from_buffer_ext`
* Added `Index::new_ext`
* Added `Index::open_ext`
* Added `Indexer::new_ext`
* Added `Odb::new_ext`
This makes these call sites automatically handle
the correct object format once SHA256 support is wired up.
Declare the `unstable-sha256` Cargo feature,
reflecting upstream libgit2's `GIT_EXPERIMENTAL_SHA256`.
This is an ABI-breaking experimental feature.

Implement the SHA256 support gated behind this feature:

Things gated:

* `ObjectFormat::Sha256` and non_exhaustive when `unstable-sha256` is off
* all FFI call sites for the overloaded libgit2 function signatures
  * git_oid_fromstrn
  * git_oid_fromraw,
  * git_odb_hash
  * git_odb_hashfile
  * git_diff_from_buffer,
  * git_index_new
  * git_index_open
  * git_indexer_new
  * git_odb_new
* `Repository::object_format` is format-aware
* `Remote::object_format` is format-aware
* `Oid::{as_bytes,object_format,from_bytes}` are format-aware
`Oid::zero().object_format()` panics with `unstable-sha256`
because the raw `kind` field was left as zero.

The zeroed struct is generally use for output param.
Not really meaning to switch a `zero_ext` variant.

Replace it with const values for both kinds.
Construct `git_oid` with a struct literal instead of `mem::zeroed`.

Under `unstable-sha256`,
default `kind` to `GIT_OID_SHA1` so the value is a valid `git_oid_t`
even before libgit2 overwrites it as an output buffer.
So that we have a more thorough test coverage
Copy link
Copy Markdown
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehuss ehuss added this pull request to the merge queue May 18, 2026
Merged via the queue into rust-lang:main with commit 8c6dc4a May 18, 2026
7 checks passed
@rustbot rustbot removed the S-waiting-on-review Status: Waiting on review label May 18, 2026
@weihanglo weihanglo linked an issue May 18, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SHA-256

7 participants