Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 18, 2025

Description

TarReader throws InvalidDataException: "Unable to parse number." when reading tar files that use GNU binary encoding (0x80 prefix) for the checksum field. The checksum parsing used ParseOctal which only handles ASCII octal digits, but some tar tools write all numeric fields including checksum using the GNU binary format.

Changed checksum parsing from ParseOctal<uint> to ParseNumeric<int>, which already handles both octal and binary-encoded values (added in PR #101172 for large file support).

Customer Impact

Customers cannot read valid tar archives created by tools that use GNU binary encoding for all numeric fields. These archives work fine with GNU tar, 7-Zip, and other standard tools.

Regression

No. This is extending support for a tar format variant not previously handled.

Testing

  • Added Read_BinaryEncodedChecksum and Read_BinaryEncodedChecksum_Async tests that create headers with 0x80-prefixed checksums
  • All 5685 existing tests pass

Risk

Low. ParseNumeric is already used for all other numeric fields (size, uid, gid, mtime, etc.). The checksum field uses the same encoding rules. Checksum values are small positive integers (max ~130K for 512-byte header) that fit comfortably in int.

Package authoring no longer needed in .NET 9

IMPORTANT: Starting with .NET 9, you no longer need to edit a NuGet package's csproj to enable building and bump the version.
Keep in mind that we still need package authoring in .NET 8 and older versions.

Original prompt

This section details on the original issue you should resolve

<issue_title>TarReader throws System.IO.InvalidDataException: 'Unable to parse number.'</issue_title>
<issue_description>### Description

System.IO.InvalidDataException
HResult=0x80131501
Message=Unable to parse number.
Source=System.Formats.Tar
StackTrace:
at System.Formats.Tar.TarHelpers.ThrowInvalidNumber()
at System.Formats.Tar.TarHelpers.ParseOctal[T](ReadOnlySpan1 buffer) at System.Formats.Tar.TarHeader.TryReadCommonAttributes(ReadOnlySpan1 buffer, TarEntryFormat initialFormat)
at System.Formats.Tar.TarHeader.TryReadAttributes(TarEntryFormat initialFormat, ReadOnlySpan1 buffer, Stream archiveStream) at System.Formats.Tar.TarHeader.<TryGetNextHeaderAsync>d__49.MoveNext() at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable1.ConfiguredValueTaskAwaiter.GetResult()
at System.Formats.Tar.TarReader.d__15.MoveNext()
at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable1.ConfiguredValueTaskAwaiter.GetResult() at System.Formats.Tar.TarReader.<GetNextEntryInternalAsync>d__13.MoveNext() at System.Runtime.CompilerServices.ValueTaskAwaiter1.GetResult()
at ZipArchiveExtensions.d__0.MoveNext() in D:\git\collaboration\apis.core2\src\Siemens.Collaboration.Net.CoreExtensions\Zip_global\ZipArchiveExtensions.cs:line 36
at ZipArchiveExtensions.d__0.MoveNext() in D:\git\collaboration\apis.core2\src\Siemens.Collaboration.Net.CoreExtensions\Zip_global\ZipArchiveExtensions.cs:line 36

This exception was originally thrown at this call stack:
System.Formats.Tar.TarHelpers.ThrowInvalidNumber()
System.Formats.Tar.TarHelpers.ParseOctal(System.ReadOnlySpan)
System.Formats.Tar.TarHeader.TryReadCommonAttributes(System.ReadOnlySpan, System.Formats.Tar.TarEntryFormat)
System.Formats.Tar.TarHeader.TryReadAttributes(System.Formats.Tar.TarEntryFormat, System.ReadOnlySpan, System.IO.Stream)
System.Formats.Tar.TarHeader.TryGetNextHeaderAsync(System.IO.Stream, bool, System.Formats.Tar.TarEntryFormat, bool, System.Threading.CancellationToken)
System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable.ConfiguredValueTaskAwaiter.GetResult()
System.Formats.Tar.TarReader.TryGetNextEntryHeaderAsync(bool, System.Threading.CancellationToken)
System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable.ConfiguredValueTaskAwaiter.GetResult()
System.Formats.Tar.TarReader.GetNextEntryInternalAsync(bool, System.Threading.CancellationToken)
System.Runtime.CompilerServices.ValueTaskAwaiter.GetResult()
...
[Call Stack Truncated]

Reproduction Steps

I try to untar the following file with dotnet 10:
https://github.com/qdrant/qdrant/releases/download/v1.16.2/qdrant-x86_64-unknown-linux-gnu.tar.gz

public static async Task UntarAsync(this TarArchive tarArchive, DirectoryInfo outputDirectory, bool overwrite = false, CancellationToken cancellation = default)
{
    await using var tarStream = tarArchive.ReadStreamFactory();
    await using var tarReader = new System.Formats.Tar.TarReader(tarStream);
    System.Formats.Tar.TarEntry entry;
    while ((entry = await tarReader.GetNextEntryAsync()) != null)
    {
        if (entry.EntryType is System.Formats.Tar.TarEntryType.SymbolicLink or System.Formats.Tar.TarEntryType.HardLink or System.Formats.Tar.TarEntryType.GlobalExtendedAttributes)
        {
            continue;
        }

        await entry.ExtractToFileAsync(outputDirectory.Combine(entry.Name), overwrite, cancellation);
    }

}
Image

Expected behavior

The code runs succesful

Actual behavior

It fails with the exception mentioned above

Regression?

There was already a fix merged for dotnet 9. However it still seem to fail in dotnet 10. #93763

Known Workarounds

No response

Configuration

No response

Other information

No response</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits December 18, 2025 11:40
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
…cation

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix TarReader to handle invalid data exception Fix TarReader to handle binary-encoded checksum fields Dec 18, 2025
Copilot AI requested a review from stephentoub December 18, 2025 11:49
@stephentoub stephentoub marked this pull request as ready for review December 18, 2025 14:36
Copilot AI review requested due to automatic review settings December 18, 2025 14:36
@stephentoub
Copy link
Member

cc: @tmds

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes TarReader to correctly handle tar archives that use GNU binary encoding (0x80 prefix) for the checksum field. Previously, the checksum parser only handled octal ASCII digits, causing InvalidDataException when reading valid tar files created by tools that use binary encoding for all numeric fields.

Key changes:

  • Changed checksum parsing from ParseOctal<uint> to ParseNumeric<int> to support both octal and binary-encoded values
  • Added comprehensive tests for binary-encoded checksum handling in both sync and async code paths

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs Changed checksum parsing to use ParseNumeric<int> instead of ParseOctal<uint> to handle binary-encoded checksums
src/libraries/System.Formats.Tar/tests/TarTestsBase.cs Added helper method CreateTarHeaderWithBinaryEncodedChecksum() to generate test headers with binary-encoded checksums
src/libraries/System.Formats.Tar/tests/TarReader/TarReader.GetNextEntry.Tests.cs Added synchronous test Read_BinaryEncodedChecksum to verify binary checksum parsing
src/libraries/System.Formats.Tar/tests/TarReader/TarReader.GetNextEntryAsync.Tests.cs Added asynchronous test Read_BinaryEncodedChecksum_Async to verify binary checksum parsing

@tmds
Copy link
Member

tmds commented Dec 18, 2025

when reading tar files that use GNU binary encoding (0x80 prefix) for the checksum field.

Do we know what tool/library outputs binary encoding for the checksum field?

These archives work fine with GNU tar, 7-Zip, and other standard tools.

Did someone check this? I don't think GNU tar accepts binary encoded checksums.

using System.Diagnostics;

string tempPath = Path.GetTempFileName();

File.WriteAllBytes(tempPath, CreateTarHeaderWithBinaryEncodedChecksum());
Process.Start("tar", [ "xf", tempPath ]).WaitForExit();

static byte[] CreateTarHeaderWithBinaryEncodedChecksum()
{
    byte[] header = new byte[512];

    // Name: "test" (null-terminated)
    byte[] name = "test\0"u8.ToArray();
    name.CopyTo(header, 0);

    // Mode: "0000644\0" at offset 100
    byte[] mode = "0000644\0"u8.ToArray();
    mode.CopyTo(header, 100);

    // Uid: "0000000\0" at offset 108
    byte[] uid = "0000000\0"u8.ToArray();
    uid.CopyTo(header, 108);

    // Gid: "0000000\0" at offset 116
    byte[] gid = "0000000\0"u8.ToArray();
    gid.CopyTo(header, 116);

    // Size: "00000000000\0" at offset 124
    byte[] size = "00000000000\0"u8.ToArray();
    size.CopyTo(header, 124);

    // Mtime: "00000000000\0" at offset 136
    byte[] mtime = "00000000000\0"u8.ToArray();
    mtime.CopyTo(header, 136);

    // TypeFlag: '0' (regular file) at offset 156
    header[156] = (byte)'0';

    // Magic: "ustar\0" at offset 257
    byte[] magic = "ustar\0"u8.ToArray();
    magic.CopyTo(header, 257);

    // Version: "00" at offset 263
    byte[] version = "00"u8.ToArray();
    version.CopyTo(header, 263);

    // Calculate the correct checksum value.
    // During checksum calculation, the checksum field (offset 148-155) is treated as all spaces.
    int calculatedChecksum = 0;
    for (int i = 0; i < 512; i++)
    {
        if (i >= 148 && i < 156)
        {
            calculatedChecksum += (byte)' ';
        }
        else
        {
            calculatedChecksum += header[i];
        }
    }

    // Write checksum as binary-encoded (0x80 prefix) at offset 148.
    // The checksum field is 8 bytes. With 0x80 prefix, remaining 7 bytes store the value in big-endian.
    header[148] = 0x80;
    header[149] = 0;
    header[150] = 0;
    header[151] = 0;
    header[152] = (byte)((calculatedChecksum >> 24) & 0xFF);
    header[153] = (byte)((calculatedChecksum >> 16) & 0xFF);
    header[154] = (byte)((calculatedChecksum >> 8) & 0xFF);
    header[155] = (byte)(calculatedChecksum & 0xFF);

    return header;
}

outputs:

$ dotnet run
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

@stephentoub
Copy link
Member

@TFTomSun,can you please share more details on the tar that's problematic, how it was created, etc? Can you share it?

@tmds
Copy link
Member

tmds commented Dec 18, 2025

I found the ticket: #122635. (I overlooked it in the first comment apparently...)

The code is probably missing gzip decompression.

This works:

using System.IO.Compression;

string outputDirectory = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName());
Directory.CreateDirectory(outputDirectory);
Console.WriteLine($"Outputting to {outputDirectory}");

HttpClient client = new HttpClient();
using Stream tarGzStream = await client.GetStreamAsync("https://github.com/qdrant/qdrant/releases/download/v1.16.2/qdrant-x86_64-unknown-linux-gnu.tar.gz");
await using var tarStream =  new GZipStream(tarGzStream, CompressionMode.Decompress);
await using var tarReader = new System.Formats.Tar.TarReader(tarStream);
System.Formats.Tar.TarEntry entry;
while ((entry = await tarReader.GetNextEntryAsync()) != null)
{
    if (entry.EntryType is System.Formats.Tar.TarEntryType.SymbolicLink or System.Formats.Tar.TarEntryType.HardLink or System.Formats.Tar.TarEntryType.GlobalExtendedAttributes)
    {
        continue;
    }

    await entry.ExtractToFileAsync(Path.Combine(outputDirectory, entry.Name), overwrite: true);
}
Console.WriteLine("Done");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TarReader throws System.IO.InvalidDataException: 'Unable to parse number.'

3 participants