Skip to content

Switch to LTRData.DiscUtils fork#145

Merged
gfs merged 14 commits intomicrosoft:mainfrom
Erik-White:switch-to-ltrdata-discutils
Jan 30, 2024
Merged

Switch to LTRData.DiscUtils fork#145
gfs merged 14 commits intomicrosoft:mainfrom
Erik-White:switch-to-ltrdata-discutils

Conversation

@Erik-White
Copy link
Contributor

@Erik-White Erik-White commented Jan 20, 2024

Closes #144, fixes #93, fixes #59

The current fork of DiscUtils is almost abandoned at this point. The LTRData fork is much further ahead and has made some great performance improvements and bug fixes. Moving to the newer fork would help resolve some existing issues, as well as make future fixes much easier to resolve.

@Erik-White
Copy link
Contributor Author

@microsoft-github-policy-service agree

@Erik-White Erik-White marked this pull request as ready for review January 24, 2024 21:46
@Erik-White
Copy link
Contributor Author

@gfs I think this is ready to go now. All the tests are passing, it just needs checking over to see if you think the LTRData fork of DiscUtils is ready for use.

@gfs
Copy link
Contributor

gfs commented Jan 24, 2024

@Erik-White awesome. I'm running pipeline tests now and I'll aim to give this a thorough review tomorrow.

@gfs
Copy link
Contributor

gfs commented Jan 25, 2024

/azp run

@azure-pipelines
Copy link

Azure Pipelines failed to run 1 pipeline(s).

@gfs
Copy link
Contributor

gfs commented Jan 25, 2024

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@gfs gfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @Erik-White for the contribution!

Copy link
Contributor

@gfs gfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon closer inspection it looks like this does not resolve #59. Need to add DMG type detection to MiniMagic and to add a DMG extractor.

@gfs
Copy link
Contributor

gfs commented Jan 26, 2024

Implemented DMG extractor and added compatible DMG file.

@gfs
Copy link
Contributor

gfs commented Jan 26, 2024

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@gfs
Copy link
Contributor

gfs commented Jan 26, 2024

Test failure is related to different invalid path character definitions between net4.8 and net core. See dotnet/runtime#63383.

Will aim to determine which character is the issue and develop a new sample tomorrow.

@Erik-White
Copy link
Contributor Author

Erik-White commented Jan 26, 2024

The problem with the DMG test is due to a file entry that has an invalid character in the name: ".HFS+ Private Directory Data\r"
I can't see a difference in the invalid character list for .NET4.8 vs other versions, but there is a difference in the path handling:
https://referencesource.microsoft.com/#mscorlib/system/io/path.cs,1197
vs
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/Path.cs#L685C3-L685C103

We end up calling Path.Combine from DiscUtils and I wonder if that helper method should maybe have a conditional that excludes .NET framework i.e. #if NETSTANDARD || NETCOREAPP.

The other option is to change the file entry name in the test data, but I'm not sure what's best. Is this a scenario that needs to be tested?

@gfs
Copy link
Contributor

gfs commented Jan 26, 2024

The problem with the DMG test is due to a file entry that has an invalid character in the name: ".HFS+ Private Directory Data\r" I can't see a difference in the invalid character list for .NET4.8 vs other versions, but there is a difference in the path handling: https://referencesource.microsoft.com/#mscorlib/system/io/path.cs,1197 vs https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/Path.cs#L685C3-L685C103

We end up calling Path.Combine from DiscUtils and I wonder if that helper method should maybe have a conditional that excludes .NET framework i.e. #if NETSTANDARD || NETCOREAPP.

The other option is to change the file entry name in the test data, but I'm not sure what's best. Is this a scenario that needs to be tested?

This is an issue I've had to resolve at other times in Recursive Extractor in order to be able to use Path.Combine -

public string GetSanitizedPath(string replacement = "_") => SanitizePath(FullPath, replacement);
.

I think I see a couple options here:

  1. A patch to DiscUtils to sanitize file names for the platform before calling Path.Combine
  2. Exclude DMG support from Recursive Extractor on netstandard 2.1/net4.8
  3. Create a sample file which doesn't contain the .HFS+ Private Directory Data (I'm not positive this is possible - or desirable - I created this sample file with pretty standard hdiutil arguments on OS X from a folder containing two text files.)

Either way its likely prudent to add some extra exception handling as this exception bubbling up to the user does violate one of the design principles I try to follow for this project - that in general unparseable/malformed archives should not throw exceptions. https://github.com/microsoft/RecursiveExtractor/#exceptions

As for if this case is needed - we need a test case for DMG parsing, but this file in particular is not special. However, this file does represent a relatively naiive DMG that I imagine someone may create, so it would be nice if we could support it, and it would be nice to keep feature parity as much as possible between different runtimes.

I think the ideal solution is probably option 1, perhaps using a mechanism like the one I linked above in FileEntry.cs, however, if the combined path is needed as a key to access additional data from the image in later calls, and not just as a string used to create files on disc/as a file identifier for the user, this would not be feasible. I can look further into where/why disc utils is combining these paths today.

@gfs
Copy link
Contributor

gfs commented Jan 26, 2024

@Erik-White, got a chance to look at the discutils code you linked. I think I agree the simplest fix may be to change

#if NET461_OR_GREATER || NETSTANDARD || NETCOREAPP
        return Path.Combine(a, b);
#else

to

#if NETSTANDARD || NETCOREAPP
        return Path.Combine(a, b);
#else

@gfs
Copy link
Contributor

gfs commented Jan 26, 2024

I found the two locations with the differing lists of invalid path chars.

Net 4.8: https://referencesource.microsoft.com/#mscorlib/system/io/pathinternal.cs,32
NetCore (Windows): https://github.com/dotnet/runtime/blob/82bea3fe096185a337bc7a89e4649421215a10d3/src/libraries/System.Private.CoreLib/src/System/IO/Path.Windows.cs#L24

At a minimum it looks like at least ", > and < were removed between framework and core.

@Erik-White
Copy link
Contributor Author

At a minimum it looks like at least ", > and < were removed between framework and core.

Sorry, I should have been clearer, I meant characters that affected the failing test 😃

Changing to #if NETSTANDARD || NETCOREAPP would be nice and simple. Just need to check two things I think:

  • Does it have any unintended effects elsewhere in DiscUtils (or RecursiveExtractor)?
  • What is the implementation for NETSTANDARD? Is it the same as .NET Framework or Core?

@gfs
Copy link
Contributor

gfs commented Jan 26, 2024

At a minimum it looks like at least ", > and < were removed between framework and core.

Sorry, I should have been clearer, I meant characters that affected the failing test 😃

Changing to #if NETSTANDARD || NETCOREAPP would be nice and simple. Just need to check two things I think:

* Does it have any unintended effects elsewhere in DiscUtils (or RecursiveExtractor)?

* What is the implementation for NETSTANDARD? Is it the same as .NET Framework or Core?

Ah okay, yes, sorry missed what you meant. It also doesn't appear to me either that we have any of those characters here (perhaps the \r at the end of what you clipped above?).

I opened a PR with the change here, and validated locally by referencing the net code from recursive extractor that it seems to resolve the test issue with net4.8: LTRData/DiscUtils#4 . all the DiscUtils tests still pass without modification - but it would be hard for me to speak if it had some other unintended effect not covered by the tests. I dont think it has any negative impact for other recursive extractor paths as this should only effect net4.8 and testing indicates this resolves the issue.

I'm not totally sure on if netstandard should be included in the directive or not, I think there's a reasonable argument to only use Path.Combine on netcoreapp, but I think this is the minimal change which resolves the issue we see here.

@Erik-White
Copy link
Contributor Author

Nice! We just need DiscUtils with the latest fix and then hopefully it's all good to go

@Erik-White
Copy link
Contributor Author

Your fix is now in with the latest DiscUtils and the tests are green! Everything looks good from my side, nice work with the DmgExtractor 🥇

@gfs
Copy link
Contributor

gfs commented Jan 29, 2024

Nice. I'll give it one more look over and likely merge today.

@gfs
Copy link
Contributor

gfs commented Jan 29, 2024

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Erik-White Erik-White closed this Jan 29, 2024
@Erik-White Erik-White reopened this Jan 29, 2024
@gfs gfs merged commit 38ad5d6 into microsoft:main Jan 30, 2024
@gfs
Copy link
Contributor

gfs commented Jan 30, 2024

Thanks again @Erik-White for your contribution!

@Erik-White
Copy link
Contributor Author

Fantastic, thanks for all your work on this project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate to LTRData/DiscUtils GC thread exception when extracting corrupt wim file MultiExtractor DMG parsing

3 participants