Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 33 additions & 26 deletions src/System.Management.Automation/engine/Modules/AnalysisCache.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1071,38 +1071,45 @@ static AnalysisCacheData()
}

string cacheFileName = "ModuleAnalysisCache";
if (ExperimentalFeature.EnabledExperimentalFeatureNames.Count > 0)
// When multiple copies of pwsh are on the system, they should use their own copy of the cache.
// Append hash of `$PSHOME` to cacheFileName.
byte[] hashBytes;
using (var sha1 = SHA1.Create())
{
// If any experimental features are enabled, we cannot use the default cache file because those
// features may expose commands that are not available in a regular powershell session, and we
// should not cache those commands in the default cache file because that will result in wrong
// auto-completion suggestions when the default cache file is used in another powershell session.
//
// Here we will generate a cache file name that represent the combination of enabled feature names.
// We first convert enabled feature names to lower case, then we sort the feature names, and then
// compute an SHA1 hash from the sorted feature names. We will use a short SHA name (first 8 chars)
// to generate the cache file name.
int index = 0;
string[] featureNames = new string[ExperimentalFeature.EnabledExperimentalFeatureNames.Count];
foreach (string featureName in ExperimentalFeature.EnabledExperimentalFeatureNames)
{
featureNames[index++] = featureName.ToLowerInvariant();
}
hashBytes = sha1.ComputeHash(Encoding.UTF8.GetBytes(Utils.DefaultPowerShellAppBase));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use a cryptographic hash at all? Why not use .GetHashCode()?

Copy link
Collaborator

@iSazonov iSazonov Nov 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a stable result but the GetHashCode could be randomized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe GetHashCode() in .NET Core is now randomized as @iSazonov stated and we need something stable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRC and MD5 are stable and both take less computation than SHA1. I don't believe this is a security issue, if it is, we shouldn't use SHA1. CRC is relatively simple to calculate we can implement it ourselves as I don't see anything to calculate it in DotNet Core.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/dotnet/corefx/search?utf8=%E2%9C%93&q=crc32&type=

but I don't understand how much we will win.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume that this has to be calculated at most every startup. Switching to a less intensive hash algorithm will reduce our startup time which is a big factor in how people perceive performance.

cc @daxian-dbw

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TravisEz13 and I discussed this offline yesterday and I agree that we should go with CRC. The main gain is to potentially avoid loading System.Security.Cryptography.Algorithms.dll at powershell startup time, plus there is also gain in CPU time when running the hash algorithm. I will do this change as part of my startup improvement work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got this chart from StackOverflow about the collision rate of CRC32:
image

In our scenario, hashing would be done on each powershell core installation on a machine, and in case of any experimental features is enabled, the combination of feature names. I think 100 different hash inputs should be a fair large number for a local machine environment in our scenario, and thus the collision rate would be less than 1 in a million, which should be good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use SSE from System.Runtime.Intrinsics.Experimental. We get this in runtime after .Net Core 3.0 release.


Array.Sort(featureNames);
string allNames = string.Join(Environment.NewLine, featureNames);
string hashString = BitConverter.ToString(hashBytes, startIndex: 0, length: 4).Replace("-", string.Empty);
cacheFileName = string.Format(CultureInfo.InvariantCulture, "{0}-{1}", cacheFileName, hashString);

// Use SHA1 because it's faster.
// It's very unlikely to get collision from hashing the combinations of enabled features names.
byte[] hashBytes;
using (var sha1 = SHA1.Create())
if (ExperimentalFeature.EnabledExperimentalFeatureNames.Count > 0)
{
// If any experimental features are enabled, we cannot use the default cache file because those
// features may expose commands that are not available in a regular powershell session, and we
// should not cache those commands in the default cache file because that will result in wrong
// auto-completion suggestions when the default cache file is used in another powershell session.
//
// Here we will generate a cache file name that represent the combination of enabled feature names.
// We first convert enabled feature names to lower case, then we sort the feature names, and then
// compute an SHA1 hash from the sorted feature names. We will use a short SHA name (first 8 chars)
// to generate the cache file name.
int index = 0;
string[] featureNames = new string[ExperimentalFeature.EnabledExperimentalFeatureNames.Count];
foreach (string featureName in ExperimentalFeature.EnabledExperimentalFeatureNames)
{
featureNames[index++] = featureName.ToLowerInvariant();
}

Array.Sort(featureNames);
string allNames = string.Join(Environment.NewLine, featureNames);

// Use SHA1 because it's faster.
// It's very unlikely to get collision from hashing the combinations of enabled features names.
hashBytes = sha1.ComputeHash(Encoding.UTF8.GetBytes(allNames));
}

// Use the first 8 characters of the hash string for a short SHA.
string stringVal = BitConverter.ToString(hashBytes, startIndex: 0, length: 4).Replace("-", String.Empty);
cacheFileName = String.Format(CultureInfo.InvariantCulture, "{0}-{1}", cacheFileName, stringVal);
// Use the first 8 characters of the hash string for a short SHA.
hashString = BitConverter.ToString(hashBytes, startIndex: 0, length: 4).Replace("-", string.Empty);
cacheFileName = string.Format(CultureInfo.InvariantCulture, "{0}-{1}", cacheFileName, hashString);
}
}

#if UNIX
Expand Down