Skip to content

Conversation

@daxian-dbw
Copy link
Member

@daxian-dbw daxian-dbw commented Feb 21, 2019

PR Summary

Make xUnit test truly run sequentially to rule out race conditions around powershell.config.json.
Fix #8784 #8715

PR Context

We keep getting xUnit test failures due to the race condition around manipulating powershell.config.json file, even though we have been trying to run xUnit tests sequentially by having dotnet test -p:ParallelizeTestCollections=false. It turns out -p:ParallelizeTestCollections=false doesn't work, and we have to use xunit.runner.json to disable parallel running.

Failure example: https://powershell.visualstudio.com/PowerShell/_build/results?buildId=13830
From the exception below, we can see the all-user scope config file was there when we check for file existence in ReadValueFromFile, but was deleted when we came to creating a FileStream out of it.
This is because the all-user scope config file as created and deleted as part of the PSConfiguration tests. This is an evidence that tests are still running in parallel with dotnet test -p:ParallelizeTestCollections=false.

[xUnit.net 00:00:03.20]     PSTests.Sequential.RunspaceTests.TestRunspaceWithPowerShellAndInitialSessionState [FAIL]
Failed   PSTests.Sequential.RunspaceTests.TestRunspaceWithPowerShellAndInitialSessionState
Error Message:
 System.IO.FileNotFoundException : Could not find file 'C:\Users\VssAdministrator\Documents\PowerShell\powershell.config.json'.
Stack Trace:
   at System.IO.FileStream.ValidateFileHandle(SafeFileHandle fileHandle)
   at System.IO.FileStream.CreateFileOpenHandle(FileMode mode, FileShare share, FileOptions options)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
   at System.Management.Automation.Configuration.PowerShellConfig.WaitForFile(String fullPath, FileMode mode, FileAccess access, FileShare share) in D:\a\1\s\src\System.Management.Automation\engine\PSConfiguration.cs:line 426
   at System.Management.Automation.Configuration.PowerShellConfig.ReadValueFromFile[T](ConfigScope scope, String key, T defaultValue, Func`4 readImpl) in D:\a\1\s\src\System.Management.Automation\engine\PSConfiguration.cs:line 397
   at System.Management.Automation.Utils.GetPolicySettingFromConfigFile[T](ConfigScope[] preferenceOrder) in D:\a\1\s\src\System.Management.Automation\engine\Utils.cs:line 523
   at System.Management.Automation.Utils.GetPolicySetting[T](ConfigScope[] preferenceOrder) in D:\a\1\s\src\System.Management.Automation\engine\Utils.cs:line 509
   at Microsoft.PowerShell.Commands.ModuleCmdletBase.GetModuleLoggingInformation(IEnumerable`1& moduleNames) in D:\a\1\s\src\System.Management.Automation\engine\Modules\ModuleCmdletBase.cs:line 4507
   at System.Management.Automation.PSSnapInReader.ReadEnginePSSnapIns() in D:\a\1\s\src\System.Management.Automation\singleshell\config\MshSnapinInfo.cs:line 1068
   at System.Management.Automation.Runspaces.InitialSessionState.CreateDefault() in D:\a\1\s\src\System.Management.Automation\engine\InitialSessionState.cs:line 1526
   at PSTests.Sequential.RunspaceTests.TestRunspaceWithPowerShellAndInitialSessionState() in D:\a\1\s\test\xUnit\csharp\test_Runspace.cs:line 72
Results File: D:\a\1\s\ParallelXUnitTestResults.xml

PR Checklist

@daxian-dbw daxian-dbw requested a review from iSazonov February 21, 2019 19:52
@daxian-dbw daxian-dbw added the CL-Test Indicates that a PR should be marked as a test change in the Change Log label Feb 21, 2019
@adityapatwardhan
Copy link
Member

@daxian-dbw Can you tag all the issues open for xunit tests in this PR?

@daxian-dbw
Copy link
Member Author

@adityapatwardhan I tagged #8784 #8715. Please let me know if there are others.

- powershell: |
Import-Module .\tools\ci.psm1
$ParallelXUnitTestResultsFile = "$(System.ArtifactsDirectory)\xunit\ParallelXUnitTestResults.xml"
$xUnitTestResultsFile = "$(System.ArtifactsDirectory)\xunit\xUnitTestResults.xml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do something smarter here. Like:

  1. Get all items in the xunit folder
  2. run Test-XUnitTestResults on each item found
  3. or throw an exception if no items were found

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, this is not changed frequently. So, this is not blocking this PR.

@iSazonov
Copy link
Collaborator

I very wonder why we mask BUGs which are revealed by the tests?
Are we going to release 6.2 version that is subject to DoS attack by unprivileged user?
Are we going to release 6.2 version in which hosting applications will fall unpredictably in 9% of cases?

I expect that we fast fix the bugs:

@daxian-dbw
Copy link
Member Author

daxian-dbw commented Feb 22, 2019

I very wonder why we mask BUGs which are revealed by the tests?

This is not hiding bugs, but to fix our test script. Blindly running all xUnit tests in parallel is just wrong. Please take a look at the original script in Start-PSxUnit before your changes in #8356, the tests are categorized into PSTests.Sequential and PSTests.Parallel namespaces and the original script ran the PSTests.Sequential tests sequentially, and run PSTests.Parallel tests in parallel.

The categorization was for a reason -- config files get created and deleted in test_PSConfiguration.cs, and that will definitely cause unusually frequent race conditions with tests in test_Runspace.cs. That's why they are categorized into PSTests.Sequential namespace. Using default values when failed to access the file as you suggested won't help here, because that will still make the setting values unpredictable and cause the tests to fail randomly -- a test might expect a specific value of a setting but it gets a default value because of the race condition.

I could make the test script behave like before -- running PSTests.Sequential namespace tests in sequence and PSTests.Parallel namespace tests in parallel, but I chose to not do that because:

  1. we have to use the xunit config file to disable parallel, which make it cumbersome to run part of the tests sequentially and other part in parallel.
  2. it makes writing xunit test harder by requiring the author to know which namespace to use for his/her tests.

Be noted that, fixing the test script doesn't mean we don't want to fix the race condition issue regard to processing the config file.

Are we going to release 6.2 version that is subject to DoS attack by unprivileged user?

@TravisEz13 and I chatted about this offline and we believe this is not a release blocker because it's already an existing issue.

Are we going to release 6.2 version in which hosting applications will fall unpredictably in 9% of cases?

The 9% failure was because of the frequent creation/deletion of the config files within test_PSConfiguration.cs, which would be really really rare to happen in real scenarios.

@iSazonov
Copy link
Collaborator

Be noted that, fixing the test script doesn't mean we don't want to fix the race condition issue regard to processing the config file.

I am 😄 that the team will keep it under control. Thanks!

One extra note that if I hadn’t made these tests parallel, we wouldn’t start fixing these race conditions. I do not know how now we can identify such problems (?)

@daxian-dbw
Copy link
Member Author

daxian-dbw commented Feb 22, 2019

One extra note that if I hadn’t made these tests parallel, we wouldn’t start fixing these race conditions. I do not know how now we can identify such problems (?)

The flag -p:ParallelizeTestCollections=false stopped working starting from some point, so luckily we will run into the same failures even without your changes.
Besides, the intra-process synchronization problem was a known issue. Please see the PR description in #5809 (I just updated it to add a bullet point about the case that config file is not accessible).

@iSazonov
Copy link
Collaborator

the intra-process synchronization problem was a known issue.

We have so many open issues that it is easy to lost/forget important ones. It is difficult to prioritize work. And it seems GiHub doesn't help well. :-(

@iSazonov
Copy link
Collaborator

The flag -p:ParallelizeTestCollections=false stopped working starting from some point, so luckily we will run into the same failures even without your changes.

This problem is too rarely raised and running the tests in parallel is useless locally. Can we do it on CI (maybe optional) to track the race conditions?

@daxian-dbw
Copy link
Member Author

daxian-dbw commented Feb 22, 2019

Can we do it on CI (maybe optional) to track the race conditions?

Like I said previously, fixing the race condition won't help to make our test reliable if they are running in parallel, because the setting values returned are not predictable. A test may expect a specific value for a setting but instead get back the default value due to race condition and thus would fail. It should be tracked by an issue.

This problem is too rarely raised and running the tests in parallel is useless locally.

It's rarely raised because it rarely happens in real scenarios, and that's likely why the this issue has not been treated with high priority.

@iSazonov
Copy link
Collaborator

It should be tracked by an issue.

In long term yes. For now my suggestion was to create optional CI job (like CodeFactor or Codacy) so that we can track the state of affairs until we find better solution.

@iSazonov
Copy link
Collaborator

It's rarely raised because it rarely happens in real scenarios, and that's likely why the this issue has not been treated with high priority.

Rather, because PowerShell is rarely used. If we imagine that it would be used in high-load application such as Microsoft SCOR we would catch a bunch of unpredictable crashes. Or in multi-user system like Misrosoft Remote Desktop any regular user would block all system scripts by locking the config file!

@daxian-dbw
Copy link
Member Author

daxian-dbw commented Feb 22, 2019

any regular user would block all system scripts by locking the config file!

You could do the same to any applications.
For example, hold the pwsh.exe by opening the file with [System.IO.FileShare]::None, and you won't be able to start the pwsh.exe anymore:

image

So, yes, it's technically a DoS attack, but if a user is able to do it (has the permission), then there's always a way for him/her to do it, no matter we fix the race condition regarding config.json or not.

@daxian-dbw daxian-dbw merged commit 5d54f1a into PowerShell:master Feb 22, 2019
@daxian-dbw daxian-dbw deleted the xunit branch February 22, 2019 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-Test Indicates that a PR should be marked as a test change in the Change Log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The TestRunspaceWithPowerShellAndInitialSessionState intermittently fails (about 9%)

5 participants