Dynamic keyword metadata reader #3169

rjmholt · 2017-02-18T02:49:58Z

This PR adds a new entry point for defining DynamicKeywords to PowerShell. Where currently these can be added at runtime in PowerShell, this now uses CIL metadata in an assembly as a specification of the syntax requirements, and performs some validation of the specification.

I've also added the concept of nested keywords -- there are keywords that will only be parsed and invoked as keywords when they are in the scope of another keyword. This seemed like a common feature in Domain Specific Languages we've been looking at.

I still need to do some work on this PR, but I want to get the ball rolling because it's a substantial addition.

Work to still to do includes:

Making the tests work on Linux. I didn't want to submit before this happened, but the errors are in module loading (module paths) and Add-Type ("Could not load type...") -- so I thought I would ask for advice on this
Adding a Types.ps1xml-like test case
Adding more tests for UseMode checking, keyword scope, and other DynamicKeyword features
Adding completions/intellisense for keywords

I've been doing work on adding DynamicKeyword runtime execution using C# code (so a domain-specific language for PowerShell can be defined entirely in a C#/dotnet module/assembly -- see here for a very rough prototype), and this PR is essentially the first step toward that. I intend to submit an issue describing the work and a roadmap soon.

Finally, in this PR, there are some points I think that need special scrutiny:

The UseMode concept (restricting how many times a keyword may appear in a block) -- it may be unneccessary or unwanted, and is where the metadata reader starts to verge on too opinionated. I included it because it was part of the RFC I submitted.
The metadata reader type providers -- one uses System.String and the other System.Type, but they could both be consolidated to use a custom type description class. I've seen dotnet core/corefx use a class called TypeDesc, but that might be a bit heavyweight. The type providers are also only partially implemented because some methods are never called -- but it would be nice to fix that.
Error messages -- I'm not sure what the standard way to report errors to the user is

…UsingModule to parser from symbol resolver

msftclas · 2017-02-18T02:50:02Z

Hi @rjmholt, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!

It looks like you're a Microsoft contributor. If you're full-time or an intern, we DON'T require a Contribution License Agreement. If you are a vendor, please DO sign the electronic Contribution License Agreement. It will take 2 minutes and there's no faxing! https://cla.microsoft.com.

TTYL, MSBOT;

rjmholt · 2017-02-18T22:50:08Z

I have a rough idea of what needs fixing for the Unix support, but does AppVeyor use something other than project.json to resolve dependencies? I added System.Reflection.Metadata there, but clearly it needs to go somewhere else.

lzybkr · 2017-02-18T23:16:13Z

project.json is it. You are likely missing a reference to System.Collections.Immutable.

You can try to repro locally by clearing your nuget cache unless it was resolving that assembly to the gac.

rjmholt · 2017-02-19T00:08:12Z

The System.Reflection.Metadata package already depends on System.Collections.Immutable. Should I add System.Collections.Immutable anyway? I also noticed it's not resolving on types from System.Reflection.Metadata in the error message -- I can only replicate the problem when building with -FullCLR, which gives me a pretty perfect repro, but still comes up when I add System.Collections.Immutable by hand. Where does FullCLR look for dependencies?

lzybkr · 2017-02-19T05:40:16Z

Ah, yeah, that's the failure in AppVeyor - the target is net451 - which I've stopped doing in #3066 - we just need to get that approved.

bergmeister · 2018-02-18T17:51:06Z

@rjmholt Is this PR still active or do you have a rough timeline for this? If so, can you please resolve the conflicts in the meantime? This fix would be very helpful to resolve this parsing issue in PSScriptAnalyzer.

SteveL-MSFT · 2018-02-21T00:07:01Z

@bergmeister @rjmholt was an intern on my team at the time this PR was submitted and went back to finish school shortly after. He's actually joining Microsoft as a full time employee on my team the first week of March. @rjmholt should work with @daxian-dbw to determine if we should continue with this PR as-is.

bergmeister · 2018-02-21T20:26:51Z

@SteveL-MSFT Thanks for the update. Assuming that this gets fixed, would this also mean that if you published then new PowerShell referencing assemblies that tools such as PSSA using the AST parser could receive this fix to be able to parse types that are imported by using statements? I don't need a precise answer but would like to know if this is something that is technically feasible or would this scenario require more? Knowing this could help us make a decision whether we need to workaround this issue in PSSA or if we can wait for the parser improvements?

SteveL-MSFT · 2018-02-21T21:24:19Z

@bergmeister PSSA still needs to support older version of PowerShell though?

bergmeister · 2018-02-21T21:28:41Z

@SteveL-MSFT Yes, but if I understand correctly, the referencing assemblies (containing the parser) are compiled into PSSA. Therefore my hope that updated reference assemblies would solve it. WDYT?

SteveL-MSFT · 2018-02-22T18:50:26Z

@bergmeister I see, if it's only the reference assemblies and not the runtime assemblies (don't know enough about the internals of PSSA), it may work. Since this work is not committed, my recommendation is to not build a dependency on it.

lzybkr · 2018-02-22T19:40:23Z

@bergmeister - the reference assemblies contain no code - they are like a header file in C/C++ or an interface definition.

It does make sense to produce nuget packages for some of PowerShell like the parser, but it's a ton of work to factor things in a reasonable way to be useful.

rjmholt · 2018-02-23T05:37:55Z

I've been meaning to close this PR. I'll revisit it later with @daxian-dbw , but the code is stale now and if we proceed with it, it's probably worth rewriting a fair amount of it.

There seems to be some interest in the feature though (#6194, #4016), so I would like to spend some more time on it.

@bergmeister this PR is intended more to extend the DynamicKeyword feature than to fix anything. I read through the PSSA issue you linked, but not sure I fully understand the fix. It's possible a smaller fix could be implemented based on this PR (and also serve as an incremental step toward DSL support?).

rjmholt · 2018-02-23T05:45:18Z

Ah, the other thing to note is that having tests for this feature is very painful so long as we require unloading the assemblies that describe the DynamicKeywords. See corefx #552.

SteveL-MSFT · 2018-02-23T21:16:26Z

@rjmholt I think a workaround to loaded assemblies is to start new pwsh processes

iSazonov · 2018-02-24T05:47:55Z

It seems AppDomains come back in .Net Core 2.1?

daxian-dbw · 2018-02-27T00:50:20Z

I have the changes in this PR rebased and moved to my fork at https://github.com/daxian-dbw/PowerShell/tree/dsl. It was rebased a while ago, but at least it was rebased to the point after we moved to msbuild. It won't be too much trouble to rebase it again.

I have a separate branch my-dsl that contains some further improvement work I made on top of this PR, it's at https://github.com/daxian-dbw/PowerShell/tree/my-dsl. The work was not finished.

bergmeister · 2018-03-03T18:29:49Z

@lzybkr It turns out, PSScriptAnalyzer is not using the reference assemblies of PowerShell but has a self-contained copy of System.Management.Automation.dll and its dependencies and therefore the parser. I just made a proof of concept where I can parse the AST using System.Management.Automation.Language.Parses.ParseInput(...) on an Ubuntu VM in a .net core console app that does not have PowerShell installed but only the .net core runtime. The reason why this works is because we use an unofficial NuGet package of 6.0.0-alpha13 (because the official ones on Nuget are only full .net and not compatible with netcore2.0 or netstandard1.6) for getting System.Management.Automation.dll and its dependencies like Microsoft.Management.Infrastructure.
Using either the System.Management.Automation or Microsoft.PowerShell.SDK packages from https://powershell.myget.org/F/powershell-core/api/v3/index.json feed I could get the prototype to work as well but I need to do a PoC for PSSA to see which of the 2 are required for PSSA. What I am not sure about is whether we would need to do a dotnet publish to be sure that it really uses the nuget package and not the parser of PowerShell. It would be interesting if you could comment on the PR below.

rjmholt · 2018-03-14T00:57:28Z

I've discussed this PR with @daxian-dbw, and it seems simpler to approach DSL support without metadata-parsing for now (e.g. using PowerShell classes as schemas). Implementation is not a priority currently though.

@bergmeister: I know there's a use-case for PSSA, but the implementation here was designed only to read DSL specifications and error on anything else. Possibly the parse-time assembly load meant that PSSA could discover types in using statements, but this is not desired behavior. The purpose of using the metadata reader was to prevent parse-time execution of static .NET code blocks (and to save us having to deal with the lack of AppDomains and assembly unloading in .NET Core, as well as a performance hit on module loading). Because of those properties in .NET Core, the general recommendation is to rely on the module manifest to provide the information. However, talking with @JamesWTruher, I can imagine there is a use case for PSSA to do its own metadata analysis to provide richer script analysis, and that could also pay dividends for things like completions. But I'm not sure there are good reasons for metadata parsing to be in PowerShell's engine itself.

bergmeister · 2018-03-14T22:02:02Z

@rjmholt Ok, I'll have to accept for now that this feature will not be implemented in PowerShell's parser. But what does the PowerShell team suggest to do for PSSA? PSSA depends on PowerShell's parser (it does not use the one being installed but has it's own version of it as part of a self-contained PSCore version of System.Management.Automation. Should PSSA then maintain its own customized fork of PowerShell in order to build and consume a custom parser? Catching the ParseException for TargetObject TypeNotFound and trying to remove the type name in memory might be a temporary workaround but is going to be hairy when using commands such as Invoke-Formatter or the -Fix switch, which actually manipulate the script content.

rjmholt · 2018-03-15T22:59:15Z

@bergmeister We might be speaking at cross-purposes a little here -- essentially the fact that this prototype feature loaded modules at parse-time was something we don't want, because we don't want arbitrary code execution when we're just parsing PowerShell. It's one of the several reasons this PR was rejected. In fact, for PSSA to avoid executing malicious code by just parsing PowerShell, it will need to avoid module loading too.

I'm not familiar at all with PSSA's codebase though, so I didn't quite understand where you talked about a ParseException and Invoke-Formatter. Could you give the links to where those occur in the PSSA code so I can take a look?

In terms of the parser, I think the System.Management.Automation.dll parser is definitely the one to use (even though it won't and shouldn't ever load a module in at parse time). But, the way I imagine it, PSSA could use the parser to get an AST as normal, and then extract any module-loading statements from that (maybe in AnalyzeSyntaxTree), and use that information to go and look at that module. If it's a PowerShell module, then PSSA just runs the parser on it. If it's a DLL, then maybe a module manifest or possibly use a metadata reader to read the attributes.

bergmeister · 2018-03-29T21:13:56Z

@rjmholt Sorry for coming back to you later, I forgot about it. I understand and respect the decision that the PowerShell parser won't get those module loading features. In short PSSA is relying on the parser to analyse ASTs and calls scriptAst = Parser.ParseFile(filePath, out scriptTokens, out errors); here or Parser.ParseInput(scriptDefinition, out scriptTokens, out errors); here initially to get the ASTs and then statically analyses them. If parsing fails, then PSSA cannot do its job for the whole file.
Since a custom fork of the parser seems to be undesired for security reasons (although I think having a -Force switch to allow loading of dlls would be reasonable), I was thinking of trying to catch the parser errors and trying to replace the 'naughty' type definitions of the in-memory presentation (assuming it is possible to extract the exact error in a locale independent way) of the script to make it parse-able. There are some special features like Invoke-Formatter or Invoke-ScriptAnalyzer -Fix that take either a string of the script or a path to the script file and outputs them with some rules being auto-fixed (this is how formatting of selected code using Ctrl+K+F works in VSCode btw). Therefore you can see that using a workaround of removing the type strings introduces the complexity of having to add them back in when outputting the script.
Maybe I am missing something but it would be great it the PowerShell team could provide some suggestions on how to better handle scripts that are not parseable.

rjmholt · 2018-03-29T21:22:45Z

Hmmm, do you know what error the parser throws when a type is used that isn't loaded. Philosophically, the parser shouldn't be looking up types at all. You should be able to use a string that is totally meaningless as long as it is syntactically representative of a type ([Foo.Quux]::Banana()) and get an AST. If there are some kind of type lookups or semantic checks happening earlier than I would expect and causing an error, then our path forward could be complicated. /cc @daxian-dbw

bergmeister · 2018-03-29T21:31:19Z

@rjmholt An example is

using namespace Microsoft.Azure.Commands.ResourceManager.Cmdlets.SdkModels
using namespace Microsoft.Azure.Commands.Common.Authentication.Abstractions
Import-Module "AzureRm"
class MyClass { [IStorageContext]$StorageContext }

We get the parse error Unable to find type [IStorageContext] at line 4 column 18.. If the parser supported using unknown types like [Foo.Banana] (maybe using a different mode/overload), etc then this would solve the issue for PSSA.

rjmholt · 2018-03-31T01:39:51Z

Immediately after writing that last comment I went and tracked down the issue the code above comes from, because I remembered it being the problematic code in question. Which then brought me to this issue, and then this issue which really documents what's going on. And then I spent a fair amount of time thinking about this.

For whatever reason I didn't understand that it was PowerShell that was producing the error here, just because I didn't think it would be trying to resolve types at parse time. I've heard before that it's to do with being able to integrate with .NET types (although I'm not sure why Add-Type works in that case then). But long story short, I owe you an apology on that one @bergmeister.

But to summarise, here are some dot points (most of which I imagine are already well known to @bergmeister , but I document here anyway):

Type resolution within PowerShell classes and their members occurs at parse time, meaning that runtime type loading that occurs in Add-Type and Import-Module won't work for classes; even though the statements might come before the classes that use the types, they can't execute until after the class is parsed and the check fails.
This means that PSSA is quite correct when it says there's an error there -- those scripts won't run unless the type is loaded at parse time. Which means users should be loading them before running PSSA too.
@bergmeister is quite right that there is an unfinished feature in the using assembly statement, where we should offer parse-time type loading for classes. However, loading an assembly the normal way here would give an entry point for arbitrary code execution at parse time (when we'd like to keep script analysis and AST generation pure and safe operations).
As far as I know, the only alternative is to do metadata parsing by hand, which is a very sizeable implementation task in terms of handling arbitrary .NET types. To be clear, the work in this PR is only looking for specific attributes on types and then ignores everything else. I think there could be a lot more work required to do something like build an arbitrary TypeInfo or something like that.
But changing any of the parse-time class type resolution behaviour would also be an unfeasible change for PowerShell.

The idea that a PowerShell script's AST/parse could be valid or invalid based on the environment its parsed in doesn't really sit well with me, but I'm sure there were good reasons for going that way. It looks like maybe we should implement the metadata reading, but the performance cost could be considerable, especially for complex webs of inter-referenced assemblies.

Anyway, PSSA is correctly reporting that PowerShell won't run the script under normal circumstances. And in this case, I think you still get a fully-formed AST (you just also get an error). You can try this:

$file = "$PWD\ex.ps1"

$problem = @'
using namespace Microsoft.Azure.Commands.ResourceManager.Cmdlets.SdkModels
using namespace Microsoft.Azure.Commands.Common.Authentication.Abstractions
Import-Module "AzureRm"
class MyClass { [IStorageContext]$StorageContext }
'@

Set-Content -Path $file -Value $problem

$tokens = @()
$errors = @()

$ast = [System.Management.Automation.Language.Parser]::ParseFile($file, [ref]$tokens, [ref]$errors)

$ast.EndBlock.Statements[1].Members[0].PropertyType

A complete, well-formed AST is still given back, it's just that the error array is not empty.

bergmeister · 2018-03-31T16:43:54Z

@rjmholt Thank you very much for putting so much thought into it. I did not know that the Parser can still continue and return an AST despite parse errors. Because PSSA throws on any parse errors, I made the wrong assumption that parsing cannot continue at all after a parse error. This will help PSSA very much because it can then simply ignore/remove TypeNotFound parse errors and continue. I had not clue it could be that simple, thank you very much. :-)

rjmholt · 2018-03-31T16:53:33Z

Well the parser is designed (to my knowledge) not to throw at all, since it's trying to be helpful about errors. So it just collects errors and keeps parsing. If the parser actually throws, it's because something very bad has happened that it can't recover from. But otherwise it will try and make the most sense that it can and keep parsing. Even in cases where there's some malformed syntax, you will still get an AST, but it will contain error ASTs where bad script was, up until the point where the parser can understand what's going on again.

iSazonov · 2018-03-31T18:47:21Z

I can add that the parser even tries to perform some recovery to continue the correct parsing after an error. If the parser does it wrong then this is a reason to open a issue.

rjmholt added 19 commits February 14, 2017 19:02

Moved DynamicKeyword to its own file. Wrote the metadata parser

60407d2

Added 0-arg ctor check

04854fa

Added remaining keyword attribute properties to metadata parser

77a3cc3

Added DynamicKeyword scoping

1e0e8c3

Removed internal command inheritance on Keyword. Moved GetModulesFrom…

ce86a2d

…UsingModule to parser from symbol resolver

Removed GetUnderlyingEnumType -- not supported in dotnetCore

229e221

Moved the DynamicKeyword.cs file to where it should be

13b9900

Fixed tokenizer to highlight command keywords

89da080

Fixed Command keyword rendering. Added error messages to keyword loader

9d8a674

Improved keyword loader failure. Added preliminary test files

a90e8a2

Added error and semantic tests

810b29c

Updated module paths. Added some Datastructure tests

407ce53

Added data structure tests

7b07d3a

Updated path string

e80a1a6

Updated module path string function

b5dc1a3

updated path string

a466f7f

Introduced AST tests

48d82e7

Fixed FileNotFoundException regression in UsingAssembly test

c2fd533

Cleaned up small code pieces. Removed app/packages.config

8e9eced

rjmholt requested review from BrucePay, daxian-dbw and lzybkr February 18, 2017 02:49

msftclas added the cla-required label Feb 18, 2017

PowerShellTeam added the Review - Needed The PR is being reviewed label Feb 18, 2017

daxian-dbw self-assigned this Feb 20, 2017

msftgits removed the cla-not-required label Sep 26, 2017

PowerShell deleted a comment from msftclas Sep 27, 2017

lzybkr mentioned this pull request Feb 20, 2018

For better DSL #6194

Closed

rjmholt closed this Feb 23, 2018

bergmeister mentioned this pull request Mar 3, 2018

Upgrade 'System.Automation.Management' NuGet package of version 6.0.0-alpha13 to official version 6.0.2 from powershell-core feed, which requires upgrade to netstandard2.0 PowerShell/PSScriptAnalyzer#919

Merged

5 tasks

rjmholt mentioned this pull request Apr 14, 2018

Use the MetadataLoadContext to restore reflection-only type discovery for mechanisms like using assembly #6653

Closed

iSazonov mentioned this pull request Oct 2, 2020

Support only DSC class based resources #13731

Closed

Dynamic keyword metadata reader #3169

Dynamic keyword metadata reader #3169

Uh oh!

Conversation

rjmholt commented Feb 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msftclas commented Feb 18, 2017

Uh oh!

rjmholt commented Feb 18, 2017

Uh oh!

lzybkr commented Feb 18, 2017

Uh oh!

rjmholt commented Feb 19, 2017

Uh oh!

lzybkr commented Feb 19, 2017

Uh oh!

bergmeister commented Feb 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SteveL-MSFT commented Feb 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bergmeister commented Feb 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SteveL-MSFT commented Feb 21, 2018

Uh oh!

bergmeister commented Feb 21, 2018

Uh oh!

SteveL-MSFT commented Feb 22, 2018

Uh oh!

lzybkr commented Feb 22, 2018

Uh oh!

rjmholt commented Feb 23, 2018

Uh oh!

rjmholt commented Feb 23, 2018

Uh oh!

SteveL-MSFT commented Feb 23, 2018

Uh oh!

iSazonov commented Feb 24, 2018

Uh oh!

daxian-dbw commented Feb 27, 2018

Uh oh!

bergmeister commented Mar 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rjmholt commented Mar 14, 2018

Uh oh!

bergmeister commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rjmholt commented Mar 15, 2018

Uh oh!

bergmeister commented Mar 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rjmholt commented Mar 29, 2018

Uh oh!

bergmeister commented Mar 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rjmholt commented Mar 31, 2018

Uh oh!

bergmeister commented Mar 31, 2018

Uh oh!

rjmholt commented Mar 31, 2018

Uh oh!

iSazonov commented Mar 31, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

rjmholt commented Feb 18, 2017 •

edited

Loading

bergmeister commented Feb 18, 2018 •

edited

Loading

SteveL-MSFT commented Feb 21, 2018 •

edited

Loading

bergmeister commented Feb 21, 2018 •

edited

Loading

bergmeister commented Mar 3, 2018 •

edited

Loading

bergmeister commented Mar 14, 2018 •

edited

Loading

bergmeister commented Mar 29, 2018 •

edited

Loading

bergmeister commented Mar 29, 2018 •

edited

Loading