Skip to content

Conversation

@rjmholt
Copy link
Collaborator

@rjmholt rjmholt commented Feb 18, 2017

This PR adds a new entry point for defining DynamicKeywords to PowerShell. Where currently these can be added at runtime in PowerShell, this now uses CIL metadata in an assembly as a specification of the syntax requirements, and performs some validation of the specification.

I've also added the concept of nested keywords -- there are keywords that will only be parsed and invoked as keywords when they are in the scope of another keyword. This seemed like a common feature in Domain Specific Languages we've been looking at.

I still need to do some work on this PR, but I want to get the ball rolling because it's a substantial addition.

Work to still to do includes:

  • Making the tests work on Linux. I didn't want to submit before this happened, but the errors are in module loading (module paths) and Add-Type ("Could not load type...") -- so I thought I would ask for advice on this
  • Adding a Types.ps1xml-like test case
  • Adding more tests for UseMode checking, keyword scope, and other DynamicKeyword features
  • Adding completions/intellisense for keywords

I've been doing work on adding DynamicKeyword runtime execution using C# code (so a domain-specific language for PowerShell can be defined entirely in a C#/dotnet module/assembly -- see here for a very rough prototype), and this PR is essentially the first step toward that. I intend to submit an issue describing the work and a roadmap soon.

Finally, in this PR, there are some points I think that need special scrutiny:

  • The UseMode concept (restricting how many times a keyword may appear in a block) -- it may be unneccessary or unwanted, and is where the metadata reader starts to verge on too opinionated. I included it because it was part of the RFC I submitted.
  • The metadata reader type providers -- one uses System.String and the other System.Type, but they could both be consolidated to use a custom type description class. I've seen dotnet core/corefx use a class called TypeDesc, but that might be a bit heavyweight. The type providers are also only partially implemented because some methods are never called -- but it would be nice to fix that.
  • Error messages -- I'm not sure what the standard way to report errors to the user is

@msftclas
Copy link

Hi @rjmholt, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!


It looks like you're a Microsoft contributor. If you're full-time or an intern, we DON'T require a Contribution License Agreement. If you are a vendor, please DO sign the electronic Contribution License Agreement. It will take 2 minutes and there's no faxing! https://cla.microsoft.com.

TTYL, MSBOT;

@PowerShellTeam PowerShellTeam added the Review - Needed The PR is being reviewed label Feb 18, 2017
@rjmholt
Copy link
Collaborator Author

rjmholt commented Feb 18, 2017

I have a rough idea of what needs fixing for the Unix support, but does AppVeyor use something other than project.json to resolve dependencies? I added System.Reflection.Metadata there, but clearly it needs to go somewhere else.

@lzybkr
Copy link
Contributor

lzybkr commented Feb 18, 2017

project.json is it. You are likely missing a reference to System.Collections.Immutable.

You can try to repro locally by clearing your nuget cache unless it was resolving that assembly to the gac.

@rjmholt
Copy link
Collaborator Author

rjmholt commented Feb 19, 2017

The System.Reflection.Metadata package already depends on System.Collections.Immutable. Should I add System.Collections.Immutable anyway? I also noticed it's not resolving on types from System.Reflection.Metadata in the error message -- I can only replicate the problem when building with -FullCLR, which gives me a pretty perfect repro, but still comes up when I add System.Collections.Immutable by hand. Where does FullCLR look for dependencies?

@lzybkr
Copy link
Contributor

lzybkr commented Feb 19, 2017

Ah, yeah, that's the failure in AppVeyor - the target is net451 - which I've stopped doing in #3066 - we just need to get that approved.

@daxian-dbw daxian-dbw self-assigned this Feb 20, 2017
@PowerShell PowerShell deleted a comment from msftclas Sep 27, 2017
@bergmeister
Copy link
Contributor

bergmeister commented Feb 18, 2018

@rjmholt Is this PR still active or do you have a rough timeline for this? If so, can you please resolve the conflicts in the meantime? This fix would be very helpful to resolve this parsing issue in PSScriptAnalyzer.

@lzybkr lzybkr mentioned this pull request Feb 20, 2018
@SteveL-MSFT
Copy link
Member

SteveL-MSFT commented Feb 21, 2018

@bergmeister @rjmholt was an intern on my team at the time this PR was submitted and went back to finish school shortly after. He's actually joining Microsoft as a full time employee on my team the first week of March. @rjmholt should work with @daxian-dbw to determine if we should continue with this PR as-is.

@bergmeister
Copy link
Contributor

bergmeister commented Feb 21, 2018

@SteveL-MSFT Thanks for the update. Assuming that this gets fixed, would this also mean that if you published then new PowerShell referencing assemblies that tools such as PSSA using the AST parser could receive this fix to be able to parse types that are imported by using statements? I don't need a precise answer but would like to know if this is something that is technically feasible or would this scenario require more? Knowing this could help us make a decision whether we need to workaround this issue in PSSA or if we can wait for the parser improvements?

@SteveL-MSFT
Copy link
Member

@bergmeister PSSA still needs to support older version of PowerShell though?

@bergmeister
Copy link
Contributor

@SteveL-MSFT Yes, but if I understand correctly, the referencing assemblies (containing the parser) are compiled into PSSA. Therefore my hope that updated reference assemblies would solve it. WDYT?

@SteveL-MSFT
Copy link
Member

@bergmeister I see, if it's only the reference assemblies and not the runtime assemblies (don't know enough about the internals of PSSA), it may work. Since this work is not committed, my recommendation is to not build a dependency on it.

@lzybkr
Copy link
Contributor

lzybkr commented Feb 22, 2018

@bergmeister - the reference assemblies contain no code - they are like a header file in C/C++ or an interface definition.

It does make sense to produce nuget packages for some of PowerShell like the parser, but it's a ton of work to factor things in a reasonable way to be useful.

@rjmholt
Copy link
Collaborator Author

rjmholt commented Feb 23, 2018

I've been meaning to close this PR. I'll revisit it later with @daxian-dbw , but the code is stale now and if we proceed with it, it's probably worth rewriting a fair amount of it.

There seems to be some interest in the feature though (#6194, #4016), so I would like to spend some more time on it.

@bergmeister this PR is intended more to extend the DynamicKeyword feature than to fix anything. I read through the PSSA issue you linked, but not sure I fully understand the fix. It's possible a smaller fix could be implemented based on this PR (and also serve as an incremental step toward DSL support?).

@rjmholt rjmholt closed this Feb 23, 2018
@rjmholt
Copy link
Collaborator Author

rjmholt commented Feb 23, 2018

Ah, the other thing to note is that having tests for this feature is very painful so long as we require unloading the assemblies that describe the DynamicKeywords. See corefx #552.

@SteveL-MSFT
Copy link
Member

@rjmholt I think a workaround to loaded assemblies is to start new pwsh processes

@iSazonov
Copy link
Collaborator

It seems AppDomains come back in .Net Core 2.1?

@daxian-dbw
Copy link
Member

I have the changes in this PR rebased and moved to my fork at https://github.com/daxian-dbw/PowerShell/tree/dsl. It was rebased a while ago, but at least it was rebased to the point after we moved to msbuild. It won't be too much trouble to rebase it again.

I have a separate branch my-dsl that contains some further improvement work I made on top of this PR, it's at https://github.com/daxian-dbw/PowerShell/tree/my-dsl. The work was not finished.

@bergmeister
Copy link
Contributor

bergmeister commented Mar 3, 2018

@lzybkr It turns out, PSScriptAnalyzer is not using the reference assemblies of PowerShell but has a self-contained copy of System.Management.Automation.dll and its dependencies and therefore the parser. I just made a proof of concept where I can parse the AST using System.Management.Automation.Language.Parses.ParseInput(...) on an Ubuntu VM in a .net core console app that does not have PowerShell installed but only the .net core runtime. The reason why this works is because we use an unofficial NuGet package of 6.0.0-alpha13 (because the official ones on Nuget are only full .net and not compatible with netcore2.0 or netstandard1.6) for getting System.Management.Automation.dll and its dependencies like Microsoft.Management.Infrastructure.
Using either the System.Management.Automation or Microsoft.PowerShell.SDK packages from https://powershell.myget.org/F/powershell-core/api/v3/index.json feed I could get the prototype to work as well but I need to do a PoC for PSSA to see which of the 2 are required for PSSA. What I am not sure about is whether we would need to do a dotnet publish to be sure that it really uses the nuget package and not the parser of PowerShell. It would be interesting if you could comment on the PR below.

@rjmholt
Copy link
Collaborator Author

rjmholt commented Mar 14, 2018

I've discussed this PR with @daxian-dbw, and it seems simpler to approach DSL support without metadata-parsing for now (e.g. using PowerShell classes as schemas). Implementation is not a priority currently though.

@bergmeister: I know there's a use-case for PSSA, but the implementation here was designed only to read DSL specifications and error on anything else. Possibly the parse-time assembly load meant that PSSA could discover types in using statements, but this is not desired behavior. The purpose of using the metadata reader was to prevent parse-time execution of static .NET code blocks (and to save us having to deal with the lack of AppDomains and assembly unloading in .NET Core, as well as a performance hit on module loading). Because of those properties in .NET Core, the general recommendation is to rely on the module manifest to provide the information. However, talking with @JamesWTruher, I can imagine there is a use case for PSSA to do its own metadata analysis to provide richer script analysis, and that could also pay dividends for things like completions. But I'm not sure there are good reasons for metadata parsing to be in PowerShell's engine itself.

@bergmeister
Copy link
Contributor

bergmeister commented Mar 14, 2018

@rjmholt Ok, I'll have to accept for now that this feature will not be implemented in PowerShell's parser. But what does the PowerShell team suggest to do for PSSA? PSSA depends on PowerShell's parser (it does not use the one being installed but has it's own version of it as part of a self-contained PSCore version of System.Management.Automation. Should PSSA then maintain its own customized fork of PowerShell in order to build and consume a custom parser? Catching the ParseException for TargetObject TypeNotFound and trying to remove the type name in memory might be a temporary workaround but is going to be hairy when using commands such as Invoke-Formatter or the -Fix switch, which actually manipulate the script content.

@rjmholt
Copy link
Collaborator Author

rjmholt commented Mar 15, 2018

@bergmeister We might be speaking at cross-purposes a little here -- essentially the fact that this prototype feature loaded modules at parse-time was something we don't want, because we don't want arbitrary code execution when we're just parsing PowerShell. It's one of the several reasons this PR was rejected. In fact, for PSSA to avoid executing malicious code by just parsing PowerShell, it will need to avoid module loading too.

I'm not familiar at all with PSSA's codebase though, so I didn't quite understand where you talked about a ParseException and Invoke-Formatter. Could you give the links to where those occur in the PSSA code so I can take a look?

In terms of the parser, I think the System.Management.Automation.dll parser is definitely the one to use (even though it won't and shouldn't ever load a module in at parse time). But, the way I imagine it, PSSA could use the parser to get an AST as normal, and then extract any module-loading statements from that (maybe in AnalyzeSyntaxTree), and use that information to go and look at that module. If it's a PowerShell module, then PSSA just runs the parser on it. If it's a DLL, then maybe a module manifest or possibly use a metadata reader to read the attributes.

@bergmeister
Copy link
Contributor

bergmeister commented Mar 29, 2018

@rjmholt Sorry for coming back to you later, I forgot about it. I understand and respect the decision that the PowerShell parser won't get those module loading features. In short PSSA is relying on the parser to analyse ASTs and calls scriptAst = Parser.ParseFile(filePath, out scriptTokens, out errors); here or Parser.ParseInput(scriptDefinition, out scriptTokens, out errors); here initially to get the ASTs and then statically analyses them. If parsing fails, then PSSA cannot do its job for the whole file.
Since a custom fork of the parser seems to be undesired for security reasons (although I think having a -Force switch to allow loading of dlls would be reasonable), I was thinking of trying to catch the parser errors and trying to replace the 'naughty' type definitions of the in-memory presentation (assuming it is possible to extract the exact error in a locale independent way) of the script to make it parse-able. There are some special features like Invoke-Formatter or Invoke-ScriptAnalyzer -Fix that take either a string of the script or a path to the script file and outputs them with some rules being auto-fixed (this is how formatting of selected code using Ctrl+K+F works in VSCode btw). Therefore you can see that using a workaround of removing the type strings introduces the complexity of having to add them back in when outputting the script.
Maybe I am missing something but it would be great it the PowerShell team could provide some suggestions on how to better handle scripts that are not parseable.

@rjmholt
Copy link
Collaborator Author

rjmholt commented Mar 29, 2018

Hmmm, do you know what error the parser throws when a type is used that isn't loaded. Philosophically, the parser shouldn't be looking up types at all. You should be able to use a string that is totally meaningless as long as it is syntactically representative of a type ([Foo.Quux]::Banana()) and get an AST. If there are some kind of type lookups or semantic checks happening earlier than I would expect and causing an error, then our path forward could be complicated. /cc @daxian-dbw

@bergmeister
Copy link
Contributor

bergmeister commented Mar 29, 2018

@rjmholt An example is

using namespace Microsoft.Azure.Commands.ResourceManager.Cmdlets.SdkModels
using namespace Microsoft.Azure.Commands.Common.Authentication.Abstractions
Import-Module "AzureRm"
class MyClass { [IStorageContext]$StorageContext }

We get the parse error Unable to find type [IStorageContext] at line 4 column 18.. If the parser supported using unknown types like [Foo.Banana] (maybe using a different mode/overload), etc then this would solve the issue for PSSA.

@rjmholt
Copy link
Collaborator Author

rjmholt commented Mar 31, 2018

Immediately after writing that last comment I went and tracked down the issue the code above comes from, because I remembered it being the problematic code in question. Which then brought me to this issue, and then this issue which really documents what's going on. And then I spent a fair amount of time thinking about this.

For whatever reason I didn't understand that it was PowerShell that was producing the error here, just because I didn't think it would be trying to resolve types at parse time. I've heard before that it's to do with being able to integrate with .NET types (although I'm not sure why Add-Type works in that case then). But long story short, I owe you an apology on that one @bergmeister.

But to summarise, here are some dot points (most of which I imagine are already well known to @bergmeister , but I document here anyway):

  • Type resolution within PowerShell classes and their members occurs at parse time, meaning that runtime type loading that occurs in Add-Type and Import-Module won't work for classes; even though the statements might come before the classes that use the types, they can't execute until after the class is parsed and the check fails.
  • This means that PSSA is quite correct when it says there's an error there -- those scripts won't run unless the type is loaded at parse time. Which means users should be loading them before running PSSA too.
  • @bergmeister is quite right that there is an unfinished feature in the using assembly statement, where we should offer parse-time type loading for classes. However, loading an assembly the normal way here would give an entry point for arbitrary code execution at parse time (when we'd like to keep script analysis and AST generation pure and safe operations).
  • As far as I know, the only alternative is to do metadata parsing by hand, which is a very sizeable implementation task in terms of handling arbitrary .NET types. To be clear, the work in this PR is only looking for specific attributes on types and then ignores everything else. I think there could be a lot more work required to do something like build an arbitrary TypeInfo or something like that.
  • But changing any of the parse-time class type resolution behaviour would also be an unfeasible change for PowerShell.

The idea that a PowerShell script's AST/parse could be valid or invalid based on the environment its parsed in doesn't really sit well with me, but I'm sure there were good reasons for going that way. It looks like maybe we should implement the metadata reading, but the performance cost could be considerable, especially for complex webs of inter-referenced assemblies.

Anyway, PSSA is correctly reporting that PowerShell won't run the script under normal circumstances. And in this case, I think you still get a fully-formed AST (you just also get an error). You can try this:

$file = "$PWD\ex.ps1"

$problem = @'
using namespace Microsoft.Azure.Commands.ResourceManager.Cmdlets.SdkModels
using namespace Microsoft.Azure.Commands.Common.Authentication.Abstractions
Import-Module "AzureRm"
class MyClass { [IStorageContext]$StorageContext }
'@

Set-Content -Path $file -Value $problem

$tokens = @()
$errors = @()

$ast = [System.Management.Automation.Language.Parser]::ParseFile($file, [ref]$tokens, [ref]$errors)

$ast.EndBlock.Statements[1].Members[0].PropertyType

A complete, well-formed AST is still given back, it's just that the error array is not empty.

@bergmeister
Copy link
Contributor

@rjmholt Thank you very much for putting so much thought into it. I did not know that the Parser can still continue and return an AST despite parse errors. Because PSSA throws on any parse errors, I made the wrong assumption that parsing cannot continue at all after a parse error. This will help PSSA very much because it can then simply ignore/remove TypeNotFound parse errors and continue. I had not clue it could be that simple, thank you very much. :-)

@rjmholt
Copy link
Collaborator Author

rjmholt commented Mar 31, 2018

Well the parser is designed (to my knowledge) not to throw at all, since it's trying to be helpful about errors. So it just collects errors and keeps parsing. If the parser actually throws, it's because something very bad has happened that it can't recover from. But otherwise it will try and make the most sense that it can and keep parsing. Even in cases where there's some malformed syntax, you will still get an AST, but it will contain error ASTs where bad script was, up until the point where the parser can understand what's going on again.

@iSazonov
Copy link
Collaborator

I can add that the parser even tries to perform some recovery to continue the correct parsing after an error. If the parser does it wrong then this is a reason to open a issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review - Needed The PR is being reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants