-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Early feedback: PowerShell profiler #13589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
A method based on Ast visitors is more accurate, power and reliable. And more slow of cause. We could add some enhancements in PowerShell Engine to make it more faster - Why do we need to set breakpoints on every line? I hope we could avoid this. |
|
@iSazonov When you talk about AST being more accurate, powerful and reliable what are you comparing it to? The approach proposed in this PR gives us a way better baseline, and is able to profile any script without having to resort to AST. This is a huge limitation of PSProfiler / Measure-Script, you can analyze only a single scriptblock, and any referenced scripts are not instrumented. And you have to have an AST Visitor that understands every branch of your code and instruments it. This PR shows a way to do this without any of that. As for Code Coverage, I am sure it will help Pester a lot. Some additional analysis will need to be done on the returned data to identify lines that can't be hit (e.g. empty line, return, else etc.), but that is pretty easy, we already have AST analyzer that does that, and imho we could write a simple regex that gets us 95% there. Or do you see some obvious gap that I am missing? All in all, I am really excited about this change, and would love to get it in one form or another. 🙂 |
I was not too accurate - both options (AST visitors and sampling) have pros and cons. I believe we don't want to add an profiler implementation in PowerShell Engine but we could enhance PowerShell Engine so that simplify third-party profiler implementations. If PowerShell Debugger class is only source we can get an needed information from my suggestion is:
|
|
Sounds good, except for the last point:
Could this be all done directly by just providing a callback of some sort, e.g. |
I have big doubts that the callback is applicable here. A common practice is to release events so that an external tool can collect and analyze them. |
|
Added @PaulHigin for debugger expertise and @daxian-dbw for performance insights. This seems like a really valuable addition. The questions in my mind are:
|
Debugger class is created per ExecutionContext.
Debugger class is already injected in execution process and if we could use this class it would greatly simplify the implementation. |
|
@PaulHigin @daxian-dbw Tagging you here one more time, in case you see some fundamental problems with this proposal. Thanks! |
|
One more thought.
This is always a classic tradeoff. Do we allow arbitrary callbacks so we can reuse this later but which could become a doorway for all kinds of issues, or do we specialise the API? Or do we make the callbacks into asynchronous events? For a trace API, I don't think it needs to be synchronous (i.e. it can simply emit events with timestamps to subscriber threads). In turn that makes the API a bit more cohesive; it's not for arbitrary callbacks, but for trace events saying when and where code was hit. The only problem with not being synchronous is that a subscriber needs a way to know that a session has ended. |
|
Another possible problem with it being async is that there are well known complications that can tend to happen for any PS-focused tooling that might want to make use of such an API. That could perhaps be alleviated by packing it into an in-box cmdlet, though. |
We use standard trace API and it already injected in all PowerShell code base. Writing one more event will not worsen anything. (If this is really a problem then we will have to rewrite all the tracing code which looks unnecessary.) |
|
I tried implementing it via the TraceSource, and it is pretty annoying.
But here is another proposal, that does the same thing without most of the loose ends:
This has great performance, the data are immediately available. And it requires minimal changes. The Api is also pretty natural to use, and don't require you to implement any c# code if you want to consume it (unlike using an event, where the handler could not be powershell action, because that would cause stack overflow): Set-PSDebug -Trace 1 -TraceVariable "trace"
<Run your code>
Set-PSDebug -Off
<Fiddle with the $trace as you like. >I played with it a bit and if the line struct looks like this, it seems to consume little memory, and have no noticable perf impact on the execution. I've run some Pester tests 10 times in a row and they still run for the same amount of time, and ran fast: public struct PSTraceLine
{
public IScriptExtent Extent { get; set; }
public long Timestamp { get; set; }
}Additionally, by being able to provide the IList instance ourselves, we can implement minimal filtering directly in the collection if we wish. What do you think? |
|
Code for that is here master...nohwnd:pwsh-profiler-take2 |
Why is it worse than Also, for better UX we could implement Enable-Profiler/Disable-Profiler/Set-Profiler.
It is easy to implement such inMemoryListener. Also alternative scenarios could prefer a file output. The output should be binary (like EWT does) for performance. (If we converted timestamps to strings or Datetime we would lost performance over 100x or more.)
As I already said it is not right place for the profile feature. It is better to use *-Trace cmdlets or create new *-Profile cmdlets.
If we want to have a profiler feature in Engine it must works for both scripts and binaries too. I'd expect users will want to implement alternative profilers (in different IDEs).
It is right to have Timestamp as binary but we still lost performance on coping Extent. We could issue events for file start and file end, between them events could contain Timestamp and Line number as binaries.
|
|
I dunno. I mean, I get that we do already have tracing cmdlets, but they trace events moreso than just lines of a script. I'm of the opinion that since the debugger is build from the beginning to be more cognizant of how a script is structured moreso than the engine is structured, it's a better place to put things here. |
I don't get this. The cmdlet already has -Trace parameter, it combines Debugging with Trace, which is exactly what I do. I just write to a collection instead of writing it to debug stream. I don't do any profiling in the code, I just write a Trace. Profiling would be additional functionality provided by external module that would consume the data. This would be just a very simple way to get the data. They are not very useful by themselves, but it allows me to keep the change to PowerShell minimal.
Trace-Command does not enable debugging by default. I have to enable tracing of my trace event, and then somehow enable debugging, but not by Set-PSDEbug -trace 1, because that writes to Debug stream. And then I need to run the code.
Yes, maybe, but why bother with adding another trace listener? We would also need to make it work for all the possible traces, because the current api of Trace-Command is not suited well for using different tracers with different sources of events.
Yes but it would enable it before running the tests file, because we don't control how users import their dependencies. So we need a way to tell the profiler which files we are interested in, otherwise it captures everything from everywhere. Putting a lot of noise in the trace, and storing unnecessary data.
By binaries you mean binary modules? That would require a totally different approach and I and the current tracing that writes to Debug does not do that either. Profiling your own "C#" code is not hard, you have VS and all the awesome tooling at your disposal. I don't see why we should complicated it with this. I also don't understand how implementing different profilers depends on being able to trace binary code. And I doubt there will be more that a handful of profilers, there is only so much you can do with the trace.
Is it not just copying the struct that has a reference to IScriptExtent that already exists? I don't see any impact on the execution time, and it also is running in debug mode, so we expect some perf degradation to be there. Having an event for start and end of file is further complicating the problem. |
It is huge work to refactor all trace code. But it is not block additions which we can implement in right way.
No, I mean hosting applications. I think about a profiler in VS Code/PES. You are trying to find minimal and fastest solution for Pester (and maybe script scenarios (calling Profiler from script)) but I am sure that PowerShell engine must expose more power API so that users can profile scripts from scripts but also create other kinds of profilers and analyzers. |
Why are you forcing me to implement this in the most complicated way then? Trace-Command does not enable debugging, and I need debugging. Trace-Command does not have in memory trace and I need in memory tracer. On the other hand Set-PSDebug enables debugging, and already has -Trace parameter, I just want a way to write the data to a different stream that I can consume, not just to Debug.
You are mostly right, I am trying to find a minimal solution so I can finally profile my scripts, without jumping through so many hoops. All I ask for is being able to write the trace in a way that I can consume it, instead of being written into Debug stream. That is it. You are trying to make this into an effort to allow profilers to be written in any IDE, but do you have any asks for it? If someone wants to integrate this into VSCode they can call the same cmdlets as I do, or they can download the module that we will write around this, and use it. Or replicate that code in their language of choice. Look at Pester, it is also an external module and it integrates with VSCode. I am slowly giving up here. So how about the events that @rjmholt mentioned, is there an example of such event already in the engine, and where would I surface it? |
|
Here is it implemented with event, much like master...nohwnd:pwsh-trace-with-sync-event I make sure to not invoke the action when already in the action. This way you can hook it up from PowerShell, by simply providing a scriptblock. You can also debug the provided scriptblock. With my basic example there is again no significant perf degradation even if running powershell scriptblock as the callback. I don't think that making it async is worth it, because it is obvious that the more work you do in the callback the slower the code will be, and it would be difficult to leverage that in powershell. In C# we can simply push the data into a queue and process it on another thread, if we need. But seeing that even using scriptblock callback causes no visible perf degradation it probably won't be needed. The naming and location could be improved. Do you have any suggestions? |
I cannot force you. :-) Since we intend to make a new public API, we need to weigh all the options.
It was always a big gap in PowerShell. We can find in Internet some Profiles implemented on tracing. Also MSFT team have an interest #7857 (comment) |
If we consider only "in-session" profiling this looks like overhead and previous prototype is more good for the scenario where you directly collect full info in a list. |
|
It seems @daxian-dbw is busy maybe @lzybkr will share some thoughts. |
|
I honestly never considered I do prefer an EventSource (or ETW for Windows) based approach. I actually started this approach a long time back here. A huge benefit to ETW is the ability to enable a trace from outside the process. It's more work to produce useful reports, but it's useful when you need to profile:
That said, in-process is important too, and EventSource supports that. I don't think EventSource is a huge effort to support in-process, but my WIP is incomplete because my approach introduces some extra complexity to support ETW/out-of-process reporting. The basic idea is to, while measuring, add as little impact as possible, collecting minimal information (collecting a GUID + sequence number while tracing) - then post process. In-process is a bit easier as the data structures are still in memory, ETW would require additional events to enable mapping the GUID+sequence number back to something meaningful. These events would be written after stopping the trace, similar to the .Net rundown events. |
|
@lzybkr Many thanks! We could continue this work if @daxian-dbw hadn't intercepted it already. Have you a time to mentor? |
|
@iSazonov - I could mentor if someone wants to pick up my approach. |
|
@lzybkr Thanks! Now I'm working with your code and I was able to get the first results. Now I'm stuck on "rundown events". because there we need to send an array of SequencePoint-s and I do not still find how implement this. |
|
@iSazonov - You would have 2 event sources:
I was thinking the
This way, a tool need not depend on PowerShell at all and could generate a nice report. |
|
This pull request has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 15 days. It will be closed if no further activity occurs within 10 days of this comment. |
PR Summary
Trying to get early feedback on a feature proposal.
This PR enhances
Set-PSDebug -Traceto write down timestamps for every powershell instruction instead of just writing to Debug stream. This information allows us to write a very simple, but very effective profiler, which does not require complicated AST rewriting to instrument the code. Additionally, this profiler is able to profile any code that runs in the session, in any ps file, not just a single scriptblock. Additionally this would greatly simplify Code Coverage in Pester where we currently rely on AST analysis, and setting breakpoints on any hittable command, which makes code coverage extremely slow. This approach on the other hand seems to have very little overhead.The behavior provided in this PR is similar to https://www.powershellgallery.com/packages/PSProfiler/1.0.2.0 and it's predecessors, but does not use any of it's code.
A running example is provided in profiler-demo/demo.ps1, it profiles to two attached files and shows output like this:
The implementation is very simple. Trace level 3 is introduced to
Set-PSDebug. WhenSet-PSDebug -Trace 3is used, every PowerShell instruction adds a hit entry into[System.Management.Automation.Profiler]::Hitscollection. The hit entry lists in which file, on which line and when the hit happened. It also checks if there is any previous hit and updates when it ended, to get the total duration of that instruction. The data are then exposed via a static property and processed byGet-Profilefunction (attached in the demo.ps1). That function simply adds up all the traced entries for the files and returns them in an object.This gives us at minimum, overview of how many times was each line hit, and how long the line took overall. But with more careful analysis we could also see who called the current function the most, how long did a given function or branch took overall, what are the hotspots etc.
The implementation should mainly show a proof of concept, and start a discussion of how this data should be made available and how much of the profiling functionality should be implemented in powershell itself. I am purposefully limiting the profiling to just files, to avoid handing some of the edge cases. I also butchered the Debug output in some cases.
I would love answers to the following questions:
-likewildcard, to allow profiling many files at the same time, but not absolutely everything that get's executed.In the end we might have something like this:
Or maybe some
Measure-Script -ScriptBlock { }type of syntax.Ideally a basic way would be built in, with the ability to get the raw data, and process them yourself.
PR Context
Profiling code, and collecting Code Coverage is extremely difficult in PowerShell, but with little support from the runtime we can easily profile and code coverage, in a performant way.
PR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerWIP:or[ WIP ]to the beginning of the title (theWIPbot will keep its status check atPendingwhile the prefix is present) and remove the prefix when the PR is ready.