Skip to content

Conversation

@iSazonov
Copy link
Collaborator

@iSazonov iSazonov commented Nov 28, 2020

PR Summary

At startup we send a telemetry that delays the startup.
The delay is from 10 to 100 ms on my notebook but under other conditions this delay may be even worse (issues with DNS, network, target telemetry service).

  • TrackEvent() is sync and causes ApplicationInsightsTelemetry static class initialization
  • GetMetric().TrackValue() delays sending telemetry on 1 minutes but initialize ApplicationInsightsTelemetry too.

The fix is to send startup telemetry in a background thread with 500 ms delay.

Open question is how many do we want to postpone the send? Current value is 500 ms. This corresponds to the start time of pwsh. If we reduce this time, it may slow down the launch. If we increase this time, we can lose telemetry for short processes.
Additionally, we could add a wait for this task to complete in PowerShell exit code to ensure that the telemetry is transmitted

Perf win on my notebook:

  • 35 ms ApplicationInsightsTelemetry..cctor()
  • from 10 ms up to 100 ms telemetry send

Screenshots show the delays we removed:
image

image

PR Context

Tracking issue #14268

PR Checklist

@iSazonov iSazonov added the CL-Performance Indicates that a PR should be marked as a performance improvement in the Change Log label Nov 28, 2020
@iSazonov iSazonov requested a review from anmenaga as a code owner November 28, 2020 18:03
@iSazonov iSazonov changed the title Sens telemetry lazy at startup Send telemetry lazy at startup Nov 28, 2020
@daxian-dbw
Copy link
Member

I didn't know you were doing something similar :)

The fix is to send startup telemetry in a background thread with 500 ms delay.

No, we cannot delay the 'sending telemetry work' by 500 ms. This will likely cause telemetry to not be sent at all when a user executes command directly using pwsh -noprofile -c run-a-command. Startup telemetry should be fired right away and the task needs to be waited on before pwsh exists, so we can make sure the SendPSCoreStartupTelemetry call actually run. Please see #14304 for details about how I approached it.

TrackEvent() is sync

What do you mean by "is sync"? Is it sending out package to the server synchronously (I doubt it ...)?

@iSazonov
Copy link
Collaborator Author

iSazonov commented Dec 3, 2020

I didn't know you were doing something similar :)

I am collecting all in #14268 issue to sync the work with others.

TrackEvent() is sync

What do you mean by "is sync"? Is it sending out package to the server synchronously (I doubt it ...)?

Yes, I think it is sending synchronously to to the server. I read docs and best practices and found the recommendation to put it in background. Traces show the same for me.
I am not sure whether it makes sense to ask owners of the package to improve performance/add async-s or maybe MSFT could consider https://github.com/open-telemetry/opentelemetry-dotnet.

No, we cannot delay the 'sending telemetry work' by 500 ms. This will likely cause telemetry to not be sent at all when a user executes command directly using pwsh -noprofile -c run-a-command. Startup telemetry should be fired right away and the task needs to be waited on before pwsh exists, so we can make sure the SendPSCoreStartupTelemetry call actually run. Please see #14304 for details about how I approached it.

I see how you approached it. But my thoughts is again about low-power hardware -
I suppose it is important to do useful work (the short script) as quickly as possible and postpone sending telemetry but make it mandatory. For the latter, the solution could be to add (1) a delay and then (2) await for the background task in PowerShell exit code.

GitHub
The OpenTelemetry .NET Client. Contribute to open-telemetry/opentelemetry-dotnet development by creating an account on GitHub.

@daxian-dbw
Copy link
Member

Yes, I think it is sending synchronously to to the server. I read docs and best practices and found the recommendation to put it in background.

Where do you see that? Can you share the links? I will also check with @JamesWTruher on this.

But my thoughts is again about low-power hardware ...

On low-power hardware, it's even less deteministic about whether a task/worker thread can even be scheduled to run, and thus less deteministic about whether what we do here actually make a difference -- it may even be a penalty on those hardwares due to context switching or contention.

For #14304, I think I will leverage worker thread only if Environment.ProcessorCount >= 4.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Dec 3, 2020

Where do you see that? Can you share the links? I will also check with @JamesWTruher on this.

https://github.com/degant/application-insights-best-practices
https://docs.microsoft.com/en-us/dotnet/api/microsoft.applicationinsights.telemetryclient.trackmetric?view=azure-dotnet

This method is not the preferred method for sending metrics. Metrics should always be pre-aggregated across a time period before being sent.
Use one of the GetMetric(..) overloads to get a metric object for accessing SDK pre-aggregation capabilities.

If you could confirm it is always async it would be nice.

Another question how do we ensure the telemetry has been sent before PowerShell exit? I see TelemetryClient.Flush() async too.

On low-power hardware, it's even less deteministic about whether a task/worker thread can even be scheduled to run, and thus less deteministic about whether what we do here actually make a difference -- it may even be a penalty on those hardwares due to context switching or contention.

That is why I am convinced that we should do fewer threads - there are too many of them. We need to reconsider this. Regarding ApplicationInsights, unfortunately we will have to move this to a background thread.
And we need to take Experimental Features into account - you don't seem to be doing this, although while people are testing them in real scenarios it should work as fast as release too.

In last commits I have tried to make the process more deterministic.
(On Linux the behavior is so strange - s_telemetryCliebt is null but we never assign the null so there is no initialization! - I guess there is a tricky bug.)

Update:

after some thought, I suppose that we'd better wait and first do all the other optimizations that we can, as well as get and evaluate all the benefits of future versions of .Net 6.0 - if we get performance comparable to WinPS, then it generally makes no sense to complicate the code for the sake of ApplicationInsights. It's much better to ask the ApplicationInsights owners to improve this scenario. We have several months to wait. If they do this, then we won't have to do anything and complicate the code.
Also if a hardware is fast we don't get big perf win with background thread, if a hardware is slow we don't get noticeable perf win too.

For #14304, I think I will leverage worker thread only if Environment.ProcessorCount >= 4.

In the case we should make more common improvement and think about all threads - perhaps this helps with scenarios like Wasm (and maybe PSES). Doing this for ApplicationInsights only looks redundant.

GitHub
Application Insights Best Practices for MVC and Web API - degant/application-insights-best-practices
This method is not the preferred method for sending metrics. Metrics should always be pre-aggregated across a time period before being sent. Use one of the GetMetric(..) overloads to get a metric object for accessing SDK pre-aggregation capabilities. If you are implementing your own pre-aggregation logic, then you can use this method. If your application requires sending a separate telemetry item at every occasion without aggregation across time, you likely have a use case for event telemetry; see TrackEvent(EventTelemetry).

@iSazonov iSazonov force-pushed the perf-applicationinsights branch from 95b6799 to 9fb85c2 Compare December 3, 2020 10:04
@daxian-dbw
Copy link
Member

I checked with @JamesWTruher and he confirmed that ApplicationInsight does not send telemetry packets to the endpoint (server) in synchronous way. It maintains a in-memory buffer and uses thread pool thread for sending telemetry packets.

That's how a tracing library should work -- never sending packets on wire synchronous. Log4Net is bad just for that exact reason.

Pre-aggregation is more about processing the data before sending on the wire, so as to "reducing the volume of data sent from the SDK to the Application Insights collection endpoint". According to Jim, TrackEvent is used for startup telemetry because we are sending non-metric sort of data. "Events are amorphous blobs of data, metrics are more akin to performance counters".

Another question how do we ensure the telemetry has been sent before PowerShell exit? I see TelemetryClient.Flush() async too.

It's a best effort thing. It's possible that a proces may exit before the packets are sent out, even today. But that also means if a task is used for that job, we shouldn't enforce a delay on the task. Also, we probably should wait on that task to finish before giving the initial prompt to user (or something similar when it's pwsh -c xxx scenario).

That is why I am convinced that we should do fewer threads - there are too many of them. We need to reconsider this.

I want to remind you again on the philosophy here, leveraging parallel processing doesn't contradict with the work that improves specific/individual code paths. When it's handled correctly, they amplify the final result.

As for your concern about the background thread used by ApplicationInsight, I think you overestimated it's impact. The expensive part that I see in the startup telemetry collection, is the loading of the ApplicationInsight assembly and the initialization of types from it.

I suppose that we'd better wait and first do all the other optimizations that we can, as well as get and evaluate all the benefits of future versions of .Net 6.0 ...

I agree. I don't mean to rush my PR to the code base :) I'm not happy with the changes yet, and also it needs more testing to fine-tune the decisions like how many tasks to use, and what work load shall be done in tasks. But we also cannot wait till near the end of the release cycle for a PR like #14304.

About the .NET 6 runtime initialization improvement, can you share links about it?

@iSazonov
Copy link
Collaborator Author

iSazonov commented Dec 4, 2020

About the .NET 6 runtime initialization improvement, can you share links about it?

I collect all references in #14268.
Mains is dotnet/runtime#44545 and dotnet/runtime#44598
If you can communicate and work with Stephan Toub directly you have chance to improve PowerShell startup up to 30% I hope. :-)

Many thanks for clarifications about ApplicationInsight!

As for your concern about the background thread used by ApplicationInsight, I think you overestimated it's impact. The expensive part that I see in the startup telemetry collection, is the loading of the ApplicationInsight assembly and the initialization of types from it.

My worries are only about the initialization. It would be great if you could ask ApplicationInsight owners to improve this.
And you can see other cases in the trace. One with ToLower() we already merged. In #14268 there are others.

Also, we probably should wait on that task to finish before giving the initial prompt to user (or something similar when it's pwsh -c xxx scenario).

It is again question for ApplicationInsight owners. I hope they can do more and better than us.

I want to remind you again on the philosophy here, leveraging parallel processing doesn't contradict with the work that improves specific/individual code paths. When it's handled correctly, they amplify the final result.

I completely trust your experience (many thanks for discussions with me), but I have absolutely no idea how this is implemented in depth (after Stephan Toub's articles, I only have an understanding that everything is complicated) and how we could reliably check this in tests, especially for a startup script.
So I'm still in the opinion that it is better to make a couple of threads fast than hoping that ten pots will be better. This is me about myself, maybe you know how to make these checks reliable.

I agree. I don't mean to rush my PR to the code base :)

:-)))) So it seems we have a chance to see how the open PR counter is rapidly decreasing from 150 to 0 - half of them you and your team can merge in a day as very simple.

I updated #14268 with our last thoughts - feel free to add more there. Thanks!

@daxian-dbw
Copy link
Member

It would be great if you could ask ApplicationInsight owners to improve this.

I'm afraid the answer would be "go ahead take a look at our code base, any perf improvement contributions are welcome!" 😄
To be fair, the type initialization for many PowerShell types is also pretty expensive, for example, System.Management.Automation.Utils -- It's just too large.

Thanks for the compilation in #14268, and I'm glad that we are on the same page now.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Dec 4, 2020

I'm afraid the answer would be "go ahead take a look at our code base, any perf improvement contributions are welcome!" 😄

Threaten them migrating to OpenTelemetry 😄

To be fair, the type initialization for many PowerShell types is also pretty expensive, for example, System.Management.Automation.Utils -- It's just too large.

You could open new tracking issue for this and share where we could investigate - I could get involved, maybe prepare fixes and open issues in .Net if needed - now is the right time.

Also we could see Source Generators. I made an experiment for TypeGen. This works but the .Net feature is not ready for production (I already sent some feedbacks) and .Net team will actively improve and use them in .Net 6.0.
We could benefit from Source Generators too in TypeGen, types.ps1xml convertor, others???

@daxian-dbw
Copy link
Member

We could benefit from Source Generators too in TypeGen, types.ps1xml convertor, others???

TypeGen, you mean the TypeCatalogGen project? It's a built-time only project, not included in PowerShell package, so I don't see a need of any chnage to it.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Dec 4, 2020

We could benefit from Source Generators too in TypeGen, types.ps1xml convertor, others???

TypeGen, you mean the TypeCatalogGen project? It's a built-time only project, not included in PowerShell package, so I don't see a need of any chnage to it.

Like ResGen the TypeCatalogGen complicates a life of Linux distribution maintainers. Migrating to MSBuild process helps them.

@ghost ghost added the Review - Needed The PR is being reviewed label Dec 12, 2020
@ghost
Copy link

ghost commented Dec 12, 2020

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@daxian-dbw
Copy link
Member

@iSazonov I went through our comments and it looks we were on the same page that this PR is not needed. I will close it now.

@daxian-dbw daxian-dbw closed this Nov 2, 2021
@ghost ghost removed the Review - Needed The PR is being reviewed label Nov 2, 2021
@iSazonov iSazonov deleted the perf-applicationinsights branch November 2, 2021 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-Performance Indicates that a PR should be marked as a performance improvement in the Change Log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants