Fix tracing on windows for the runner, and when using multiple self-hosted actions runners #167

robertbrignull · 2020-09-02T16:01:06Z

This is a draft at getting windows tracing working outside of actions, and make it more robust when using self-hosted actions runners.

I'm in the process of testing this, and I'll push any changes that are necessary.

Merge / deployment checklist

Confirm this change is backwards compatible with existing workflows.
Confirm the readme has been updated if necessary.

robertbrignull · 2020-09-07T10:47:34Z

Not sure who is best placed to review this. Possibly @aibaars if you can. I also asked questions to @nickrolfe and @dbartol in case either of you can see anything wrong here.

aibaars · 2020-09-07T11:16:17Z

It think the assumption that the grand-parent of the process is the one that persists between build steps but not between subsequent jobs is reasonable. The process itself will be nodejs the parent is likely some shell, the parent of that process is likely the closest one that's not wrong ;-)

Of course the assumption might be wrong. The process tree could be slightly deeper for some CI system. I'd recommend adding an "advanced" flag to allow a user to specify some N (default 2) that indicates that the Nth parent should be taken.

It's also possible that the depth is variable, although that is less likely. You could also add an additional flag that allows specifying the name of the "worker" process.

These two flags generalise the actions case where we search by name for Runner.Worker and the other case where we blindly select the second parent. I would disallow specifying both flags for now.

robertbrignull · 2020-09-07T11:26:33Z

It think the assumption that the grand-parent of the process is the one that persists between build steps but not between subsequent jobs is reasonable. The process itself will be nodejs the parent is likely some shell, the parent of that process is likely the closest one that's not wrong ;-)

I think you're off by one from my assumptions. I think the process itself will be powershell, and then the parent is nodejs, and the grandparent is likely some shell or the CI system, or the parent of that is the CI system process that sticks around.

So this sounds like we should increase the default by one level. Do you agree?

Of course the assumption might be wrong. The process tree could be slightly deeper for some CI system. I'd recommend adding an "advanced" flag to allow a user to specify some N (default 2) that indicates that the Nth parent should be taken.
It's also possible that the depth is variable, although that is less likely. You could also add an additional flag that allows specifying the name of the "worker" process.
These two flags generalise the actions case where we search by name for Runner.Worker and the other case where we blindly select the second parent. I would disallow specifying both flags for now.

I agree with you here. There's no way we can make a guess that will be right in all cases. So giving inputs like the two you mention sould give enough power that we can make it work in almost any situation if we manually work out what to do.

This could be annoying to the docs team (cc. @felicitymay) as we're adding new arguments that are quite hard to explain. Since these are meant mainly for manual intervention by a field team member, perhaps we should make them hidden option that don't show up if you run -h. Would that be acceptable?

aibaars · 2020-09-07T11:32:39Z

I think you're off by one from my assumptions. I think the process itself will be powershell, and then the parent is nodejs, and the grandparent is likely some shell or the CI system, or the parent of that is the CI system process that sticks around.

So this sounds like we should increase the default by one level. Do you agree?

Yes, indeed. I forgot about powershell. It would be interesting to print the process trees on Actions, Jenkins, and ideally some other CI systems (TeamCity, Bamboo) for windows workers. This should give you some idea on whether there is a value that would work for most and use that as default.

robertbrignull · 2020-09-07T12:00:42Z

It would be interesting to print the process trees on Actions, Jenkins, and ideally some other CI systems (TeamCity, Bamboo) for windows workers.

I did this on actions at least and the process immediately above the nodejs process is the one that stays around for the length of the job. If you go one level higher you get the process that persists between jobs, so we don't want to touch that one as it would lead to affecting future jobs.

robertbrignull · 2020-09-07T12:43:12Z

I've pushed code to add --trace-process-name and --trace-process-level as hidden options. In fact we just parse them straigh out of process.argv as unfortunately the library we're using for generating the command line interface stuff doesn't support hidden options. If you have suggestions for better names I'm open.

With these options I think we should be flexible enough. It would be nice if the default were still a good choice in most cases. I think it'll now work out of the box on anywhere that there's a script in between the CI master process and the runner process. For example on jenkins or actions you specify to run a script, and then that script spawns the runner process. So I think a default of the 3rd parent will work in this case. If the runner is spawned without this intermediate script, then it might go a level too high, but that can now be corrected using the new hidden option.

robertbrignull · 2020-09-07T12:44:22Z

Unfortunately we ideally want to get this done so we can release tomorrow, or very soon after, to match GHES. Sorry this is happening so close to the deadline and with comparatively little testing work. I hope we can get it out there in time and then make improvements over time and release frequently.

robertbrignull · 2020-09-07T15:24:46Z

I've tested this on actions, by which I mean calling the runner from a powershell script on actions, and it works there and I'm able to build and analyze a csharp project. I'll try the same thing on jenkins.

nickrolfe · 2020-09-07T15:48:27Z

I mostly looked at the PowerShell scripts you generate, and they seem reasonable to me.

robertbrignull · 2020-09-07T16:58:15Z

I had an off-by-one error in my loop, but I've fixed that now. I've tested this on jenkins and actions by running the init command twice and observing that it determines the same process to inject into each time. That means it's finding the process that persists between individual calls to scripts.

aibaars

Code changes look fine to me. Make sure to test as much as you can before the release.

robertbrignull · 2020-09-08T10:28:38Z

I think we're best off merging this as it's definitely better than the current situation of being broken.

I'll do more testing this afternoon, including adding more actions workflows that exercise the runner.

robertbrignull added 3 commits September 2, 2020 15:28

Fix tracing when there are multiple self-hosted runners

2dbd7e8

set -ExecutionPolicy Bypass

5c0bd22

add alternative script for in runner mode

48df013

robertbrignull assigned aibaars Sep 7, 2020

Merge branch 'main' into windows_tracing

789059e

add options to specify process name or level to trace

694fa2d

robertbrignull added 2 commits September 7, 2020 17:08

Merge branch 'main' into windows_tracing

212f448

Print final process we choose

7d9c81f

aibaars approved these changes Sep 7, 2020

View reviewed changes

robertbrignull merged commit 506e641 into main Sep 8, 2020

robertbrignull deleted the windows_tracing branch September 8, 2020 10:28

github-actions bot mentioned this pull request Sep 14, 2020

Merge main into v1 #184

Merged

aliscco mentioned this pull request Apr 25, 2023

[Snyk] Fix for 13 vulnerabilities aliscco/codeql-action#142

Open

Fix tracing on windows for the runner, and when using multiple self-hosted actions runners #167

Fix tracing on windows for the runner, and when using multiple self-hosted actions runners #167

Uh oh!

Conversation

robertbrignull commented Sep 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge / deployment checklist

Uh oh!

robertbrignull commented Sep 7, 2020

Uh oh!

aibaars commented Sep 7, 2020

Uh oh!

robertbrignull commented Sep 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aibaars commented Sep 7, 2020

Uh oh!

robertbrignull commented Sep 7, 2020

Uh oh!

robertbrignull commented Sep 7, 2020

Uh oh!

robertbrignull commented Sep 7, 2020

Uh oh!

robertbrignull commented Sep 7, 2020

Uh oh!

nickrolfe commented Sep 7, 2020

Uh oh!

robertbrignull commented Sep 7, 2020

Uh oh!

aibaars left a comment

Choose a reason for hiding this comment

Uh oh!

robertbrignull commented Sep 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

robertbrignull commented Sep 2, 2020 •

edited

Loading

robertbrignull commented Sep 7, 2020 •

edited

Loading