Skip to content

Conversation

@robertbrignull
Copy link
Contributor

@robertbrignull robertbrignull commented Sep 2, 2020

This is a draft at getting windows tracing working outside of actions, and make it more robust when using self-hosted actions runners.

I'm in the process of testing this, and I'll push any changes that are necessary.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Confirm the readme has been updated if necessary.

@robertbrignull
Copy link
Contributor Author

Not sure who is best placed to review this. Possibly @aibaars if you can. I also asked questions to @nickrolfe and @dbartol in case either of you can see anything wrong here.

@aibaars
Copy link
Contributor

aibaars commented Sep 7, 2020

It think the assumption that the grand-parent of the process is the one that persists between build steps but not between subsequent jobs is reasonable. The process itself will be nodejs the parent is likely some shell, the parent of that process is likely the closest one that's not wrong ;-)

Of course the assumption might be wrong. The process tree could be slightly deeper for some CI system. I'd recommend adding an "advanced" flag to allow a user to specify some N (default 2) that indicates that the Nth parent should be taken.

It's also possible that the depth is variable, although that is less likely. You could also add an additional flag that allows specifying the name of the "worker" process.

These two flags generalise the actions case where we search by name for Runner.Worker and the other case where we blindly select the second parent. I would disallow specifying both flags for now.

@robertbrignull
Copy link
Contributor Author

robertbrignull commented Sep 7, 2020

It think the assumption that the grand-parent of the process is the one that persists between build steps but not between subsequent jobs is reasonable. The process itself will be nodejs the parent is likely some shell, the parent of that process is likely the closest one that's not wrong ;-)

I think you're off by one from my assumptions. I think the process itself will be powershell, and then the parent is nodejs, and the grandparent is likely some shell or the CI system, or the parent of that is the CI system process that sticks around.

So this sounds like we should increase the default by one level. Do you agree?

Of course the assumption might be wrong. The process tree could be slightly deeper for some CI system. I'd recommend adding an "advanced" flag to allow a user to specify some N (default 2) that indicates that the Nth parent should be taken.
It's also possible that the depth is variable, although that is less likely. You could also add an additional flag that allows specifying the name of the "worker" process.
These two flags generalise the actions case where we search by name for Runner.Worker and the other case where we blindly select the second parent. I would disallow specifying both flags for now.

I agree with you here. There's no way we can make a guess that will be right in all cases. So giving inputs like the two you mention sould give enough power that we can make it work in almost any situation if we manually work out what to do.

This could be annoying to the docs team (cc. @felicitymay) as we're adding new arguments that are quite hard to explain. Since these are meant mainly for manual intervention by a field team member, perhaps we should make them hidden option that don't show up if you run -h. Would that be acceptable?

@aibaars
Copy link
Contributor

aibaars commented Sep 7, 2020

I think you're off by one from my assumptions. I think the process itself will be powershell, and then the parent is nodejs, and the grandparent is likely some shell or the CI system, or the parent of that is the CI system process that sticks around.

So this sounds like we should increase the default by one level. Do you agree?

Yes, indeed. I forgot about powershell. It would be interesting to print the process trees on Actions, Jenkins, and ideally some other CI systems (TeamCity, Bamboo) for windows workers. This should give you some idea on whether there is a value that would work for most and use that as default.

@robertbrignull
Copy link
Contributor Author

It would be interesting to print the process trees on Actions, Jenkins, and ideally some other CI systems (TeamCity, Bamboo) for windows workers.

I did this on actions at least and the process immediately above the nodejs process is the one that stays around for the length of the job. If you go one level higher you get the process that persists between jobs, so we don't want to touch that one as it would lead to affecting future jobs.

@robertbrignull
Copy link
Contributor Author

I've pushed code to add --trace-process-name and --trace-process-level as hidden options. In fact we just parse them straigh out of process.argv as unfortunately the library we're using for generating the command line interface stuff doesn't support hidden options. If you have suggestions for better names I'm open.

With these options I think we should be flexible enough. It would be nice if the default were still a good choice in most cases. I think it'll now work out of the box on anywhere that there's a script in between the CI master process and the runner process. For example on jenkins or actions you specify to run a script, and then that script spawns the runner process. So I think a default of the 3rd parent will work in this case. If the runner is spawned without this intermediate script, then it might go a level too high, but that can now be corrected using the new hidden option.

@robertbrignull
Copy link
Contributor Author

Unfortunately we ideally want to get this done so we can release tomorrow, or very soon after, to match GHES. Sorry this is happening so close to the deadline and with comparatively little testing work. I hope we can get it out there in time and then make improvements over time and release frequently.

@robertbrignull
Copy link
Contributor Author

I've tested this on actions, by which I mean calling the runner from a powershell script on actions, and it works there and I'm able to build and analyze a csharp project. I'll try the same thing on jenkins.

@nickrolfe
Copy link
Contributor

I mostly looked at the PowerShell scripts you generate, and they seem reasonable to me.

@robertbrignull
Copy link
Contributor Author

I had an off-by-one error in my loop, but I've fixed that now. I've tested this on jenkins and actions by running the init command twice and observing that it determines the same process to inject into each time. That means it's finding the process that persists between individual calls to scripts.

Copy link
Contributor

@aibaars aibaars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look fine to me. Make sure to test as much as you can before the release.

@robertbrignull
Copy link
Contributor Author

I think we're best off merging this as it's definitely better than the current situation of being broken.

I'll do more testing this afternoon, including adding more actions workflows that exercise the runner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants