Skip to content
This repository was archived by the owner on Jan 5, 2023. It is now read-only.

Conversation

@sauyon
Copy link
Contributor

@sauyon sauyon commented Jun 11, 2020

A dist-compare shows 60 new results on kubernetes due to the new way that files are picked up.

The results that I looked at look like false positives, but are reasonable, so I'm going to open issues for them and not worry about them for this PR.

Also, evaluation has slowed significantly for both cadence and kubernetes.

Matt established that the OS X issues I've been having are an issue with the tracer, and he's working on that now.

@max-schaefer
Copy link
Contributor

Many thanks for your work on this; glad to see it finally made it into a PR! Before I start reviewing this, can you quickly outline how one enables build tracing?

@sauyon
Copy link
Contributor Author

sauyon commented Jun 11, 2020

I've enabled the changes required to make it work globally and not added a flag, since I thought it might be superior in most cases:

  • Input packages (i.e. the packages that are passed to the extractor explicitly) are always extracted
  • If a dependency of that package is in the same module as the input package, or if it is contained in the source directory of an input package, it is also extracted.

@max-schaefer
Copy link
Contributor

I've enabled the changes required to make it work globally and not added a flag

As discussed previously, I would strongly prefer build tracing to be opt-in, since it is a new feature. Also, doesn't it currently only work for module-based code?

@sauyon
Copy link
Contributor Author

sauyon commented Jun 12, 2020

As discussed previously, I would strongly prefer build tracing to be opt-in, since it is a new feature.

Build tracing itself is opt-in, because the autobuilder is run by default, right? The behavior for extraction via autobuilder hasn't changed too much, the only change is some more dependencies may be extracted, but I think they should all be part of the project.

I wouldn't mind adding a flag to make this behavior opt-in, though.

Also, doesn't it currently only work for module-based code?

It falls back on package directory, which is basically what the old code did except in the case that there's a package nested under a package passed to the extractor explicitly.

Actually, before review I'm going to run a dist-compare with a query that selects files; does the below query seem sensible?

import go

from PackageName n
select n.getFile()

I'll mark this as a draft in the interim.

@sauyon sauyon marked this pull request as draft June 12, 2020 06:50
@max-schaefer
Copy link
Contributor

I'm afraid you have lost me. Perhaps I should just look at the code before asking any more questions.

Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial questions; I'll need a few more passes to digest this.

@max-schaefer
Copy link
Contributor

As discussed offline, I would favour breaking this PR up into two parts, one being the general improvements to package handling, and one being the build-tracing specific bits.

@sauyon sauyon changed the title Add support for build tracing Extract more dependency ASTs Jun 12, 2020
@sauyon
Copy link
Contributor Author

sauyon commented Jun 12, 2020

I've subverted this PR for extracting more dependencies because most of the comments were related to it.

@sauyon
Copy link
Contributor Author

sauyon commented Jun 12, 2020

Comments addressed.

Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments and questions.

I think it could be made a little clearer in the change note and perhaps the code that we now extract the (AST for the) entire module, even if the extractor was explicitly invoked on only one package inside that module.

@sauyon sauyon force-pushed the tracing branch 2 times, most recently from 2220897 to b081d13 Compare June 16, 2020 08:14
Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't have a good intuitive feeling for how this will affect results, so I would suggest we

  • do a dist-compare with --no-cleanup and sanity check the results;
  • run an ad-hoc query on the databases produced by the dist-compare to get the list of files and compare those before and after the change;
  • run a Go-Differences job to validate extractor performance.

@sauyon
Copy link
Contributor Author

sauyon commented Jun 17, 2020

* [ ]  do a dist-compare with `--no-cleanup` and sanity check the results;

Will do. Should I just run a few of the standard queries and a few checks that AST has actually been extracted?

* [ ]  run an ad-hoc query on the databases produced by the dist-compare to get the list of files and compare those before and after the change;

This should now theoretically be in progress, though I've run it wrong about eight times now...

* [ ]  run a Go-Differences job to validate extractor performance.

I'm not entirely sure how to do this; do I fill out the gitCommits parameter or the gitBranch parameter?

@sauyon
Copy link
Contributor Author

sauyon commented Jun 17, 2020

Also, the test failures seem to me to be Actions failures, so I've rerun them (the tests pass on my local machine).

@max-schaefer
Copy link
Contributor

Will do. Should I just run a few of the standard queries and a few checks that AST has actually been extracted?

If the changes aren't too voluminous, a standard dist-compare should do the trick, I think.

I'm not entirely sure how to do this; do I fill out the gitCommits parameter or the gitBranch parameter?

Both. gitBranch is where the scripts are taken from; it doesn't matter too much, but I generally use the base commit. Note that all commits have to be commits on git.semmle.com.

@max-schaefer
Copy link
Contributor

How is the evaluation of this PR coming along?

@sauyon
Copy link
Contributor Author

sauyon commented Jun 24, 2020

Ah, I'd forgotten to go through and post this.

Dist-compare now shows no results changes and a rather large slowdown in cadence evaluation times, which lines up with this report that says that 22k files were added to the cadence snapshot.

The last Go-Differences job I ran seems to have failed, so I've started a new one.

@sauyon sauyon marked this pull request as ready for review June 25, 2020 04:29
@sauyon
Copy link
Contributor Author

sauyon commented Jun 25, 2020

The last Go-Differences job I ran seems to have failed, so I've started a new one.

That job's failed because it apparently tries to checkout the sub-sha in the ql directory instead of the Go one, so I'm not entirely sure how I should be doing this.

@max-schaefer
Copy link
Contributor

Yes, you cannot use a sub-sha with the Go-Differences job, you have to make a branch in the main repo that points the submodule to the right sha.

We now extract packages that have the same module root as the specified packages, as determined by
the `go list` command.
smowton
smowton previously approved these changes Jun 26, 2020
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, one or two of which appeared to already be done in a later commit.

Two high-level suggestions:

  • Is it normal in this project/org to keep "review comments" commits or to squash them?
  • Do we get feedback on the command-line indicating what it decided to extract and what it didn't?

dirList := strings.Split(relDir, string(filepath.Separator))
if len(dirList) == 0 || dirList[0] != ".." {
// if dirList is empty, root is the same as the source dir
// if dirList starts with `".."`, it is not inside the root dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about paths that escape the rootdir like x/../..?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you satisfied with how this was addressed later on?

@max-schaefer
Copy link
Contributor

  • Is it normal in this project/org to keep "review comments" commits or to squash them?

The latter; see the newly added internal Wiki article on the subject.

  • Do we get feedback on the command-line indicating what it decided to extract and what it didn't?

The extractor prints out the path of each file it extracts. (This is suppressed by codeql test run, but during normal database builds it ends up in the logs.)

@max-schaefer
Copy link
Contributor

@sauyon, any news on the evaluation?

@sauyon
Copy link
Contributor Author

sauyon commented Jun 30, 2020

Sorry, I'd forgotten to review the job. Go-Differences seems to suggest no time differences.

@sauyon
Copy link
Contributor Author

sauyon commented Jul 1, 2020

I was looking at the analysis time numbers, whoops. I would have expected cockroach to take more time as it now contains significantly more (generated) files. I'll take a look at concourse too, but I suspect that's a similar story.

@sauyon
Copy link
Contributor Author

sauyon commented Jul 1, 2020

I'm not able to reproduce the difference in concourse build time locally. Also, the actual extraction portion for concourse seems to take something like 20s for me, so I think it's probably something to do with the dependency installation or build, which shouldn't have been changed by this PR.

@max-schaefer
Copy link
Contributor

OK, thanks for checking. In that case I'll merge.

@max-schaefer max-schaefer merged commit f74a94e into github:master Jul 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants