Use smaller Plarform-Language packages when analyzing only one language #172

marcogario · 2020-09-07T14:20:18Z

When the action (or runner) analyzes only a single language, we download a smaller codeql bundle.

The main change here is that we now need to first detect the language and then setup codeQL. The language detection used to be a step of the initConfig step. We move this out and perform it before initCodeQL.

The changes to setupCodeQL are meant to correctly cache the tooling. I.e., if we already have a full bundle in the cache, we should not try to download a platform-language bundle. Similarly, if we did download a platforms-language bundle, then we should not cache it as a full bundle.

We extend this logic to only download the platform specific bundle, so that linux runners do not need to download a package with both windows and osx binaries.

I added this comment in the setupCodeQL to make the flow a bit more clear:

  // The URL identifies the release version. E.g., codeql-20200901 .
  // The plVersion identifies the platform-language combination of the package
  // within the release. E.g., `linux64-cpp` in `codeql-linux64-cpp.tar.gz`.
  // We expect the codeqlUrl (when given) to always point to the main bundle
  // `codeql-bundle.tar.gz`
  //
  // The logic is as follows:
  //  - Always use the Toolcache if available.
  //    - If we would like a platform-language package, but have the
  //      full bundle in the cache, use that.
  //  - If codeqlURL is specified, use that.
  //  - If a single language is being analyzed, try to download the platform-language package.
  //    - If it is not available in the release assets, fallback to the full bundle
  //  - If multiple languages are being anlyzed, use the full bundle

I will like to test/merge this code before merging the code to split the bundles, to ensure that it works if we do not create the platform-language packages (it should, and we have tests, but...).

Merge / deployment checklist

Confirm this change is backwards compatible with existing workflows.
Confirm the readme has been updated if necessary.

* Add `bundleName` argument to `getCodeQLBundleDownloadURL` * Add `languages` argument to `setupCodeQL`. The logic now tries to find the platform-language pkg before defaulting to the full bundle. We keep the toolcache clean by adding the pl version to the tool version.

# This is the 1st commit message: Add logic to download codeql platform-language pkg * Add `bundleName` argument to `getCodeQLBundleDownloadURL` * Add `languages` argument to `setupCodeQL`. The logic now tries to find the platform-language pkg before defaulting to the full bundle. We keep the toolcache clean by adding the pl version to the tool version. # The commit message #2 will be skipped: # Add simple fallback logic for download # The commit message #3 will be skipped: # wip linter # The commit message #4 will be skipped: # linter

# This is the 1st commit message: Add logic to download codeql platform-language pkg * Add `bundleName` argument to `getCodeQLBundleDownloadURL` * Add `languages` argument to `setupCodeQL`. The logic now tries to find the platform-language pkg before defaulting to the full bundle. We keep the toolcache clean by adding the pl version to the tool version. # The commit message #2 will be skipped: # add test # The commit message #3 will be skipped: # cleanup # The commit message #4 will be skipped: # linter

* Add `bundleName` argument to `getCodeQLBundleDownloadURL` * Add `languages` argument to `setupCodeQL`. The logic now tries to find the platform-language pkg before defaulting to the full bundle. We keep the toolcache clean by adding the pl version to the tool version.

marcogario · 2020-09-10T09:34:03Z

I ran an experiment comparing zip0 and tar.gz decompression time. There is basically no difference between zip and tar.gz. I would keep the tar.gz for consistency.

robertbrignull

A couple of comments, but I admit I haven't read through all that fully. I'd like to see a demo if possible as that would be best for showing it works. I'm a bit worried about our use of the toolcache as we're really not using it how it intends with regards to the version strings.

Overall though, the approach this is taking looks like it'll be good.

I'm sure you already know this, but probably worth noting we're also trying to get the bundle pre-populated into the actions runners. It's not clear exactly when this will happen but it's looking like it will. So that would mean that splitting up the bundle would be less useful on dotcom, but this is still potentially very useful for GHES or self-hosted runners on dotcom.

src/codeql.ts

src/config-utils.ts

robertbrignull · 2020-09-10T13:46:59Z

Would https://github.com/github/codeql-action-sync-tool need to be changed as part of this?

marcogario · 2020-09-10T15:34:13Z

Thanks @robertbrignull, I will go through your comments in detail tomorrow and also update the code to match main.

I'm a bit worried about our use of the toolcache as we're really not using it how it intends with regards to the version strings.

I think we need to dig deeper here. My thought was this would improve all settings. Clearly self-hosted runners, but also dotCom, as I understood there is a window between we releasing a new bundle and the base image being rebuilt. However, also given your comment in that discussion, I might be misunderstanding how the version information is used in the toolcache. Do you have more information you can point me to, or shall we try to sync?

marcogario · 2020-09-11T12:31:46Z

This is an example of how we fallback in the search of the tool:

 ##[debug]Bundle version 20200826 is not in SemVer format. Will treat it as pre-release 0.0.0-20200826.
  ##[debug]PL Version linux64-javascript
  ##[debug]isExplicit: 0.0.0-20200826-linux64-javascript
  ##[debug]explicit? true
  ##[debug]checking cache: /opt/hostedtoolcache/CodeQL/0.0.0-20200826-linux64-javascript/x64
  ##[debug]not found
  ##[debug]isExplicit: 0.0.0-20200826
  ##[debug]explicit? true
  ##[debug]checking cache: /opt/hostedtoolcache/CodeQL/0.0.0-20200826/x64
  ##[debug]not found
  ##[debug]CodeQL not found in cache
  ##[debug]RUNNER_TEMP=/home/runner/work/_temp
  ##[debug]Using CodeQL URL: https://github.com/github/codeql-action/releases/download/codeql-bundle-20200826/codeql-bundle.tar.gz

marcogario · 2020-09-11T15:27:29Z

I was doing some more testing and I ran into the following issue.

initCodeQL takes a codeqlURL that can be a string or undefined. I thought this was because the tool argument could be underspecified, and thus we have logic to detect which bundle to use. This is the case for the runner, but not for the init-action.ts, where this is always set to a string:

codeql-action/src/init-action.ts

Line 62 in 57ef26c

core.getInput('tools'),

What could be the best way to deal with this? Can we detect that the value was not specified in the configuration?

robertbrignull · 2020-09-11T15:37:01Z

initCodeQL takes a codeqlURL that can be a string or undefined. I thought this was because the tool argument could be underspecified, and thus we have logic to detect which bundle to use. This is the case for the runner, but not for the init-action.ts, where this is always set to a string:

Yes, I agree this is a pain. Your analysis is right that for the runner it might be undefined, but for the action it's always a string. If it's not defined in the workflow then it gets set to the emptystring. So this means the best way to check it may be to do convert it to a boolean to see if it's truthy, but I agree this is no ideal.

I think this is a problem we should deal with and be , though doesn't have to be in this PR. Maybe we should convert all emptystrings to undefined when we read the inputs, but there are other options. I'll open an issue as I think we should track this and make sure it gets improved.

marcogario · 2020-09-15T20:16:13Z

This is ready for review. I am available to walk through the code. The only change I expect is to rebase on #211, as I currently included the changes from #208.

The macos tests are failing due to some sort of rate limiting. I will try to retrigger them tomorrow.

robertbrignull

I'm re-reviewing but I'm having to step out now so I'm posting what I have so far. I'll continue reviewing the rest later today.

robertbrignull · 2020-09-16T10:51:50Z

.github/workflows/integration-testing.yml

+        language: ["none", "cpp", "csharp", "go", "java", "javascript", "python"]
+    runs-on: ${{ matrix.os }}
+    steps:
+      # Translate OS


Could you explain what this step is doing? I can't tell just looking at it here.

Ah, this is a left-over. I might use it now, if I want to test the toolcache (as you suggest below). I am adding a comment, but in general, it is setting an env variable called OS with the correct string value depending on the platform.

.github/workflows/integration-testing.yml

robertbrignull · 2020-09-16T10:57:17Z

.github/workflows/integration-testing.yml

      env:
        TEST_MODE: true

+  single-language-bundles:


Does this need some check after running the actions to make sure it actually downloaded the right bundles. Currently it could download the full thing each time and still pass.
You should be able to look inside the toolcache directory and see what it has downloaded. You'll find it under /opt/hostedtoolcache (at least on linux) or you can use $AGENT_TOOLSDIRECTORY (which might work on all operating systems).

The idea was not to require this, although I can see how it would be good to test this in a non-failing way.
I did not want to make the bundle-release process more complex, so these tests should pass with only the main bundle (fallback behavior). I see how outputting a message that we used the smaller bundler would be useful for validation.

src/codeql.test.ts

robertbrignull

Ok, I've gone through everything now.

Only one comment and I can't see anything else wrong, though I'm still a bit uneasy about this feature as a whole because it's just quite complicated and I feel I'm not giving it the brain power / testing time it requires.

Also, #211 has been merged now, so this could be rebased onto that to simply things.

robertbrignull · 2020-09-17T09:17:44Z

src/codeql.ts

      apiURL === util.GITHUB_DOTCOM_URL &&
-      repository === CODEQL_DEFAULT_ACTION_REPOSITORY
+      repository === CODEQL_DEFAULT_ACTION_REPOSITORY &&
+      bundleNames[0] === CODEQL_BUNDLE_NAME


I'm not convinced this will ever be true. The only place this is called from non-test code, the 0th element is either the platform bundle or the plVersion bundle. The raw full CODEQL_BUNDLE_NAME is never the 0th element. I can see why you're checking this here, because the idea is to not default to using the full bundle unless necessary, but I don't think the logic is quite right.

Since we own the release on github.com/github/codeql-action could you do the same thing of just assuming the release artifacts will be there for all platforms and languages? I realise they aren't there currently so that will delay merging this PR until they are there. The alternative would be to stop treating github.com as special and make the request to discover what assets are there. I don't have a strong favourite between those options as they both have positive and negative aspects.

marcogario · 2020-09-17T11:23:08Z

Thanks for the review. I think there is a conceptual point to address. I did this work trying to be compatible with the current bundle release process. This has a few visible impact in the implementation. As you noted in the review:

Integrations tests are currently designed to pass if we do NOT have the language specific bundle
The code checking for available bundles gets more complicated and expensive as we need to query the API and cannot shortcut effectively.

Since I automated the creation of the smaller bundles, it would seem that changing the release process has some benefits, with few drawbacks.

The only "problem" I see is that this would require to postpone merging this and first:

Change the release process and merge the workflow to generate the language bundles
Update the sync tool
Finally adjust this PR

@robertbrignull WDYT?

marcogario · 2020-09-17T15:58:27Z

I went ahead and prepared the changes to the process as discussed above. I think that is the most reasonable solution.

We need to merge Workflow to split the bundle into components #216 first before the tests for this PR can succeed (as we now do not fail gracefully anymore)
I tested the codeql-action-sync-tool and indeed it does download all assets
Created a PR for the changes to the bundle release process.

marcogario · 2020-09-21T09:59:02Z

As discussed with @robertbrignull , this will be closed in favor of a simpler version that only accounts for platforms.

marcogario added 6 commits September 9, 2020 13:50

Add CI for testing small bundles

e4c39c9

Add test for bundle with platform and language

408e376

marcogario force-pushed the platform_lang_pkg branch from f00b280 to 57b0b7f Compare September 9, 2020 11:54

marcogario marked this pull request as ready for review September 9, 2020 19:08

marcogario requested review from Daverlo and robertbrignull September 9, 2020 19:09

robertbrignull self-assigned this Sep 10, 2020

robertbrignull reviewed Sep 10, 2020

View reviewed changes

src/codeql.ts Outdated Show resolved Hide resolved

src/config-utils.ts Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into platform_lang_pkg

588a28d

marcogario force-pushed the platform_lang_pkg branch from d343ab6 to 588a28d Compare September 11, 2020 09:48

Move language functions in dedicated package

7b29d2e

marcogario force-pushed the platform_lang_pkg branch from 38e85d2 to 7b29d2e Compare September 11, 2020 10:02

Remove old workflow

d093fc0

marcogario force-pushed the platform_lang_pkg branch 5 times, most recently from a43213e to 1e41045 Compare September 11, 2020 10:33

CI: Add test for small-bundles

0ca7a34

marcogario force-pushed the platform_lang_pkg branch from 1e41045 to 0ca7a34 Compare September 11, 2020 10:34

marcogario mentioned this pull request Sep 11, 2020

CI: Reduce unnecessary jobs #182

Closed

marcogario added 2 commits September 15, 2020 15:30

Merge remote-tracking branch 'origin/main' into platform_lang_pkg

b9e8fa6

fix linter

4d836f4

marcogario force-pushed the platform_lang_pkg branch from db1d997 to 2a73b54 Compare September 15, 2020 19:18

robertbrignull reviewed Sep 16, 2020

View reviewed changes

robertbrignull reviewed Sep 17, 2020

View reviewed changes

marcogario added 2 commits September 17, 2020 13:58

Fix early termination

b9f08d9

Merge branch 'main' into platform_lang_pkg

7f2fa9c

marcogario force-pushed the platform_lang_pkg branch from 2a73b54 to 7f2fa9c Compare September 17, 2020 12:02

marcogario added 2 commits September 17, 2020 14:27

Improve integration test

c25320c

Assume small bundles are available in the official repo

1157017

Merge remote-tracking branch 'origin/main' into platform_lang_pkg

f8e9262

marcogario force-pushed the platform_lang_pkg branch from 1bb59f4 to 77a4910 Compare September 21, 2020 07:22

Disable dependency in CI

8547a75

marcogario force-pushed the platform_lang_pkg branch from 77a4910 to 8547a75 Compare September 21, 2020 07:35

Fix test for toolcache

11e79a3

marcogario closed this Sep 21, 2020

This was referenced Apr 24, 2023

[Snyk] Fix for 24 vulnerabilities aliscco/codeql-action#59

Open

[Snyk] Fix for 20 vulnerabilities aliscco/codeql-action#93

Open

This was referenced Apr 25, 2023

[Snyk] Fix for 17 vulnerabilities aliscco/codeql-action#125

Open

[Snyk] Fix for 26 vulnerabilities aliscco/codeql-action#134

Open

[Snyk] Fix for 13 vulnerabilities aliscco/codeql-action#142

Open

[Snyk] Fix for 44 vulnerabilities aliscco/codeql-action#144

Open

Use smaller Plarform-Language packages when analyzing only one language #172

Use smaller Plarform-Language packages when analyzing only one language #172

Uh oh!

Conversation

marcogario commented Sep 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge / deployment checklist

Uh oh!

marcogario commented Sep 10, 2020

Uh oh!

robertbrignull left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

robertbrignull commented Sep 10, 2020

Uh oh!

marcogario commented Sep 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcogario commented Sep 11, 2020

Uh oh!

marcogario commented Sep 11, 2020

Uh oh!

robertbrignull commented Sep 11, 2020

Uh oh!

marcogario commented Sep 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robertbrignull left a comment

Choose a reason for hiding this comment

Uh oh!

robertbrignull Sep 16, 2020

Choose a reason for hiding this comment

Uh oh!

marcogario Sep 17, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

robertbrignull Sep 16, 2020

Choose a reason for hiding this comment

Uh oh!

marcogario Sep 17, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

robertbrignull left a comment

Choose a reason for hiding this comment

Uh oh!

robertbrignull Sep 17, 2020

Choose a reason for hiding this comment

Uh oh!

marcogario commented Sep 17, 2020

Uh oh!

marcogario commented Sep 17, 2020

Uh oh!

marcogario commented Sep 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

marcogario commented Sep 7, 2020 •

edited

Loading

marcogario commented Sep 10, 2020 •

edited

Loading

marcogario commented Sep 15, 2020 •

edited

Loading