Skip to content

Conversation

@edoardopirovano
Copy link
Contributor

@edoardopirovano edoardopirovano commented Aug 5, 2022

This PR introduces everything that needs to happen on the Action's side for TRAP caching. In particular:

  1. We run codeql resolve languages --format=betterjson to see the extractor options.
  2. If we see an extractor that supports TRAP caching then we:
    • If we're running on the default branch: Start from a fresh cache, tell the extractor to write up to 1GB to the cache (tbc: is this a sensible amount for size of the Actions cache and the disk space of the default runners?). Upload it near the end of the run (after we've sent results to Code Scanning and uploaded the DB to MRVA, since both of those are higher priority). (sample run with some now-removed debugging output)
    • If we're running on a PR branch: Download the cache for the base commit if we have one, or any other recent commit failing that at the init step. Then we use that cache in read-only mode during extraction. (sample run with some now-removed debugging output)

All of this is gated behind a feature flag, although the trap-caching parameter to the init step can be used to force this to be on or off irrespective of the feature flag. I expect a few users will want to disable this if they're already using their Actions cache for something else so we're not competing for space.

We should document this new feature, and the option to disable it, before we turn this on for any external users. But for now we're only going to turn on the feature flag for some internal repositories, so I think it is fine to not have it in the public facing documentation just yet.

Before we enabled this on more than one or two internal repos (which we can monitor by hand), I would also like to implement some telemetry so we can have some visibility over whether this is causing any issues. Again, I propose leaving this for a later PR.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Confirm the readme has been updated if necessary.
  • Confirm the changelog has been updated if necessary.

@edoardopirovano edoardopirovano requested a review from a team as a code owner August 5, 2022 15:32
Copy link
Contributor

@henrymercer henrymercer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell the extractor to write up to 1GB to the cache (tbc: is this a sensible amount for size of the Actions cache and the disk space of the default runners?).

1 GB per language sounds reasonable to me as a default. I can think of cases where we might want to use more or less space however — perhaps we should make the TRAP cache size configurable? Question: Should the total size or the size per language be configurable?

We should document this new feature, and the option to disable it, before we turn this on for any external users. But for now we're only going to turn on the feature flag for some internal repositories, so I think it is fine to not have it in the public facing documentation just yet.

Sounds reasonable.

Before we enabled this on more than one or two internal repos (which we can monitor by hand), I would also like to implement some telemetry so we can have some visibility over whether this is causing any issues. Again, I propose leaving this for a later PR.

Also sounds reasonable 👍

@edoardopirovano
Copy link
Contributor Author

perhaps we should make the TRAP cache size configurable? Question: Should the total size or the size per language be configurable?

Indeed, we may eventually want configuration options to fine-tune this. I don't think adding this should block getting this prototype out of the door, though. In the telemetry I will be adding, I will include a field that records the size of the cache so we can use that to get a feel for what the default should be and how configurable we need this to be before we properly declare this GA.

@edoardopirovano edoardopirovano merged commit 07720c7 into main Aug 9, 2022
@edoardopirovano edoardopirovano deleted the edoardo/trap-caching branch August 9, 2022 18:18
@asottile-sentry
Copy link

are there docs explaining what this is or why one would want to turn it on or off? I didn't see it mentioned in the docs linked from the README and this is chewing up our entire GHA cache since it seems to put many MB per commit

@adityasharad
Copy link
Contributor

@asottile-sentry I'm sorry to hear this feature has caused trouble for you. The changelog note here has some explanation of this feature and how to turn it off. The intended purpose is to speed up the "extraction" phase of CodeQL analysis (which processes your source code into a local database format so that it can be analysed), however this is completely optional and you are safe to turn it off in the workflow since it's using up your Actions cache quota. We have only rolled this out in a limited fashion for one language at the moment, so it is not in the public docs beyond the changelog.

To help us understand and investigate the problem better, would you able to point us to your repo, and let us know if you're using the Actions cache for any other purposes?

@asottile-sentry
Copy link

https://github.com/getsentry/sentry/actions/caches -- yes we use the cache for other things (I suspect this to be the common case -- so perhaps enabling this by default in codeql is the wrong default?)

@sayhiben
Copy link

In case the additional context helps, trap caching is also causing my organization's monorepo to exhaust our cache very rapidly. I had to turn off the feature, even though I want to use it. We use the GHA cache for other purposes as well and can't afford to let this churn our cache. I've opened another issue to hopefully move the cache storage location: #2030

vszakats added a commit to vszakats/curl that referenced this pull request Sep 19, 2025
The `cpp` CodeQL job is adding a cache entry for each run on the master
branch. One for Linux, another for Windows. Size: 68MB + 180MB = 248MB.
In one week we got 50+ such entries, almost filling the available cache
space. The cache entries don't seem to be used again.

Following the recommendation in an open issue thread, this patch tries
to disable this cache. Since it only affects master, the effect can only
be verified after merging.

Bug: curl#18528 (comment)
Ref: github/codeql-action#1172
Ref: github/codeql-action#2030
Ref: github/codeql-action#2885 (comment)
vszakats added a commit to curl/curl that referenced this pull request Sep 19, 2025
The `cpp` CodeQL job is adding a cache entry for each run on the master
branch. One for Linux, another for Windows. Size: 68MB + 180MB = 248MB.
In one week we got 50+ such entries, almost filling the available cache
space.

Following the recommendation in an open issue thread, this patch tries
to disable this cache. Since it only affects master, the effect can only
be verified after merging.

The latest cache is picked up in PRs. The performance impact is also to
be seen after merge.

Bug: #18528 (comment)
Ref: github/codeql-action#1172
Ref: github/codeql-action#2030
Ref: github/codeql-action#2885 (comment)

Follow-up to cc50f05 #18528

Closes #18613
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants