OAuth: re-enable sync active user repos task #12443

stsewd · 2025-08-26T20:36:53Z

Take into consideration active users from the last 30 days only.
Ignore GitHub App, so we don't re-sync installations over and over again, as users can share the same installation, and repos from the GH app are kept in sync via a webhook.

This task now completes in 42 minutes.

stsewd · 2025-08-26T20:48:29Z

readthedocs/oauth/tasks.py

+            # Triggering a sync per-user, will re-sync the same installation
+            # multiple times.
+            sync_remote_repositories(
+                user.pk,


We could probably have a wrapper around this function, so it accepts a user object, instead of re-fetching the object from the DB each time.

Copilot

Pull Request Overview

This PR re-enables the daily sync task for active user repositories with optimizations to prevent performance issues. The task was previously disabled due to causing celery instances to freeze up from excessive re-syncing.

Key changes include:

Reducing the active user window from 90 days to 30 days to limit scope
Adding logic to skip GitHub App services during sync to avoid redundant processing
Using iterator() for more memory-efficient user processing

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
readthedocs/settings/base.py	Re-enables the commented-out daily repository sync task
readthedocs/oauth/tasks.py	Updates sync logic to skip GitHub App services and reduces active user window to 30 days

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

readthedocs/oauth/tasks.py

agjohnson · 2025-08-27T06:34:57Z

readthedocs/oauth/tasks.py

+    one_month_ago = timezone.now() - datetime.timedelta(days=30)
    users = User.objects.annotate(weekday=ExtractIsoWeekDay("last_login")).filter(
-        last_login__gt=three_months_ago,
+        last_login__gt=one_month_ago,


It feels like we shouldn't consider the user inactive until after longest possible user session duration, plus a buffer of some days. last_login is updated on log in, and so won't update again until the user's session is invalidated -- likely the session cookie aging out after 30 days. After that period, last_login doesn't really signal an inactive user until maybe 2 additional weeks have passed (44 days total) and they haven't logged in again.

So, maybe say 45 days is probably the minimum and 90 days does seem like a lot.

With 90 days the task takes 1h 40min to complete (2.4K users), and with 45 days it takes 44min (1.3K users).

Changed to 45 days.

humitos · 2025-08-27T09:08:12Z

This task now completes in 42 minutes

In another comment you said it took ~3m. I'm confused. What's the correct value here?

42 minutes is a lot! 😔

stsewd · 2025-08-27T15:49:19Z

In another comment you said it took ~3m. I'm confused. What's the correct value here?

This is the time it takes to sync all active users. If you are referring to my message from slack, that was about the time of a syncing the repos of a single user (from 12min to 12 seconds).

ericholscher

This seems like a good step forward. Do we think the GH app is what caused this task to massively increase in size?

In general, we should probably try and find a way to lock these tasks so they don't pile up. We have this logic other places, but should probably apply it to all daily tasks:

readthedocs.org/readthedocs/builds/utils.py

Line 99 in 736ad50

def memcache_lock(lock_id, lock_expire, app_identifier):

stsewd · 2025-08-28T17:54:49Z

Do we think the GH app is what caused this task to massively increase in size?

I think it was more about the users from organizations with lots of repositories that recently logged in.

In general, we should probably try and find a way to lock these tasks so they don't pile up. We have this logic other places, but should probably apply it to all daily tasks:

I think we should just make sure the task isn't retried if it times out or gets killed. Since this task is run daily, there is no way the task will run over 24+ hours in order to trigger a new task for the lock to be useful.

stsewd · 2025-08-28T18:24:29Z

I played around this locally and wasn't able to replicate a retry after the task times out. My other guess would be the task killing the server (OOM), but that I can't test locally...

ericholscher · 2025-08-29T12:24:29Z

In general, we should probably try and find a way to lock these tasks so they don't pile up. We have this logic other places, but should probably apply it to all daily tasks:

I think we should just make sure the task isn't retried if it times out or gets killed. Since this task is run daily, there is no way the task will run over 24+ hours in order to trigger a new task for the lock to be useful.

Yea, but the symptom of all these issues is that the tasks try to run 2x, so if we can easily lock it and log a warning in Sentry, it will help us catch situations like this with an explicit guarantee.

read-the-docs-community · 2025-09-02T18:28:18Z

Documentation build overview

📚 docs | 🛠️ Build #29440276 | 📁 Comparing 1b79c5a against latest (c7c96de)

🔍 Preview build

Show files changed (1 files in total): 📝 1 modified | ➕ 0 added | ➖ 0 deleted

File	Status
reference/git-integration.html	📝 modified

stsewd added 2 commits August 26, 2025 15:35

OAuth: re-enable sync active user repos task

b16de5c

Closes #12406

re-enable task

1bbdf7f

stsewd commented Aug 26, 2025

View reviewed changes

stsewd requested a review from Copilot August 26, 2025 21:14

Copilot AI reviewed Aug 26, 2025

View reviewed changes

readthedocs/oauth/tasks.py Show resolved Hide resolved

readthedocs/oauth/tasks.py Outdated Show resolved Hide resolved

Typo

05d51c8

stsewd marked this pull request as ready for review August 26, 2025 21:28

stsewd requested a review from a team as a code owner August 26, 2025 21:28

stsewd requested a review from agjohnson August 26, 2025 21:28

auto-assign bot assigned stsewd Aug 26, 2025

stsewd requested a review from ericholscher August 26, 2025 21:29

agjohnson reviewed Aug 27, 2025

View reviewed changes

Latest 45 days

e69fc68

ericholscher reviewed Aug 28, 2025

View reviewed changes

Use lock

ed38d5f

stsewd requested a review from ericholscher September 2, 2025 18:52

ericholscher approved these changes Sep 4, 2025

View reviewed changes

Merge branch 'main' into re-enable-sync-user-repos

1b79c5a

stsewd merged commit 9ffd0c7 into main Sep 4, 2025
4 of 5 checks passed

stsewd deleted the re-enable-sync-user-repos branch September 4, 2025 16:40

Uh oh!

OAuth: re-enable sync active user repos task #12443

OAuth: re-enable sync active user repos task #12443

Uh oh!

Conversation

stsewd commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stsewd Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

agjohnson Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

stsewd Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

stsewd Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

humitos commented Aug 27, 2025

Uh oh!

stsewd commented Aug 27, 2025

Uh oh!

ericholscher left a comment

Choose a reason for hiding this comment

Uh oh!

stsewd commented Aug 28, 2025

Uh oh!

stsewd commented Aug 28, 2025

Uh oh!

ericholscher commented Aug 29, 2025

Uh oh!

read-the-docs-community bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

stsewd commented Aug 26, 2025 •

edited

Loading

read-the-docs-community bot commented Sep 2, 2025 •

edited

Loading