-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
OAuth: re-enable sync active user repos task #12443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
readthedocs/oauth/tasks.py
Outdated
| # Triggering a sync per-user, will re-sync the same installation | ||
| # multiple times. | ||
| sync_remote_repositories( | ||
| user.pk, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably have a wrapper around this function, so it accepts a user object, instead of re-fetching the object from the DB each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR re-enables the daily sync task for active user repositories with optimizations to prevent performance issues. The task was previously disabled due to causing celery instances to freeze up from excessive re-syncing.
Key changes include:
- Reducing the active user window from 90 days to 30 days to limit scope
- Adding logic to skip GitHub App services during sync to avoid redundant processing
- Using iterator() for more memory-efficient user processing
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| readthedocs/settings/base.py | Re-enables the commented-out daily repository sync task |
| readthedocs/oauth/tasks.py | Updates sync logic to skip GitHub App services and reduces active user window to 30 days |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
readthedocs/oauth/tasks.py
Outdated
| one_month_ago = timezone.now() - datetime.timedelta(days=30) | ||
| users = User.objects.annotate(weekday=ExtractIsoWeekDay("last_login")).filter( | ||
| last_login__gt=three_months_ago, | ||
| last_login__gt=one_month_ago, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we shouldn't consider the user inactive until after longest possible user session duration, plus a buffer of some days. last_login is updated on log in, and so won't update again until the user's session is invalidated -- likely the session cookie aging out after 30 days. After that period, last_login doesn't really signal an inactive user until maybe 2 additional weeks have passed (44 days total) and they haven't logged in again.
So, maybe say 45 days is probably the minimum and 90 days does seem like a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 90 days the task takes 1h 40min to complete (2.4K users), and with 45 days it takes 44min (1.3K users).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to 45 days.
In another comment you said it took ~3m. I'm confused. What's the correct value here? 42 minutes is a lot! 😔 |
This is the time it takes to sync all active users. If you are referring to my message from slack, that was about the time of a syncing the repos of a single user (from 12min to 12 seconds). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good step forward. Do we think the GH app is what caused this task to massively increase in size?
In general, we should probably try and find a way to lock these tasks so they don't pile up. We have this logic other places, but should probably apply it to all daily tasks:
| def memcache_lock(lock_id, lock_expire, app_identifier): |
I think it was more about the users from organizations with lots of repositories that recently logged in.
I think we should just make sure the task isn't retried if it times out or gets killed. Since this task is run daily, there is no way the task will run over 24+ hours in order to trigger a new task for the lock to be useful. |
|
I played around this locally and wasn't able to replicate a retry after the task times out. My other guess would be the task killing the server (OOM), but that I can't test locally... |
Yea, but the symptom of all these issues is that the tasks try to run 2x, so if we can easily lock it and log a warning in Sentry, it will help us catch situations like this with an explicit guarantee. |
This task now completes in 42 minutes.
Closes #12406