-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Build: Denormalize latest build for projects #12392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great approach, it should help with query time a lot on the listing view.
It seems like this could be a one way relationship, but I don't think it's necessarily an issue that it's two way either.
Having a ballpark estimate on the data migration in prod would be helpful. If it's far too long, we might weigh other options for updating the project field.
readthedocs/projects/models.py
Outdated
| latest_build = models.OneToOneField( | ||
| "builds.Build", | ||
| verbose_name=_("Latest build"), | ||
| related_name="latest_build_for_project", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want the related relationship at all here? Isn't it always going to be build.project === build.latest_build_for_project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but don't think you can omit the related field, django will create one if not given. But I didn't test what would happen if the default was used (it would have collided with Build.project is my guess).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it always going to be build.project === build.latest_build_for_project?
No. There is going to be only one build where this is true. Most of the builds will have build.latest_build_for_project === None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't like too much the related name latest_build_for_project because is not semantically correct since it will return a Project object and not a Build object.
I prefer if we disable the related relationship. Django has an option for this using +:
If you’d prefer Django not to create a backwards relation, set related_name to '+' or end it with '+'
(from https://docs.djangoproject.com/en/5.2/ref/models/fields/#django.db.models.ForeignKey.related_name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why the name ends with _for_project, but I'm fine with disabling the related name
| latest_build = project.builds.order_by("-date").first() | ||
| if latest_build: | ||
| project.latest_build = latest_build | ||
| project.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm also guessing this might take a while to migrate. What does this query time look like without updating anything in the database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be a good metric to see how long this would take, yeah, will do that first thing tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to do the model change first, so we can run the migration without blocking the deploy, and then do the code changes using the model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Eric's idea makes sense. It will require a few extra PRs, and multiple deploys tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like doing the actual filtering after the migration is done? I'm fine doing a Hotfix after the migration is done of it takes too long to run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took like 10 min in .org without saving, maybe with saving will take 15 min? But fine to just hotfix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great and simple implementation. I was wondering about just doing a latest_build_date, but having a 1to1 to the Build object is likely better so we can actually pull the metadata off it. I really like this pattern, and imagine we could be using it a few other places to really speed up slow queries!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me with some adjustments.
| latest_build = project.builds.order_by("-date").first() | ||
| if latest_build: | ||
| project.latest_build = latest_build | ||
| project.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Eric's idea makes sense. It will require a few extra PRs, and multiple deploys tho.
readthedocs/projects/models.py
Outdated
| latest_build = models.OneToOneField( | ||
| "builds.Build", | ||
| verbose_name=_("Latest build"), | ||
| related_name="latest_build_for_project", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it always going to be build.project === build.latest_build_for_project?
No. There is going to be only one build where this is true. Most of the builds will have build.latest_build_for_project === None
readthedocs/projects/models.py
Outdated
| latest_build = models.OneToOneField( | ||
| "builds.Build", | ||
| verbose_name=_("Latest build"), | ||
| related_name="latest_build_for_project", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't like too much the related name latest_build_for_project because is not semantically correct since it will return a Project object and not a Build object.
I prefer if we disable the related relationship. Django has an option for this using +:
If you’d prefer Django not to create a backwards relation, set related_name to '+' or end it with '+'
(from https://docs.djangoproject.com/en/5.2/ref/models/fields/#django.db.models.ForeignKey.related_name)
Sorting by the last build project is expensive, as the query is done for each project (even if paginated, since the order applies to all projects). For comparison, here are some numbers with data from production
The migration is going to take some time to run, I imagine... but should be worth it for a snappier dashboard.