Skip to content

Conversation

@stsewd
Copy link
Member

@stsewd stsewd commented Aug 11, 2025

Sorting by the last build project is expensive, as the query is done for each project (even if paginated, since the order applies to all projects). For comparison, here are some numbers with data from production

# Ordering by name
145 ms ± 446 μs per loop (mean ± std. dev. of 5 runs, 5 loops each)

# Ordering by max(builds_date)
4.37 s ± 254 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)

The migration is going to take some time to run, I imagine... but should be worth it for a snappier dashboard.

@stsewd stsewd requested a review from a team as a code owner August 11, 2025 22:50
@stsewd stsewd requested a review from agjohnson August 11, 2025 22:50
@stsewd stsewd requested a review from ericholscher August 11, 2025 22:51
Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great approach, it should help with query time a lot on the listing view.

It seems like this could be a one way relationship, but I don't think it's necessarily an issue that it's two way either.

Having a ballpark estimate on the data migration in prod would be helpful. If it's far too long, we might weigh other options for updating the project field.

latest_build = models.OneToOneField(
"builds.Build",
verbose_name=_("Latest build"),
related_name="latest_build_for_project",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the related relationship at all here? Isn't it always going to be build.project === build.latest_build_for_project?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but don't think you can omit the related field, django will create one if not given. But I didn't test what would happen if the default was used (it would have collided with Build.project is my guess).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it always going to be build.project === build.latest_build_for_project?

No. There is going to be only one build where this is true. Most of the builds will have build.latest_build_for_project === None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't like too much the related name latest_build_for_project because is not semantically correct since it will return a Project object and not a Build object.

I prefer if we disable the related relationship. Django has an option for this using +:

If you’d prefer Django not to create a backwards relation, set related_name to '+' or end it with '+'

(from https://docs.djangoproject.com/en/5.2/ref/models/fields/#django.db.models.ForeignKey.related_name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why the name ends with _for_project, but I'm fine with disabling the related name

latest_build = project.builds.order_by("-date").first()
if latest_build:
project.latest_build = latest_build
project.save()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm also guessing this might take a while to migrate. What does this query time look like without updating anything in the database?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be a good metric to see how long this would take, yeah, will do that first thing tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to do the model change first, so we can run the migration without blocking the deploy, and then do the code changes using the model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Eric's idea makes sense. It will require a few extra PRs, and multiple deploys tho.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like doing the actual filtering after the migration is done? I'm fine doing a Hotfix after the migration is done of it takes too long to run

Copy link
Member Author

@stsewd stsewd Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took like 10 min in .org without saving, maybe with saving will take 15 min? But fine to just hotfix.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great and simple implementation. I was wondering about just doing a latest_build_date, but having a 1to1 to the Build object is likely better so we can actually pull the metadata off it. I really like this pattern, and imagine we could be using it a few other places to really speed up slow queries!

Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me with some adjustments.

latest_build = project.builds.order_by("-date").first()
if latest_build:
project.latest_build = latest_build
project.save()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Eric's idea makes sense. It will require a few extra PRs, and multiple deploys tho.

latest_build = models.OneToOneField(
"builds.Build",
verbose_name=_("Latest build"),
related_name="latest_build_for_project",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it always going to be build.project === build.latest_build_for_project?

No. There is going to be only one build where this is true. Most of the builds will have build.latest_build_for_project === None

latest_build = models.OneToOneField(
"builds.Build",
verbose_name=_("Latest build"),
related_name="latest_build_for_project",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't like too much the related name latest_build_for_project because is not semantically correct since it will return a Project object and not a Build object.

I prefer if we disable the related relationship. Django has an option for this using +:

If you’d prefer Django not to create a backwards relation, set related_name to '+' or end it with '+'

(from https://docs.djangoproject.com/en/5.2/ref/models/fields/#django.db.models.ForeignKey.related_name)

@stsewd stsewd merged commit 6697275 into main Aug 12, 2025
2 of 4 checks passed
@stsewd stsewd deleted the denormalize-latest-build branch August 12, 2025 14:19
stsewd added a commit that referenced this pull request Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants