-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Build: auto-disable ACKS_LATE for long builds
#12393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html#visibility-timeout | ||
| BROKER_TRANSPORT_OPTIONS = { | ||
| 'visibility_timeout': 18000, # 5 hours | ||
| 'visibility_timeout': BUILD_TIME_LIMIT * 1.15, # 15% more than the build time limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know this value does nothing either way, but for builds that are using acks_late, this would worsen the effect -- builds running later than 15/30 minutes + 15% would restart on another builder. I don't think we want this value low while we are using acks late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+15/+30 minutes build should always use acks_late=False via Feature.BUILD_NO_ACKS_LATE to disable the visibility timeout issue we had, actually. In other words, projects with custom container_time_limit should always use Feature.BUILD_NO_ACKS_LATE; which is what this PR does to avoid these issues.
Builds running the default time limit + 15% should be fine with this value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that sounds fine, but right now these projects are using acks late, and the condition for not using acks late in this PR only matches for projects with a time limit > 1 hour. I think you want to make any custom value trigger no acks late instead, but this is a bug currently in production though, because acks late is not being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I realized that and updated the code to check for custom time limit 👍
|
Not sure what happened with the diff here. The only new commit is I will try to rebase this branch. |
If the project has a custom `Project.container_time_limit` we make the task to be `acks_late=False`.
a178baa to
1e5f3f0
Compare
|
The feedback you gave me, may be still relevant for #12369, tho |
|
With that PR already merged, we should address these issues before we go too much further. I think we probably just introduced the visibility timeout retry issue for any build longer than the default 15/30 minutes with that PR. |
|
I replied all the feedback with explanations. Let me know if I'm explaining myself here. |
df7f97f to
bdb34f6
Compare
Documentation build overviewFiles changed
Show files (2) | 2 modified | 0 added | 0 deleted
|
We are noticing all builds restarting at 17.25m (15m * 115%), reverting this back to 5hours for now. In theory, tasks are being executed with ack_late=False so they should immediately acknowledge the task and avoid the visibility timeout entirely. But in practice this doesn't seem to happen and the build task acts like ack_late=True with a 17.25min visibility timeout, duplicating the build task on another builder. - Related #12369 - We discussed the potential for low timeout bugs at #12393 - This bug is identical to #12317 except the timeout is 17m now instead of 1h
When a customer/user requests longer builds, we set
container_time_limitat project level. However, if the time is longer than 1h we also need to addBUILD_NO_ACKS_LATEto that project. This PR checks for that limit and if it's longer than 1h addsacks_late=Falseto the build task.Related to #12369