Tag Archives: celery

photo of meerkats looking at the light

Monitoring Celery Tasks with Sentry

Sentry is a great tool for monitoring celery tasks, and alerting when they fail or don’t run on time. But it requires a bit of work to setup properly. Below is some sample code for setting up sentry monitoring of periodic tasks, followed by an explanation.

import math
import sentry_sdk
from celery import signals
from sentry_sdk import monitor
from sentry_sdk.integrations.celery import CeleryIntegration
@signals.beat_init.connect # if you use beats
@signals.celeryd_init.connect
def init_sentry(**kwargs):
    sentry_sdk.init(
        dsn=...,
        integrations=[
            CeleryIntegration(monitor_beat_tasks=False)
        ]
    )
@signals.worker_shutdown.connect
@signals.task_postrun.connect
def flush_sentry(**kwargs):
    sentry_sdk.flush(timeout=5)
def add_periodic_task(celery, schedule, task):
    max_runtime = math.ceil(schedule * 4 / 60)
    monitor_config = {
        "recovery_threshold": 1,
        "failure_issue_threshold": 10,
        "checkin_margin": max_runtime,
        "max_runtime": max_runtime,
        "schedule": {
            "type": "interval",
            "value": math.ceil(schedule / 60.0)
            "unit": "minute"
        }
    }
    name = task.__name__
    task = monitor(monitor_slug=name, monitor_config=monitor_config)(task)
    celery.add_periodic_task(schedule, celery.task(task).s(), name=name)

Initialize Sentry

The init_sentry function must be called before any tasks start executing. The sentry docs for celery recommend using the celeryd_init signal. And if you use celery beats for periodic task execution, then you also need to initialize on the beat_init signal.

Monitoring Beats Tasks

In this example, I’m setting monitor_beat_tasks=False to show how you can do manual monitoring. monitor_beat_tasks=True is much simpler, and doesn’t require any code like in add_periodic_task. But in my experience, it’s not reliable when using async celery functions. The automatic beats monitoring uses some celery signals that likely don’t get executed correctly under async conditions. But manual monitoring isn’t that hard with a function wrapper, as shown above.

Adding a Periodic Task

The add_periodic_task function takes a Celery instance, a periodic interval in seconds, and a function to execute. This function can be normal or async. It then does the following:

  1. Calculates a max_runtime in minutes, so that sentry knows when a task has gone over time. This is also used for checkin_margin, giving the task plenty of buffer time before an issue is created. You should adjust these according to your needs.
  2. Creates a monitor_config for sentry, specifying the following:
    • schedule in minutes (rounded up, because sentry doesn’t handle schedules in seconds)
    • the number of failures allowed before creating an issue (I put 10, but you should adjust as needed)
    • how many successful checkins are required before the issue is marked as resolved (1 is the default, but adjust as needed)
  3. Wraps the function in the sentry monitor decorator, using the function’s name as the monitor_slug. With default beats monitoring, the slug is set to the full package.module.function path, which can be quite long and becomes hard to scan when you have many tasks.
  4. Schedules the task in celery.

Sentry Flush

While this may not be strictly necessary, calling sentry_sdk.flush on the worker_shutdown and task_postrun signals ensures that events are sent to sentry when a celery task completes.

Monitoring your crons

Once this is all setup and running, you should be able to go to Insights > Crons in your sentry web UI, and see all your celery tasks. Double check your monitor settings to make sure they’re correct, then sit back and relax, while sentry keeps track of how your tasks are running.

green celery on blue background

Async Python Functions with Celery

Celery is a great tool for scheduled function execution in python. You can also use it for running functions in the background asynchronously from your main process. However, it does not support python asyncio. This is a big limitation, because async functions are usually much more I/O efficient, and there are many libraries that provide great async support. And parallel data processing with async.gather becomes impossible in celery without async support.

Celery Async Issues

Unfortunately, based on the current Open status of these issues, celery will not support async functions anytime soon.

But luckily there are two projects that provide async celery support.

AIO Celery

This project is an alternative independent asyncio implementation of Celery

aio-celery “does not depend on the celery codebase”. Instead, it provides a new implementation of the Celery Message Protocol that enables asyncio tasks and workers.

It is written completely from scratch as a thin wrapper around aio-pika (which is an asynchronous RabbitMQ python driver) and it has no other dependencies

It is actively developed, and seems like a great celery alternative. But there are some downsides:

  1. “Only RabbitMQ as a message broker” means you cannot use any other broker such as Redis
  2. “Only Redis as a result backend” means you can’t store results in any other database
  3. “Complete feature parity with upstream Celery project is not the goal”, so there may be features from celery you want that are not present in aio-celery

Celery AIO Pool

celery-aio-pool provides a custom worker pool implementation that works with celery 5.3+. Unlike aio-celery, you can keep using your existing celery implementation. All you have to do to get async task support in celery is:

  1. Start your celery worker with this environment variable: CELERY_CUSTOM_WORKER_POOL='celery_aio_pool.pool:AsyncIOPool'
  2. Run the celery worker process with --pool=custom

So your worker command will look like

CELERY_CUSTOM_WORKER_POOL='celery_aio_pool.pool:AsyncIOPool' celery worker --pool=custom

plus whatever other arguments or environment variables you need. Once you have this in place, you can start using async functions as celery tasks.

While celery-aio-pool is not as actively developed, it works, and has the following benefits:

  • Simple to install and configure with Celery >= 5.3
  • Works with any celery support message broker or result backend
  • Works with your existing celery setup without requiring any other changes