|
| 1 | +title: Task Queues |
| 2 | +category: page |
| 3 | +slug: task-queues |
| 4 | +sort-order: 0703 |
| 5 | +choice1url: /logging.html |
| 6 | +choice1icon: fa-align-left fa-inverse |
| 7 | +choice1text: How do I monitor my app and its task queues with logging? |
| 8 | +choice2url: /web-analytics.html |
| 9 | +choice2icon: fa-dashboard |
| 10 | +choice2text: How can I learn more about the users of my application? |
| 11 | +choice3url: /monitoring.html |
| 12 | +choice3icon: fa-bar-chart-o fa-inverse |
| 13 | +choice3text: What tools exist for monitoring a live web application? |
| 14 | +choice4url: |
| 15 | +choice4icon: |
| 16 | +choice4text: |
| 17 | + |
| 18 | + |
| 19 | +# Task queues |
| 20 | +Task queues manage background work that must be executed outside the usual |
| 21 | +HTTP request-response cycle. |
| 22 | + |
| 23 | + |
| 24 | +## Why are task queues necessary? |
| 25 | +Tasks are handled asynchronously either because they are not initiated by |
| 26 | +an HTTP request or because they are long-running jobs that would dramatically |
| 27 | +reduce the performance of an HTTP response. |
| 28 | + |
| 29 | +For example, a web application could poll the GitHub API every 10 minutes to |
| 30 | +collect the names of the top 100 starred repositories. A task queue would |
| 31 | +handle invoking code to call the GitHub API, process the results and store them |
| 32 | +in a persistent database for later use. |
| 33 | + |
| 34 | +Another example is when a database query would take too long during the HTTP |
| 35 | +request-response cycle. The query could be performed in the background on a |
| 36 | +fixed interval with the results stored in the database. When an |
| 37 | +HTTP request comes in that needs those results a query would simply fetch the |
| 38 | +precalculated result instead of re-executing the longer query. |
| 39 | +This precalculation scenario is a form of [caching](/caching.html) enabled |
| 40 | +by task queues. |
| 41 | + |
| 42 | +Other types of jobs for task queues include |
| 43 | + |
| 44 | +* spreading out large numbers of independent database inserts over time |
| 45 | + instead of inserting everything at once |
| 46 | + |
| 47 | +* aggregating collected data values on a fixed interval, such as every |
| 48 | + 15 minutes |
| 49 | + |
| 50 | +* scheduling periodic jobs such as batch processes |
| 51 | + |
| 52 | + |
| 53 | +## Task queue projects |
| 54 | +The defacto standard Python task queue is Celery. The other task queue |
| 55 | +projects that arise tend to come from the perspective that Celery is overly |
| 56 | +complicated for simple use cases. My recommendation is to put the effort into |
| 57 | +Celery's reasonable learning curve as it is worth the time it takes to |
| 58 | +understand how to use the project. |
| 59 | + |
| 60 | +* The [Celery](http://www.celeryproject.org/) distributed task queue is the |
| 61 | + most commonly used Python library for handling asynchronous tasks and |
| 62 | + scheduling. |
| 63 | + |
| 64 | +* The [RQ (Redis Queue)](http://python-rq.org/) is a simple Python |
| 65 | + library for queueing jobs and processing them in the background with workers. |
| 66 | + RQ is backed by Redis and is designed to have a low barrier to entry. |
| 67 | + The [intro post](http://nvie.com/posts/introducing-rq/) contains information |
| 68 | + on design decisions and how to use RQ. |
| 69 | + |
| 70 | +* [Taskmaster](https://github.com/dcramer/taskmaster) is a lightweight simple |
| 71 | + distributed queue for handling large volumes of one-off tasks. |
| 72 | + |
| 73 | + |
| 74 | +## Hosted message and task queue services |
| 75 | +Task queue third party services aim to solve the complexity issues that arise |
| 76 | +when scaling out a large deployment of distributed task queues. |
| 77 | + |
| 78 | +* [Iron.io](http://www.iron.io/) is a distributed messaging service platform |
| 79 | + that works with many types of task queues such as Celery. It also is built |
| 80 | + to work with other IaaS and PaaS environments such as Amazon Web Services |
| 81 | + and Heroku. |
| 82 | + |
| 83 | +* [Amazon Simple Queue Service (SQS)](http://aws.amazon.com/sqs/) is a |
| 84 | + set of five APIs for creating, sending, receiving, modifying and deleting |
| 85 | + messages. |
| 86 | + |
| 87 | +* [CloudAMQP](http://www.cloudamqp.com/) is at its core managed servers with |
| 88 | + RabbitMQ installed and configured. This service is an option if you are |
| 89 | + using RabbitMQ and do not want to maintain RabbitMQ installations on your |
| 90 | + own servers. |
| 91 | + |
| 92 | + |
| 93 | +## Task queue resources |
| 94 | +* [Getting Started Scheduling Tasks with Celery](http://www.caktusgroup.com/blog/2014/06/23/scheduling-tasks-celery/) |
| 95 | + is a detailed walkthrough for setting up Celery with Django (although |
| 96 | + Celery can also be used without a problem with other frameworks). |
| 97 | + |
| 98 | +* [Distributing work without Celery](http://justcramer.com/2012/05/04/distributing-work-without-celery/) |
| 99 | + provides a scenario in which Celery and RabbitMQ are not the right tool |
| 100 | + for scheduling asynchronous jobs. |
| 101 | + |
| 102 | +* [Evaluating persistent, replicated message queues](http://www.warski.org/blog/2014/07/evaluating-persistent-replicated-message-queues/) |
| 103 | + is a detailed comparison of Amazon SQS, MongoDB, RabbitMQ, HornetQ and |
| 104 | + Kafka's designs and performance. |
| 105 | + |
| 106 | +* [Queues.io](http://queues.io/) is a collection of task queue systems with |
| 107 | + short summaries for each one. The task queues are not all compatible with |
| 108 | + Python but ones that work with it are tagged with the "Python" keyword. |
| 109 | + |
| 110 | +* [Why Task Queues](http://www.slideshare.net/bryanhelmig/task-queues-comorichweb-12962619) |
| 111 | + is a presentation for what task queues are and why they are needed. |
| 112 | + |
| 113 | +* [How to use Celery with RabbitMQ](https://www.digitalocean.com/community/articles/how-to-use-celery-with-rabbitmq-to-queue-tasks-on-an-ubuntu-vps) |
| 114 | + is a detailed walkthrough for using these tools on an Ubuntu VPS. |
| 115 | + |
| 116 | +* Heroku has a clear walkthrough for using |
| 117 | + [RQ for background tasks](https://devcenter.heroku.com/articles/python-rq). |
| 118 | + |
| 119 | +* [Introducing Celery for Python+Django](http://www.linuxforu.com/2013/12/introducing-celery-pythondjango/) |
| 120 | + provides an introduction to the Celery task queue. |
| 121 | + |
| 122 | +* [Celery - Best Practices](https://denibertovic.com/posts/celery-best-practices/) |
| 123 | + explains things you should not do with Celery and shows some underused |
| 124 | + features for making task queues easier to work with. |
| 125 | + |
| 126 | +* The "Django in Production" series by |
| 127 | + [Rob Golding](https://twitter.com/robgolding63) contains a post |
| 128 | + specifically on [Background Tasks](http://www.robgolding.com/blog/2011/11/27/django-in-production-part-2---background-tasks/). |
| 129 | + |
| 130 | +* [Asynchronous Processing in Web Applications Part One](http://blog.thecodepath.com/2012/11/15/asynchronous-processing-in-web-applications-part-1-a-database-is-not-a-queue/) |
| 131 | + and [Part Two](http://blog.thecodepath.com/2013/01/06/asynchronous-processing-in-web-applications-part-2-developers-need-to-understand-message-queues/) |
| 132 | + are great reads for understanding the difference between a task queue and |
| 133 | + why you shouldn't use your database as one. |
| 134 | + |
| 135 | +* [A 4 Minute Intro to Celery](https://www.youtube.com/watch?v=68QWZU_gCDA) is |
| 136 | + a short introductory task queue screencast. |
| 137 | + |
| 138 | + |
| 139 | +## Task queue learning checklist |
| 140 | +<i class="fa fa-check-square-o"></i> |
| 141 | +Pick a slow function in your project that is called during an HTTP request. |
| 142 | + |
| 143 | +<i class="fa fa-check-square-o"></i> |
| 144 | +Determine if you can precompute the results on a fixed interval instead of |
| 145 | +during the HTTP request. If so, create a separate function you can call |
| 146 | +from elsewhere then store the precomputed value in the database. |
| 147 | + |
| 148 | +<i class="fa fa-check-square-o"></i> |
| 149 | +Read the Celery documentation and the links in the resources section below |
| 150 | +to understand how the project works. |
| 151 | + |
| 152 | +<i class="fa fa-check-square-o"></i> |
| 153 | +Install a message broker such as RabbitMQ or Redis and then add Celery to your |
| 154 | +project. Configure Celery to work with the installed message broker. |
| 155 | + |
| 156 | +<i class="fa fa-check-square-o"></i> |
| 157 | +Use Celery to invoke the function from step one on a regular basis. |
| 158 | + |
| 159 | +<i class="fa fa-check-square-o"></i> |
| 160 | +Have the HTTP request function use the precomputed value instead of the |
| 161 | +slow running code it originally relied upon. |
| 162 | + |
| 163 | + |
| 164 | +### What's next after task queues? |
0 commit comments