Skip to main content

Command Palette

Search for a command to run...

Celery gotchas

Updated
4 min read
Celery gotchas
D

I am developer/code-reviewer/debugger/bug-fixer/architect/teacher/builder from dubai, uae

Async Reliability with Celery

Task Loss Prevention Between Web Process and Broker

  • Enable broker confirmation: Configure confirm_publish on RabbitMQ to ensure tasks are actually committed to the broker before the delay operation completes

  • Pass data references, not data values: Use S3 URLs or database IDs instead of passing large Python objects as task arguments to prevent gigantic tasks that can crash workers

  • Implement database-sourced task recovery: Use Celery Beat with periodic tasks that check the database and re-queue missed tasks (e.g., verification emails) for automatic recovery

Task Loss Prevention Between Broker and Worker

  • Set task_acks_late = True: Tasks remain on the broker until the worker acknowledges completion, enabling redelivery if workers crash

  • Use transaction.on_commit(): Only queue tasks after database transactions commit to avoid race conditions where tasks execute before data is saved

  • Make tasks idempotent: Use ORM methods like get_or_create() and update_or_create() so tasks can be safely retried multiple times

  • Wrap tasks in transaction.atomic(): Ensure database changes can be rolled back if tasks are interrupted

Worker Reliability Configuration

  • Set task_reject_on_worker_lost = True: Enable task redelivery even when workers die from memory errors or SIGKILL signals

  • Handle all exceptions properly: Treat task exceptions with the same care as 500 errors in web views, using Celery's retry functionality for intermittent failures

  • Use RabbitMQ over Redis/SQS: RabbitMQ's connection-based redelivery is more reliable than visibility timeout mechanisms

Deployment Safety

  • Empty queues before changing task signatures: Ensure no old tasks remain when modifying function parameters

  • Avoid ETA/countdown tasks beyond a few seconds: These live in worker memory and complicate deployments

  • Use graceful shutdown (SIGTERM): Avoid SIGKILL during deploys to prevent unintended task dropping

Alternative Approaches

  • Use dedicated workflow tools for complex orchestration: Consider Prefect, Temporal, or Airflow instead of Celery Canvas for complex workflows

  • Implement proper monitoring and alerting: Set up observability tools specifically for Celery task execution

  • Configure task time limits and expiration: Prevent clogged queues and outdated notifications

Celery Canvas Best Practices

Workflow Patterns Demonstrated

  • Single Sequential Task (all_in_one): Processes all work in one task iterating through items sequentially - simplest but not parallel

  • Parallel Tasks with Join: Queues multiple tasks simultaneously but demonstrates why you should never call result.get() within a task - causes RuntimeError

  • Chord Pattern: Uses chord to run parallel tasks and execute a callback after all complete - recommended for fan-out/fan-in workflows

  • Starmap for Parameter Mapping: Uses starmap to efficiently map function calls over parameter tuples

  • Fine-grained Parallelism: Shows how to break work into smaller parallel tasks for maximum throughput

Performance and Scalability Insights

  • Database-intensive work: Parallel task execution isn't always better - database query optimization often outperforms more concurrent tasks for DB-heavy operations

  • Task granularity matters: Breaking work into smaller tasks enables better parallelism but creates more overhead

  • Concurrency configuration: Use appropriate worker concurrency settings (example shows -c 8) based on your workload

Canvas Primitives Usage

  • Chord: Best for fan-out/fan-in patterns where you need to collect results after parallel execution

  • Starmap: Efficient for mapping a function over multiple parameter sets

  • Group: For simple parallel execution without result collection

  • Chain: For sequential task dependencies

Setup and Configuration Best Practices

  • Use RabbitMQ as broker: The examples use RabbitMQ via Docker for reliable message delivery

  • Environment isolation: Uses direnv for clean Python virtual environment management

  • Mac-specific configuration: Sets OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES for multithreading compatibility

  • Proper worker scaling: Configure concurrency based on workload characteristics

Performance Testing Approach

  • Comparative benchmarking: The repository demonstrates multiple approaches to the same problem for performance comparison

  • Real-world simulation: Uses a voter registration scenario with 10,000 records to test scalability

  • Timing analysis: Encourages comparing timestamps to measure actual performance differences

Key Takeaway

The repository emphasizes that Canvas primitives should be used judiciously - parallel execution isn't always faster, especially for database-intensive operations. The choice between sequential, parallel, or Canvas-based approaches should be based on the specific nature of your workload and performance testing results.

References