Celery gotchas

I am developer/code-reviewer/debugger/bug-fixer/architect/teacher/builder from dubai, uae
Async Reliability with Celery
Task Loss Prevention Between Web Process and Broker
Enable broker confirmation: Configure
confirm_publishon RabbitMQ to ensure tasks are actually committed to the broker before the delay operation completesPass data references, not data values: Use S3 URLs or database IDs instead of passing large Python objects as task arguments to prevent gigantic tasks that can crash workers
Implement database-sourced task recovery: Use Celery Beat with periodic tasks that check the database and re-queue missed tasks (e.g., verification emails) for automatic recovery
Task Loss Prevention Between Broker and Worker
Set
task_acks_late = True: Tasks remain on the broker until the worker acknowledges completion, enabling redelivery if workers crashUse
transaction.on_commit(): Only queue tasks after database transactions commit to avoid race conditions where tasks execute before data is savedMake tasks idempotent: Use ORM methods like
get_or_create()andupdate_or_create()so tasks can be safely retried multiple timesWrap tasks in
transaction.atomic(): Ensure database changes can be rolled back if tasks are interrupted
Worker Reliability Configuration
Set
task_reject_on_worker_lost = True: Enable task redelivery even when workers die from memory errors or SIGKILL signalsHandle all exceptions properly: Treat task exceptions with the same care as 500 errors in web views, using Celery's retry functionality for intermittent failures
Use RabbitMQ over Redis/SQS: RabbitMQ's connection-based redelivery is more reliable than visibility timeout mechanisms
Deployment Safety
Empty queues before changing task signatures: Ensure no old tasks remain when modifying function parameters
Avoid ETA/countdown tasks beyond a few seconds: These live in worker memory and complicate deployments
Use graceful shutdown (SIGTERM): Avoid SIGKILL during deploys to prevent unintended task dropping
Alternative Approaches
Use dedicated workflow tools for complex orchestration: Consider Prefect, Temporal, or Airflow instead of Celery Canvas for complex workflows
Implement proper monitoring and alerting: Set up observability tools specifically for Celery task execution
Configure task time limits and expiration: Prevent clogged queues and outdated notifications
Celery Canvas Best Practices
Workflow Patterns Demonstrated
Single Sequential Task (
all_in_one): Processes all work in one task iterating through items sequentially - simplest but not parallelParallel Tasks with Join: Queues multiple tasks simultaneously but demonstrates why you should never call
result.get()within a task - causes RuntimeErrorChord Pattern: Uses
chordto run parallel tasks and execute a callback after all complete - recommended for fan-out/fan-in workflowsStarmap for Parameter Mapping: Uses
starmapto efficiently map function calls over parameter tuplesFine-grained Parallelism: Shows how to break work into smaller parallel tasks for maximum throughput
Performance and Scalability Insights
Database-intensive work: Parallel task execution isn't always better - database query optimization often outperforms more concurrent tasks for DB-heavy operations
Task granularity matters: Breaking work into smaller tasks enables better parallelism but creates more overhead
Concurrency configuration: Use appropriate worker concurrency settings (example shows
-c 8) based on your workload
Canvas Primitives Usage
Chord: Best for fan-out/fan-in patterns where you need to collect results after parallel execution
Starmap: Efficient for mapping a function over multiple parameter sets
Group: For simple parallel execution without result collection
Chain: For sequential task dependencies
Setup and Configuration Best Practices
Use RabbitMQ as broker: The examples use RabbitMQ via Docker for reliable message delivery
Environment isolation: Uses
direnvfor clean Python virtual environment managementMac-specific configuration: Sets
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YESfor multithreading compatibilityProper worker scaling: Configure concurrency based on workload characteristics
Performance Testing Approach
Comparative benchmarking: The repository demonstrates multiple approaches to the same problem for performance comparison
Real-world simulation: Uses a voter registration scenario with 10,000 records to test scalability
Timing analysis: Encourages comparing timestamps to measure actual performance differences
Key Takeaway
The repository emphasizes that Canvas primitives should be used judiciously - parallel execution isn't always faster, especially for database-intensive operations. The choice between sequential, parallel, or Canvas-based approaches should be based on the specific nature of your workload and performance testing results.




