5 Event-Driven Architecture Pitfalls to Avoid

I needed to understand the gotchas in event-driven architecture before migrating more services. Wix's engineering team documented 5 painful lessons learned from moving over 2300 microservices from request-reply to event-driven patterns.

The Pitfalls

1. Non-atomic database writes and event publishing

Writing to a database and firing an event isn't atomic. If either the DB or message broker fails, you get data inconsistency.

Solutions:

Resilient producer that retries until the event reaches Kafka
CDC (Change Data Capture) with Debezium - captures DB changes via binlog and produces them as events automatically

2. Using Event Sourcing everywhere

Event sourcing stores events instead of entity state. You reconstruct current state by replaying events. Sounds cool but adds serious complexity:

Need snapshots to avoid performance degradation
Harder to create generic libraries (unlike CRUD ORMs)
Only eventual consistency

Better approach: CRUD + CDC. Simple reads from the database, with CDC publishing changes for downstream materialized views. Get the benefits without the complexity.

3. No context propagation

Debugging distributed event flows is hard. Unlike HTTP chains, events scatter across topics and services.

# Add request context to event headers
event_headers = {
    'requestId': request_id,
    'userId': user_id
}

Automatically propagate these IDs through your event chain. Makes filtering logs and events trivial during incident investigation.

4. Large event payloads

Events over 5MB kill broker performance. Three remedies:

Compression - Kafka and Pulsar support lz4, snappy. Broker-level compression beats application-level
Chunking - Split payloads into chunks with metadata for reassembly
Object store reference - Store payload in S3, pass URL in event

5. Duplicate event processing

Most brokers guarantee at-least-once delivery. Events can be processed multiple times.

# Use optimistic locking with revisionId
def process_event(event):
    revision_id = event['revisionId']
    # Read current version
    entity = db.get(event['entityId'])
    if entity.revision_id != revision_id:
        return  # Already processed

    # Update with new revision
    entity.update(revision_id=revision_id + 1)

For Kafka, use topic-partition-offset as unique transaction ID.

The Migration Strategy

Migrate gradually. Mix HTTP/RPC with event-driven patterns as needed. CDC is the sweet spot - ensures consistency without full event-sourcing complexity.

Context propagation (pitfall #3) is critical for operations. Fix that early.

Compression and transaction IDs are good defaults even if you don't hit pitfalls #4 and #5 yet.

Happy hackin'!

5 Event-Driven Architecture Pitfalls to Avoid

The Pitfalls