Skip to content

Event Sourcing

Event sourcing stores every state change as an immutable event in Kafka rather than overwriting current state in a database, enabling full history replay, audit trails, and state reconstruction at any point in time.

Key Facts

  • Kafka's immutable append-only log is a natural fit for event sourcing
  • All events recorded as immutable facts; state is derived by replaying events
  • Current state reconstructed by replaying events from any offset
  • Kafka's log retention ensures complete event history for audits and compliance
  • Multiple [[consumer-groups]] can read the same event stream independently to build different materialized views
  • Solves "how did we get to this state?" problem - with only current state in DB, you cannot trace history
  • Writing all events to a traditional RDBMS bottlenecks at ~10K writes/sec; Kafka handles millions
  • Use specialized read stores (ElasticSearch, Redis, Cassandra) for materialized views
  • For indefinite event storage, a dedicated Event Store (EventStoreDB, MongoDB) is more appropriate than Kafka (which deletes by default)

Patterns

Event Sourcing Architecture

Events -> Kafka Topic (append-only log, source of truth)
    -> Materializer A -> Read Store A (current state for API)
    -> Materializer B -> Read Store B (analytics, full-text search)
    -> Materializer C -> Read Store C (notifications, ML fraud detection)
    -> Technical Support -> Full event history for debugging

Event Replay

POST /api/replay-events
1. Command API reads all events from Event Store
2. Sends marker event "replay started"
3. Query API clears its read model
4. Query API re-applies all events to rebuild from scratch

Use cases: schema migration, bug fix in event handlers, adding new projections, disaster recovery. During replay: incoming commands blocked until complete.

Kafka as Actor System

Each consumer processing a partition acts as an actor: - Receives messages sequentially - Maintains state - Can produce messages to other topics - Partitions naturally enforce sequential processing

Gotchas

  • Kafka eventually deletes events by default - for true indefinite event storage, use a dedicated Event Store alongside Kafka or configure infinite retention
  • Event schema evolution is critical - adding/removing fields in events breaks replay; use [[schema-registry]] with FULL compatibility
  • Replay can be expensive - millions of events take time to replay; consider snapshotting: periodically save current state, replay only from last snapshot
  • Ordering only within partition - related events must share a partition key; cross-partition ordering requires additional coordination

See Also

  • [[cqrs-pattern]] - Command Query Responsibility Segregation, natural companion to Event Sourcing
  • [[kafka-streams]] - stream processing for building materialized views
  • [[topics-and-partitions]] - log compaction for "latest state per key" tables
  • [[delivery-semantics]] - exactly-once for event processing