Skip to content

The dual write problem

Adam Mikulasev edited this page Apr 27, 2025 · 2 revisions

🐞 The Dual Write Problem: What It Is and How It Breaks Your System

When building distributed or event-driven systems, it's common to perform two actions in sequence:

  1. Insert a record into the database (e.g., creating an event).
  2. Queue a background job or publish a message to handle the event asynchronously (e.g., using Sidekiq, Kafka, etc.).

This requirement however introduces a critical bug known as the dual write problem.


πŸ’₯ The Bug

The database write and queuing of a message are not atomic. If the process crashes in between, you're left in an inconsistent state.

Example in Ruby

# Step 1: Write to DB
event = Event.create!(...)

# πŸ’€ Process crashes here

# Step 2: Write to queue
EventCreatedJob.perform_async(event.id)

❌ What Goes Wrong

  • The Event is written to the database, so your system believes the action occurred.
  • But the job is never enqueued, due to the crash or failure.
  • Result: inconsistent state. Your app thinks something happened, but downstream systems never find out.

🧨 Consequences

  • Webhooks aren't fired
  • Emails aren't sent
  • Integrations fall out of sync
  • Data becomes stale or incorrect in other systems

βœ… The Solution: Transactional Outbox Pattern

Wrap both actions into a single transactionβ€”but instead of queuing immediately, record the intent to publish a message.

ActiveRecord::Base.transaction
  event = Event.create!(...)
  Outboxer::Message.create!(messageable: event, status: "queued")
end

Then, a background publisher process reads from the outbox_messages table and reliably publishes messages to Redis/Kafka/etc.


πŸ”„ How It Works

  1. Begin Transaction
  2. Insert the event
  3. Insert an Outboxer::Message that references the event
  4. Commit Transaction
  5. A publisher thread later polls the outbox and publishes messages
  6. Once published, the message is marked as published or deleted

βœ… Benefits

  • Atomicity: Both writes succeed or fail together
  • Durability: Messages persist even if the process crashes
  • Observability: You can monitor unprocessed messages
  • Scalability: Works across multiple services or systems

The dual write bug causes data loss or inconsistency when a system crash happens between a DB commit and a queue write.

The transactional outbox pattern prevents this by capturing the intent to publish inside the same transaction.


πŸ“š Further Reading