Skip to content
rhauch edited this page Dec 19, 2014 · 7 revisions

Debezium is composed of several software components. The lightweight driver implements the public API and is instantiated in applications that are directly exposed and used by mobile apps. The driver's public methods are asynchronous, and largely involve creating messages, registering the caller's response handler (if appropriate), writing them to the appropriate stream, and monitoring the partial-responses stream for responses. All other functionality is implemented by services described on this page.

Unless otherwise specified, all services are single-threaded. Multiple services can be deployed so that each service is consuming a distinct subset of the partitions of the consumed streams. The maximum number of service instances corresponds to the number of partitions in the consumed stream. That means that any given partition will be consumed by only one service instance, and a service instance only consumes one message at a time.

Also, pay particular attention to the way the consumed streams are partitioned. Many services use a local database to store information, often keyed by the partition key of the consumed stream. Because of the way streams are partitioned, any message with a given partition key will always be stored in the same stream partition, and the same service instance will always process all such messages. In this way, the services can actually persist information in a local database, resulting in a shared-nothing architecture that maximizes scalability, availability, and performance. In the prototypes, the databases are also backed by its own stream, so all the information can be recovered even if the local database fails or is lost.

The services are:

Device service

The device service consumes the connections stream, maintains a database of all device tokens for each username, and (proposed) writes out summary messages every time a device is added or removed.

Entity batch service

Entity storage service

Schema storage service

Schema learning partitioning service

Schema learning service

Response accumulator service

Changes in zone service

The changes in zone service processes entity updates, converts them to summaries of what changed, and writes them out to a stream partitioned by zone ID. It also monitors any changed $zoneSubscription entities, and writes them out to the output stream partitioned by the zone ID of the zone that is the subject of the subscription (rather than the zone in $zoneSubscription entity is stored).

Consumes: entity-updates, partitioned by entity ID

Outputs: zone-changes, partitioned by zone ID

Processing logic:

  1. Consume message from entity-updates, partitioned by entity ID
  2. Extract the zone ID of the changed entity
  3. If the changed entity is from the built-in $zoneSubscription collection:
  4. Read the ID of the zone that is the subject of the subscription
  5. Write the entity update to the zone-changes, partitioned by subject zone ID
  6. Create a summary message of the changed entity (including zone subscription entities), determining whether the entity was created, updated, or deleted.
  7. Write the summary message to the zone-changes, partitioned by the changed entity's zone ID
  8. Mark that the message was consumed

Failure tolerance: This service is tolerant of crashes and will not lose or corrupt any data. After restarting the service following a crash, some inputs messages that were only partially processed when the crash occurred will be reprocessed, potentially resulting in some duplicated output messages. Therefore, downstream services must be tolerant of duplicate messages (which is likely not to be a problem, since it will at most imply an entity changed twice when it only changed once).

This service stores no information locally, and merely consumes the incoming stream.

Zone watch service

The zone watch service is responsible for tracking zone subscriptions and using them to identify to which devices each entity change should be forwarded.

Consumes: zone-changes, partitioned by zone ID

Outputs: changes-by-device, partitioned by device ID

Processing logic:

  1. Consume message from zone-changes, partitioned by zone ID
  2. If the message is a summary of an entity change, then:
  3. Generate a unique notification identifier for this change
  4. Create a notification summary message for the entity change
  5. Find in local cache/store the list of all devices that have subscriptions to this zone
  6. For each device: 1. write out the notification summary message to changes-by-device, using the device ID as the key and partition key
  7. If the message is a change to a subscription for a specific device, then
  8. Read from the cache/store the zone's list of subscriptions
  9. Remove or update the subscription in the zone's list of subscriptions
  10. Write list of subscriptions for the zone back to the cache/store
  11. Mark that the message was consumed

Failure tolerance: This service is tolerant of crashes and will not lose or corrupt any data. After restarting the service following a crash, some input messages that were only partially processed when the crash occurred will be reprocessed, potentially resulting in some duplicated output messages. Therefore, downstream services must be tolerant of duplicate messages.

The only information stored locally is the list of subscriptions for each zone (seen by the service instance). This information is stored in a local database, but it is also backed by a dedicated, replicated, and partitioned stream. Therefore, should the local database become lost or corrupt, the information can be recovered from the dedicated stream.

Coalescing service

The coalescing service is responsible for coalescing all of the notifications sent to a specific device.

Consumes: changes-by-device, partitioned by device ID

Outputs: device-notifications, partitioned by device ID

Processing logic:

  1. Consume message from changes-by-device, partitioned by device ID
  2. Find in local cache/store the list of all notifications for the device; if none is found create an empty document
  3. Add the notification to the list, ensuring that the notification does not appear more than once
  4. Write the list back to the cache/store
  5. Build an output message with the total number of notifications in the list, the device, the user, etc.
  6. Write the output message to device-notifications, partitioned by device ID
  7. Mark that the message was consumed

Failure tolerance: This service is tolerant of crashes and will not lose or corrupt any data. After restarting the service following a crash, some input messages that were only partially processed when the crash occurred will be reprocessed, potentially resulting in some duplicated output messages. Therefore, downstream services must be tolerant of duplicate messages.

The only information stored locally is the list of subscriptions for each zone (seen by the service instance). This information is stored in a local database, but it is also backed by a dedicated, replicated, and partitioned stream. Therefore, should the local database become lost or corrupt, the information can be recovered from the dedicated stream.

Clone this wiki locally