Skip to content

Change Propagation Approach

wanghaisheng edited this page Dec 9, 2015 · 3 revisions

Aesop uses the log mining approach of detecting data changes as described by the LinkedIn Databus project. It also uses the infrastructure components of Databus, mostly for serving change events. The concept of Event Producer, Relay, Event Buffer, Bootstrap, Event Consumer and System Change Number(SCN) are quite appealing and used mostly as-is in Aesop.

Aesop extends support for change detection on HBase data store by implementing an Event Producer based off NGDATA hbase-sep. For MySQL, Aesop builds on the producer implementation available in Databus.

The log mining producer implementations leverage master-slave replication support available in databases. Such producers may be called "Push" producers where changes are pushed from the master to slave (the Aesop Event Producer).

The log mining approach has limitations if data is distributed across database tables (when updates are not part of a single transaction) where-in it is hard to correlate multiple updates to a single change. It also has limitations when data is distributed across types of data stores - for e.g. between an RDBMS and a Document database. Aesop addresses this problem by implementing a "Pull" producer that uses an application-provided "Iterator" API to periodically scan the entire datastore and detect changes between scan cycles. This implementation is based off Netflix Zeno.

Change propagation employing both "Push" and "Pull" producers:

Pull Producer                        Streaming Client 1       Slow/Catchup client1
(Zeno based)    \                   /                        /
                 \_____ Relay _____/___ Bootstrap __________/
                 /    (Databus)    \    (Databus)           \
                /                   \                        \
Push Producer                        Streaming Client 2       Slow/Catchup client 2  
(e.g. HBase WAL edits listener,
 e.g. MySQL Replication listener)

Clone this wiki locally