Change Data Capture and Embedded Databases – Embedded Database Systems

Change Data Capture (CDC) is a process to determine what data in a database has been changed, and somehow preserve that data change for subsequent processing. The purpose of that subsequent processing could be e.g. for database recovery, replication, and/or event handling. This is not intended to be an exhaustive list of the purposes, merely a few representative examples.

How Change Data Capture is affected in embedded database systems is as varied as embedded database systems themselves. On this site alone, 21 open source embedded databases and 20 commercial embedded databases are listed (with some overlap).

The most common implementation of Change Data Capture is transaction logging, which is most often used for recovery of a database after an abnormal termination. Transaction logging, itself, can be implemented in different ways with the two most common being UNDO and REDO logging. UNDO logging is probably not ideal for any Change Data Capture purpose other than database recovery because it is the before-image of changed data that is logged while the most recent version of data is usually what is of interest. For instance, a telecom system might store Quality of Service (QoS) parameters in the database. If changed, another part of the system needs to know about it so that the protocol stack can be modified accordingly. While an UNDO log could be used to pick up the change, it would require another operation to pull the updated data from the database. Conversely, a REDO log would contain the latest data and avoid the need for another database query.

Using transaction logs for other than database recovery can be problematic because most database vendors do not document the structure of the transaction log or provide an API to access a transaction log. In the commercial embedded database universe, McObject’s eXtremeDB is one exception, and developers using any open source embedded database can discover the log file format.

Another common way to affect Change Data Capture in SQL/relational embedded databases is with triggers. A trigger is an action that is invoked when an INSERT, UPDATE or DELETE is executed. As with iterating over a transaction log, for all but the simplest of triggers, it is going to be vendor-specific because different SQL embedded databases support different trigger languages.

Change Data Capture is useful for another twist on triggers: complex event processing (CEP). Actually, this goes far beyond triggers, and is worthy of an entire article on its own merits. There’s a good read about Kafka here.

Change Data Capture is central to database replication. I’ll use the term replication both in a very general sense, and with a more specific intent. An embedded database system needs to know what to replicate; this is affected through Change Data Capture. There are as many approaches to how to replicate as there are embedded database systems that offer replication. Generally, they fall into three categories: mirroring, log shipping and transactional replication. Mirroring replicates only committed transactions. Log shipping replicates uncommitted and committed transactions. Transactional replication also only replicates committed transactions, and only synchronous transactional replication can support 2-safe high availability. Asynchronous transactional replication can only provide 1-safe. Generally speaking, all these forms of replication are provided by the embedded database vendor, and conveniently replicate between two (or more) instances of their database. Replication between different database systems falls under the general heading of Enterprise Application Integration (EAI), but EAI has a broader scope, e.g. it is a superset of replication.

In summary, Change Data Capture is central to embedded database recovery, event handling, and replication. Each database system can offer no CDC, one type of CDC or several.