The Change Data Capture (CDC) is very critical in today’s data-driven ecosystem. It ensures that changed or modified data is secured in a format that does not impact data history. In the past too, efforts have been made in this regard and the result was solutions like triggers, data audits, timestamps, and complex queries. However, none of them provided fail-safe systems.
An answer to this issue was first provided by Microsoft when it launched its SQL Server Change Data Capture product in 2005. Its advanced technology had “after update”, “after insert”, and “after delete” features but was too complex for DBAs to work with. In 2008, a newer version was introduced that met user requirements of capturing and archiving historical data and changes without help from additional drivers or programs.
The Working of the SQL Server Change Data Capture Technology
The SQL Server Change Data Capturefeature through the SQL Server enables changes like update, insert, and delete which are available to users in a simple relational format. Any input needed for capturing changes to an intended target like metadata or column information is present in the modified and changed rows. These changes are kept in tables that mirror images of the columns of the tracked stored tables. Specific table-valued functions control all access to the modified data.
SQL Server Change Data Captureis ahead of the others in the CDC niche because of several factors. With others, changes made at the source tables in a data warehouse are replicated at the intended target location but only after continuous refreshing of the data, a time-consuming activity. SQL CDC does it differently. Here change data flows smoothly and automatically to the target platforms and databases.
In addition to SQL Server Change Data Capturetrackingupdate, insert, and delete transactions, changes are recorded in mirror tables with the same column structure as the source tables. Separate columns in the following format also track changes made at the source database.
- For every Insert statement, SQL Server writes a record that shows the inserted value
- For every Delete statement, SQL Server writes a record that shows the deleted value
- For every Update statement, SQL Server writes two records. One shows the data before the change and one after the change is made.
Apart from the additional columns, SQL Server Change Data Capturealso has the following data.
- __$start_lsn and __$end_lsn showing the commit log sequence number (LSN) allotted by the SQL Server Engine to the change that is recorded.
- __$seqval showing the type of that change in comparison to other changes in the same transaction and __$operation showing the type of changewhere 1 is delete, 2 is insert, 3 is update (before change), and 4 is update (after change)
- __$update_mask that is a bitmask for defining each captured column and identifying the updating columns
The Operation of the SQL Server Change Data Capture
CDC tracks all modifications and changes made to tables that are then stored in relational tables for quick access and retrieval of data with T-SQL. When Change Data Capture is applied to a database table a mirrored image is created of the tracked table. The structure of the columns of the replicated tables identifies all changes made to the database rows as these tables have additional metadata columns.
The source and the target tables are similar in every respect. Once the SQL Server Change Data Captureprocess is completed, the new audit tables track the logged tables and monitor all activities that have finished.
The origin of the changes is shown by the transaction log of the SQL Server CDC. When any changes like update, insert or delete are found in the source tables being tracked, all their details are entered in the log and these become a component of the Change Data Capture. Detailed descriptions of the changes are contained in the log that can be read and connected to the change table part of the original table.
Forms of SQL Server Change Data Capture
Users can use the SQL Server Change Data Capturein two ways.
Log-based CDC
Here, the system analyzes the transaction log and file of the database and users can find out about any changes made at the source. These changes are then replicated to the target. The advantage here is that the method is very reliable without any possibility of missing out on any changes made. Further, as there is no need for schemas to be changed or new tables to be added there is no effect on the production database system. The only downside is that this method works only with databases that support log-based CDC.
Trigger-based CDC
In this method, triggers are placed in databases that are automatically set off when any changes occur in the source database, thereby reducing to a great extent the cost of extracting the changes. However, there is an increase in system operating costs as the run time of the source system increases because the database has to be refreshed every time a change is made.
Triggered-based SQL Server Change Data Capturehas several benefits. These include faster implementation of changes, finding details of every transaction in the shadow tables, and direct support for specific databases in the SQL API. The downside here is that triggers are often disabled during operations, thereby adversely affecting the performance of the databases. This usually happens when rewrites become necessary when changes are being made to the rows.
In short, SQL CDC is a big help for data-driven organizations.