Introduction to Change Data Capture

The Change Data Capture feature is a new feature in SQL Server 2008 that allows processes to
quickly and easily identify changes to a table in SQL Server 2008. It primarily works by watching the SQL server transaction logs, and running processes that pull changes from the logs and moving them into tables.

Change Data Capture Setup

Enabling Change Data Capture is a two step process. First, you must enable it at the database level, then you must enable it for each table for which changes must be recorded.

The first step in setting up Change Data Capture is to enable the option at the Database Level. The easiest way to do this is to use the system stored procedure sys.sp_cdc_enable_db.
Simply execute this stored procedure from the desired database, and it will enable change data capture for the database. This stored procedure requires sysadmin privilege in order to execute it.

(click to enlarge)














Once it is enabled at the database level, the Capture Change Data option must be enabled for each table for which changes should be captured. Use the sys.sp_cdc_enable_table stored procedure to enable the feature, passing the table name as an argument in order to enable the feature. By default, all of the columns in the source table are identified as captured columns. If only a subset of columns need to be tracked, such as for privacy or performance reasons, use the @captured_column_list parameter to specify the subset of columns.

(click to enlarge)




















Once Change Data Capture (CDC) is enabled for a table, several things occur. First, an associated capture instance is created to support the collection of the change data in the source table. The capture instance consists of a change table and up to two query functions.

All inserts, updates, and deletes to the source table are stored in the change table. The two query functions are specific to the source table, and can be used to fetch changes from the change table.

The first five columns of a change data capture change table are metadata columns. These provide additional information that is relevant to the recorded change. The remaining columns mirror the identified captured columns from the source table in name. These columns hold the captured column data that is gathered from the source table.

Each insert or delete operation that is applied to a source table appears as a single row within the change table. The data columns of the row that results from an insert operation contain the column values after the insert. The data columns of the row that results from a delete operation contain the column values before the delete. An update operation requires one row entry to identify the column values before the update, and a second row entry to identify the column values after the update.

In additon to the Change instance, two SQL Agent Jobs are created: one that is used to populate the database change tables, and one that is responsible for change table cleanup. Both jobs consist of a single step that runs a Transact-SQL command. The Transact-SQL command that is invoked is a change data capture defined stored procedure that implements the logic of the job.

(click to enlarge)






















The jobs are created when the first table of the database is enabled for change data capture. The Cleanup Job is always created. The capture job will only be created if there are no existing transactional publications for the database.

Using Change Capture Data


Change data is made available to change data capture prcoesses via the two functions that are created when the change instance is created. Both functions require Log Sequence Numbers (LSNs) to help define the date range of change data to return in the resultset.


The function cdc.fn_cdc_get_all_changes_ returns all changes that occurred for the specified interval. Entries are always returned sorted, first by the transaction commit LSN of the change, and then by a value that sequences the change within its transaction.

The function cdc.fn_cdc_get_net_changes_ returns one change per modified source table row. If more than one change is logged for the row during the specified time period, the column values will reflect the final contents of the row.

Several functions are provided to help determine appropriate LSN values for use in querying the change data capture functions. For example, the functions sys.fn_cdc_map_time_to_lsn and sys.fn_cdc_map_lsn_to_time are available to help place LSN values on a conventional timeline.

To illustrate, insert a single record into sample table created earlier, then update the columns.




































Now we can use the cdc.fn_cdc_get_all_changes to set how the change table is populated:

















Note that the resulting change table contains four records, with the source table columns populated with the table data surrrounding the change. The first record reflects the initial insert. The 2nd and 3rd records reflect the update (a delete followed by new insert), while the fourth record reflects the final delete.
Note that running the cdc.fn_cdc_get_net_changes returns only a single row, which represents the newest version of any modified row. This procedure will only return one change table record per source record changed, regardless of how many changes were made to the source record.



























0 comments: