Problem





Solution





There might be more than one Availability Group in a Always On setup and each group might have primary role on different nodes.

When enabling CDC on a database, jobs are only created on the currently active node. This means that if AG fails to another node we won't have the capture and cleanup jobs.

When failing back to a node that already had the cdc enabled databases in the past, the jobs will be there but in a failed state and they have to be started again.

Capture job runs continuously and has to be started every time a failover happens.

Cleanup job runs on schedule, we just have to make sure it is created.

Keeping all the above in mind, I've created a stored procedure that needs to be scheduled on each node that is a part of our Always On setup:





CREATE PROCEDURE sp_cdc_hadr_watchdog AS BEGIN SET NOCOUNT ON DECLARE @database_name NVARCHAR ( 255 ) DECLARE @ SQL NVARCHAR (MAX) DECLARE cdc_watchdog_dbs CURSOR FAST_FORWARD READ_ONLY FOR SELECT D.name FROM sys.dm_hadr_availability_replica_states AS A JOIN sys.availability_replicas AS B ON B.replica_id = A.replica_id JOIN sys.availability_groups AS AG ON AG.group_id = A.group_id JOIN sys.availability_databases_cluster AS ADC ON ADC.group_id = A.group_id JOIN sys. databases AS D ON D.name = ADC.database_name AND D.is_cdc_enabled = 1 WHERE B.replica_server_name = @@SERVERNAME AND A.role_desc = 'PRIMARY' OPEN cdc_watchdog_dbs FETCH NEXT FROM cdc_watchdog_dbs INTO @database_name WHILE @@FETCH_STATUS = 0 BEGIN SET @ SQL = ' USE [' +@database_name+ ']; IF OBJECT_ID(''tempdb..#xp_results'') IS NOT NULL EXEC (''DROP TABLE #xp_results'') IF OBJECT_ID(''tempdb..#CDCJobs'') IS NOT NULL EXEC (''DROP TABLE #CDCJobs'') CREATE TABLE #CDCJobs(job_id UNIQUEIDENTIFIER , job_type CHAR(8) , job_name NVARCHAR(255) , maxtrans SMALLINT , maxscans SMALLINT , continuous BIT , pollinginterval SMALLINT , retention SMALLINT , threshold SMALLINT) INSERT INTO #CDCJobs EXEC sys.sp_cdc_help_jobs DECLARE @job_id_capture UNIQUEIDENTIFIER DECLARE @job_owner_capture sysname DECLARE @job_id_cleanup UNIQUEIDENTIFIER DECLARE @is_running INT SELECT @job_owner_capture = SP.name, @job_id_capture = S.job_id FROM #CDCJobs AS CJ JOIN msdb.dbo.sysjobs AS S ON s.name = CJ.job_name JOIN sys.server_principals AS SP ON SP.sid = S.owner_sid WHERE CJ.job_type = ''capture'' SELECT @job_id_cleanup = CJ.job_id FROM #CDCJobs AS CJ WHERE CJ.job_type = ''cleanup'' IF @job_id_capture IS NULL EXEC sys.sp_cdc_add_job ''capture'' IF @job_id_cleanup IS NULL EXEC sys.sp_cdc_add_job ''cleanup'' IF @job_id_capture IS NOT NULL BEGIN CREATE TABLE #xp_results (job_id UNIQUEIDENTIFIER NOT NULL, last_run_date INT NOT NULL, last_run_time INT NOT NULL, next_run_date INT NOT NULL, next_run_time INT NOT NULL, next_run_schedule_id INT NOT NULL, requested_to_run INT NOT NULL, request_source INT NOT NULL, request_source_id sysname COLLATE database_default NULL, running INT NOT NULL, current_step INT NOT NULL, current_retry_attempt INT NOT NULL, job_state INT NOT NULL) INSERT INTO #xp_results EXEC master.dbo.xp_sqlagent_enum_jobs 0, @job_owner_capture, @job_id_capture SELECT @is_running = XR.running FROM #xp_results AS XR IF @is_running = 1 RETURN ELSE EXEC msdb.dbo.sp_start_job @job_id = @job_id_capture END' EXEC (@ SQL ) FETCH NEXT FROM cdc_watchdog_dbs INTO @database_name END CLOSE cdc_watchdog_dbs DEALLOCATE cdc_watchdog_dbs



It loops through all the databases that have cdc enabled and are a part of Availability Group that has "PRIAMRY" role. For each of those databases it gets the job information from sys.sp_cdc_help_jobs procedure to find out if the jobs is already created. If neither of jobs exist, they will be created by sys. sp_cdc_add_job procedure. Once made sure that capture job exists, procedure needs to check it's current state, to do that I've used dbo.xp_sqlagent_enum_jobs stored procedure - keep in mind that this extended procedure is undocumented and unsupported but was the best way I know to find the current state of a job. If the capture job was found to be running we simply do a RETURN and nothing happens, however if it was not dbo.sp_start_job is used to start it.





Having Change Data Capture enabled on databases that are part of an Availability Group, each time a failover happens (automatically or manually), we have to remember to make sure that the CDC capture and cleanup jobs are created and the capture job is started accordingly on the primary replica. Let's be honest, we are not going to remember (nor we want to) that after doing a system update and doing multiple failovers or after automatic failover we have some manual work to do. If we do forget about it, we most likely going to end up wtih some data loss in our CDC tables and a gap in anything that might rely on that data.Lets first point out few things that we have to keep in mind while dealing with the problem that we are presented: