By: Tim Smith | Updated: 2016-02-02 | Comments (6) | Related: More > Replication

Problem

This tip provides some quick queries to get started when a replication issue arises that may help minimize the amount of troubleshooting time. Consider that to reduce the noise, with some of these queries you will want to apply a time-based filter (depends) while with others, you may just want the status at any given moment (doesn't matter).

Solution

I prefer scripts over interfaces because I'm seldom at the mercy of the speed of an interface (slow) and I can aggregate the output I receive from scripts, making it easy to identify if there's an issue by collecting messages from different sources and comparing them. In these queries, I may have commented out the *, but you can include all columns as there may be other columns worth considering:

Replication Status Queries

The below query provides me with an overview of what the replication monitor provides and I've updated the status column to reflect what Microsoft provides in their documentation as the meaning for each of the status' numbers. With this query, I am paying close attention to the StatedStatus and the last_distsync columns. This does not mean there is not an error, however.

SELECT publisher , publisher_db , publication , agent_name , last_distsync , CASE WHEN status = 1 THEN 'Started' WHEN status = 2 THEN 'Succeeded' WHEN status = 3 THEN 'In progress' WHEN status = 4 THEN 'Idle' WHEN status = 5 THEN 'Retrying' WHEN status = 6 THEN 'Failed' END AS StatedStates --, * FROM distribution..MSreplication_monitordata

Query to Find Replication Errors

One of the most important tables involved in replication issues and errors is the below table. I've ordered the results of the table here by the most recent time, and you can apply a time filter here if you are only concerned with the last few days, hours, etc. We may find a wide variety of errors, such as "Query Timeout Error", "The row was not found at the subscriber", etc. The trouble here is that replication's status may appear to be fine, while you see here that, for an example, a particular publication is skipping data. I once worked with a trader who told me, "I don't care about having no data, but I do care about having bad data." There is nothing worse than thinking something is functioning when it's actually not functioning. The key with this table is looking at the time and the error_text.

SELECT error_text , [time] ,--* FROM distribution..MSrepl_errors ORDER BY [time] DESC

Query to Read the Error Log for Replication Errors

We can also double verify with the error log about replication; be careful using the below query, as my filter might not apply in your environment, and you may want to avoid using these filter strings:

-- parameter one: 0 means obtain the latest error log -- parameter two: 1 tells it to include the agent -- parameter three: find a string with ... -- parameter four: find a string with what's in parameter three and here EXEC sys.sp_readerrorlog 0,1,'Replication','-'

Status Information from Replication Distribution Agents

While the above queries provide useful information for errors, I also want to be able to determine what's happening, especially if something appears wrong. The below query is helpful in providing information regarding the distribution agents:

SELECT a.name PublicationName , a.publication Publication , ditosu.comments AS MessageText , ditosu.[time] CommandDate , ditosu.xact_seqno xact_seqno FROM MSdistribution_agents a INNER JOIN MSpublications p ON a.publisher_db = p.publisher_db AND a.publication = p.publication INNER JOIN MSdistribution_history ditosu ON ditosu.agent_id = a.id -- Apply a filter here can minimize the noise ORDER BY ditosu.[time] DESC

The key that I look for here are the MessageText, which communicate the historic information for the distribution agents. Depending on the amount of publications, you may also see a lot of messages indicating that nothing is available for replication. A word of caution on this table is that Microsoft notes that this may be subject to change in future releases. For quickly identifying a possible problem, I cannot emphasize good naming convention here enough: with good naming convention, you'll be able to easily apply filters, but poor naming convention will result in noise that may translate into some confusion when something is failing; this will also depend on how many subscribers exist per publication.

PowerShell Script to Compare Article Row Counts

Finally, if I'm interested in a particular publication that may seem delayed, the below PowerShell script will quickly ping the information by reading from the RowCount properties provided in the management objects library:

Function QuickCheck-Replication { Param( [ValidateLength(4,30)][string]$source_server , [ValidateLength(4,30)][string]$source_db , [ValidateLength(4,30)][string]$destination_server , [ValidateLength(4,30)][string]$destination_db , [ValidateLength(4,30)][string]$article ) Process { Function Loop-ReplicationTables { Param( [ValidateScript({Test-Path $_})][string]$smolibrary , [ValidateLength(4,30)][string]$articleserver , [ValidateLength(4,30)][string]$articledatabase , [ValidateLength(5,50)][string]$article ) Process { $nl = [Environment]::NewLine Add-Type -Path $smolibrary $srv = New-Object Microsoft.SqlServer.Management.SMO.Server($articleserver) [int]$reptab_rwcnt = $srv.Databases["$articledatabase"].Tables["$article"].RowCount if ($reptab_rwcnt -ne $null) { return $reptab_rwcnt } } } Function LoopSave-TransReplication { Param( [ValidateLength(4,30)][string]$source_server , [ValidateLength(4,30)][string]$source_db , [ValidateLength(4,30)][string]$destination_server , [ValidateLength(4,30)][string]$destination_db , [ValidateLength(4,30)][string]$article ) Process { ### Note that I've hard-coded the SMO library here for a 2014 Instance ### 2014: 120 ### 2012: 110 ### 2008R2: 100 ### 2008: 100 ### 2005: Upgrade (90) $smo = "C:\Program Files (x86)\Microsoft SQL Server\120\SDK\Assemblies\Microsoft.SqlServer.Smo.dll" [int]$src = Loop-ReplicationTables -smolibrary $smo -articleserver $source_server -articledatabase $source_db -article $article [int]$dest = Loop-ReplicationTables -smolibrary $smo -articleserver $destination_server -articledatabase $destination_db -article $article [int]$smosaveddiff = ($src - $dest) $smo_srcob = "$source_server.$article" $smo_dstob = "$destination_db.$article" Write-Host $smosaveddiff } } LoopSave-TransReplication -source_server $source_server -source_db $source_db -destination_server $destination_server -destination_db $destination_db -article $article } } QuickCheck-Replication -$source_server "OURSRCSRV\IN" -$source_db "OurSrcDb" -$destination_server "OURDBSRV\IN" -$destination_db "OurDestDb" -$article "OurArticle"

Two cautions here: this script assumes that the source and destination article matches - it may not. It also has the SMO library built in for the SQL Server version 2014 (120). If you're using another version, like 2012, then switch the 120 to 110. This script can quickly help me check if the data are flowing, without interrupting the flow. You can find many internet debates about using the properties as opposed to a direct query, however, in testing this approach I find .NET's library to be fast and effective while generally providing an accurate estimate. Provided that someone understand this property versus a direct count, it is helpful.

Summary

These queries provide some useful tools depending on the particular issue, though, unfortunately, these queries cannot solve:

Distribution architecture that is not configured for an OLTP load. Processes that have heavy interference with replication, and do little for end users, or that have superior alternatives that don't interfere with replication.

Sometimes the above two items are the fundamental cause and as the old saying goes, "No one raindrop thinks it caused the flood."

Next Steps

Test the above queries in your replication environment.

With some of these queries, become familiar with the messages, as it can appear noisy at first glance (the fourth one especially).

Check out these other Replication tips

About the author