SCCM Replication Issue – SQL Replication Troubleshooting Guide

SCCM Database replication issues are common when you have an SCCM hierarchy with CAS, Primary, or Secondary servers. Let’s check what the necessary troubleshooting steps an SCCM admin can perform are. SCCM replication issue is not very easy to troubleshoot via forums or offline.  The above SCCM Replication Issue troubleshooting video will help you fix some of the common SCCM replication issues.

SCCM SQL Replication?

SCCM SQL replication? Or is it SCCM SQL based replication? Yes, Umair mentioned in his post that it’s NOT SQL replication instead it’s SQL Based replication. Yes, yes I agree it’s not SQL replication, it’s Data Replication Service (DRS) introduced in SCCM 2012.  But why am I using SQL replication again? Because most of the SCCM admins don’t care about this is SQL replication or SQL based replication. But it’s just SQL replication for them (including me) 🙂

I can’t write or talk about SQL Service Broker (SSB) and Change Tracking along with Bulk copy program (BCP) because I don’t know anything about those SQL technologies. Let’s dive into some scenarios of SCCM Database Replication issues. I would recommend reading Umair blog (mentioned above) to get more details about the SCCM SQL replication or SCCM replication.

SCCM Replication Issue?

SCCM replication issue is critical most of the time because that can put SCCM infrastructure read-only mode. Think about a scenario, you can just view the objects in the console, but you can’t take any action. What will you do?

When you hit SCCM replication issue, you will have similar scenarios in child primary server. SCCM CAS server will be in maintenance mode, and it will be waiting for the primary server to send data.  This type of SCCM SQL replication issues can cause during SCCM in place upgrade scenarios as well.

SCCM Replication Groups

You can get more details about SCCM replication groups from my previous post “List of all Replication Groups and Article Names.” I have another post which talks about SCCM SQL based replication in details. I would recommend reading that to know more about replication groups and article names with examples – SCCM SQL Based Replication Guide.

What is SCCM Replication Link Analyzer (RLA)

The SCCM Replication Link Analyzer (RLA) is the first tool you should try to resolve or fix SCCM replication issue. You can launch SCCM replication link analyzer from:

  1. SCCM console Monitoring workspace
  2. Click the Database Replication Node
  3. Right-click the link that is having a problem
  4.  Select Replication Link Analyzer
  5. It may ask for username and password to connect to destination server if your user doesn’t have access to the destination server database
  6. Replication Link analyzer will check all the pre-configured checks and confirm whether everything is ok or not
  7. SCCM replication link analyzer will resolve the SCCM replication issue by  itself and provide you handful report
  8.  If the replication link analyzer didn’t help you to resolve to proceed with the next step in troubleshooting

SQL Management Studio – SCCM Replication Troubleshooting

You can use SQL management studio to perform next level of troubleshooting. SQL management studio will help you to understand the SQL backlog issues. It also helps you to understand the SCCM replication status of the server in your environment.

  1. CAS is in maintenance mode
  2. CAS is an Active mode
  3. Primary is in maintenance mode
  4. Primary is in Active mode

You can run “SPDiagDRS” from the SQL management studio to get more details about the SCCM replication issue. I have posted about SCCM SQL backlog issues “Troubleshoot SCCM SQL Backlog Issue.” I would recommend reading the previously mentioned post to get more helpful details. This post is applicable to SCCM CB versions (1802, 1806, or 1810) as well.

Monitoring Workspace – Database Replication Node

You can get more details about the replication groups or data which are failing to replication from SCCM console database replication node. The following views in console give more information about the flow between each replication group.

  1. Detailed view of the “Initialization Detail” in the console
  2. Detailed view of “Replication Detail” in the console

SCCM Replication Issue Troubleshooting with Logs

The following logs are using in SCCM replication issue.

  1. Rcmctrl.log – Records the activities of SQL database replication between SCCM sites in the hierarchy
  2. Sender.log – Records the activities in case of manual sync of replication groups. This manual replication can be done as part of troubleshooting with .PUB files.

At the time of SCCM 2007, If you need to perform manual sync between the CAS and Primary server, SCCM admin used .SHA files. I would recommend reading Sudheesh blog about SCCM .PUB file based manual SQL based replication. More details –


Microsoft Documentation on SCCM DRS Troubleshooting –

SCCM DRS Process documentation  – Database Replication Links

More Details about SSB – SQL Team Article

3 thoughts on “SCCM Replication Issue – SQL Replication Troubleshooting Guide”

  1. Hi Anoop
    As u mentioned above topics like sccm is going to read only mode. Second one is SCCM CAS server will be in maintenance mode, and it will be waiting for the primary server to send data. help me in the both cases. I am facing these issues .. It’s my request to you make a blog particularly on these issues.

    1. Thanks for your reply. I had go through your blog related to this issue. But what I am asking is if the there’s link failed issue then why it’s going to read only mode… I want to know the root cause about this particular issue.. and what could be the solution. There are only 3 link status like link active, degraded and failed.. but I have seen the link is being unknown.
      In this blog also there are some reason behind this issue but not specifically. And most importantly there’s no particular solutions for this issue…

      [Looking at the ReplicationLinkAnalysis.XML file placed on the desktop we have more info about the testing done by the replication link analyzer including where it failed, namely during the DoesBrokerConfigurationExist on P01.

      Now we know our issue is related to the SQL Broker Service, so we should check out the error log for SQL to see what it says, that error log is located at D:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\ErrorLog and that revealed SQL error 3961 which is:

      Snapshot isolation transaction failed in database ‘%.*ls’ because the object accessed by the statement has been modified by a DDL statement in another concurrent transaction since the start of this transaction. It is disallowed because the metadata is not versioned.

      Ok at this point it looks like the SQL database on P01 is confused, probably due to server load at the time of booting up the server, and due to the fact that the SQL server itself was sitting on the same PID since the 4th of July along with more than two weeks of the virtual machine being in a saved state. I could wait some more or try restarting the SMS Executive service or a reboot of the server. As this is my LAB I had enough of the troubleshooting and did a reboot.

      After the reboot (of P01) I ran the Replication Link Analyzer again (on CAS) and this time the SQL Broker Service on P01 was working however the site was still replicating, a few minutes later in rcmctrl.log I could see that the Current Site Status had finally changed to ReplicationActive.]
      Rebooting may be a part of
      every solutions but there should be more way to solve this.

Leave a Comment

Your email address will not be published. Required fields are marked *