Discussing Replication Lag in MongoDB

In the previous article we have read about deployment of replica sets in mongodb. In the deployment process there is some complications that we did not discuss in the previous article. In this article we will get acknowledged with those complications and try to make them less complicated.
So, in this article we are going to talk about replication lag. By the name everybody can understand we are going to discuss the delay that occurs in replica set. Then let us get into the topic:

What is replication lag in mongodb?

As per mongodb.org replication lag in mongodb is a delay between an operation on the primary and the application of that operation from the oplog to the secondary. Replication lag can be a significant issue and can seriously affect mongodb replica set deployments.
In a perfectly running replica set, all secondaries closely follow changes on the primary, fetching each group of operations from its oplog and replaying them approximately as fast as they occur. however, secondaries may fall behind. Sometimes elevated replication lag is transient and will remedy itself without intervention. Other times, replication lag remains high or continues to rise, indicating a systemic problem that needs to be addressed.
Now we know what replication lag is. But why it is so problematic, that we have to deal with to get good performance.

Why replication lag poses problems?

  • If replica set fails over to a secondary that is significantly behind the primary, a lot of un-replicated data may be on the original primary that will need to be manually reconciled.
  • If the failed primary cannot be recovered quickly,we may be forced to run on a node whose data is not up-to-date, or forced to take down database altogether until the primary can be recovered.
  • If there is only one secondary, and it falls farther behind than the earliest history retained in the primary’s oplog, the secondary will require a full resynchronization from the primary.
  • During the resync, cluster will lack the redundancy of a valid secondary; the cluster will not return to high availability until the entire data set is copied.
  • If backups are only taken from your, backups must be suspended for the duration of the resync.
  • Replication lag makes it more likely that results of any read operations distributed across secondaries will be inconsistent.

So, now we know that replication lag can poses very serious problems for the developer, which are recoverable. Although in some cases the data may not be recovered as well. Let us see the causes for this phenomena.

Possible causes for replication lag:

Network Latency
Check the network routes between the members of your set to ensure that there is no packet loss or network routing issue.
Disk Throughput
If the file system and disk device on the secondary is unable to flush data to disk as quickly as the primary, then the secondary will have difficulty keeping state.
Concurrency
In some cases, long-running operations on the primary can block replication on secondaries. For best results, configure write concern to require confirmation of replication to secondaries. Use the database profiler to see if there are slow queries or long-running operations that correspond to the incidences of lag.
 Write Concern
Performing a large data ingestion or bulk load operation that requires a large number of writes to the primary. The secondaries will not be able to read the oplog fast enough to keep up with changes.
Map/reduce output


db.coll.mapReduce( ... { out: other_coll ... })

From the point of view of the oplog, the entire output collection basically materializes at once, from which point the replication to the secondary plays out as above.
Index
If there is an index built in the background on the primary, it will be built in the foreground on each secondary. Therefore, whenever a secondary builds an index, it will block all other operations, including replication, for the duration. If the index builds quickly, this may not be a problem; but long-running index builds can end up to creating replication lag.
Secondary is locked for backup
One of the suggested methods for backing up data in a replica set involves explicitly locking a secondary against changes while the backup is taken. Assuming the primary is still conducting business as usual, of course replication lag will climb until the backup is complete and the lock is released.
Secondary is offline
If the secondary is not running. It cannot make progress against the replication backlog. When it rejoins the replica set, the replication lag will naturally reflect the time spent away.
Although we have the causes for replication lag, we still don’t have a perfect solution. But there is a process by which we can determine the replication lag length. This could be useful for further progress.

How to check the replication lag length?

  • In a mongo shell connected to the primary, call the rs.printSlaveReplicationInfo() method. Returns the syncedTo value for each member, which shows the time when the last oplog entry was written to the secondary. A delayed member may show as 0 seconds behind the primary when the inactivity period on the primary is greater than the slaveDelay value.
  • We can also get the length from replica graph in mongodb management services.

Although we have discussed about the replication lag in the above, we just have the means to check out the length and go for minimum lag related quarries. But due to the law of physics we can not remove the replication lag permanently. We can only try to figure out new efficient methods to minimize the replication lag.

Related Links:

1> How to Get Started with MongoDB Database?
2> How to Get Started with MongoDB?
3> How to Import and Export Through Mongodb?
4> How to Use Projection in MongoDB?
5> Using sort method in mongodb
6> Map-Reduce in MongoDB
7> Introduction to Replication in MongoDB
8> Deploying a Replica Set in MongoDB
9> Replica Set Members in Mongodb
10> Working with Sharding in MongoDB
11> Working with Index in MongoDB
12> Working with Aggregation in MongoDB
13> How to Work with Aggregation Framework in MongoDB?
14> Working with Pipeline Concept in MongoDB
15> Discussing about Pipeline Expression in MongoDB

If you find this article helpful, you can connect us in Google+ and Twitter.

1 thought on “Discussing Replication Lag in MongoDB

Leave a Reply

Your email address will not be published. Required fields are marked *