Tes Engineering Blog

Musings of the Tes Engineering Team

BlogHow we workMeet the TeamOpen rolesWork with us

January 20, 2020

Restoring Mongo Data Using the Oplog

by Mike Ryan

Everyone who works with data worries about accidental deletion and should have a good understanding of recovery strategies, ideally before an incident occurs. If you're using Mongo, you can restore data to an arbitrary point in time using the oplog, provided you are using replication and have a recent backup.

This blog post will walk you through a simple data deletion and recovery scenario using this method, and should be accessible even if you have limited experience with Mongo.

Background

When using replication, the primary Mongo instance receives all write commands. It updates its data, and logs the changes to the oplog. Secondary instances read from the oplog and perform the operations against their own data to bring it to the same state as the primary instance. Mongo's utility for restoring data from a backup, mongorestore, is able to replay the oplog to bring the data in a backup of a Mongo instance to its precise state right before a specific time, a process sometimes referred to as 'point in time recovery'. It can do this so long as the oplog covers the entire time between the creation of the backup and the time of the desired state.

Tutorial

For this tutorial, we'll first get an instance of Mongo set up that records an oplog. We'll then add and delete some data, taking backups of the data and the oplog along the way. We'll then restore the backup to a fresh Mongo instance and replay the oplog. We'll see that the oplog replay did everything it was supposed to do and nothing it wasn't supposed to do, and that the resulting data is in the same state as the original data right before the deletion.

I've tried to explain key concepts for each step so the work you do here either builds or reinforces your understanding of Mongo.

Engineers who didn't have much experience with Mongo and went through the installation process reported that this tutorial took around an hour.

You don't need to replay the oplog against a backup restored to a new Mongo instance, and you could skip steps 4, 10 and 11 below and replay the oplog against the original instance. There are good reasons to restore to a temporary, non-public instance first rather than the live one when doing this type of thing in a production scenario, so I thought it would be better to do the tutorial around a restored backup.

This tutorial is written for people who don't yet have Mongo installed. If you already have Mongo installed, you may see a way to adapt the tutorial for the specifics of your setup. You should also be able to go through it by setting up a second Mongo instance. This guide explains how to get two instances running simultaneously. If you do that, you will either need to pass port parameters to mongodump and mongorestore if you're running the new instance on a nonstandard port, or stop your existing instance and run the new instance on the standard port (27017).

1. Set up Mongo

The first step is to get a Mongo instance running that records an oplog. I used this reference to install and start the macOS version. The page also contains instructions for installations on Linux and Windows.

The oplog serves as a reference for replicas, and without replication enabled Mongo will not record it. Replication is not enabled by default, so we need to configure the instance we are running to use it. You will need to modify your mongod.conf file, whose location depends on your installation. (On my Mac, the location was /usr/local/etc/mongod.conf.) Add these lines to the end:

replication:
  replSetName: rs0

This configures our instance to run in a replica set named 'rs0'. (See here for Mongo's documentation on this configuration option.)

Next, start mongod, ensuring that it uses the mongod.conf file. The specifics of this will depend on your installation, but on my Mac the command was:

$ mongod --config /usr/local/etc/mongod.conf --fork

The presence of replication configuration causes our instance to start in an unknown state. Many commands require the instance to have an established state, and attempts to show dbs, read or write will fail until our instance's state is established.

To do this, open the Mongo shell by typing 'mongo' and run this command:

rs.initiate({ _id: 'rs0', members: [{ _id: 1, host: '127.0.0.1:27017' }] })

This initiates our replica set with our instance as its only member. (See here for Mongo's documentation on rs.initiate). Being the only member, it will be the primary and be able to perform all operations on data.

(If you see a prompt like rs0:SECONDARY> in your shell after running this command, that makes it seem like the instance is a secondary. This should only be momentary, and if you hit 'enter' you should see a rs0:PRIMARY> prompt confirming it is the primary.)

Our instance should now be fully operational, and should be recording an oplog. To verify this, type these commands from the Mongo shell:

use local

show collections

You should see an oplog.rs collection.

2. Add and update some data

Our next step will be to perform some data operations so that our backup is of something substantial.

Run these commands from the Mongo shell:

use pets

db.cats.insertMany([
  { "name": "Moonshine", "toys": [] },
  { "name": "Marbles", "toys": [] },
  { "name": "Tibby", "toys": [] },
  { "name": "Dusty", "toys": [] }
])

db.cats.updateOne({ name: 'Marbles' }, { $addToSet: { toys: 'string' } })
db.cats.updateOne({ name: 'Tibby' }, { $addToSet: { toys: 'tin foil' } })

To verify the data, run this from the Mongo shell:

db.cats.find().pretty()

3. Look at the oplog

To take a look at in the oplog, run these commands from the Mongo shell:

use local

db.oplog.rs.find().pretty()

This will show all the operations that have occurred on the data. You should see some entries that clearly correspond to insertions and updates to the data, as well as a lot of operations performed by the system.

The operations on our data set will appear at or near the very end of the oplog, and will have a ns (namespace) property of 'pets.cats'. You can use the query db.oplog.rs.find({ ns: 'pets.cats' }).pretty() to limit the results to operations that just affected our data, or db.oplog.rs.find({ ns: { $ne: 'pets.cats' } }).pretty() to find the system operations.

This was my insertion record for Marbles:

{
	"ts" : Timestamp(1578003542, 3),
	"t" : NumberLong(1),
	"h" : NumberLong(0),
	"v" : 2,
	"op" : "i",
	"ns" : "pets.cats",
	"ui" : UUID("dc517859-1fe2-4e69-9726-173cbf50dd50"),
	"wall" : ISODate("2020-01-02T22:19:02.555Z"),
	"o" : {
		"_id" : ObjectId("5e0e6c5697a8f9714e80ff7f"),
		"name" : "Marbles",
		"toys" : [ ]
	}
}

This was my update operation:

{
	"ts" : Timestamp(1578003550, 1),
	"t" : NumberLong(1),
	"h" : NumberLong(0),
	"v" : 2,
	"op" : "u",
	"ns" : "pets.cats",
	"ui" : UUID("dc517859-1fe2-4e69-9726-173cbf50dd50"),
	"o2" : {
		"_id" : ObjectId("5e0e6c5697a8f9714e80ff7f")
	},
	"wall" : ISODate("2020-01-02T22:19:10.750Z"),
	"o" : {
		"$v" : 1,
		"$set" : {
			"toys" : [
				"string"
			]
		}
	}
}

Note that o._id of the insertion entry is the same ObjectId found in o2._id of the update entry. This gives some insight into the way documents are referenced in the oplog and the different structures of oplog entries for different operations. For more information on the oplog, see this post on our blog by Charlie Harris.

4. Back up the data

We will use mongodump to back up data. We will do this twice: once to back up all the data in its current state and again to back up the oplog in a future state. There will also be two mongorestores: one to bring the data back to its state at the time of the backup, and one that replays the oplog up to the point of the deletion.

I recommend using separate directories for the mongodumps, because if we don't, a mongorestore can use the data from the non-corresponding mongodump in an unexpected way.

To back up all the data run the following commands from a system prompt in an appropriate directory. You can do this in a new terminal window, or type 'exit' to exit the Mongo shell.

$ mkdir backup
$ cd backup
$ mongodump

This should have created a dump directory. The dump directory contains a subdirectory for each database that was backed up, and each of these contains a subdirectory for each of the database's collections.

I recommend navigating out of the backup directory here.

5. Update some more data

These changes will prove that the oplog replay includes changes to the data that occurred after the backup was taken, because they will appear in our restored data.

To add these changes, run these commands from the Mongo shell:

use pets

db.cats.updateOne({ name: 'Moonshine' }, { $addToSet: { toys: 'ball of yarn' } })
db.cats.updateOne({ name: 'Dusty' }, { $addToSet: { toys: 'catnip mouse' } })

Verify that we have four cats, each with one toy by running this from the shell:

db.cats.find().pretty()

We want to recover the data as it is right before the deletion we do in the next step, so we will be restoring the data to the state it is in right now.

6. Delete some data

From the Mongo shell, run this command:

db.cats.deleteMany({ name: { $in: ['Tibby', 'Marbles'] } })

Verify that we only have two cats, Moonshine and Dusty, each with one toy by running this command:

db.cats.find().pretty()

7. Update some more data

These changes will prove that the oplog replay does not include any operations occurring after the timestamp we set as the limit, because they will not appear in our restored data.

Run these commands from the Mongo shell:

db.cats.updateOne({ name: 'Moonshine' }, { $addToSet: { toys: 'plastic spring' } })
db.cats.updateOne({ name: 'Dusty' }, { $addToSet: { toys: 'plastic potato' } })
db.cats.updateOne({ name: 'Moonshine' }, { $addToSet: { toys: 'feather bird teaser' } })
db.cats.updateOne({ name: 'Dusty' }, { $addToSet: { toys: 'laser pointer' } })

Run this command to verify that you have just two cats, Moonshine and Dusty, and that each has three toys:

db.cats.find().pretty()

8. Find the critical timestamp

This will be the timestamp of the first deletion operation. From the Mongo shell, run these commands:

use local

db.oplog.rs.find({ op: 'd', ns: 'pets.cats' }).sort({ ts: 1 }).pretty()

In this query, op: 'd' specifies the deletion operation, and ns: 'pets.cats' specifies the 'pets.cats' namespace, which is the collection in which the deletions occurred. In a production scenario, the Mongo instance is likely to have multiple collections, so it would probably be necessary to match on ns to get just the deletions we were interested in. If there were a mix of intentional and unintentional deletion operations on the data in the collection, we could also match on wall (a date field) to narrow things down to the unintentional ones we wanted to undo with the oplog replay.

The query finds the deletion entries, which illustrate the way the ts field is recorded. I had these values: Timestamp(1578003846, 1) and Timestamp(1578003846, 2).

The first number in a Timestamp gives a Unix time in seconds, and the second number gives the order of the operation within that second. We performed a single command to delete the two documents which was recorded as two different operations in the oplog that occurred in quick succession, within the same second of Unix time. The 1 and 2 indicate which operation occurred first and second.

Our critical timestamp is the first timestamp, Timestamp(1578003846, 1). We will need to provide both the 1578003846 and 1 when replaying the oplog, and it is a good idea to record this information in a notepad.

The oplog replay will include all operations occurring up to the critical timestamp, but will not include the operation with the critical timestamp.

9. Back up the oplog

Run these commands from a system prompt, double checking that you are not in the backup directory we created earlier:

$ mkdir oplogBackup
$ cd oplogBackup
$ mongodump -d=local -c=oplog.rs

To break this down, the mongodump command dumps only the oplog.rs collection of the local database, because we need only the oplog from the Mongo instance in its current state to restore our data.

This should create a dump directory, containing a local subdirectory that contains a file called oplog.rs.bson and a metadata file.

I recommend backing out of the oplogReplay directory at this point.

10. Get a fresh Mongo instance

This is an admittedly hacky way of getting a fresh Mongo instance, but it's quick and will be fine unless you have Mongo data unrelated to this tutorial that you want to keep.

First, delete and recreate the directory specified by the storage.dbPath property in your mongod.conf file. For me, this was the usr/local/var/mongodb directory. From the containing directory:

$ rm -rf mongodb
$ mkdir mongodb

This will stop the mongod process, and any Mongo shells you have open will no longer be able to execute commands.

Next, start mongod again the same way you did before. On my Mac, this was:

mongod --config /usr/local/etc/mongod.conf --fork

Finally, initiate the replica set from the Mongo shell as before:

rs.initiate({ _id: 'rs0', members: [{ _id: 1, host: '127.0.0.1:27017' }]})

If you run show dbs from the Mongo shell, you will see that we don't have a pets database and that we are indeed starting fresh.

11. Restore the backup

From the top level of the backup directory we created earlier, run mongorestore. You should see confirmation that our 4 documents were restored successfully.

You can confirm that our data is in the expected state by running these commands from the Mongo shell:

use pets

db.cats.find().pretty()

You should see all 4 documents, and that only Marbles and Tibby have toys because we took the backup before adding toys for Moonshine and Dusty.

12. Replay the oplog

Navigate to the top level of the oplogBackup directory. Earlier, you noted the critical timestamp for the oplog replay. You will need to run a command like this:

$ mongorestore --oplogReplay --oplogLimit=PART1:PART2

where 'PART1' and 'PART2' correspond to the two parts of the critical timestamp.

The command I ran based on my critical timestamp was:

$ mongorestore --oplogReplay --oplogLimit=1578003846:1

From the Mongo shell, run this command to verify that we have four cats, each with one toy, exactly as we did at the end of step 5 above, right before the deletion occurred.

db.cats.find().pretty()

Conclusion

Hopefully you now have a solid understanding of how to use mongorestore to perform a point in time recovery of data by replaying the oplog. We've used this method to recover live data, and I personally felt much more confident with this technique after going through a minimal example like this where the data was simple enough to easily verify that things were working as expected at each step.

Please leave a comment if you see something incorrect or have an idea about how the tutorial could be simplified. Also, if you didn't follow the link back in step 3, now might be a good time to check out Charlie Harris's blog entry on debugging with the oplog.

© Tes Engineering Team2020| All rights reserved
Follow @tes_engineering