By Thomas Rueckstiess, Kernel Program Manager at MongoDB

NOTE: Mtools is a popular tool developed by MongoDB engineers, but, as noted on its GitHub repository, it is not a product of MongoDB or supported as such. Bug reports, feature requests and all other questions can be posted in the Issues section of the mtools GitHub repository.

mtools is a collection of helper scripts, implemented in Python, to parse and filter MongoDB log files (both for mongod and mongos), to visualize information from log files and to quickly set up complex MongoDB test environments on a local machine.

I started working on mtools a year ago, when I realized I would automate and script most of my daily tasks as an Engineer at MongoDB. Since then, mtools has grown to a suite of flexible, useful command line tools that are being used by many of our Engineers internally, as well as MongoDB customers and users, to diagnose the root cause of system issues.

If you find yourself looking at MongoDB log files to identify system and performance issues, then I encourage you to try mtools as well.

What’s in the box?

mtools in its current version 1.1.4 consists of 5 individual scripts: mloginfo, mlogfilter, mplotqueries, mlogvis and mlaunch.

mloginfo should be your first stop on the log file analysis. This script will parse the file quickly and output general information about its contents, including start and end date and time, line numbers, version and whether the file came from a mongos or mongod (if available in the file). In addition, you can request certain “sections” of additional information; currently those are “queries”, “connections”, “restarts” and “distinct”.

mlogfilter helps to narrow down the search in log files. The script lets you filter on attributes of log messages, like their namespace (database and collection names), their type of operation (queries, inserts, updates, commands, etc.) or by individual connection. You can also search for slow operations by setting a threshold, identify collection scans (those are the queries not using an index) and other properties. Additional features include slicing the log files by time (with flexible date/time parsing), merging files, shifting them to different time zones or converting timestamp formats, and exporting them to JSON. The key property of mlogfilter is that the output format always remains the same (log lines), so you can pipe the output to another instance of mlogfilter, to the grep command or to other scripts like mplotqueries.

mplotqueries takes a log file (mlogfiltered or not) and presents the information visually in various ways. There are a number of options for graph types, such as scatter plots (showing all operations over time vs. their duration), histograms, event and range plots, and other more specialized graphs like connection churn or replica set changes. Independent of graph type, you can assign a specific color to different class categories.

mlogvis is mplotqueries’ little brother, it is very similar in its functionality, but provides a web-based alternative using the d3.js javascript visualization library. This is particularly useful if the dependencies required by mplotqueries are not installed/available, or if you want to create a self-contained interactive graph that can be shared with others, such as customers or colleagues. mlogvis will create a single .html file that can be shared, since it loads the d3.js library dynamically.

mlaunch is a little different from the other scripts, and actually has nothing to do with log file parsing. mlaunch spins up any number of mongodb nodes on your local machine, either as a stand-alone, as replica sets or sharded clusters. This is useful if you want to do testing or reproduce issues locally. Rather than setting this up manually, mlaunch will start the processes and connect the replica sets or shards together. Within a few seconds, you can have a complex environment running, like a 5 shard cluster, each shard consisting of a replica set, authentication enabled, and any kinds of individual flags you want to pass onto the processes. mlaunch also has options to start and stop individual instances or groups, and to view which ones are running in the current environment and which ones are down.

How does it work?

Rather than going through all the features of each of the scripts, I’d just like to demonstrate two basic use cases. For a full list of features you can visit the mtools wiki, which contains the manual and many usage examples.

Use Case 1: Profiling your Queries with mloginfo

You have a number of slow queries running against MongoDB that are affecting the performance of the database. To get an idea of where MongoDB is slowing down, as a first step take a look at the “queries” section of mloginfo. Here is an example output, created with the following command:

mloginfo mongod.log --queries <p>QUERIES</p> <p>namespace pattern count min (ms) max (ms) mean (ms) 95%-ile (ms) sum (ms)</p> <p>serverside.scrum_master {"datetime_used": {"$ne": 1}} 20 15753 17083 16434 17039.3 328692 serverside.django_session {"_id": 1} 562 101 1512 317 842.6 178168 serverside.user {"_types": 1, "emails.email": 1} 804 101 1262 201 684.85 162311 local.slaves {"_id": 1, "host": 1, "ns": 1} 131 101 1048 310 819.5 40738 serverside.email_alerts {"_types": 1, "email": 1, "pp_user_id": 1} 13 153 11639 2465 8865.2 32053 serverside.sign_up {"_id": 1} 77 103 843 269 551.0 20761 serverside.user_credits {"_id": 1} 6 204 900 369 763.75 2218 serverside.counters {"_id": 1, "_types": 1} 8 121 500 263 470.6 2111 serverside.auth_sessions {"session_key": 1} 7 111 684 277 645.6 1940 serverside.credit_card {"_id": 1} 5 145 764 368 705.0 1840 serverside.email_alerts {"_types": 1, "request_code": 1} 6 143 459 277 415.0 1663

Each line (from left to right) shows the namespace, the query pattern, and various statistics of this particular namespace/pattern combination. The rows are sorted by the “sum” column, descending. Sorting by sum is a good way to see where the database spent most of its time overall. In this example, we see that around half the total time is spent on $ne -type queries in the serverside.scrum_master collection. $ne queries are known to be inefficient since these queries cannot use an index, resulting in a high number of documents scanned. In fact, all of the queries took at least 15 seconds (“min” column). The “count” column also shows that only 20 of the queries were issued, yet these queries contributed to a large amount of the total time spent, more than double the time of the 804 email queries on serverside.user .

When optimizing queries and indexes, starting from the top of this list is a good idea as these optimizations will result in the highest gains in terms of performance.

Use Case 2: Visualizing Log Files with mplotqueries

Another way of looking at query performance and other operations is to visualize them graphically. mplotqueries’ scatter plot (the default) shows the duration of any operation (y-axis) over time (x-axis) and makes it easy to spot long-running operations. The following plot is generated with

mplotqueries mongod.log

and then press L for “logarithmic” y-axis view:

While most of the operations are sub-second (below the 10^3 ms mark), the blue dots immediately stand out, reaching up to the hundreds and thousands of seconds. Clicking on one of the blue dots prints out the relevant log line to stdout:

Sat Apr 5 22:29:54 [conn99] command serverside.$cmd command: { getlasterror: 1, w: "majority" } ntoreturn:1 keyUpdates:0 reslen:112 1006370ms

The getlasterror command is used for write concern. In this case, it blocked until the write was replicated to a majority of nodes in the replica set, which took 16 minutes. That is of course an issue, and because this is a command and not a query (or the query part of an update), it didn’t show up in the previous use case with mloginfo --queries .

To investigate this further, we can overlay the current plot with an “rsstate” plot, that shows replica set status changes over time. The following two commands create an overlay of the two plots:

grep "majority" mongod.log | mplotqueries --overlay mplotqueries mongod.log --type rsstate

This shows that for each of the blocking “majority” getlasterrors , replica set members are unavailable. The red vertical lines represent a node being DOWN, preceding the yellow lines for a node being in SECONDARY state again, at which point the getlasterror commands finally succeed.

From here, the next step would be to look at all the log files of the replica set at one of the incidents and investigate why the secondaries became unavailable:

mlogfilter mongod.log mongod-sec.log mongod-arb.log --from Apr 5 20:15 --to +5min

This last command merges the log files of the three replica set members by time, each line prefixed with the filename, slices out a 5-minute window at the first instance of the issue and prints the lines back to stdout.

What’s Next?

This should give you a sense of how to use mtools for diagnosing and debugging issues affecting your MongoDB system. You can organize and visualize data in a number of ways, form a hypothesis, filter out noise and dig deeper into issues affecting your deployment, all from MongoDB log files.

mtools contains many more useful features that our Support team uses daily in working through customer cases. The best way to learn how you can leverage these scripts is to download and install mtools and follow some of the examples on the mtools wiki page. mtools is open source and available for download on github. It is also in the PyPI package index and can be installed via pip . If you have any questions, bug reports or feature requests, simply go to the mtools github issues page and open an issue.

My colleague Asya Kamsky (from askasya.com) will show some more examples on how mtools can be useful for diagnosing and troubleshooting in her talk Diagnostics and Debugging at MongoDB World. I’ll be in the “Ask the Experts” sessions, so if you have any questions you can come ask in the Ask the Experts room. You can use my discount code “25ThomasRueckstiess” for 25% off tickets.