MongoDB is ideal for structured logging

By now, the benefits of stuctured logging (also known as semantic logging) are well known and accepted:

  • It’s flexible
    Logs change over time – data is added and removed. If new data is added in mid-July I’d still like to be able to query the whole years data, January to December, without having to jump through formatting hoops – if my log changes regularly then I’d really prefer to not have to write or update a custom parser to query it every time a change is made.

  • It’s machine and human readable
    Most ‘standard’ logs are simple comma- or tab-delimited rows of data, primarily intended to be human readable. Many tools to read these logs are flakey and difficult to use, especially when the formatting changes. Ideally, a log should be both human readable to allow quick browsing and machine readable for ad-hoc queries using a standard tool.

JSON to the rescue

Unsurprisingly, the de-facto standard format for stuctured logging is JSON. As a universal file format it is supported in almost every active programming language and many standalone tools exist for reading and searching JSON-formatted logs.

Compare simple log file entries that consist of a date and time, a type and an ip address:

2013/01/22 09:15:23,"login",""
2013/01/22 09:19:11,"logout",""

Expressed as JSON this might be:

"date_and_time":"2013/01/22 09:15:23",
"date_and_time":"2013/01/22 09:19:11",

Note: Each of the JSON log entries would usually only take a single line, I have only shown them this way for readability.

The general idea with structured logging is that we preserve the rich data fields as we pass them to the logging tool rather than converting them to a simple string. Put simply, structured loggers log objects, not strings. This improves readability and makes querying the data significantly easier and more powerful.

The benefits of logging to MongoDB

Assuming you’re ready to take advantage of structured logging, a great option is to log directly to a MongoDB collection.
The benefits are significant:

  • Simplified centralisation
    Multiple apps can easily share a MongoDB database and write to one or more collections. The database may be local or remote and, as it’s a regular MongoDB datasource, it can easily and dynamically be replicated or even horizontally scaled (sharded).

  • MongoDB speaks JSON but uses less space
    One disadvantage of creating structured logs is that the metadata saved with them inevitably means JSON formatted logs are larger than plain text logs. MongoDB won’t resolve this problem (if it is a problem for you) but it can reduce the impact. Documents stored in a collection are held in Binary JSON (BSON) format, a compressed binary version of JSON. A simple insert statement is all that it takes to log a complex object (including array, embedded documents and much more). For example, our previous log entry could be inserted by simply wrapping the JSON in an insert statement:

    “date_and_time”:”2013/01/22 09:19:11″,

    The MongoDB console provides a simple, Javascript-based interface for querying this data. For example, to retrieve the most recent 10 ‘logout’ entries:


  • MongoDB is schema-less
    MongoDB collections are schema-less, that is they don’t have a predefined structure in the way a relational database table does. This means that it is very simple to alter what we log at a later date without causing problems to existing queries. For example, after many months of logging we realise that we are not capturing everything we need – we want to add two new columns: ‘browser’ and ‘browser_version’. All we need do is change our insert to include the new fields:

    “date_and_time”:”2013/01/22 09:19:11″,

    and to retrieve all log entries with browser set to “IE”:


    which, in this case, would return only the one document that matched – earlier documents that don’t contain the browser property are ignored.
    No need to change the reporting scripts to cope with missing data.

  • MongoDB has a rich query language that understands JSON
    As we have seen, the MongoDB console has a rich query language that lets us query our logs with a JSON like syntax. Support exists for filtering (including embedded arrays and documents), sorting, complex aggregation and much more. Logs can finally become rich sources of information without having to write bespoke code.

  • Writes are fast and may be asynchronous
    Writes to collections can be asynchronous – just fire and forget. This means that, in many circumstances, inserting entries into the log can be almost as fast as writing direct to disk.

  • Capped collections may be limited by space or number of documents (or both)
    A capped collection is a collection of documents that is limited to a specific size (specified in bytes) or number of documents.
    When the size or document count is going to be exceeded the oldest documents are removed from the collection to make way for the newest. This makes it easy to, for example, only store the last few thousand log entries or set the maximum size to 5Mb and never worry about runaway code filling up a production hard drive.

In short, MongoDB is a great solution for storing and retrieving structured logs.

In a future post I’ll cover in more detail how to take advantage of capped collections and maximise throughput when logging to MongoDB.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s