MongoDB is ideal for structured logging

 
By now, the benefits of stuctured logging (also known as semantic logging) are well known and accepted:

  • It’s flexible
    Logs change over time – data is added and removed. If new data is added in mid-July I’d still like to be able to query the whole years data, January to December, without having to jump through formatting hoops – if my log changes regularly then I’d really prefer to not have to write or update a custom parser to query it every time a change is made.

  • It’s machine and human readable
    Most ‘standard’ logs are simple comma- or tab-delimited rows of data, primarily intended to be human readable. Many tools to read these logs are flakey and difficult to use, especially when the formatting changes. Ideally, a log should be both human readable to allow quick browsing and machine readable for ad-hoc queries using a standard tool.

JSON to the rescue

Unsurprisingly, the de-facto standard format for stuctured logging is JSON. As a universal file format it is supported in almost every active programming language and many standalone tools exist for reading and searching JSON-formatted logs.

Compare simple log file entries that consist of a date and time, a type and an ip address:

2013/01/22 09:15:23,"login","50.61.75.3"
2013/01/22 09:19:11,"logout","50.61.75.3"

Expressed as JSON this might be:

{
"date_and_time":"2013/01/22 09:15:23",
"type":"login",
"ip":"50.61.75.3"
}
{
"date_and_time":"2013/01/22 09:19:11",
"type":"logout",
"ip":"50.61.75.3"
},

Note: Each of the JSON log entries would usually only take a single line, I have only shown them this way for readability.

The general idea with structured logging is that we preserve the rich data fields as we pass them to the logging tool rather than converting them to a simple string. Put simply, structured loggers log objects, not strings. This improves readability and makes querying the data significantly easier and more powerful.

The benefits of logging to MongoDB

Assuming you’re ready to take advantage of structured logging, a great option is to log directly to a MongoDB collection.
The benefits are significant:

  • Simplified centralisation
    Multiple apps can easily share a MongoDB database and write to one or more collections. The database may be local or remote and, as it’s a regular MongoDB datasource, it can easily and dynamically be replicated or even horizontally scaled (sharded).

  • MongoDB speaks JSON but uses less space
    One disadvantage of creating structured logs is that the metadata saved with them inevitably means JSON formatted logs are larger than plain text logs. MongoDB won’t resolve this problem (if it is a problem for you) but it can reduce the impact. Documents stored in a collection are held in Binary JSON (BSON) format, a compressed binary version of JSON. A simple insert statement is all that it takes to log a complex object (including array, embedded documents and much more). For example, our previous log entry could be inserted by simply wrapping the JSON in an insert statement:

    db.mylog.insert({
    “date_and_time”:”2013/01/22 09:19:11″,
    “type”:”logout”,
    “ip”:”50.61.75.3″
    })

    The MongoDB console provides a simple, Javascript-based interface for querying this data. For example, to retrieve the most recent 10 ‘logout’ entries:

    db.mylog.find({“type”:”logout”}).sort({“date_and_time”:-1}).limit(10)

  • MongoDB is schema-less
    MongoDB collections are schema-less, that is they don’t have a predefined structure in the way a relational database table does. This means that it is very simple to alter what we log at a later date without causing problems to existing queries. For example, after many months of logging we realise that we are not capturing everything we need – we want to add two new columns: ‘browser’ and ‘browser_version’. All we need do is change our insert to include the new fields:

    db.mylog.insert({
    “date_and_time”:”2013/01/22 09:19:11″,
    “type”:”logout”,
    “ip”:”50.61.75.3″,
    “browser”:”IE”,
    “browser_version”:”10.0.1″
    })

    and to retrieve all log entries with browser set to “IE”:

    db.mylog.find({“browser”:”IE”})

    which, in this case, would return only the one document that matched – earlier documents that don’t contain the browser property are ignored.
    No need to change the reporting scripts to cope with missing data.

  • MongoDB has a rich query language that understands JSON
    As we have seen, the MongoDB console has a rich query language that lets us query our logs with a JSON like syntax. Support exists for filtering (including embedded arrays and documents), sorting, complex aggregation and much more. Logs can finally become rich sources of information without having to write bespoke code.

  • Writes are fast and may be asynchronous
    Writes to collections can be asynchronous – just fire and forget. This means that, in many circumstances, inserting entries into the log can be almost as fast as writing direct to disk.

  • Capped collections may be limited by space or number of documents (or both)
    A capped collection is a collection of documents that is limited to a specific size (specified in bytes) or number of documents.
    When the size or document count is going to be exceeded the oldest documents are removed from the collection to make way for the newest. This makes it easy to, for example, only store the last few thousand log entries or set the maximum size to 5Mb and never worry about runaway code filling up a production hard drive.

In short, MongoDB is a great solution for storing and retrieving structured logs.

In a future post I’ll cover in more detail how to take advantage of capped collections and maximise throughput when logging to MongoDB.


More:

DalekJS: a cross-platform, Javascript-based alternative to Selenium

 
DalekJS logo

DalekJS is a free and open source user interface (UI) testing tool written in Javascript and runnning in NodeJS though, like most Node tools, this is transparent once the installation is complete (which itself takes only a few minutes). Created by Sebastian Golasch as a response to the tortuous install process and maintenance nightmare that is Selenium-based testing this is a tool written by developers to solve developers real world web testing problems.

The current release is version 0.0.1 and described by Sebastian as “buggy as hell & not ready for production yet” though it is already feature-rich and definitely usable. Simple Javascript-based test scripts can easily check page properties such as title, dimensions, etc. as well as perform actions such as clicking links and buttons and filling forms.

DalekJS might just be the UI testing tool that web developers have been waiting for. Watch the following 15 minute video for a quick overview of just what Dalek can do and how easily it can do it.


More:

Ditching Lorem Ipsum

 
If you’ve ever created wireframes or mockups for a web app it’s likely you’ve encountered the problem of what to show in place of text content that hasn’t been written yet. The most common solution is to use a 2000 year old piece of latin text known as “Lorem Ipsum” (or one of the many hundreds of minor variations on it). The benefits of this text are that it has a, more or less, normal distribution of letters (that is, it’s reasonably similar to English) so it won’t distort the appearance of your app in the same way that simply pasting in repeated dummy text such as “This is test text. This is test text.” will. In addition, for most people, it isn’t readable. As such, it reduces the distraction of having irrelevant, but understandable, text on the page.

That’s the intention anyway. Despite this, there are still those who will be distracted by it and focus more on what it signifies (or doesn’t) than on your design. This leads to an obvious conclusion; rather than using “Lorem Ipsum”, why not display our wireframe’s dummy text in a font that has the shape of regular English sentences but is genuinely unreadable and, as such, never confused for real copy?

That is the role of Blokk. Available free in both ttf and web font versions, Blokk is a font made up of dashes. Cut and paste a few paragraphs of text from a Gutenberg book or an article from CNN, specify the font as Blokk,  and you’re done – a simple, effective, modern and distraction-free alternative to “Lorem Ipsum”.

The first paragraph of this post, rendered in Blokk Neue.
A sample paragraph of text, rendered in Blokk Neue; the font was coloured grey to further reduce the visual impact.



More:

Why your jsFiddle no longer works (and how to fix it)

 
If you’re unfamiliar with jsFiddle I recommend you check it out – it’s a great, free playground and an essential tool for web developers. So essential that, over the last few years, it has become almost mandatory for Javascript developers to create a jsFiddle to collaborate on problems with remote colleagues, to offer a library with working, interactive examples, to test proof-of-concepts or to troubleshoot errant code. I’ve lost count of the number of requests for help on Stack Overflow whose first reply is “jsFiddle or it didn’t happen”.

Combining jsFiddle’s interactivity with GitHub’s hosting and version control can greatly increase uptake of your public project. By linking your jsFiddle to GitHub’s raw output you can offer a no cost, no download, no fuss intro to your code.  Instead of asking fellow developers to download and set up your GitHub hosted library you can provide working code examples that instantly show how your code should be used and if it does what they expect it to in they way they expect it to do it. Or rather, that used to be the case. Recent browser changes, particularly to Google Chrome, mean than many existing jsFiddles no longer work. There’s no obvious error but no output appears in the Result.

Click to embiggen

The problem

The problem is simple enough – the external Javascript libraries we are loading from GitHub are no longer being executed. To see why, we need to take a step back. Let’s assume we have a library in GitHub that we want to use in jsFiddle.  To do so, we add the URL of the raw GitHub source in the External Resources section of jsFiddle. This has been common practice for years and used by pretty much everyone because it was easy, convenient and, best of all, it worked.

Adding an external resource to a jsFiddle
Adding an external resource to a jsFiddle. Here, the jlist-min.js library is being loaded directly from raw.github.com.

 

Unfortunately, it worked despite the fact that it shouldn’t have. Files served from raw.github.com are served with a content-type of text/plain. This didn’t used to matter but browsers such as Google Chrome have tightened up security and will now only execute Javascript code that has a content-type of application/javascript . If we look at the console in Chrome Dev Tools while in a jsFiddle that links to raw.github.com we can see the problem:

The error that appears in the Google Chrome console when we link to raw.github.com.
The error that appears in the Google Chrome console when we link to raw.github.com.

What was once acceptable (and very common) practice no longer works.

The solution

Luckily the solution is both simple and less wordy than the problem.
There is a 3rd-party service which will proxy the file you need from raw.github.com and change the content-type before your browser receives it.
To make use of this service, all you need do is remove the full stop (period) from between raw and github in the offending url.
That is

http://raw.github.com/rest_of_the_path

becomes

http://rawgithub.com/rest_of_the_path

 

and your jsFiddle is working again.

 

More: