Website Tracking using the ELK Stack

The ELK stack is all about capturing data and reporting on it in a meaningful way. Tracking website visitors and traffic is also all about capturing data and reporting on it in a meaningful way. Since that’s the main aim of the ELK stack, you can very easily implement a simple website analytics engine using it.

I stayed away from libraries as much as possible to make the code easy to replicate and understand. The code is self explanatory, but if you have any questions, let me know in the comments!

The Setup

First, a simple webpage with our tracking code added:

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>Analytics with Logstash</title> </head> <body> <h1>Analytics with Logstash</h1> <a href="#top">Click a link!</a> <script src="tracking.js"></script> <script src="doc_cookies.js"></script> </body> </html>

Nothing fancy (or functional!) there. Then the actual tracking code:

function logVisit() { var eventData = data('visit'); logEvent(eventData); } function logClick(target) { var eventData = data('click'); eventData['destination'] = this.getAttribute('href'); logEvent(eventData); } // Send an AJAX payload to log the event function logEvent(data) { var request = new XMLHttpRequest(); request.open('POST', 'http://my.server.com/analytics.php', true); request.setRequestHeader('Content-Type', 'application/json; charset=UTF-8'); request.send(JSON.stringify(data)); } // Get the data you want log function data(type) { return { type: type, location: window.location.href, referer: document.referrer, language: window.navigator.userLanguage || window.navigator.language, width: screen.width, height: screen.height, local_time: new Date(), returning: returning() }; } // Utility function to see if it's a returning visitor function returning() { if (docCookies.getItem("EagerELKTracking")) { return 1; } docCookies.setItem("EagerELKTracking", 1, Infinity); } // Kick off the process document.addEventListener("DOMContentLoaded", logVisit); // Set up other events var elements = document.getElementsByTagName("a"); for (var i = 0; i < elements.length; i++) { elements[i].addEventListener("click", logClick); }

The only tricking thing about this is gathering the relevant information. I added some simple properties, but there’s a lot more information that you can add to this. I’ve also added tracking to the clicking of links for extra credits.

The following code comes from MDN, but anything that can manipulate cookies will work.

// From https://developer.mozilla.org/en-US/docs/Web/API/document.cookie var docCookies = { getItem: function (sKey) { if (!sKey) { return null; } return decodeURIComponent(document.cookie.replace(new RegExp("(?:(?:^|.*;)\\s*" + encodeURIComponent(sKey).replace(/[\-\.\+\*]/g, "\\$&") + "\\s*\\=\\s*([^;]*).*$)|^.*$"), "$1")) || null; }, setItem: function (sKey, sValue, vEnd, sPath, sDomain, bSecure) { if (!sKey || /^(?:expires|max\-age|path|domain|secure)$/i.test(sKey)) { return false; } var sExpires = ""; if (vEnd) { switch (vEnd.constructor) { case Number: sExpires = vEnd === Infinity ? "; expires=Fri, 31 Dec 9999 23:59:59 GMT" : "; max-age=" + vEnd; break; case String: sExpires = "; expires=" + vEnd; break; case Date: sExpires = "; expires=" + vEnd.toUTCString(); break; } } document.cookie = encodeURIComponent(sKey) + "=" + encodeURIComponent(sValue) + sExpires + (sDomain ? "; domain=" + sDomain : "") + (sPath ? "; path=" + sPath : "") + (bSecure ? "; secure" : ""); return true; }, removeItem: function (sKey, sPath, sDomain) { if (!this.hasItem(sKey)) { return false; } document.cookie = encodeURIComponent(sKey) + "=; expires=Thu, 01 Jan 1970 00:00:00 GMT" + (sDomain ? "; domain=" + sDomain : "") + (sPath ? "; path=" + sPath : ""); return true; }, hasItem: function (sKey) { if (!sKey) { return false; } return (new RegExp("(?:^|;\\s*)" + encodeURIComponent(sKey).replace(/[\-\.\+\*]/g, "\\$&") + "\\s*\\=")).test(document.cookie); }, keys: function () { var aKeys = document.cookie.replace(/((?:^|\s*;)[^\=]+)(?=;|$)|^\s*|\s*(?:\=[^;]*)?(?:\1|$)/g, "").split(/\s*(?:\=[^;]*)?;\s*/); for (var nLen = aKeys.length, nIdx = 0; nIdx < nLen; nIdx++) { aKeys[nIdx] = decodeURIComponent(aKeys[nIdx]); } return aKeys; } };

And last but not least a simple PHP script to pipe the information to Logstash:

<?php $fp = fopen('php://input', 'r'); $input = stream_get_contents($fp); $message = json_decode($input); $message->ip = array_key_exists('REMOTE_ADDR', $_SERVER) ? $_SERVER['REMOTE_ADDR'] : 'Unkown'; $message->browser = array_key_exists('HTTP_USER_AGENT', $_SERVER) ? $_SERVER['HTTP_USER_AGENT'] : 'Unkown'; $message = json_encode($message); $socket = fsockopen('localhost', '8080'); if ($socket) { $bytes = fwrite($socket, $message . PHP_EOL); fflush($socket); die('Logged ' . $message . ' (' . $bytes . ')'); } else { header(400, 'Could not log'); die('Could not log ' . $message); }

It’s easier to get the IP and user agent in PHP, so we’ll get them there.

Getting the Data

Our logstash setup is a simple TCP input with an elasticsearch output to capture the data and a stdout output for debugging:

input { tcp { mode => 'server' port => 8080 type => 'analytics' codec => 'json' } } output { stdout { codec => 'rubydebug' } elasticsearch { host => "myanalytics.server.com" protocol => "http" } }

Now, when we run the tracker and visit the website, we’ll see events like these:

{ "type" => "visit", "location" => "http://my.server.com/index.html", "referer" => "", "language" => "en-US", "width" => 1366, "height" => 768, "local_time" => "2014-09-30T20:32:17.792Z", "returning" => 1, "ip" => "127.0.0.1", "browser" => "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:32.0) Gecko/20100101 Firefox/32.0", "@version" => "1", "@timestamp" => "2014-09-30T20:32:17.927Z", "host" => "10.0.2.2:59378" } { "type" => "click", "location" => "http://my.server.com/index.html", "referer" => "", "language" => "en-US", "width" => 1366, "height" => 768, "local_time" => "2014-09-30T20:32:42.986Z", "returning" => 1, "destination" => "#top", "ip" => "127.0.0.1", "browser" => "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:32.0) Gecko/20100101 Firefox/32.0", "@version" => "1", "@timestamp" => "2014-09-30T20:32:43.149Z", "host" => "10.0.2.2:59381" }

Viewing the Information

Now, on to the data captured in Elasticsearch as viewed in Kibana:

Information about your visitors at your fingertips, ready for mining!