Gathering information about how quickly services are running, or not, can get very complex when dealing with dozens of disparate services over many systems. To solve this problem, Twitter created its own distributed tracing system called "Zipkin" and has now made it available as open source. According to its developers, Zipkin is "closely modelled" after a Google research paper from 2010 about Dapper, a large-scale distributed systems tracing infrastructure.

The micro-blogging company says that it currently uses Zipkin to gather timing data for all of its services. Twitter has created instrumented libraries that allow it to collect tracing information which is passed into a Collector process and then on into a database. Developers and system administrators can then analyse the data from the system, through a web frontend. For example, a typical use of the system would be finding out why user requests may be timing out. Traced requests can allow the developers to pinpoint where a bottleneck is in the system.



Zipkin's architecture Zipkin uses Apache Cassandra – a scalable, column-oriented, distributed "NoSQL" database – for storage, Apache ZooKeeper – the Hadoop configuration management software – for coordination, and Facebook's Scribe data aggregation system as the logging framework to transport the trace data.

Further information about Zipkin (which is the Turkish word for a harpoon) includes download links and installation instructions that can be found on the project's GitHub page. Zipkin is made available under the Apache Licence 2.0. For updates on the project, users can follow @zipkinproject on Twitter.

Twitter engineers use a number of open source software tools, but they also release a number of their own tools as open source. Examples of these include recent MySQL enhancements, the Bootstrap 2.0 web framework, TextSecure and the Storm stream processing framework.

(crve)