Everywhere you look, change is afoot in computer networking. As data centers grow in size and complexity, traditional tools are proving too slow or too cumbersome to handle that expansion.

Dinesh Dutt is Chief Scientist at Cumulus Networks. Cumulus has been working to change the way we think about networks altogether by dispensing with the usual software/hardware lockstep, and instead using Linux as the operating system on network hardware.

In this week's New Tech Forum, Dinesh details the reasons and the means by which we may see Linux take over yet another aspect of computing: the network itself. -- Paul Venezia

Using Linux as your network operating system

There are many differences between the modern data center and the traditional enterprise networks that preceded it. Data center applications are more Layer-3 savvy -- for example, they use L3 methodologies such as name service and service discovery instead of assuming their peers are all in the same subnet. Data centers can work around things like network failures, the fact that more traffic is east-west rather than north-south, and so on. But one point dwarves everything else: the monster scale of the modern data center

Thanks to server virtualization, there are tens of thousands to millions of endpoints inside a data center compared to the few thousand of yesteryear. The scale of virtual networks is also far more than that of traditional VLANs; not only that, these virtual networks are spun up and down at speeds orders of magnitude faster than their VLAN counterparts. Plus, the sheer number of servers means there is a lot more networking gear to manage than in traditional enterprise networks.

Thus, the scale and agility of modern data centers put data center networking at odds with the existing network models. Some problems, such as the number of virtual networks, required the development of new technologies such as VXLAN, while others have required a redesign of the network architecture deployed in the data center. But the problem of managing the network is not rooted in any failure of networking, rather in the design of the network OS.

How we got here

Not so long ago, although a lifetime in Internet years, routers and switches went from being Unix servers with a few NICs to black boxes that were optimized for packet switching and for running the protocols essential to their functioning as a router. In the beginning, the OS that ran on this black box was hardly an OS. It was more like a giant select loop executing various tasks based on what woke it up.

In time, the deficiencies stemming from the lack of processes or memory management became apparent as routers became more complex. The router OS evolved to the next stage of having an embedded OS such as QNX or VxWorks for process and memory management. From there, it was a small step to substitute the embedded OS with BSD or Linux.

Despite these changes to the underlying OS, the router remained a black box or an embedded system in its personality. Protocols were the de jure mode of communication and access. For example, SNMP was developed as a way to manage the boxes, but it ended up being used only for monitoring. Even at that it did a poor job, because MIBs had to be developed and additional code written to access the counters. Since most testing and use was via the command line, SNMP quickly devolved into a "by the way" mechanism. That it was cumbersome and slow didn't exactly add to its charm.

Configuring the boxes required screen-scraping the command-line output. The CLI itself was modal in nature. To configure a feature, you had to traipse down multiple levels: "configure terminal," "interface g1/1," "router OSPF," and so on before you got to work on the features of interest. This meant that scripting the configuration was not unlike pulling teeth. It did not help that the embedded nature of the box also meant the CLI varied quite a bit across (and many times even within) vendors. Thus, not only was screen-scraping for configuration painful, it could also change so much that developing common tools was difficult. As a consequence, the network administrator's toolkit remained woefully inadequate and antediluvian.

We then see that the reason networks are so hard to manage is rooted in the model of the OS used to run the networking gear. In other words, the solution lies in addressing the limitations of the router operating system rather than in networking per se.

Two roads diverged in the wood...

We come to a fork in the road. One road leads us to NETCONF and talk of northbound and southbound APIs and such. The other leads to Linux and its well-understood, open, mature, and vibrant ecosystem. In other words, one way is to attempt to bolt an improved management and a programmatic API to existing router OSes, and the other is to switch to an OS that already has all of those characteristics.

Linux not only has an excellent, well-understood API and management toolkit, but it also has the networking model and features essential to a modern data center. For example, Linux clearly separates the control plane (in user space) from the packet forwarding plane (in the kernel). Furthermore, it explicitly models the RIB/FIB separation too, leaving the FIB (Forwarding Information Base) in the kernel and allowing the RIB (Routing Information Base) to be managed by a user process such as a routing protocol daemon. Linux supports all the typical features you'd expect of a networking gear: IPv4/v6 support, bridging, routing, link aggregation, and so on, as well as advanced features such as BGP policy-based accounting and BGP TTL security.

Let me define what I mean by Linux as the router (or switch) OS. Specifically, I mean I do not want to modify the abstractions and API offered by Linux to the applications. Getting even more specific, Linux exposes mechanisms to access interfaces, routing table, arp cache, the L2 forwarding table, and more; it provides a notification mechanism (netlink) to receive notifications on changes to these tables; it provides mechanisms to examine the counters associated with these entries. When I speak of using Linux as a router OS, I want these to be unmodified. In short, I propose that the Linux kernel is where all the state resides.

Furthermore, instead of routing customized shells such as JunOS, NX-OS, or EOS, let's use the native Linux shell such as bash. Unlike its routing cousins, bash is not modal in nature. Each command is stand-alone and is explicitly designed to work in tandem with a chain of commands to accomplish a task. This means that scripting on a Linux OS is native. The Linux shell is designed to be a programmatic interface.

This accomplishes two things. First of all, the system is far more transparent because the kernel data structures can be examined using native Linux tools. Next, it is now possible to treat the router OS as nothing more than a megaserver -- that is, a server with 64 or 128 NICs. The circle is complete. The router is back to where it started: a Unix server with multiple NICs.

The consequences of choosing Linux

An easily noticeable effect is that the server management toolkit is now available to manage networking gear as well. Data center admins are used to managing tens of thousands to hundreds of thousands of servers. They use tools such as Ansible, Chef, and Puppet (or even in-house management toolkits) to do so. They can now use the very same tools to manage networking gear too. Unifying the management toolkit inside the data center comes with an immediate reduction in opex because admins have to be trained in only one set of tools. Furthermore, scripts written in a shell or Python or Ruby or any such language are also usable on networking gear.

Not only configuration, but even monitoring falls under this umbrella. Customers are free to standardize on the model and tools such as CollectD, Graphite, and Ganglia to gather statistics for all their data center equipment. All of these tools have active communities that are developing and extending these tools continuously. Also, they can use tools such as Monit to monitor the processes and take actions based on events such as restarting the daemon if a protocol daemon dies.

Second, using the Linux API unmodified means that customers can pick and choose which protocol suites they want to in their network. For example, two popular routing protocol suites are Quagga and BIRD. Since they both rely on the well-established netlink API to modify the routing table and understand when interfaces go up or down, they can run unmodified on the networking gear. Even customer-specific protocols or protocol suites can run as long as they adhere to the standard API. The same applies to other protocols such as LLDP (Link-Layer Discovery Protocol) and STP (Spanning-Tree Protocol).

Next, problems such as defining API to program a box, what tables to program, what is the model of the box, and so on are all rendered irrelevant because Linux has an open, well-established model and API. An active community is also invested in ensuring the integrity of this API.

Moreover, the open nature of Linux means customers can be involved in developing their own tools to solve their specific problems. For example, instead of relying on the networking vendor to develop features such as link state tracking or flex links, customers can download and use nifty utilities such as Netplug or ifplugd. Tools such as Netcat can be used to pull info from multiple devices in the network.

Another benefit for customers is that the open nature of this model provides unprecedented (for a router/switch) transparency. For example, they can verify there are no backdoors in the code.

Community development

This openness also means a vibrant community is actively involved in innovating solutions. For example, in the server world, Linux containers have emerged as a lightweight alternative to server virtualization, and a whole new set of tools are being developed to use this. The networking world can now be a part of this innovation rather than something that is innovated around. For example, instead of using real boxes, customers can test their network deployment using VMs or containers and push the configuration to real networks only after this is passed.

Am I talking of using x86-based servers as routers? No, I'm talking about a model where Linux can function as described above, but the network data path is hardware accelerated. In other words, the switching silicon is added to the list of hardware that Linux already manages, such as processors and memory devices.

Essentially, we can write the equivalent of a device driver to synchronize the kernel state of these data structures with the hardware. Silicon switching ports can be made to appear like NICs to the OS. Thanks to Linux's Netlink model, a device driver can sit by the side and listen to everything that's going on with the kernel state -- interface up/down, routing entries added/deleted either by user or routing protocols, netfilter entries added or deleted -- and synchronize that state with the hardware. Furthermore, the driver can sync the state of counters from the hardware with the kernel state allowing native Linux tools such as ethtool, iptables, or /proc/net/dev to display the correct information, completely unaware that these values are coming from the hardware. Cumulus Networks has developed the first such solution, but others with a similar model may not be far away.

Looking toward an open future

By the end of the last century, a number of network protocols had gone out of fashion: DECnet, SNA, IPX. I remember distinctly the time when working on Cisco's Catalyst 6500 ASICs, we decided to stop supporting IPX. It was the last of the non-IP protocols that we had been supporting in the enterprise networks.

The open model of TCP/IP meant that many more people worked on improving it and using it in various applications compared to the closed nature of most other protocols. The development-by-committee model of OSI was no match for the "rough consensus, working code" model of the IETF and TCP/IP.

Similarly, Ethernet soon became the de facto physical media of choice. Starting from humble roots, the simple model of Ethernet wiped out pretty much all the competition from FDDI and Token Ring to SONET and ATM at the other end of the spectrum. Customers can now focus on TCP/IP with Ethernet as the common networking stack on which they can build their applications.

With the rest of the data center ecosystem open, the network OS is the last bastion of the closed model. With Linux as the network OS, users can now evolve networks and applications together, so that the whole is more than just the sum of its parts.

Linux as the network OS is an idea whose time has come.

New Tech Forum provides a means to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all enquiries to newtechforum@infoworld.com.

This article, "Your next network operating system is Linux," was originally published at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.