As anticipated in a previous blog post, I’ve attended Cisco Live Europe in Berlin from 20th to 24th February. During that time I’ve also had the pleasure to be invited as delagate at the Tech Field Day at CLEUR event and had the opportunity to hear about some interesting news from Cisco on several topics and environments.

During the first TFD day of sessions, we learned how Cisco is developing and leveraging its DNA (Digital Network Architecture) to simplify how campus networks are managed. This post will be focused on new features around programmability and automation on enterprise switches.

Programmable Interfaces in Enterprise Switches

SDN, NFV, Automation, Softwarization, Orchestration. When software meets networks the confusion starts from the naming. There is a whole world of new tools, names and acronyms around this field that people can get lost even before starting to dig deeper into it. That’s why our presenter, Fabrizio Maccioni, decided to give us a very focused and practical view of what Cisco is doing with programmable interface in campus switches.

(In)Consistency

One of the biggest difficulties in automating multi-vendor networks is that most vendors (when they do!) offer totally different APIs and protocols to “automate” their devices. This means that in order to execute a simple operation, like getting the current running configuration, we may have to use a REST API for vendor X, simple CLI for vendor Y and some proprietary protocol for vendor Z. This operation may also return structured or un-structured data based on the vendor. This is clearly sub-optimal and it doesn’t help engineers to start automating their networks.

This is even more frustrating when it happens with different devices from the same vendor! Today, based on which Cisco equipment you have, you may end up on one of the following stages:

Catalyst 4K : no APIs offered. Simple SSH/screen scraping of unstrucured data

: no APIs offered. Simple SSH/screen scraping of unstrucured data Catalyst 3K : NETCONF protocol with YANG models

: NETCONF protocol with YANG models Nexus 7K : NX-API leveraging CLI

: NX-API leveraging CLI Nexus 9K: NX-API leveraging REST

The good news is this is going to change soon. In fact, Cisco is working on driving consistency across several platforms so that we will be able to automatically manage our Cisco networks in the same regardless of the device platform and OS. In particular, Cisco will bring this kind of consistency across IOS-XR, IOS-XE and NX-OS.

This is an important commitment from Cisco as it represents the desire to lower the barrier to start using APIs instead of CLI on their enterprise equipment.

Data Models

How is Cisco going to do it?

The plan is to leverage YANG data models over device features, using NETCONF, REST or gRPC to configure/get those features.

YANG (Yet Another Next Generation) is a data modeling language used to describe how data is represented and accessed. YANG data models are represented by definition hierarchies called schema trees whose instances are encoded in XML.

As as example, the following block represent the data model for ACL statistical data.

module cisco-acl-oper { yang-version 1; namespace "urn:cisco:params:xml:ns:yang:cisco-acl-oper"; prefix cisco-access-control-list-oper; import ned { prefix ned; } import ietf-yang-types { prefix "yang"; } organization "Cisco Systems, Inc."; contact "Cisco Systems, Inc. Customer Service Postal: 170 W Tasman Drive San Jose, CA 95134 Tel: +1 1800 553-NETS E-mail: cs-yang@cisco.com"; description "This module contains a collection of YANG definitions for ACL statistical data."+ "Copyright (c) 2016 by Cisco Systems, Inc."+ "All rights reserved."; reference "TODO"; revision 2016-03-30 { description "Update description with copyright notice."; } revision 2015-08-10 { description "Model for Network Access Control List (ACL) operational data."; reference "RFC XXXX: Network Access Control List (ACL) YANG Data Model"; } augment /ned:native { container access-lists { config false; description "This is top level container for Access Control Lists. It can have one or more Access Control List."; list access-list { key access-control-list-name; description "An access list (acl) is an ordered list of access list entries (ACE). Each access control entries has a list of match criteria, and a list of actions. Since there are several kinds of access control lists implemented with different attributes for each and different for each vendor, this model accommodates customizing access control lists for each kind and for each vendor."; leaf access-control-list-name { type string; description "The name of access-list. A device MAY restrict the length and value of this name, possibly space and special characters are not allowed."; } container access-list-entries { description "The access-list-entries container contains a list of access-list-entry(ACE)."; list access-list-entry { key rule-name; ordered-by user; description "List of access list entries(ACE)"; leaf rule-name { type uint32; description "Entry number."; } container access-list-entries-oper-data { description "Per access list entries operational data"; leaf match-counter { type yang:counter64; description "Number of matches for an access list entry"; } } } } } } } }

YANG models are open and available on GitHub, where we can find a sub-directory reserved to Cisco models for IOS-XE, IOS-XR and NX-OS.

These models can either be open or native:

open models are vendor-independent, designed by standardization organizations like IETF but also by other entities like OpenConfig

native models are designed by Cisco itself for its own equipment

Cisco devices will support both types of model, with native models being a super-set of open ones. The reason for this is clear: (1) standardization organization are generally slow as (2) they have to find trade-off between several parties needing to find the best solution fitting the whole industry. As a result, platform specific features will be left out of the equation. In order to avoid partial feature coverage (which would be the worst!) Cisco has developed its native models that will be used to offer a complete support for all features. Anyway, both families will always be supported and the user will be the one who will choose which family to use. So if you want to use only IETF models across your multivendor enviroment which includes Cisco device, go for it!

This also have another implication: native models may be different across platforms, meaning that the same feature may be represented by different models on NXOS and XE, since NXOS can have some specific attributes not present on XE. This totally makes sense, but still I hope deviations will be minimal. We’ll know soon!

An important section of Fabrizio’s session has been focused on demos, showing us few use cases where these new features can be useful. Obviously, I couldn’t stand still without testing something myself 😉

Demo time

An example worths thousand words, so we’ll use two simple scripts to compare sending CLI command over SSH and using an open programmable interface like REST, highlighting the key benefits coming with the second approach.

Here I’ll use a Cisco CSR1000V device running Cisco IOS XE Software, Version 16.03.01. Jason Edelman has highlighted how this platform already supports RESTCONF even if it still appears as an hidden feature.

Operational commands

First, I want to compare sending operational commands. I’ll use Netmiko as SSH library. The following simple script will get the show ip interface brief output and will print it out.

from netmiko import ConnectHandler def main(): device = ConnectHandler('csr1', username='test', password='test', device_type='cisco_ios') output = device.send_command('show ip interface brief') print output if __name__ == "__main__": main()

The output will look as follows:

csr1#show ip interface brief Interface IP-Address OK? Method Status Protocol GigabitEthernet1 10.0.0.51 YES NVRAM up up GigabitEthernet2 unassigned YES NVRAM up up GigabitEthernet3 unassigned YES NVRAM up up GigabitEthernet4 10.10.10.1 YES NVRAM up up Loopback10 unassigned YES unset up up

Let’s now do the same using the REST API.

import requests def main(): auth = HTTPBasicAuth('test', 'test') headers = { 'Accept-Type': 'application/vnd.yang.data+json', 'Content-Type': 'application/vnd.yang.data+json' } url = 'http://csr1/restconf/api/config/native/interface?deep' response = requests.get(url, headers=headers, auth=auth) print response.text if __name__ == "__main__": main()

The produced output will look like this (too verbose).

Let’s compare them:

The first approach returned unstructured data which is easy to understand by engineers but hard to manage by software. Also, it took 8.8s to establish an SSH session, execute the command and retrieve the output. SSH is stateful, meaning a session needs to be created before starting to send any commands

The second approach returned A LOT of structured data which is really easy to manage and parse. Also, it took only 3.2s to get an huge amount of data because REST is a stateless service, meaning no session has to be established first

Configuration commands

Let’s now see how SSH and REST differently behave with configuration commands. I want to send a set of configuration commands to configure an interface with the following properties:

name: Loopback10 ip: 100.10.10.10 mask: 255.255.255.255 description: Configured with RESTCONF

To make things trieckier, let’s insert a typo into the mask: 255.255.255.355.

I’ve written two other simple scripts executing configuration commands via SSH and REST. In the first one, since it uses simple SSH, each command is sent individually. Let’s run it:

$ python test_ssh_command.py $

Let’s now check what happened on the device:

csr1#show run interface loopback10

Building configuration… Current configuration : 76 bytes

!

interface Loopback10

description Configured via REST

no ip address

end

As we can see, no IP/mask is configured due to the typo into the mask value, but the interface’s been created and description’s been correctly configured. This means this kind of operation is not transactional/atomic: in a transactional operation all operations must be correctly executed, otherwise none of them will be executed at all.

Now, let’s run the REST script, in which we can send multiple configuration commands at once (I’ve removed the loopback interface before doing it and included a print to show the request’s response).

$ python test_rest_command.py { "errors": { "error": [ { "error-message": "invalid value for: mask in /ios:native/ios:interface/ios:Loopback[ios:name='10']/ios:ip/ios:address/ios:primary/ios:mask: \"255.255.255.355\" is not a valid value.", "error-urlpath": "/api/config/native/interface/Loopback", "error-tag": "malformed-message" } ] } }

Here we have a clear description of what went wrong: invalid value for: mask \”255.255.255.355\” is not a valid value.

Let’s get back to the device and see what happened.

csr1#show run interface loopback10 ^ % Invalid input detected at '^' marker.

The loopback interface has not been created! How come?

The RESTCONF interface pushes configuration on NETCONF datastore first. Then, if the configuration is valid, meaning all the values are compliant to the underlying data models, it’s committed to the running configuration. Otherwise, ALL the configuration operations contained in the RESTCONF/NETCONF call are rolled back and never applied!

Again, let’s compare:

Single operation in SSH, multiple operations at once in RESTCONF/NETCONF

Open interfaces are transactional, SSH is not

Better error handling in RESTCONF/NETCONF. We can use try/except Python construct to check and remediate errors

On-box Python

I’ve already written on on-box Python support on Nexus 9000 platform. This may be particularly useful on some use cases and Cisco is now adding this support to its enterprise switches as well!

Fabrizio’s shown us an interesting demo where he exploited the Embedded Event Manager and Python. Here he configured the EEM (1) to monitor log messages to look for an IF-DOWN message and running a Python script to reactivate it via a “no shut” command and (2) to create a backup config every time a change occour.

Someone may ask: why should we consume CPU for something we can do with an external system? Here are few points:

Python runs on a secure Linux instance on the device, so the actual OS is separated from Linux environment. This enabled Cisco to limit the Linux container CPU usage up to 1% of the total. This means Python scripts will never overload your device.

Why shouldn’t you use something that is already there without the need to add an external system which may mean additional complexity, overhead etc. ? 🙂

What if you accidentaly lose access to the device due to some misconfiguration on your management interface? This can be used to automatically restore the right configuration!

Conclusions

Unlike other buzzword, Automation is a reality today and it’s good to see how Cisco is working hard on trying to offer a better experience to those who want to start automating their networks, not only on data centers but on campus environments as well.

In particular, I’m really happy about Cisco effort on bringing some level of consistency across the (almost) full range of OSs. This is something the industry has to lean toward if we want automation to be the normality when it comes to operate networks: consistency across APIs, enabling us to access devices the same way and consistency on data using open models, enabling us to get the same kind of data across differen devices.

That’s the future and I’m excited about it 🙂