Mesosphere’s DCOS allows the use of a data center as if it were a single machine. It has a graphical user interface and people can install frameworks and manage the data center resources from a web browser. But what about a user interface for the Mesos frameworks itself?

Three Areas of Responsibility

If we step back and take a higher level look at a running Mesos cluster, we can see there are three main areas of responsibility:

Resource management: This is the responsibility of Mesos itself. It makes sure that, at every point in time, applications running on the cluster get adequate processors and memory assigned from the pool of CPUs and memory, which are provided by the Mesos slaves in the cluster.

Applications: Our applications are web servers, databases, batch jobs, data analytics, etc. We’ve probably been running these before we introduced Mesos to our setup — they are the bread and butter of the business.

Frameworks: Mesos frameworks are the layer between Mesos and the applications. They are responsible for launching and scaling applications, performing health checks and responding to various conditions affecting their operation. A framework can be understood as an integration layer between the distributed systems kernel (a sort of a virtualized data center) and an application.

Please note that aside from these three areas, there are sometimes other components in Mesos clusters — overlapping or interconnecting both frameworks and applications, e.g., distributed service discovery, such as Zookeeper or Consul, or configuration management tools, such as etcd.

Why Mesos Frameworks Need Graphical User Interfaces

Mesos itself comes with a GUI, which communicates a list of current and finished tasks, resource offers and a list of installed frameworks. It also gives users access to logs from their browser. The Mesos GUI is fairly spartan (monochrome display users, you don’t miss anything here), but it does fulfill its purpose. It does not provide any information about frameworks’ health, performance or features. It only displays the amount of resources provided to different frameworks. And that’s correct, because frameworks represent another area of responsibility in the system.

However, as a user of a Mesos framework, I want to be able to quickly scan the status of the framework, with the important metrics highlighted, because it makes it easier for me to decide what steps to take in order to achieve smooth operation and higher uptime. I also want to be able to see how the framework has been performing recently; trigger scaling of the application; or quickly find hostnames, IP addresses, ports, or endpoints of applications running on top of the framework. I might want to visualize the data, in order to see trends and patterns I wouldn’t be able to recognize in logs or overview tables.

We often focus on good interfaces for end users, which is why old and young people alike can work the inflight video systems on long-haul flights. However, maybe because system operators know their way around the command line, we neglect user experience of systems software like Mesos. Nevertheless, if it’s logical that good interfaces enable rapid decision making for end users, then it’s logical that good interfaces enable rapid decisions for operators, too. In a world of increasing complexity, as we move to distributed, decentralized and more narrowly focused services, the ability to quickly get an overview of the system becomes not just a nice-to-have feature, but actually a mission-critical one.

GUI Application Within a Framework

Each Mesos framework consists of two parts: the scheduler and the executor. The scheduler is like the brain of a framework; it communicates with Mesos and implements most of its logic. The executor is the agent; distributed to Mesos slaves, it enables communication between the framework and the application it serves. Naturally, because all information goes through the brain at one point, all the existing Mesos frameworks implement a GUI application, which is essentially a web application, as part of the scheduler.

For example in the Elasticsearch Mesos framework, built in collaboration with Ken Owens‘ team at Cisco Cloud Infrastructure Services, the GUI application, based on Spring Boot, is launched as part of the scheduler’s main method:

public void run(String[] args) { checkEnv(); try { parseCommandlineOptions(args); } catch (ParseException | IllegalArgumentException e) { printUsageAndExit(); return; } final ElasticsearchScheduler scheduler = new ElasticsearchScheduler(configuration, new TaskInfoFactory()); HashMap<String, Object> properties = new HashMap<>(); properties.put("server.port", String.valueOf(configuration.getManagementApiPort())); new SpringApplicationBuilder(WebApplication.class) .properties(properties) .initializers(applicationContext -> applicationContext.getBeanFactory().registerSingleton("scheduler", scheduler)) .initializers(applicationContext -> applicationContext.getBeanFactory().registerSingleton("configuration", configuration)) .showBanner(false) .run(args); scheduler.run(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 public void run ( String [ ] args ) { checkEnv ( ) ; try { parseCommandlineOptions ( args ) ; } catch ( ParseException | IllegalArgumentException e ) { printUsageAndExit ( ) ; return ; } final ElasticsearchScheduler scheduler = new ElasticsearchScheduler ( configuration , new TaskInfoFactory ( ) ) ; HashMap & lt ; String , Object & gt ; properties = new HashMap & lt ; & gt ; ( ) ; properties . put ( "server.port" , String . valueOf ( configuration . getManagementApiPort ( ) ) ) ; new SpringApplicationBuilder ( WebApplication . class ) . properties ( properties ) . initializers ( applicationContext - & gt ; applicationContext . getBeanFactory ( ) . registerSingleton ( "scheduler" , scheduler ) ) . initializers ( applicationContext - & gt ; applicationContext . getBeanFactory ( ) . registerSingleton ( "configuration" , configuration ) ) . showBanner ( false ) . run ( args ) ; scheduler . run ( ) ; }

Here the Spring Boot web component is responsible for serving both the web application, based on AngularJS, and the RESTful API. As you can see, the only parameter we pass is the TCP port number.

Start With a Well Designed API

Many Mesos framework GUIs are developed as in-browser applications written in Javascript using a (typically RESTful) framework API. The advantage of this approach is that there is a loose coupling of the GUI and framework functionality. Practically, that means that the GUI can be replaced with alternatives, easily reused with other frameworks (as long as they follow some standard in API design, which they eventually should), and last but not least, the API can be used by the framework’s CLI tooling.

As an example, the Elasticsearch Mesos framework REST API currently has these two endpoints:

GET /v1/tasks 1 GET / v1 / tasks

returning an array of all Elasticsearch tasks known to the cluster:

[ { "hostname": "example4.nodes.cluster", "http_address": "172.18.58.139:9203", "id": "hJLXmY_NTrCytiIMbX4_1g", "name": "example4", "started_at": "2014-11-11T08:40:51.620Z", "state": "TASK_RUNNING", "transport_address": "172.18.58.139:9303", "version": "0.1.1" } ] 1 2 3 4 5 6 7 8 9 10 11 12 [ { "hostname" : "example4.nodes.cluster" , "http_address" : "172.18.58.139:9203" , "id" : "hJLXmY_NTrCytiIMbX4_1g" , "name" : "example4" , "started_at" : "2014-11-11T08:40:51.620Z" , "state" : "TASK_RUNNING" , "transport_address" : "172.18.58.139:9303" , "version" : "0.1.1" } ]

and the second API endpoint GET /v1/cluster, returning cluster name and configuration:

{ "name": "my_elastic_cluster", "configuration": { "Cpus": "0.2", "Disk": "250", "DiskUnits": "GB", "FailoverTimeout": "2592000", "FrameworkId": "20150707-102016-1006637484-5050-1-0000", "FrameworkName": "elasticsearch", "ManagementApiPort": "0", "Mem": "512", "MemUnits": "MB", "NumberOfHwNodes": "3", "State": "org.apache.mesos.elasticsearch.scheduler.State@5afac135", "TaskName": "esdemo", "Version": "0.1.1", "ZookeeperUrl": "zk://zookeeper:2181/mesos" } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 { "name" : "my_elastic_cluster" , "configuration" : { "Cpus" : "0.2" , "Disk" : "250" , "DiskUnits" : "GB" , "FailoverTimeout" : "2592000" , "FrameworkId" : "20150707-102016-1006637484-5050-1-0000" , "FrameworkName" : "elasticsearch" , "ManagementApiPort" : "0" , "Mem" : "512" , "MemUnits" : "MB" , "NumberOfHwNodes" : "3" , "State" : "org.apache.mesos.elasticsearch.scheduler.State@5afac135" , "TaskName" : "esdemo" , "Version" : "0.1.1" , "ZookeeperUrl" : "zk://zookeeper:2181/mesos" } }

This means we can easily connect the GUI to the Elasticsearch framework, as shown above, or connect our CLI to the framework.

Using Websockets for Real-Time Data Flows

During GUI development, you soon find out that a great deal of the information that you want to display is dynamic in nature. Nodes are created and killed, performance metrics change constantly, rolling upgrades come in, etc. A great alternative to a polling strategy (periodically requesting updates from the server) is using websockets and pushing events from server to browser as they occur.

Conclusion

Mesos frameworks are being developed by a lot of developers now. These developers focus on functionality, security and stability first. In this article I tried to introduce several arguments for why a quality graphical user interface is critical for usability and success of frameworks. We’ve also seen how relatively easy it is to hook a web application into the scheduler component of a Mesos framework. At Container Solutions, we believe a well designed interface leads to better adoption of Mesos frameworks and we expect to refine our ideas around this subject as times passes and we gain more experience developing frameworks.

Jaroslav is with Container Solutions, the company that developed the Mesos framework for Elasticsearch with Cisco. It’s worth noting that Jaroslav will be at Software Circus, a different kind of conference Sept. 10-11 in Amsterdam. The conference theme is programmable infrastructure. But it’s not exactly just a software conference. It’s also a music festival. There’ll be circus acts. It’s not going to be in a hotel’s basement conference room. We’re going to Software Circus and so is Jaroslav, who says he will gladly recount his stories of building user interfaces for distributed systems around the bonfire over beer.

Cisco is a sponsor of The New Stack.

Feature image: “On the framework” by Micolo J is licensed under CC BY 2.0.