Photo by rawpixel.com from Pexels

For the longest time Jenkins has, in my opinion, really been the only game in town when it came to Continuous Integration (CI) and even to some extent Continuous Deployment (CD). However, after several years of using, and fighting Jenkins I think it may be time to move on. Like so many other programs Jenkins has become a victim of its own success.

Jenkins began life as Hudson back in 2005. It gained notoriety in 2008 as the winner of the Duke’s Choice award at Java One. When Oracle bought Sun the Open source community decided to fork a new project, Jenkins, in order to maintain it as a truly open source and free endeavor. It has continued to live on as such since then.

I’ve been with Jenkins through much of my career. First as a consumer of Jenkins artifacts when it was still Hudson, and now as a dev ops admin with Jenkins, creating and configuring thousands of jobs across a vast team.

Jenkins has grown and matured much in those years. But all is not well. I find myself fighting aspects of Jenkins on a daily basis. While some strides have been made to address some of my concerns others are intractable problems that delve to the core of how Jenkins works.

Photo by energepic.com from Pexels

Build Configuration

Jenkins' core components are Jobs. These are the build configurations that tell Jenkins what and how to do stuff.

Web UI

At first everything was done through the UI. All projects were configured through very easy to use web forms. This was great, at first. But it soon became apparent that this was difficult to backup, and maintain. Also with the growing desire for configuration as code this didn’t fit well. Configuring large numbers of jobs was tedious and error prone. Maintaining those configurations was even harder.

Job DSL

Next came Jenkins Job DSL. This was a welcome step towards configuration driven development. Jenkins Job DSL allowed you to use Groovy to code your jobs. You could create reusable classes and methods to generate 10's, 100’s, 1000’s of jobs with little work as an admin. As an added benefit we could also source control the job configurations.

def gitUrl = 'git://github.com/jenkinsci/job-dsl-plugin.git' job('PROJ-unit-tests') {

scm {

git(gitUrl)

}

triggers {

scm('*/15 * * * *')

}

steps {

maven('-e clean test')

maven('-B release:prepare release:perform')

shell('cleanup.sh')

}

}

But all was not well. Job DSL had (has) several key flaws.

It wasn’t always easy to start from scratch. The seed job pattern was difficult to pull off.

Testing of build scripts was nearly impossible. This led sys admins to testing configuration in production. Also experimenting on new features was very difficult to do. What if you wanted one branch to build differently than other branches? You could do this but it wasn’t very clean.

While the Job DSL was centrally maintained, which the SA’s loved, it left the developers feeling like second class citizens when it came to configuring the jobs. Not all Jobs fell into the exact same pattern. This often led to very complex DSL scripts.

Jenkins Pipeline

Recently the move has been to Jenkins Pipeline or as many refer to them as Jenkinsfile, since the build is driven off a special file in the repository of the name Jenkinsfile. Jenkinsfile puts the control back in the developers hands. The build configuration is stored with the code. This allows for different branches to behave differently depending on the Jenkinsfile configuration in that branch. SA’s can still maintain a set of core functionality that the developers can import into their scripts and customize as needed.

pipeline {

agent any

stages {

stage('Build') {

steps {

//

}

}

stage('Test') {

steps {

//

}

}

stage('Deploy') {

steps {

//

}

}

}

}

While this does provide the ability to experiment in a more isolated environment it still has several problems. Although, Jenkinsfile is technically groovy you can’t do many groovy things you would normally think you could do because of the tight security concerns. Often this would show up as weird errors like not being able to do a for each on a list. I can’t tell you how many times I’ve been coding in Jenkinsfile just to have something seemingly benign blow up due to security script permissions. This then requires someone to login to Jenkins, approve the call, and run the job again. Rinse and repeat until hopefully the job completes. On top of all those, the whitelist can’t be configured and stored externally. Therefore, when you start a new Jenkins instance with your library you have to go through approving all those same scripts all of again. And you can’t approve a script until after it fails. So, good luck making sure everything is correctly approved.

Jenkinsfile is split into stages. This allows for nice isolation of build ideas like building, testing, releasing etc. Previous versions of Jenkins didn’t allow resuming the pipeline. I am happy to report the this seems to have been resolved in the April 2018 release.

Photo by rawpixel.com from Pexels

Plugins

On its own the core of Jenkins is fairly small. But no one runs Jenkins this way. You need, and want, plugins. Plugins are awesome! They allow third parties to write new functionality into Jenkins. We get pages with all kinds of content merged together. For example, I can have a build that displays code coverage, units test history, static code analysis trends and more. Wonderful right. On the surface yes.

But plugins have a dark side.

Because of plugins, and plugin versions, maintaining Jenkins is a nightmare. Plugin’s are deployed INTO Jenkins. As part of the single Jenkins Master web container. This means they all share a common classpath.

How do I know if I’m going to break something?

A lot of times upgrading Jenkins also changes the underlying configuration files. If you use Jenkins Job DSL then you must ensure it stays in sync with your version of Jenkins AND all the installed plugins. This forces you to have multiple Jenkins instances. But you can’t easily do traditional canary deployment patterns with Jenkins.

This ends up stagnating new features and delaying upgrades.

Photo by Pixabay from Pexels

File Data

All the data that Jenkins generates and uses to render all those lovely build result pages is stored as files. Lots, and lots and lots of files. This makes Jenkins performance incredibly IO bound. It is not unusual for large pipeline views to take several seconds to render. These large files also make Jenkins very memory intensive. Lots of times Jenkins parses these files as entire DOM’s, loading the whole contents of the file into memory. It is not unusually for Jenkins to use several Gigabytes of heap size. In fact, our production Jenkins instance has a 32 GB heap just to keep things running.

Memory

Speaking of memory because all the plugins are actually code that is loaded into Jenkins it is quite common to see large memory leaks. This isn’t the core Jenkins’ fault but it will still take down the server. We now schedule regular restarts of our Jenkins instance to alleviate the memory leaks.

System Groovy Scripts

If you are every tempted to use System Groovy scripts, DON’T! You can (will) mess up Jenkins all to easily. This is because the script is not a forked process, instead the script runs IN Jenkins. This is on purpose since as a system groovy script you want access to Jenkins itself and the actual running classes. But that means you also have access to the direct JVM and process that is running Jenkins. Therefore, you can mess with the class path, the heap, and even take the whole thing down (System.exit?)

Jenkins Configuration

While Job DSL and Jenkins file help in the configuration of Jobs they do very little when it comes to configuring Jenkins itself. You WILL need to configure Jenkins to connect to your static slaves or Mesos clusters, or setting up security or configuration environment variables, build tools etc. Job DSL has some libraries you can use to do this but they are hard to understand and impossible to test.

Photo by Startup Stock Photos from Pexels

Impossible To Test

I’ve said this once, twice, a thousands times. Jenkins is IMPOSSIBLE to test. Who would use a programming language that you couldn’t test except for in production. No one! Some of you may be saying of course you can test Jenkins. Believe me I’ve tried. How do you truly test it? We’ve tried starting up local instance of Jenkins, but this quickly breaks down when it comes to actually running the builds. What about the actual configuration? You need to configure access to the cluster, mock critical infrastructure like artifactory and DTR. It’s just really hard. And on top of all that you want your tests to not effect the actual system. Several times our “tests” have caused production issues due to the lack of isolation.

Upgrades

I mentioned this briefly in plugins but upgrades are a real problem. When you upgrade a plugin or even Jenkins itself its ALL or nothing. If you have one job that needs an older version of a plugin and another Job that wants a newer version you are stuck. You will need multiple Jenkins instances. Migrating to new versions isn’t always possible since upgrading a plugin can cause the underlying XML to change as well. Therefore, rollbacks can be nearly impossible.

Jenkins Web UI

The actual web UI is run on a single web container known affectionately as Jenkins Master. You can have only ONE of these. It does not support any kind of clustering or failover. So if you have a very large team of developers all hitting Jenkins the one instance needs to be very beefy and constantly monitored.

Photo by Chanaka from Pexels

Modularization

The core problem to Jenkins is its a monolith. Everything lives together, the plugins, the configuration, the web ui, Jenkins core, everything in one large web application. Its time we started applying the lessons we’ve learned on our own production systems to Jenkins itself. Jenkins needs to be a cloud native modular system.

Jenkins X

Jenkins X is an effort to bring Jenkins to cloud native environment of Kubernetes. Although, my experience with Jenkins X is limited I still feel it doesn’t address many of my concerns of modular disconnected components and easy testing.

Photo by Pixabay from Pexels

My Dream

My ideal CI/CD system needs to:

Support some sort of Unit testing of jobs

Be completely disconnected

Horizontally scaleable

Modular

Extensible

Thankfully, there appears to be something that just might fit the bill.

Drone.io

Drone.io is a CI/CD system that is based around docker containers.

How is this different than Jenkins on docker?

Each step is a different container focused on one thing. The job configuration file wires a series of these containers together.

kind: pipeline

name: default steps:

- name: test

image: maven:3-jdk-10

commands:

- mvn install -DskipTests=true -Dmaven.javadoc.skip=true -B -V

- mvn test -B

Why is this important?

Now I can run a job on my local machine in the EXACT same environment that the build system would use. Each step is isolated in small containers that are versioned. This allows me to upgrade and add new features without effecting existing jobs.

The same could be accomplished in Jenkinsfile. However, there is no easy local interpreter to Jenkinsfile configurations. Also, Jenkinsfile runs in Jenkins. Therefore, even if you could run the script you wouldn’t exactly replicate what Jenkins would do. With drone I could very easily interpret the configuration and run all the steps locally since all the work runs in the containers.

Build Result Meta-Data

What about the artifacts and build-meta data? And all those nice web pages?

It is the responsibility of the containers to upload any artifacts to other storage instances. Something like S3, Artifactory. These were built to handle large numbers of requests and easily distribute the load. Build meta-data should also be pushed out to other metrics services, like Casandra.

Build web pages?

Well, that does pose a problem. As of now there are no real solutions. You could create Maven sites for each build. This could display all the build results in somewhat nice formats for consumption of developers. But these would only show the results of single runs. Creating metrics and dashboard would still need to be developed externally.

Photo by rawpixel.com from Pexels

What now?

I see a void in the CI/CD space forming. There are several different frameworks coming out but most require you to choose a platform to run on like Google Cloud Build and AWS Pipelines. Jenkins as is stands today will not work. I can only hope Jenkins X can learn from the lessons of the past and become a true modern CI/CD framework we all desire.

To be clear. I like Jenkins. I continue to use Jenkins. I want to continue to use Jenkins in the future and I want it to succeed. I just feel that its current architecture needs to be updated for the cloud world. Hopefully, this will be Jenkins X.