I came across an interesting problem on my journey from legacy land into the promised land of Kubernetes heaven.

There’s a CI infrastructure in the old environment with a git repository, a Jenkins build server and a Nexus artifact repository.

And there’s one environment in the cloud with a nice Kubernetes cluster, with again a git repository holding the deployment descriptors, a registry with our Docker images and another Jenkins instance responsible for the continuous deployment (which has of course some manual steps).

The problem is that the source code for some very important reason (and there’s always some very important reason in each and every enterprise for everything which would make your life easier) cannot be pushed into the cloud which means that the artifacts are built in the legacy environment and uploaded to the Nexus servers there.

When someone decides so, the artifact is downloaded to a developers machine, gets committed into the 2nd version control, and so uploaded into the cloud environment. There the Jenkins server spins up a pod (docker, kubectl) which will create a Docker image from the jar file and will push it to the registry from where Kubernetes can access and deploy it.

You have to pay extra for network traffic, therefore there’s no way you can download the artifacts every 10 minutes — besides it would be slow as well. Same is true for the code, we cannot pull and build at every commit — it’s expensive.

For me this is very ugly and painful — especially to commit binaries manually into git and version control them, so I wanted to change this.

Summed up: the goal is not to check in jar files manually into git but save network traffic at the same time.

Sha1 to the rescue

The trick of which the creators of these pipelines above are not aware of is that when you create maven artifacts and deploy them, you get some extra metadata along the binaries, have a look:

bnemeth@mash:~/.m2/repository/com/edudoo/sample/1.0-SNAPSHOT$ ls -la

total 104

drwxrwxr-x 2 bnemeth bnemeth 4096 szept 14 15:08 .

drwxrwxr-x 3 bnemeth bnemeth 4096 aug 27 12:46 ..

-rw-rw-r-- 1 bnemeth bnemeth 14345 szept 3 10:51 sample-1.0-20180903.072937-26.jar

-rw-rw-r-- 1 bnemeth bnemeth 40 szept 3 10:51 sample-1.0-20180903.072937-26.jar.sha1

-rw-rw-r-- 1 bnemeth bnemeth 1564 szept 3 10:51 sample-1.0-20180903.072937-26.pom

-rw-rw-r-- 1 bnemeth bnemeth 40 szept 3 10:51 sample-1.0-20180903.072937-26.pom.sha1

-rw-rw-r-- 1 bnemeth bnemeth 14399 szept 14 14:01 sample-1.0-20180914.083257-30.jar

-rw-rw-r-- 1 bnemeth bnemeth 40 szept 14 14:01 sample-1.0-20180914.083257-30.jar.sha1

-rw-rw-r-- 1 bnemeth bnemeth 1157 szept 14 14:01 sample-1.0-20180914.083257-30.pom

-rw-rw-r-- 1 bnemeth bnemeth 40 szept 14 14:01 sample-1.0-20180914.083257-30.pom.sha1

-rw-rw-r-- 1 bnemeth bnemeth 14387 szept 20 14:16 sample-1.0-SNAPSHOT.jar

-rw-rw-r-- 1 bnemeth bnemeth 1157 szept 13 10:23 sample-1.0-SNAPSHOT.pom

-rw-rw-r-- 1 bnemeth bnemeth 720 szept 20 14:16 maven-metadata-local.xml

-rw-rw-r-- 1 bnemeth bnemeth 787 szept 14 15:08 maven-metadata-ply-snapshots.xml

-rw-rw-r-- 1 bnemeth bnemeth 40 szept 14 15:08 maven-metadata-ply-snapshots.xml.sha1

-rw-rw-r-- 1 bnemeth bnemeth 424 szept 20 14:16 _remote.repositories

-rw-rw-r-- 1 bnemeth bnemeth 193 szept 14 15:08 resolver-status.properties

This sha1 hash changes every time the jar file changes.

It’s a checksum.

And there is a very simple way to download the latest artifact from Nexus (format: groovy in Jenkinsfile):

def downloadJar(String groupId, String artifactId, String version, String folder){

jarUrl="http://nexus.edudoo.com/service/local/artifact/maven/content?g=${groupId}&a=${artifactId}&v=${version}&r=snapshots&c=exec"

sh "curl \"${jarUrl}\" > ${folder}/${artifactId}-exec.jar"

}

By adding a qualifier, you can obtain the hash file as well:

def downloadHash(String groupId, String artifactId, String version) {

shaUrl="http://nexus.edudoo.com/service/local/artifact/maven/content?g=${groupId}&a=${artifactId}&v=${version}&r=snapshots&c=exec&e=jar.sha1"

hash=sh(returnStdout: true, script: "curl \"${shaUrl}\" -s")

hash

}

Now instead of downloading the jar file everytime and compare that with what we had before, it’s enough to download the hash and compare that one!

(Yes, this is the same way how dependency management tools determine whether to download snapshot artifacts or not.)

Caching the Checksum

We could as before commit the sha1 file into git and always compare the latest sha1 with that one but this feels still very suspicious — remember: our Jenkins node is a Kubernetes pod therefore stateless and we don’t really want to hack around with volumes.

If only we could save the metadata somewhere in the second environment…

Indeed there is a way. Instead of only tagging our Docker images with something like the version number (which is a very outdated way of keeping track of your deployed artifacts anyhow if you do CD — think about it and again a little longer) and the classic latest (SNAPSHOT!), why don’t we simply tag it with the hash itself? This way we would always know if a Docker artifact has already been built from a jar with the sha1 tag.

To check this, simply do:

def imageExists(String registryHost, String image, String tag) {

echo "Checking ${image} in remote repository ${registryHost} for existing ${tag} tag..."

exists=false

withCredentials([[$class: 'UsernamePasswordMultiBinding', credentialsId:'ARTIFACTORY_DOCKER_NDC', usernameVariable: 'USER', passwordVariable: 'PASSWD']]) {

registryResponse=sh(returnStdout: true, script: "curl --insecure -s -u ${USER}:${PASSWD} \"https://${registryHost}/v2/${image}/tags/list\"")

exists=registryResponse.contains(tag)

echo "Image exists: ${exists}"

}

exists

}

There’s some authentication going on but the point is that a docker registry has also an endpoint to target with a curl statement which returns all the tags of an image. Note that registryResponse.contains(tag) is simplified and in some cases incorrect way to check if a json object contains a specific tag somewhere but I’ll leave this implementation task to you (hint: use jq or some python lib).

(Tagging with the commit ID of the artifact would be probably the best but we don’t have access to the original git, just the artifact repository.)

So know the steps are:

Download hash from Nexus Check if an image exists with that tag If yes, do nothing If no, download the jar to a temporary directory, build image, tag image with the hash, push image with all the tags

Now we can always run this job, it would never download a jar if it is not absolutely necessary!