Photo by Mikael Kristenson on Unsplash

About a month ago the Kubernetes project released a security fix that unfortunately broke our OpenShift cluster. This issue manifested itself by the impossibility to create volume subpaths for container volume mounts which eventually put our pipelines to a halt.

We got affected just after provisioning new nodes into our 3.7.22 cluster, something that had worked seamlessly without issues in the past. This time however, either new minor packages were compromised or the bug had made its way into all OpenShift 3.x rpm packages.

Luckily for us, we didn’t loose any of our primary nodes which have been supporting us through out this issue so far. Fortunately a fix for this bug has been recently released by RedHat (3.7.44) but this time we are putting the effort into ensuring that we are able to control the rpm package versions that we apply to nodes in the cluster moving forward.

Forcing a specific OpenShift package version.

According to the documentation, the installer uses the following Ansible inventory facts to force a specific package version install:

However there is an issue in the Ansible installer playbook whereby the pre-install routines fail by performing checks against available package versions instead of installed ones. In other words, the cluster install wont run if there are packages in the rhel-7-server-ose-3.7-rpms repo that are higher than the version specified in the openshift_pkg_version.

To overcome this, we just have to exclude higher package versions through the /etc/yum.conf file. This setting has to be present on every node before installing:

[main] # One entry for every upper version (*-openshift*3.7.47 *-openshift*3.7.48) exclude= *openshift*3.7.46

Lastly, we must remove the following line from the aos_version.py file located in the Ansible openshift_health_checker module on host from which we are running the installer; otherwise the installer will ignore the exclusion setting we just configured in the previous step.

# /usr/share/ansible/openshift-ansible/roles/openshift_health_checker/library/aos_version.py # yb.conf.disable_excludes = ['all']

Once again, it is unclear to me the reason for this questionable logic but the end result is that removing the line allows the installer to continue its job.

Once the installer complete, running a yum list | grep openshift across nodes confirms that packages installed at the right version.

$ ansible all -m shell -a 'yum list | grep openshift'

10.90.66.117 | SUCCESS | rc=0 >>

atomic-openshift.x86_64 3.7.44-2.git.0.6b061d4.el7

atomic-openshift-clients.x86_64 3.7.44-2.git.0.6b061d4.el7

atomic-openshift-clients-redistributable.x86_64

atomic-openshift-cluster-capacity.x86_64

atomic-openshift-descheduler.x86_64

atomic-openshift-docker-excluder.noarch

atomic-openshift-dockerregistry.x86_64

atomic-openshift-excluder.noarch

atomic-openshift-federation-services.x86_64

atomic-openshift-master.x86_64 3.7.44-2.git.0.6b061d4.el7

atomic-openshift-node.x86_64 3.7.44-2.git.0.6b061d4.el7

atomic-openshift-node-problem-detector.x86_64

atomic-openshift-pod.x86_64 3.7.44-2.git.0.6b061d4.el7

atomic-openshift-sdn-ovs.x86_64 3.7.44-2.git.0.6b061d4.el7

atomic-openshift-service-catalog.x86_64

atomic-openshift-template-service-broker.x86_64

atomic-openshift-tests.x86_64 3.7.44-2.git.0.6b061d4.el7

atomic-openshift-utils.noarch 3.7.44-1.git.9.684c638.el7

golang-github-openshift-oauth-proxy.x86_64

golang-github-openshift-prometheus-alert-buffer.x86_64

hawkular-openshift-agent.x86_64 1.2.2-1.el7 rhel-7-server-ose-3.7-rpms

jenkins-plugin-openshift-client.x86_64

jenkins-plugin-openshift-login.x86_64

jenkins-plugin-openshift-pipeline.x86_64

jenkins-plugin-openshift-sync.x86_64

nodejs-openshift-auth-proxy.noarch

openshift-ansible.noarch 3.7.44-1.git.9.684c638.el7

openshift-ansible-callback-plugins.noarch

openshift-ansible-docs.noarch 3.7.44-1.git.9.684c638.el7

openshift-ansible-filter-plugins.noarch

openshift-ansible-lookup-plugins.noarch

openshift-ansible-playbooks.noarch

openshift-ansible-roles.noarch 3.7.44-1.git.9.684c638.el7

openshift-elasticsearch-plugin.noarch

openshift-eventrouter.x86_64 0.1-1.git5bd9251.el7 rhel-7-server-ose-3.7-rpms

openshift-external-storage-efs-provisioner.x86_64

openshift-external-storage-local-provisioner.x86_64

openshift-external-storage-snapshot-controller.x86_64

openshift-external-storage-snapshot-provisioner.x86_64

python2-openshift.noarch 1:0.3.4-2.el7 rhel-7-server-ose-3.7-rpms

tuned-profiles-atomic-openshift-node.x86_64

Wrapping up

I think we can consider ourselves lucky we were able to survive this incident without any major complications this time. Having said this, it did trigger an alert signal to remind us that we shouldn’t neglect proper housekeeping practices when hosting these platforms.

Our final goal is to digging in deeper into OpenShift playbook installers and figure out a way to create golden AMIs so that we are able to provision truly identical nodes that we can trust with our eyes closed.

Until then, hands to work.

Cheers