Major Remote SSH Security Issue in CoreOS Linux Alpha, Subset of Users Affected

• By Alex Crawford

Update 2 (May 19): Read the post-mortem blog post dissecting this vulnerability and the CoreOS response

Update 1 (May 16 04:28 PDT): 99% of affected hosts have been updated

A misconfiguration in the PAM subsystem in CoreOS Linux Alpha 1045.0.0 and 1047.0.0 allowed unauthorized users to gain access to accounts without a password or any other authentication token being required. This vulnerability affects a subset of machines running CoreOS Linux Alpha. Machines running CoreOS Linux Beta or Stable releases are unaffected. The Alpha was subsequently reverted back to the unaffected previous version (1032.1.0) and hosts configured to receive updates have been patched. The issue was reported at May 15 at 20:21 PDT and a fix was available 6 hours later at 02:29 PDT.

Identifying an Affected System

To determine if a CoreOS Linux Alpha system is currently running or was running with this vulnerability, run (as root):

$ TMP=$(mktemp --directory) && for p in $(cgpt find -t coreos-usr 2> /dev/null); do blkid $p -t TYPE=ext4 > /dev/null || continue && mount -o ro $p $TMP && grep 'DISTRIB_RELEASE=\(1045.0.0\|1047.0.0\)' ${TMP}/share/coreos/lsb-release && echo AFFECTED INSTALL $p $(sudo cgpt show $p | grep -Eow -e 'priority=.*') ; umount $TMP; done; rmdir $TMP; unset TMP

If the system has an affected version installed the command will print something like:

AFFECTED INSTALL /dev/sda3 priority=1 tries=0 successful=1

This means that the /dev/sda3 device has a copy of CoreOS Linux Alpha with this vulnerability.

To determine the next course of action look at the successful= field.

If successful=1 or greater this partition was successfully booted. This machine is affected by the vulnerability and may have been compromised. See instructions below on the course of action.

If successful=0 then there have been no attempts at booting this partition. You can clear the update and install a fixed version with two commands:

$ update_engine_client -reset_status

Note: The above command will print out instructions for updating the partition table. Ignore this information.

Next, force an update by running:

$ update_engine_client -update

Fixing Systems that Booted an Affected Version

If your host booted an affected version, you should immediately limit network access to SSH and then update CoreOS Linux. An update to a fixed version can be forced by running:

$ update_engine_client -update

However, if a system was compromised while running version 1045.0.0 or 1047.0.0, it may still be in an insecure state after an upgrade if the SSH port (TCP 22) was exposed to the internet or other untrusted network. Reinstalling the system from scratch is the recommended course of action.

Basic Forensics

The most common users that will exist on a CoreOS machine with valid login shells are "operator" and "core". You can see if there were any successful and unsuccessful login attempts by running last and lastb respectively. For a more complete log use journalctl _EXE=/usr/sbin/sshd

Total impact of the issue

Once we became aware of the issue we immediately ceased further distribution of the affected version and removed the vulnerable images from all distribution locations. Based on log data from the CoreOS Linux Update Service roughly 3% of online, auto-upgrading, hosts were affected.

This issue demonstrates a hole in our test coverage for new releases. We will perform a comprehensive review of our processes in order to avoid similar issues in future and share these improvements in a future post.

CoreOS is designed to make updating to the latest version as painless as possible. We believe that frequent, reliable updates are critical to good security. To do this, we utilize an “over the air” update system that provides a continuous stream of patches. New vulnerabilities will exist in perpetuity of software development, so we believe that an organization's ability to remediate them quickly is the key to on-going security. To be able to service updates quickly, but safely, we take advantage of a variety of techniques:

Automated testing before reaching update channel: CoreOS is subject to an automated testing suite that performs validation and testing in all major supported CoreOS environments.

Gradual rollout: Updates are slowly rolled out to the CoreOS Linux population to prevent propagation of bad updates.

Makes use of alpha, beta, and stable channels: First updates are given to the alpha channel, which sit for four weeks before being propagated to beta. The same process continues until beta is promoted to stable which typically happens eight weeks after alpha.

Distributed systems: In a distributed environment, the system will keep running even in the event of a failure. This, in turn, makes it more safe to aggressively patch and update your software in general.

Offline signing: All CoreOS updates are signed via an offline, air-gapped, signing process to reduce risk of bad updates being produced by malicious parties.

The majority of security vulnerabilities are introduced by human error, and this issue is no exception. This issue demonstrates the weaknesses but also advantages of automatic upgrades. In this case, channeled upgrades and gradual rollout caught the issue before wide propagation. All software is subject to vulnerabilities and an organization's ability to react and move quickly is key to ongoing security.