Shellshock

This paper covers the basics of the Shellshock bash vulnerability, a discussion on ways to detect or prevent future Shellshock-like vulnerabilities, a timeline of what happened when, and some information about the specific CVEs (vulnerability identifiers). It ends with a few conclusions. This paper is part of the essay suite Learning from Disaster.

Shellshock basics

The shellshock vulnerability is a vulnerability in the widely-used bash command shell. This vulnerability had a huge impact. Here I will discuss the initial disclosure, the realization that there was a bigger problem, how to detect the Shellshock vulnerability, naming Shellshock, and the general aftermath.

Impact

The bash shell is widely used in many Unix-like systems, including Linux-based systems (such as Red Hat Enterprise Linux, Fedora, CentOS, Debian, and Ubuntu), *BSDs (such as FreeBSD and NetBSD), Apple MacOS X, and Cygwin (which runs on Windows). Thus, there were many systems that were potentially exploitable.

Shells are widely used on these systems to process commands, so there were many ways to potentially exploit Shellshock. A system is exploitable if an attacker could find a sequence of events where the attacker can control the content of an environment variable and have it sent through a bash shell with the Shellshock vulnerability. This situation included many systems running CGI web applications that were invoked via bash or invoked bash subshells, sshd using ForceCommand (to limit access to specific actions), and DHCP clients connecting to subverted DHCP servers.

Whether or not a system was actually exploitable depended on many low-level details. Many programs are invoked indirectly via shell scripts, and those scripts may invoke the default non-interactive shell (e.g., using #!/bin/sh ) or explicitly require bash (e.g., using #!/bin/bash ). Many low-level routines (such as system() , popen() , execlp() , execvp() , and execvpe() ) will always or under certain conditions invoke the default non-interactive shell (which in some cases is bash). Similarly, many languages have built-in routines that always or sometimes invoke the default non-interactive shell.

Systems often invoke the default non-interactive shell, so systems with bash as the default non-interactive shell were more likely to be exploitable than those that used another shell. For example, systems using Debian and Ubuntu were less likely to be exploitable, because their default non-interactive shell is dash (which was not vulnerable) instead of bash, but there were still cases where Debian and Ubuntu systems could be exploited. One point of confusion about Debian and Ubuntu is that their default interactive shell is bash, while their default non-interactive shell is dash, and it is primarily the non-interactive shell (aka /bin/sh ) that increases the exploitability of the Shellshock vulnerability. Similarly, Apple MacOS does not use bash in many circumstances, but there were cases where it could be exploited. Android systems use Linux but normally use the MirBSD (mksh) shell , which was not vulnerable. Some other systems, like Red Hat Enterprise Linux, CentOS, and Fedora, do use bash as the default non-interactive shell ( /bin/sh ), and that increased the likelihood that they could be exploited.

Attackers quickly used Shellshock to build a botnet, dubbed wopbot, that actively scanned and attacked systems at Akamai and the U.S. Department of Defense.

It is considered maximally severe on systems where it applies, e.g., under the CVSS scoring system it has a maximum score because it is network exploitable, has low access complexity, does not require authentication to exploit, and can allow complete control of a vulnerable system (leading to unauthorized information disclosure, unauthorized modification, and/or disruption of service). In addition, bash is very widely deployed, so this maximally-severe problem applied to a large number of systems. Finally, the original fix did not work, leading to a hurried worldwide effort to develop true fixes and then deploy them.

Initial disclosure

Shellshock was discovered by Stéphane Chazelas, reported to its developer and a few others, and assigned the CVE identifier CVE-2014-6271. The lead developer of bash, Chet Ramey, developed a fix which was rolled out by major distributors as part of a routine coordinated disclosure.

A quick aside on terminology is appropriate here. Some people use the term “responsible disclosure” instead, but that is a misleading and pejorative term. In 2011 Microsoft switched terminology from responsible disclosure to coordinated disclosure. I recommend that others do the same.

Anyway, the post CVE-2014-6271: remote code execution through bash by Florian Weimer (2014-09-24 17:03:19 +0200) was one of the first public public disclosures of the problem as it was then understood. It explained that (and I am quoting here):

“Bash supports exporting not just shell variables, but also shell functions to other bash instances, via the process environment to (indirect) child processes. Current bash versions use an environment variable named by the function name, and a function definition starting with “() {” in the variable value to propagate function definitions through the environment. The vulnerability occurs because bash does not stop after processing the function definition; it continues to parse and execute shell commands following the function definition. For example, an environment variable setting of VAR=() { ignored; }; /bin/id will execute /bin/id when the environment is imported into the bash process... So far, HTTP requests to CGI scripts have been identified as the major attack vector. A typical HTTP request looks like this: GET /path?query-param-name=query-param-value HTTP/1.1 Host: www.example.com Custom: custom-header-value The CGI specification maps all parts to environment variables. With Apache httpd, the magic string “() {” can appear in these places: Host (“www.example.com”, as REMOTE_HOST)

Header value (“custom-header-value”, as HTTP_CUSTOM in this example)

Server protocol (“HTTP/1.1”, as SERVER_PROTOCOL) ... The other vector is OpenSSH, either through AcceptEnv variables, TERM or SSH_ORIGINAL_COMMAND. Other vectors involving different environment variable set by additional programs are expected.”

Whether or not a system is exploitable depends on many complex factors; not every use of CGI made a system exploitable. See the earlier discussion on its impact. Still, it was a common enough circumstance that many systems were exploitable.

Realization of a bigger problem

The original understanding, however, turned out to be false. The repair created by the bash developers stopped parsing at the closing “}” of a function definition, but the bash shell was still parsing every environment variable that began with the magic sequence “ () { ”. People noticed and discussed its ramifications in locations such as the:

oss-security mailing list (a widely-archived mailing list, also called oss-sec, for public discussion of security flaws, concepts, and practices in the Open Source community)

bug-bash mailing list (a mailing list used to report and discuss bash bugs)

full disclosure mailing list (a public, vendor-neutral forum for detailed discussion of vulnerabilities and exploitation techniques)

Security researchers quickly realized that as long as bash continued to parse untrusted data, any error in the bash parser (which processed such variables) could lead to a dangerous security exploit. Yet the bash parser was never intended to be security-relevant! Researchers immediately looked for bash parser errors. Later that day, Tavis Ormandy reported and then tweeted an example of a bug in the bash parser that could lead to an exploit. This was assigned CVE-2014-7169, and soon afterwards four more bash parser errors were found (each of which had a CVE identifier assigned). What had been intended to be a coordinated disclosure had turned into a full disclosure process instead.

At this point one of the advantages of open source software (OSS) kicked in: People other than original software developer can examine the software, propose a solution, or implement that solution. Florian Weimer (Red Hat) quickly posted a patch for bash that counters the attack in a full and general way. In this patch, environment variables are only examined for shell functions if the variable names begin with prefix “ BASH_FUNC_ ” and suffix “ () ”. Adding prefixes had been previously suggested by Michal Zalewski; suffixes were an addition previously suggested by Eric Blake. Distributions rapidly deployed broad defenses that eliminated the problem, since they did not need to wait for the lead bash developer to determine how to fix it upstream. Red Hat, CentOS, Fedora, Oracle Linux, Debian, and Ubuntu adopt Florian Weimer’s prefix/suffix approach. Apple’s later OS X bash update 1.0 includes Florian Weimer’s approach, with slightly different prefixes and suffixes (prefix “ __BASH_FUNC <” and suffix “ >() ”). In contrast, NetBSD and FreeBSD disabled automatic imports (so imports must be specifically requested) per a patch proposed by Christos Zoulas; this also completely eliminates the vulnerability. Akamai developed their own emergency patch, which they deployed internally and made available to others. Antti Louko and later Solar Designer (Alexander Peslyak) posted approaches to perform binary patches (which were handy for patching otherwise unmaintained systems).

On 2014-09-27 22:50:07 -0400, Chet Ramey posted bash-4.3 official patch 27 aka “bash43-027” (along with related patches), formally accepting into mainline (upstream) bash Florian Weimer’s prefix/suffix approach. This eliminated the Shellshock problem in the upstream program (bash) used by everyone else as their baseline. The official version in bash used the same prefix “ BASH_FUNC_ ” that Florian originally proposed, but changes the suffix from “ () ” to “ %% ”. I think this official bash change is a mild improvement over Florian’s original patch; the sequence “ %% ” has no shell metacharacters, reducing the risk that it will trigger other problems.

Detecting the Shellshock vulnerability

To determine if your version of bash is vulnerable to Shellshock, run the following refined test on a Unix-like system command line (this should work on any Bourne or C shell):

env foo='() { echo not patched; }' bash -c foo

This will reply “ bash: foo: command not found ” on a repaired version of bash, while a vulnerable bash will typically reply “ not patched ” instead. The initial “ env ” can be omitted when typing the command into a POSIX/Bourne shell (including bash, dash, and ash). If you want to test a bash that is not first in your path, change bash to the full path to the program to test.

This refined test determines if bash automatically parses function imports at all when they are in normal environment variables; a correctly-patched bash must not. The cause of the Shellshock vulnerability is the inappropriate parsing of normal environment variables; this should not occur because some (though not all) normal environment variables include data provided by attackers. The refined test works correctly reports that bash is not vulnerable across many correct solutions, including any variation of the prefix/suffix change proposed by Florian Weimer, the elimination of automatic function imports as proposed by Christos Zoulas, or the elimination of function imports through binary patches as proposed by Antti Louko and Solar Designer. It also correctly detects that your version of bash is vulnerable if it is unpatched or if it only applied the original patch (the original patch did not fully solve the problem). Credits: This refined test was originally posted by Michal Zalewski and refined by Paul Vixie. This test has been posted other places too, e.g., the HOST project’s “Are you Open to Being Shell-Shocked?”

One reader asked me why a slightly different test does not reliably determine if bash is vulnerable. He noticed that the following test produces “not patched” on bash 4.3 with all patches up through patch 30:

env BASH_FUNC_foo%%='() { echo not patched; }' bash -c foo # bad test

This is true, but not a security vulnerability. The reason is that on most fixed bash implementations, only environment variables beginning with a specific prefix (typically “BASH_FUNC_”) and suffix (typically “%%”) are checked for functions to import. This is not a security vulnerability because attackers cannot choose the names of the environment variables in correctly-working programs. If an attacker could set an arbitrary environment variable to an arbitrary value then they control the program anyway (e.g., via PATH or LD_PRELOAD ). Programs that cross security boundaries, such as setuid or setgid programs, already must extract and erase environment variables. Since attacker data is only set in specific environment variables (such as the ones used in the CGI interface), and none of these have this particular prefix and suffix, they cannot cause any problems.

Hanno Böck developed a more detailed script to determine whether or not a given bash implementation is vulnerable, including specific tests for each known CVE. Perhaps most importantly, it first tests to see if the complete countermeasures are in place; it then tests for specific cases. However, this is more detailed information than many people need; this is primarily useful for security researchers and distribution vendors. Similarly, shellshocker.net posted a long sequence of tests for various specific exploits, yet their test for exploit 6 (CVE-2014-6278) as of 2014-10-07 16:00 -0400 is functionally identical to this test and is all that is needed.

Unfortunately, early scripts for detecting the Shellshock vulnerability only detected the original problem as it was originally understood and reported as CVE-2014-6271. For example, many people used this script to detect the vulnerability (as it was originally reported):

env x='() { :;}; echo vulnerable' bash -c "echo this is a test"

However, this version does not detect the full Shellshock vulnerability. This test only determines if code is run after the closing curly brace, but bash implementations that pass this test might still force the the bash parser to directly parse data provided by an attacker. Since the bash parser was not designed to do this, this creates a security risk. Some later scripts tested more, but sometimes unnecessarily worried people because they identified parser errors without first determining if they were exploitable.

Naming Shellshock

Stéphane Chazelas proposed calling vulnerability “bashdoor”, but this name did not catch on.

Instead, as part of discussion on what to name the vulnerability, Andreas Lindh proposed “Shell Schock” on 2014-09-24 16:42:21 +0000. Mark Stanislav (who had started the discussion) quickly commented that this name was a “good one”. Once spelled more conventionally, this name quickly caught on. Robert Graham in “Bash bug as big as Heartbleed” added an update, saying “I think people are calling this the ‘shellshock’ bug. Still looking for official logo”. Andreas Lindh then tweeted a proposed logo in response to Robert Graham’s challenge. This is confirmed by a tweet where Rob Graham credits Andreas Lindh for coining the name Shellshock. (My thanks to Larry W. Cashdollar who pointed me to this information.)

Aftermath

This was an extremely bad vulnerability. Unlike Heartbleed this attack was easy to exploit on vulnerable systems, and it granted attackers immediate control of vulnerable systems when successful. What is worse, the initial understanding of the problem was faulty, so the carefully-crafted response developed at first did not fully fix the problem.

The good news was that there was a rapid response of the various security researchers and distributors who quickly realized that the original fix was not enough. Instead of pretending there was no problem, they quickly identified solutions and distributed them to their users.

The biggest issue, as always, is those systems which are not rapidly updated. Indeed, many systems have no reasonable update process at all! A big question is how many embedded systems use bash. Historically embedded systems had limited resources, and thus would typically include a less-featureful shell like ash or dash. For example, busybox users would typically use its ash shell, not bash. However, modern embedded systems often have so much storage space that they could use a richly-featured shell like bash. Since embedded systems are rarely updated, or may not even have an update process, this could be a real problem. How many embedded systems have bash? I have no idea... but I think we’re going to find out.

One intriguing thing about this vulnerability is that journalists were more adept at examining social media, including mailing lists and Twitter, to track and report what was going on.

Many people are looking for similar problems. For example, FioraAeterna tweeted on 2014-10-02 an odd behavior of the Microsoft Windows command shell. In the Microsoft command shell, the commands:

set foo=bar^&ping -n 1 localhost echo %foo%

cmd.exe

command.exe

It is known that Windows scripts that are not properly quoted are vulnerable if an attacker can create malicious filenames. Some seem to be calling this a “new” attack, but I think that’s misleading; it’s just an old mistake. The need to properly quote shell scripts, especially for filenames, has been known for a long time, especially in the Unix-like world. See my paper “Filenames and Pathnames in Shell: How to do it Correctly” for how to properly handle filenames in Unix-like shells, and “Fixing Unix/Linux/POSIX Filenames: Control Characters (such as Newline), Leading Dashes, and Other Problems” for my proposal on how to reduce the likelihood of security vulnerabilities stemming from filenames on Unix-like systems.

There are many online resources with more information, including the Shellshock page on Wikipedia and HOST project page on Shellshock. You can also see the materials below, particularly the timeline and information about the specific CVEs.

How can we detect or prevent Shellshock-like vulnerabilities ahead-of-time?

I think it is critical to examine vulnerabilities to determine how can we detect or prevent things like it from happening again, ahead-of-time, and then doing those things. We should look for things like guidelines, rules of thumb, and detection techniques that would have cost-effectively helped as early as possible. For example, my paper How to Prevent the next Heartbleed identifies a set of techniques that people can apply to detect (and prevent) future problems like Heartbleed. My papers on the POODLE attack against SSLv3 and Apple goto fail vulnerability take similar approaches. I am especially interested in approaches that software developers or other technologists can apply before software gets delivered to users.

This is not easy. Crispin Cowan noted to me on 2014-10-03 that “detecting command injection is *much* harder than detecting memory corruption.” Similarly, Michal Zalewski noted that pondering “why bugs happen and how we fix it” is a good thing to ponder, but it is “certainly one where it’s difficult to come up with fresh ideas :-(”. I agree that it’s harder... but I still think it is a goal worth pursuing.

This is not to mock the good work done by Brian Fox and Chet Ramey. As noted in “The Internet Is Broken, and Shellshock Is Just the Start of Our Woes” by Robert McMillan, Wired, 2014-09-29:

Brian Fox is still proud of the project he once drove across the country. “It’s been 27 years of that software being out there before a bug [causing a vulnerability] was found... That’s a pretty impressive ratio of usage to bugs found.”

Here are the approaches I have identified (directly or with the help of others). I first go over a few general points. I then discuss some specific measures: document the external interface precisely and review it for security, create namespaces where practicable, separate data and code, minimize component functionality and/or enable replacement, sunset rarely-used functions, avoid using components (like shells) unnecessarily, require explicit import of data and code, and that there may be an approach involving taint tracking. Finally, I point out that perfection is unlikely, so system developers and operators must plan for vulnerabilities (including least privilege and timely updates).

A few general points

Preventing software vulnerabilities requires countermeasures throughout the software development process, including requirements, design, implementation, and test. This is true regardless of what process you use (be it iterative, agile, or whatever). You should use general measures such as educating developers (so they know what to do), identifying the program’s security requirements, examining attack surfaces (to limit what software is exposed to malicious input), limiting privileges (including via sandboxes), reducing complexity, implementing the software in a way that avoids common mistakes, using version control (e.g., using git), performing peer review of changes (including analyzing it for security), and evaluating software for vulnerabilities before release (using both static and dynamic analysis approaches). This is not a complete list, of course. I have a freely-available book focusing on the design and implementation of secure software, which has a lot of information on how to avoid common mistakes. That said, I want to focus on specific approaches that would have probably prevented or detected Shellshock ahead-of-time, so that we can apply them.

There were multiple stages in Shellshock:

Realizing that a malicious attacker can add commands “after” a function definition, and that that is exploitable. That was the original (and faulty) understanding of the problem. Realizing that merely having the shell parse data in normal environment variables AT ALL is exploitable, and needed to stop.

One thing that made Shellshock hard to identify as a vulnerability is that it is expected that if an attacker can control arbitrary environment variables then the system is already subverted. There are many environment variables, such as PATH and LD_PRELOAD , whose control would enable attacker control. Whenever you pass a security boundary (e.g., with setuid/setgid programs), you must extract and erase environment variables, as I point out in my book. On the other hand, there are specific environment variables that are expected to be controlled by an attacker, because they are used to pass data to a program that will check it. For example, the CGI interface uses environment variables to pass data. The Shellshock problem was that the bash parser itself was directly parsing and responding to all environment variables, not just the ones documented to affect bash (such as IFS and the environment variables that affect the loader).

As Michal Zalewski put it, “the bash bug was fairly unique and almost hilariously bad - but also a bit intractable. It dates back to the 80s, [and] cropped up in a place where I certainly wouldn’t think to look... Before this finding, it genuinely wouldn’t have occurred to most people that auditing bash is a good use of their time and money, not any more than it’s a good use of your time to audit /bin/uname... This is actually probably a lot more significant for libraries that don’t perform security tasks, but may be exposed in even more profound ways (e.g., how much money goes to libpng, ffmpeg, imagemagick?)”.

Once people realized that the bash parser was generally exposed to attack, many people quickly found vulnerabilities. Michal Zalewski reported how he quickly found CVE-2014-6277 and CVE-2014-6278 using the fuzzing tool american fuzzy lop. In short, once people realized that the bash parser was an attack surface, standard techniques quickly found problems.

The biggest problem was realizing that arbitrary environment variables were being processed by the bash shell, and that this was a serious problem all by itself. In my mind, we should focus on how to detect similar situations.

Document the external interface precisely and review it for security

The fact that bash could export and import functions was well-documented. However, how it exported and imported functions was not.

One possible countermeasure would have simply been to document “all the ways that environment variables are processed when the shell starts”. Had it been noted that any environment variable beginning with “() {“ was specially processed, I think it probably would have been identified much sooner. Even the process of documenting it today might alert someone.

In any case, once the interface is more clearly documented, then review the interface to determine if that is appropriate from a security point-of-view. The interface can be more easily reviewed once it is documented more precisely.

The design interface can be documented without specifying it as unchangeable interface. Some people worry that by writing down the details of an interface, they are guaranteeing that the interface cannot change or be improved. But there is no reason to assume that. For example, this information could be documented in a “design” section describing how the current implementation works, perhaps with an explicit statement that it might change in the future. Another approach is to extend the ENVIRONMENT section of the manual. It is common to list all environment variables affected by a program in an ENVIRONMENT section of the manual. The bash man page does not list the specific environment variables that affect it in its ENVIRONMENT section, but it could do so in a future version.

Michal Zalewski states that the shell function import “feature was clearly added with no basic consideration for the possibility of ever seeing untrusted data in the value of an environmental variable. This lack of a threat model seems to be the core issue... The detailed documentation part is perhaps easier to tackle. The security properties of shells are generally under-documented and counterintuitivie... Decent security-centric docs, authored or even merely just reviewed by the maintainers, would have helped highlight the risk.”

Dewey remarked on Bruce Schneier’s blog, when looking at the documentation, “The string ‘() {’ doesn’t appear once. There are vague references to ‘importing function definitions from the shell environment’ and the like, but I can’t find any description of how it works. There’s certainly no huge warning like ‘bash will look at literally every environment variable and import anything starting with ‘() {‘ as a function’. ‘Everyone knows’ that the environment is just a series of null-terminated key=value strings which, in general, are not interpreted except as documented in man pages for libc and each program that reads specific keys... if it had been properly documented, someone would have probably figured out earlier that this was a terrible idea... But we have to infer from statements like ‘The export and declare -x commands allow parameters and functions to be added to and deleted from the environment’, and that’s not nearly good enough for such a dangerous behavior.”

Create namespaces where practicable

The eventual solution for Shellshock used by most implementations was to add a prefix and suffix for specially-processed environment variables. This created a special namespace for importing shell functions. But you do not need to know of a vulnerability to use solutions like this; simple heuristics could have gotten there.

In general, whenever using any shared resource (such as environment variables or a filesystem), consider creating a separate namespace for just your program’s use. For example, when using environment variables, consider adding a prefix or suffix in the name. You should especially create a special namespace when interpreting a large set of values instead of just a particular one (such as “any function name” instead of “PATH variable”). Dewey remarked on Bruce Schneier’s blog, “If the vaguely-defined ‘importing function definitions’ feature had used ‘BASH_’-prefixed names, or names with any prefix, these things would never have passed a whitelist.” This has other advantages, e.g., it reduces the likelihood of future conflicts with other programs or standards which may otherwise use the same name.

The same approach applies to the filesystem. A specially-designated file or directory, specifically set aside for just your program, can eliminate the problems caused by shared resources. If you plan to use a large number of files, for example, put them all inside a directory that operates as the isolated namespace. It is then easier to isolate that information from everything else. This is already understood when dealing with temporary files and the /tmp directory, but this is a broader principle that can be reused elsewhere.

I should note that Stéphane Chazelas recommended a slightly different namespace approach for function imports, using a single environment variable to pass the functions to be exported. There are some advantages to this approach, e.g., there is less memory overhead when there are many bash functions. However, this would have required more analysis and work to implement; when there was a rush to fix the problem permanently, approaches that took longer were less welcome. If the developers had used a namespace from the beginning, they would have had time to consider the trade-offs before implementing something. (It’s not clear to me that they would have done anything differently, but it’s better to have the time to choose because there is no vulnerability.)

Separate data and code

A related point is that data and code should be separated to the extent possible, and not conflated. Some would say that this is basically the same as separating namespaces, but for emphasis I will list this as a separate related point. This is not always practical, but it is worth considering as a starting point (and giving up only as necessary).

This point is somewhat subtle. In a broader sense, all code is data. Data and code are typically stored on the same storage media and processed in the same memory. Compilers (including Just-in-Time compilers) would not work if data and code were always rigorously separated. Yet maintaining a distinction can significantly improve security.

I prefer a functional kind of definition for distinguishing data and code: is there a way an attacker could send data that would be interpreted as code? If so, then you have not separated data and code. Where possible, you should instead separate data and code (so this cannot happen). If you cannot reasonably separate data and code, then you need to prevent harm, but it is much harder to prevent harm if attackers can send code. Tools such as limited functionality, sandboxes, and whitelisted safe patterns require a lot more analysis work to verify! For security, the usual goal is to maximize the separation so that attackers cannot provide code that the system will later execute. The better the separation, the less risk later.

Note that using statically-compiled language does not eliminate the need for this. You can write a language interpreter in any Turing-complete language. For example, you may be writing code in Haskell, but it’s easy to create a Lisp interpreter in Haskell. If an attacker can provide code that it interprets, you have not separated data and code.

This is a point raised by Crispin Cowan (on 2014-10-03 on a private mailing list); he has previously spoken about the value of a “software Harvard Architecture” where you systematically separate the data from the code. This logically moves away from the traditional Von Neumann architecture, and thus eliminates the conflation of data and code. This point was also raised by Timothy D. Morgan (Tim) on oss-security; some of the points below are from him.

Mixing data and code is at the root of many security problems. As Tim notes, “Any time you design a system to accept executable code as well as data in the same format/context/whatever, you invite a huge number of possible attacks.” Examples include Microsoft Office files with embedded macros, HTML with Javascript embedded in the same file, and OGNL expressions in Apache Struts URL parameters.

Javascript was originally developed so that Javascript code can be embedded in HTML data. However, in many cases it is difficult to write Javascript-based systems without vulnerabilities like cross-site scripting (XSS), because the data and code are fundamentally mixed. An emerging solution is Content Security Policy (CSP), a W3C Candidate Recommendation, which separate data and code. CSP defines a new “Content-Security-Policy” HTTP header. When this header is used, it creates a whitelist of sources of trusted content for this webpage. Compliant browsers will then only execute/render items from those sources. This is supported in Chrome 16+, Safari 6+, and Firefox 4+ (unfortunately, IE 10 has only very limited support, but this still means that many users are protected through it). Twitter and Facebook have deployed CSP, and generally have had success. Typically you must modify your website design to fully use CSP (e.g., you must move the Javascript into separate files, otherwise the receiving browser can’t distinguish between whitelisted & malicious Javascript). Also, this only works when users use compliant browsers (if Javascript had been designed from the beginning to be in a separate file this would not have been a problem). More information about CSP is available at HTML5rocks and Twitter. This is a useful example of how to move an existing system that mixes data and code into something that separates data and code.

A related issue is the dangerous ways that some Javascript programs, especially early ones, processed data in JavaScript Object Notation (JSON) format by using eval . JSON is a widely-used data format; since it was derived from JavaScript and its syntax is (mostly) a subset of Javascript, it is often possible to parse JSON data by using JavaScript’s eval function. This is unsafe and can sometimes lead to a security vulnerability, because the eval function simply executes what it is passed, yet JSON is normally only data. Instead, developers should use a JSON parser designed to read (or write) JSON. Since 2010 web browsers have included native support for parsing JSON, eliminating the temptation to use eval in this dangerous way. Again, instead of mixing code with data, use techniques that separate them, or at least process them in a way that does not lead to ordinary data being treated as code.

In contrast, Tim describes an example where people are having trouble due to the mixture of data and code: “In Apache Struts [at the moment], OGNL is used are used to parse the entire POST body, variable names and values. However, OGNL expressions are executable code, which breaks the whole assumption that POST variables are data. So the Struts team is now playing whack-a-mole with blacklist blocking of specific attack vectors... In the case of Shellshock, the ‘mixing’ of data and code came about because environment variables, normally used to carry data, were overloaded and used to carry code. This is very similar to the Struts case.”

Crispin Cowan argues that instead of supporting eval (evaluation) function of arbitrary data, systems should require an explicit “Clarke-Wilson ceremony” to convert untrusted data into trusted code (if the source is trusted) or harmless markup text (if the source is not trusted). E.g., ToStaticHTML removes all dynamic HTML elements and attributes from an HTML fragment.

Florian Weimer noted that calling eval (or its equivalent) on untrusted input is a relatively common issue - and is a bug. Before evaluating anything, check to ensure there is no way that untrusted input can get there (or if it can, ensure that it must always be filtered and escaped in a safe manner). I think this is another variant of the same point. I should note that upstream bash, and most deployed versions, still automatically import function definitions when included in environment variable names with prefixes and suffixes. This is currently considered acceptable by many people, because there is no known mechanism for untrusted input to get there. That said, this is not as safe as you might like; I have previously argued that the safest course from a security point-of-view would be to both apply the prefix/suffix namespace (separating data and code), and to require that function imports be specifically requested (as implemented by Christos Zoulas). I talk more below about requiring explicit import of data. This trade-off of potential risk (on the one hand), versus functionality and backwards compatibility, is a constant issue.

This separation is especially important if you are using an existing construct for storing data (such as environments). As Timothy D. Morgan notes, “When an existing construct in a system is widely expected to be used for storing data, avoid overloading it for use of storing code.”

Separating data and code is something that can be accomplished in many ways. Separate namespaces are one mechanism for creating this separation, if you must use the same underllying system. Separate files or URLs for data and code, as done in the Content Security Policy (CSP), is a related approach. SQL injections are countered through prepared statements (when used correctly), and prepared statements also separate data and code. The point is to develop and use simple mechanisms that perform this separation, and enforce them where practical.

Obviously there are places where this can only be partly achieved. For example, the whole purpose of a shell (like bash) is to accept data and execute it. Still, shells should not execute arbitrary code provided by adversaries without being asked to do so, and this was the problem with Shellshock.

In some cases full isolation between data and code is not practical, and this can happen in a surprising number of places. For example, fonts and document formats like PDF include code. In those situations you can try to isolate and separate to some extent. In many cases it is possible to create a limited virtual machine that cannot reach outside to other resources; where this is not, treat it as executing malicious code and audit the implementation carefully. However, this is hard to do correctly, and many vulnerabilities have been found in systems that take this approach. Even if you do it correctly, it takes a lot of additional time to ensure that it stays secure. Thus, mix code with data only when you absolutely need to, and be prepared for extra effort if you must do it.

Require explicit import of data and code

Shellshock turned an odd capability into a vulnerability because bash automatically imported functions when environment variables had values of a particular format. Yet importing data and code does not need to be automatic.

In the case of Shellshock, Christos Zoulas created a patch that supported function import, but required users to specifically request it. This eliminated practically all exploitations, because this functionality was rarely used, and it could be enabled only in the few cases it is needed.

There is precedence for this in the PHP language. By default, in PHP versions 4.1.0 and lower, all environment variables and values sent to PHP over the web were automatically loaded into the same namespace (global variables) that normal variables are loaded into. This automatic setting of variables (essentially an import) probably seemed convenient to the original PHP developers. However, this meant that attackers could set arbitrary variables to arbitrary values, which keep their values unless explicitly reset by a PHP program. In addition, PHP automatically creates variables with a default value when they’re first requested, so it’s common for PHP programs to not initialize variables. This made it extremely difficult to write secure programs in PHP at the time. PHP version 4.2.0 (which is now old) changed this, so that by default external variables (from the environment, the HTTP request, cookies or the web server) are no longer registered in the global. The preferred method of accessing these external variables became using new Superglobal arrays, a mechanism for explicit imports (instead of implicit imports).

There are other reasons to do this besides security, too. Requiring a programmer to explicitly state what they are importing can make it easier for later developers to see what the program depends on, and thus can make it easier to maintain.

However, there are downsides as well. Shells need to be easy to use; requiring explicit importing creates some additional work, especially if the import mechanism is poorly thought out. In the case of bash, it also creates a backwards-incompatibility; explicit imports are much easier to design into a language from the beginning instead of changing to them later.

Minimize component functionality and/or enable replacement

When choosing security-relevant components, choose components or their configurations to minimize their functionality to the smallest extent needed. Similarly, when implementing a component, consider if that functionality is really needed, especially if might increase security risks. This is not because smaller or less-functional components are magically more secure; it is just that smaller and simpler components are easier to thoroughly review, and so if there is review the review is less likely to miss a security issue. Finally, consider making it easy to replace (e.g., by following standards so that you can easily replace a component with another implementation).

Debian and Ubuntu were much less vulnerable to Shellshock compared to some other Linux-based systems like Red Hat Enterprise Linux, CentOS, Fedora, or Oracle Linux. That is because years ago Debian (starting with Squeeze) and Ubuntu (starting with version 6.10) changed their default non-interactive shell (aka /bin/sh ) from bash to dash. This change was primary made for efficiency; because bash is very full-featured, it is rather large and slow to start up and operate by comparison with dash. Bash continues to be the default shell for interactive use on both Debian and Ubuntu, because many interactive users prefer bash’s richer feature set. Also, a number of shell scripts require bash, because those additional features are often very useful.

It is easy to show that dash is smaller than bash by measuring source lines of code (SLOC). SLOC is an imperfect but useful measure for estimating development effort, size, and to some extent functionality. I used my tool SLOCCount version 2.26 to measure SLOC; it measures SLOC as non-comment non-blank physical lines after automatically detecting what language is used in each file. Using SLOCCount, I find that dash version 0.5.8 has 14,208 physical SLOC (13,040 in C). In contrast, bash version 4.3 (unpatched) has 115,715 physical SLOC (99,988 in C). (I verified that the bash numbers do not include code that is generated from flex or bison.) In short, bash is over 8 times larger than dash by this measure.

In theory this change should have been painless, because they were switching from one POSIX-compatible shell to another. In practice, it was not painless, but it was doable. Ubuntu’s DashAsBinSh lists the many ways that people accidentally depend on bash extensions. A tool called checkbashisms was developed to find these extensions, so that people could quickly find them and change them. If people determined that the changes were too difficult, they could change their scripts to make the dependency on bash explicit (e.g., by making script headers #!/bin/bash and by modifying Makefiles to say SHELL=/bin/bash ).

This change was made much easier because there is a well-known standard (POSIX) that is publicly available at no charge and is widely adhered to. Modern POSIX systems are built from a large number of different components that can be swapped out for other components, and this includes the shell. (This design approach, using modular components with standardized interfaces, is sometimes called a “modular open systems approach”). What is more, the work by Debian and Ubuntu to ensure that shell programs were portable should make it easier for other systems to consider doing the same. I have anecdotal reports that some people switched their individual systems /bin/sh from bash to another shell (like dash) temporarily; this switch was only possible because there were ready alternatives.

Their experience does illustrate some of the problems of this approach. In some situations it is rather difficult or time-consuming to develop software without the additional functionality (for example, bash adds support for arrays, and some programs are hard to create in shell without them). Still, this is an approach worth considering. Tools can be developed to help with the transition, or the more-limited component could be slightly extended just enough to ease the transition. The Debian/Ubuntu experience also shows that you do not have to always give up functionality; instead, you can choose a component with more restricted functionality in most cases, and use a more full-featured component only when it is needed.

And of course, there are some controversies based on whether or not dash is really countering attacks as well as it could be. As Tavis Ormandy noted in 2013, both bash and ksh include an interesting hardening technique that he says “is surprisingly effective at mitigating some common vulnerability classes and misconfigurations.” He notes that bash will “drop privileges very early if uid != euid” (i.e., is called indirectly through a setuid program) in various cases (e.g., when called as bash without privilege mode). The pdksh shell version 5.0.5 or later also includes this technique. However, this hardening technique has not been incorporated into dash. My point is not whether or not dash should implement this; my point is that switching is not always as straightforward an answer as you might think.

It’s worth noting that ksh was not vulnerable to Shellshock because they intentionally decided to not implement function imports.

Avoid using components (like shells) unnecessarily

Many programs need to invoke other programs. An easy way to do this is by using a shell, so many languages have built-in functionality that make it easy to invoke a shell. C and C++, for example, have system() and popen() . Other languages that provide easy-to-use methods to invoke the shell include Java, Perl, and Python. However, in many cases there is no real reason to ask the shell to execute a program if all you are doing is executing another program. Instead, simply execute the program directly. This is more efficient, and reduces any risk of shell vulnerabilities (since the shell is not involved).

In short, go ahead and (carefully) use the shell or any other component if you need to in a program that might be attacked. However, only use a component if it really provides an easier or better way to do a job! This would not have eliminated Shellshock, but it would have reduced its effects.

Sunset rarely-used functions

A more controversial approach would be to aggressively sunset (remove) features that do not catch on, a possible approach mentioned by Michal Zalewski. Bash has had function imports for 25 years, but many users were unaware of it. I personally use bash daily and didn’t know about it, and many others (including Michal Zalewski) said the same thing. Both Shellshock and Heartbleed were fundamentally exploits involving rarely-used functionality; removing rarely-used functionality could eliminate them as attack vectors.

That said, this is an idea that is hard to apply in practice. Bash is widely used in part because its developers try to not harm backwards compatibility. (Sometimes backwards compatibility is broken, but my point is that the developers usually try to remain backwards-compatible. Here is a list of known backwards incompatibilities.) Weimer found a number of uses of bash function export/import, especially in test harnesses, in a search through Debian’s code repository. So while I think this is an idea worth pursuing in general, it is less likely that it would have countered Shellshock.

Taint tracking

I and other people suspect that taint tracking could be used in some way to detect these kinds of problems. Micro-tainting, for example, could track at a finer grain where data came from, and an automated detection system could perhaps be created this way. A variant of this approach is to use some sort of advanced type system (which in some ways is a kind of taint tracking). At this point this is a vague idea, not a specific process; suggestions welcome.

Plan for vulnerabilities (including least privilege and timely updates)

Security vulnerabilities should be detected and eliminated before they are deployed where they can, but clearly that does not always happen. Unfortunately, many systems are not ready for the inevitable vulnerability disclosure, be it Shellshock or anything else. You should plan for the occasional vulnerability discovery, including using least privilege and timely updates, so I will now discuss those.

First, systems should be designed and implemented so that they have least privilege. In short, break components down and only give them the access they need (through file privileges, SQL grants, sandboxes, and so on). Consider designing with “mutually suspicious components” - then, if one component is broken into, that does not mean the whole system is owned. One widely-available mechanism for constraining damage on Linux-based systems is SELinux. As Dan Walsh noted, “SELinux does not block the exploit but it would prevent escalation of confined domains... SELinux would probably have protected a lot/most of your valuable data on your machine. It would buy you time for you to patch your system.” Colin Powers has some pictures showing how SELinux constrains damage.

Also, you and your systems need to be prepared to have timely updates. The big concern now with Shellshock are the systems that did not quickly apply the patch or other countermeasures (such as traffic filtering). Full patches for bash were available quickly, but that does not matter if system developers and administrators do not deploy it. This is especially an issue for embedded systems including cheap routers and the cell phone components used by mobile phones, because they are often not prepared for updates.

People need to be either prepared to update their systems quickly, or have it done automatically for them. Increasingly people are deploying embedded systems that are not prepared to be rapidly or automatically updated; indeed, they often cannot be practically updated at all. This lack of preparedness puts us all at risk. Organizations should be required to either provide timely updates, or provide users with the means necessary so that they can update their systems (e.g., the ability to update the software), at least for a reasonable length of time (say, a decade). In many cases, updates should be automatic unless the user specifically says otherwise; there are simply too many devices to keep track of otherwise. Automated updates are a risk, but automated regression test suites, data standards, digitally-signed updates, and rollback functionality can greatly reduce that risk.

Dan Geer, in his Black Hat USA 2014 keynote talk “Cybersecurity as Realpolitik”, makes a related point. He proposed that “embedded systems cannot be immortal if they have no remote management interface”. He noted that, “what is sold at Best Buy or the like is remarkably cheap and remarkably old... [since] the average [codebase age] is 4-5 years, then ... the CVE catalog lists numerous methods of attacking those operating systems and device drivers remotely.”

These are not preventative measures, and preventative measures are in many ways the best. That said, we need to be prepared when the preventative measures inevitably fail.

Timeline

Below is a timeline of Shellshock, including citations to justify it. My sincere thanks to those who helped, including Stéphane Chazelas (e.g., for vulnerability insertion dates and report times) and Eric Blake (e.g., for bash patch dates). Remember that people do not necessarily represent the organizations they work for.

In this timeline I use primary sources where possible, since they are the most reliable sources of information. In most cases the primary sources are either verifiable logs with date-time stamps (in ChangeLog files or git logs) or public postings by the actual participants (on a mailing list or Twitter). For example, it was widely reported that the Shellshock vulnerability was introduced in 1992, but this is incorrect. As explained below, primary sources prove that the Shellshock vulnerability was introduced into bash on 1989-08-05 08:32:05 -0700 (timezone estimated) and later released as part of bash version 1.03.

Timezone information is often vital because people rapidly responded from around the world. The timezone “Z” represents Coordinated Universal Time (UTC). You can convert all other times to UTC by subtracting the timezone offset (recall that subtracting a negative is the same as adding a positive). Twitter shows date and time, but does not normally show the timezone for a tweet. However, using “view source” reveals data-time values; StackExchange confirms that these are Unix time values, so I used Unixtimestamp to convert these values to UTC and then determined their timezone. Github does not visibly show exact times, but git clone downloads the metadata including exact times. The seconds value in the old times may be slightly off, but modern systems typically have excellent time accuracy due to capabilities such as the network time protocol (NTP).

CVEs

There are six CVEs assigned to Shellshock, and there are no CVEs assigned specifically for the general hardening of bash as implemented by various distributions and bash patch bash43-027. This is rather confusing, but this is due to the sequence of events when the vulnerability was found. Also, CVEs track vulnerabilities, not solutions (because better solutions might be found later). See the previous section on detecting the Shellshock vulnerability if that is what you want to do.

The following list of CVEs, and the bash patches that addressed them, are based on information from Michal Zalewski’s summary on 2014-10-02, Eric Blake, and Chet Ramey’s summary on 2014-10-03:

CVE-2014-6271 - original report. Fixed by bash43-025 (etc.) on 2014-09-24.

CVE-2014-7169 - file creation / token consumption bug found by Tavis. Fixed by bash43-026 (etc.) on 2014-09-26.

CVE-2014-7186 - 10+ here-doc crash found by Florian and Todd. Fixed by bash43-028 (etc.) on 2014-10-01.

CVE-2014-7187 - off-by-one parsing error found by Florian. Fixed by bash43-028 (etc.) on 2014-10-01.

CVE-2014-6277 - uninitialized memory issue found by Michal Zalewski. Fixed by bash43-029 (etc.) on 2014-10-02.

CVE-2014-6278 - command injection remote command execution (RCE) found by Michal Zalewski. Fixed by bash43-030 (etc.) on 2014-10-05.

Again, all of these are mitigated by Florian Weimer’s patch, a variant of which was accepted upstream by the bash developers on 2014-09-27 via bash43-027 and related patches. They are also all countered by Christos Zoulas’s patch that only imports environment variables by request (this was the approach used in FreeBSD), and by the various binary patches that disable function import entirely. As Zalewski notes, “If you have that patch [by Weimer], there’s no point in obsessing about the status of individual bugs, because they should no longer pose a security risk.”

Conclusions

This paper covered the basics of the Shellshock bash vulnerability (including how to detect it), a discussion on ways to detect or prevent future Shellshock-like vulnerabilities, a timeline of what happened when, and some information about the specific CVEs (vulnerability identifiers).

It is much more difficult to detect Shellshock-like vulnerabilities than other kinds of vulnerabilities - which is a major reason that it took so long to find! That said, there are clearly ways to find them, so I hope that other key programs and libraries that people depend on will be examined for similar problems.

If you enjoyed this paper, you might also enjoy the entire suite of related papers in my essay suite Learning from Disaster, which also includes How to Prevent the next Heartbleed and the POODLE attack against SSLv3. My essays Filenames and Pathnames in Shell: How to do it Correctly, and Fixing Unix/Linux/POSIX Filenames: Control Characters (such as Newline), Leading Dashes, and Other Problems. discuss filename issues with shell that can lead to security issues. You might also want to look at my book on how to develop secure programs.

Feel free to see my home page at https://dwheeler.com. You may also want to look at my paper Why OSS/FS? Look at the Numbers! and my book on how to develop secure programs.

(C) Copyright 2014 David A. Wheeler.