OWASP Top 10: XML External Entities (XXE) Security Vulnerability Practical Overview

Read Time: 5 min.

XML External Entities (XXE or XML injection) is #4 in the current OWASP Top Ten Most Critical Web Application Security Risks.

In December 2017, the research team at Check Point Software Technologies uncovered multiple vulnerabilities in APKTool's XML parser. The vulnerability would allow any maliciously modified ‘AndroidManifest.xml’ file to retrieve any file on the victim's computer and send it to the attacker's server. The researchers went on to find that the vulnerable parser – DocumentBuilderFactory – was also present in the three most popular Android IDEs (tools for app development). Potentially, anyone who used an app made with these IDEs was vulnerable to this XML threat. When an XML parser accepts code from an outside source, it's called an XXE; XML External Entity. XXE threats [CWE-611] are ranked A4 on OWASP's 2017 list of top 10 web application security risks.

Want to have an in-depth understanding of all modern aspects of XML External Entities (XXE) Security Vulnerability Practical Overview. Read carefully this article and bookmark it to get back later, we regularly update this page.

How to Detect XML External Entities (XXE) Vulnerabilities Free Website Security Test Non-intrusive GDPR Test

Non-intrusive PCI DSS Test Try Free Test ImmuniWeb® On-Demand Complete GDPR Audit

Complete PCI DSS Audit

Remediation Guidelines

DevSecOps Integration Learn More

What is the XML External Entities (XXE) risk?

XXE is a newcomer to the OWASP top 10, not having been present in the previous 2013 list. XML, or Extensible Markup Language, is a flexible tool for transmitting, storing and editing data. XML files can be accessed by a variety of software or web-apps, so it's an effective tool for allowing different businesses or applications to access common data. According to Gartner's IT glossary, “it has become the standard for business-to-business transactions, electronic-data interchanges and Web services.”

Part of what makes XML so flexible is the ability to define its own building blocks or ‘entities’, as well as define what counts as valid syntax. These definitions are made inline or in a separate file with Document Type Definitions, or DTDs. If multiple organizations agree on a standard DTD, it allows their applications to view and interpret data that basic XML wouldn't be able to parse. W3Schools provides a detailed rundown of how DTDs interact with XLM documents.

A DTD entry defining an entity would look like this:

<!ENTITY identity “Definition Value” >

Here, anything referenced in the code as “&identity;” would return “Definition Value” in the interpreter application. This becomes a risk when attackers can introduce their own definitions into an XML document; the ‘External Entity’ of XXE.

Any situation where attackers can introduce their own code to a system is bad, but XML's flexibility in integrating with other applications only makes this worse.

The scope of the problem

The biggest risk with XXE is the huge variety of ways in which it can be exploited. Whether simple or complex, if an external piece of code can make its way onto an XML document, the system has been compromised. XML's ubiquity means that applications making use of XML are likely to intersect with a lot of sensitive data.

The most widely-known form of XXE attack is known as the ‘Billion Laughs’ attack, or the ‘XML Bomb‘. This is a simple but effective denial of service attack used to overload and shut down a target server. By defining an entity – usually something small and nonsensical, like ‘lol’ or ‘haha’ – as a nested string of other entities, an attacker can quickly overload a system's resources. For example:

<!ENTITY haha “haha” > <!ENTITY haha2 “&haha;&haha;&haha;&haha;&haha;&haha;&haha;&haha;&haha;&haha;” >

This can be repeated with further lines of code defining “haha3” as 10 instances of “haha2”, and so on, increasing the ‘laughs’ tenfold with each line. By the time you test ‘haha9’ you are generating billions of ‘hahas’ with about a dozen lines of code – and overloading and potentially crashing the parser.

OWASP demonstrates how the basic syntax of an XXE exploit can be turned to a variety of malicious uses with only minor alterations. The first example shows an attempt to retrieve a file:

<?xml version = "1.0" encoding = "ISO-8859-1" ?> <!DOCTYPE foo [ <!ELEMENT foo ANY > <!ENTITY xxe SYSTEM “file:///etc/passwd” > ]> <foo > &xxe; </foo >

By simply changing the “!ENTITY” line, an attacker can use the same process to probe the server's private network:

<!ENTITY xxe SYSTEM “https://192.168.1.1/private” > ]>

Or initiate a Denial of Service attack by returning an endless or recursive file:

<!ENTITY xxe SYSTEM “file:///dev/random” > ]>

As the Check Point research into XXE vulnerabilities shows, there is virtually no limit to the damage an XML breach can cause. The research demonstrates how any document on the server could have been retrieved by the attacker. This would mean a successful XXE breach could lead to exposure of personally identifiable information (PII), sensitive internal corporate data or intellectual property (IP), and user credentials or banking information. A sophisticated attack could even remotely seize control of the app's functions.

Solutions

Proper prevention of XML vulnerabilities begins at the development level. App developers must have a good knowledge of XML and how to configure the parsers for best security. Good configuration will mitigate many of the threats associated with XXEs. For example, switching off or limiting entity expansion will neutralize the threat of a Billion Laughs attack. It's also worth considering at an early stage whether XML is the right choice for the application at all.

When possible, OWASP recommends using simpler formats for handling data, such as JSON. JSON is a newer and more lightweight syntax, and tends to be less exploitable than XML. Even in 2011, as this Gartner blog discusses, JSON was beginning to be seen as preferable to XML. Alternatively, disabling DTDs in the XML parser will prevent external elements completely. Some older XML applications may still depend on DTDs and not be able to disable them, but the good news is that newer applications can make use of ‘DTDLess‘ XML and still be functional.

If XML and DTDs are the only way forward, there are still reliable steps to make the app more secure. Many of these depend on the specific XML parser being used, so providing general guidelines is difficult. OWASP has provided a 'cheat sheet' for specific parsers and how to configure them against XXE.

OWASP suggests that manual code review to detect and fix XXE vulnerabilities is the best choice.

Especially for larger applications, OWASP suggests that manual code review to detect and fix XXE vulnerabilities is the best choice. However, a good SAST solution would go a long way in assisting with this. High Tech Bridge's ImmuniWeb products integrate both SAST and DAST, and can discover security vulnerabilities in more than just XML.

How to Protect Your Web Applications from XXE Attacks provides a detailed discussion on fixing XXE vulnerabilities.