Intro to XXE Vulnerabilities: AppSec Simplified

Protect your XML parsers against malicious XML documents!

Vickie Li
Photo by Jason Leung on Unsplash

Hey! And welcome to the first installment of AppSec Simplified. Today, we are going to explore a fascinating vulnerability called XML External Entity vulnerabilities, or XXEs!

To understand XXEs, we need to first talk about “DTDs” in XML documents.

XML documents can contain a Document Type Definition, or a “DTD”. DTDs are used to define the structure of an XML document and the data it contains. They are declared within the document using a “DOCTYPE” tag, like this:

<?xml version=”1.0" encoding=”UTF-8"?><!DOCTYPE ...INSERT DTD HERE...>

Within DTDs, you can declare “XML entities”. XML entities work a lot like variables in programming languages.

For example, this DTD declares an XML entity called “greeting” with the value of “Hello World!”. In the XML document, you can reference this entity using &greeting, and the XML document will load “Hello World!” in its place.

<?xml version=”1.0" encoding=”UTF-8"?><!DOCTYPE example [ <!ENTITY greeting "Hello World!" >

XML external entities

There is a special type of XML entities called “external entities”. They are used to access local or remote content with a URL. XML external entities can be declared using the “SYSTEM” keyword. For example, this DTD declares an external entity named “file” that points to file:///secrets.txton the local file system. The XML parser will replace any &file reference in the document with the contents of file:///secrets.txt.

<?xml version=”1.0" encoding=”UTF-8"?><!DOCTYPE example [ <!ENTITY file SYSTEM "file:///secrets.txt" >]><example>&file;</example>

What is the problem?

So what is the problem with this functionality? Imagine if your application parses user-supplied XML documents and displays the results on your site.

If users can declare arbitrary XML entities in their uploads, they can declare an external entity to any location on your machine. For example, this XML file contains an external entity that points to file:////etc/shadow on your server.

<?xml version=”1.0" encoding=”UTF-8"?><!DOCTYPE example [ <!ENTITY file SYSTEM "file:////etc/shadow" >]><example>&file;</example>

The “/etc/shadow” file stores usernames and their encrypted passwords on Unix systems. When the parsed XML document is displayed back to the user, the contents of file:////etc/shadow will also be included.

By exploiting the XML parser, a malicious user can now read arbitrary files on your server. They might be able to retrieve user information, configuration files, or other sensitive information like AWS credentials. Attackers can also launch a denial of service attack by making the XML parser dereference entities recursively. This is called a “billion laughs attack”. Talk about a catastrophic vulnerability!

So how do you prevent XXEs from happening? The best way to prevent XXEs is to limit the capabilities of your XML parsers.

Since DTD processing is a requirement for XXE attacks, developers should disable DTD processing on their XML parsers. If it is impossible to disable DTDs completely, then external entities, parameter entities, and inline DTDs should be disabled. You can also disable the expansion of XML entities entirely.

How you can configure the behavior of an XML parser will depend on the XML parser you use. For example, if you are using the default PHP XML parser, “libxml_disable_entity_loader” needs to be set to TRUE to disable the use of external entities. For more information on how to do it for your parser, consult the OWASP Cheat Sheet here:

Finally, you should routinely audit your applications to catch XXEs that might already be written into your code.

How would you go about detecting XXEs in your application? One approach you can take is to go through your application’s functionalities that process XML documents and test them with malicious XML input. For example, you can submit this XML document and see if the file file:///etc/hostname gets sent back to you.

<?xml version=”1.0" encoding=”UTF-8"?><!DOCTYPE example [ <!ENTITY test SYSTEM "file:///etc/hostname" >]><example>&test;</example>

But obviously, using a black-box approach is risky because it does not guarantee that you will find all instances of XXE in your system. Since XXE is a vulnerability with a clear and definable signature, analyzing your source code is a much better approach.

An XML parser is vulnerable to XXEs when they process user-supplied XML files or XML files whose DTD is polluted by user input. At the same time, the parser needs to be configured to evaluate DTDs and external entities. We are essentially looking for two things:

  • First, we are looking for XML parsers that receive user-supplied XML files or DTDs.
  • Then, we are checking if that XML parser evaluates DTDs or external entities.

You can manually audit your source code to look for these signatures or employ a static analysis security testing tool to stare at the code for you. Most static analysis tools can detect if your XML parsers are evaluating DTDs and external entities. But only static analysis tools that can work with data flows like ShiftLeft’s NG-SAST can detect if that parser is reachable by user input and thus automatically detect both of these conditions.

Finally, application dependencies cause many XXEs, so you should also monitor and upgrade all XML processors and libraries in use by your application or the underlying operating system.

Later in AppSec Simplified, we’ll target an open-source application and see how we can find XXE vulnerabilities using code analysis. Stay tuned!