Nick Daugherty is WordPress.com VIP Lead Engineer. Here he shares some important information about escaping in code and how that can increase security in WordPress sites anywhere in the world.

If there’s one issue we flag more often than all others in code reviews…it’s escaping.

For starters, we should all agree that escaping (fundamentally, sanitizing input and escaping output) is a critical aspect of web application security. What may be less universally agreed upon is where to escape. On that point, we require “late escaping“- escaping as close as possible to the point of output – and further, we now require it everywhere, always.

You may now be thinking:

“Do I really need to “late escape” everything? Always? Even core WordPress functions?”

We hear you. And, here’s why this is important to us:

In addition to some automated scanning, we manually review every line of code our VIP customers commit to the VIP platform. And, while the original author of a particular piece of code may know exactly where they’ve already escaped their output and/or it’s convenient to trust a WordPress core function’s escaping, it’s much, much faster and more reliable for our reviewers to check for “late escaping”. This way a reviewer can be 100% positive that output has been escaped properly by simply looking at the point of output.

We acknowledge this standard requires a bit more effort from developers writing code for the VIP platform. But, we see the benefit as three fold:

“late escaping” makes VIP reviewers more efficient, which means customer code is reviewed and deployed faster, a consistent practice of “late escaping” makes missed escaping obvious, thereby reducing the chances that unescaped output makes it into production, a consistently applied escaping standard- and we’ve chosen “late escaping” as ours- allows automated tools to better augment our human reviewers…further improving on #1 and #2 above.

To illustrate the importance of escaping everything, let’s look at a pattern where escaping is commonly omitted: Widget form elements.

A Widget form may look like this:

<label for="<?php echo $this->get_field_id( 'title' ); ?>"><?php _e( 'Title:' ); ?></label> <input type="text" id="<?php echo $this->get_field_id( 'title' ); ?>" title="<?php echo $this->get_field_id( 'title' ); ?>" name="<?php echo $this->get_field_name( 'title' ); ?>" value="<?php echo esc_attr( $title ); ?>"/>

Those get_field_id( 'title' ); ?> and get_field_name( 'title' ); ?> calls should be safe right, since they are core WordPress functions?

Let’s see what happens when we drop this bit of code anywhere in our codebase:

add_action( 'widget_form_callback', function( $instance, $widget ){ $widget->id_base = '"><img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-wp-preserve="%3Cscript%3Ealert(%22Greetings!%20You%20have%20been%20hacked.%22)%3B%3C%2Fscript%3E" data-mce-resize="false" data-mce-placeholder="1" class="mce-object" width="20" height="20" alt="<script>" title="<script>" />"<'; return $instance; }, -999, 2);

Oh no! Javascript has been injected where it shouldn’t be.

Here is a more real world case illustrating how easy it is to get to a point where we’re outputting values of indeterminate origin:

add_action( 'widget_form_callback', function( $instance, $widget ){ My_Widget_Controller::setup_widget_form( $instance, $widget ); return $instance; }, 10, 2); // ... class My_Widget_Controller { static function setup_widget_form( $instance, $widget ) { $widget->id_base .= self::get_widget_id_base( $instance, $widget ); $widget->name .= self::get_widget_name( $instance, $widget ); } static function get_widget_id_base( $instance, $widget ) { global $my_config_object; return get_option( 'my_widget_id_base_prefix' ) . '_' . $my_config_object['current_site']['widgets']['id_base']; } static function get_widget_name( $instance, $widget ) { $name = ''; // ... arbitrary processing to arrive at a $name return $name; } }

Now we’re down a rabbit hole, and it’s not so clear that get_field_id( 'title' ) will give us safe values.

Even values that are ‘100% safe and there is no way this could ever be abused’ need to be escaped, because future refactorings can introduce hard-to-detect vulnerabilities if there is unescaped code hanging around:

$class = ( 'featured' == $category ) ? 'home-featured' : 'standard'; ?> <div class="<?php echo $class; ?>">...

Seems harmless enough – $class can ever only have two values. Great, we’re safe!

Until 6 months from now, when a new business need refactors this to:

function my_get_post_class( $post ) { // ... arbitrary processing to determine a post class. Maybe we pull it from meta now? return get_post_meta( $post->ID, 'custom_post_class' ); } // ... $class = my_get_post_class( $post ); ?> <div class="<?php echo $class; ?>">...

Hmmm, now we’re outputting meta values directly, and there is no way to know that without following a potentially complex program flow – a recipe for an exploitable site.

What about constants? Those are the foolproof, never changing pillars of security, right? Consider the following:

// let's say this is for setting a class name, depending on the site we're on $my_setting = get_option( 'safe_data' ); // ... elsewhere define( 'MY_SAFE_CONSTANT', $my_setting ); // ... <div class="<?php echo MY_SAFE_CONSTANT; ?>">...</div>

later down the line, our option gets updated (somehow):

update_option( 'safe_data', '"><img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-wp-preserve="%3Cscript%3Ealert(%22hax0rd%22)%3B%3C%2Fscript%3E" data-mce-resize="false" data-mce-placeholder="1" class="mce-object" width="20" height="20" alt="<script>" title="<script>" />' );

Another example of how constants can be exploited is conditional constants:

if ( ! defined( 'MY_SAFE_CONSTANT' ) ) { define( 'MY_SAFE_CONSTANT', 'safe-value' ); } // ... elsewhere <div class="<?php echo MY_SAFE_CONSTANT; ?>">...</div>

As a hacker, all I need to do to inject anything I like into the page is to add this somewhere before the previous code:

define( 'MY_SAFE_CONSTANT', 'unsafe value' );

What About Core Functions?

This concept applies to nearly all code in a theme, including many core functions that return a value. Some core functions that output, such as bloginfo() , have output escaping applied automatically – we recommend using the equivalent ‘return’ function and manually escaping

Example: bloginfo( 'name' ); could be rewritten as esc_html( get_bloginfo( 'name' ) ); . This approach ensures everything is properly escaped and removes ambiguity.

A post on the merits of escaping would be incomplete without addressing the fact that most esc_*() functions in WordPress apply a filter before returning. While true, the simple answer is: Filters on the escaping functions simply are not allowed on WP.com, and would be quickly caught during code review. Your site is always much safer when escaping all output.

The Bottom Line

If it’s not escaped on output, it’s potentially exploitable. Never underestimate the abilities of an attacker – they’re experts at finding the way to make the ‘this should never, ever, be possible‘ things happen :). For maximum security, we must escape all the things.