Why You Should Always Use HTTPS

Two keys, crossedThis first appeared on Mashable in May 2011. Five years later, the SSL Pulse notes only 76% of the top 200K web sites fully support TLS 1.2, with a quarter of them still supporting the egregiously insecure SSLv3. While Let’s Encrypt makes TLS certs more attainable, administrators must also maintain their sites’ TLS configuration to use the best protocols and ciphers available. Check out www.ssllabs.com to test your site.


The next time you visit a cafe to sip coffee and surf on some free Wi-Fi, try an experiment: Log in to some of your usual sites. Then, with a smile, hand the keyboard over to a stranger. Now walk away for 20 minutes. Remember to pick up your laptop before you leave.

While the scenario may seem silly, it essentially happens each time you visit a website that doesn’t bother to encrypt the traffic to your browser — in other words, sites using HTTP instead of HTTPS.

The encryption within HTTPS is intended to provide benefits like confidentiality, integrity and identity. Your information remains confidential from prying eyes because only your browser and the server can decrypt the traffic. Integrity protects the data from being modified without your knowledge. We’ll address identity in a bit.

There’s an important distinction between tweeting to the world or sharing thoughts on Facebook and having your browsing activity going over unencrypted HTTP. You intentionally share tweets, likes, pics and thoughts. The lack of encryption means you’re unintentionally exposing the controls necessary to share such things. It’s the difference between someone viewing your profile and taking control of your keyboard.

The Spy Who Sniffed Me

We most often hear about hackers attacking websites, but it’s just as easy and lucrative to attack your browser. One method is to deliver malware or lull someone into visiting a spoofed site (phishing). Those techniques don’t require targeting a specific victim. They can be launched scattershot from anywhere on the web, regardless of the attacker’s geographic or network relationship to the victim. Another kind of attack, sniffing, requires proximity to the victim but is no less potent or worrisome.

Sniffing attacks watch the traffic to and from the victim’s web browser. (In fact, all of the computer’s traffic is visible, but we’re only worried about websites for now.) The only catch is that the attacker needs to be able to see the communication channel. The easiest way for an attacker to do this is to sit next to one of the end points, either the web server or the web browser. Unencrypted wireless networks — think of cafes, libraries, and airports — make it easy to find the browser’s end point because the traffic is visible to anyone who can obtain that network’s signal.

Encryption defeats sniffing attacks by concealing the traffic’s meaning from all except those who know the secret to decrypting it. The traffic remains visible to the sniffer, but it appears as streams of random bytes rather than HTML, links, cookies and passwords. The trick is understanding where to apply encryption in order to protect your data. For example, wireless networks can be encrypted, but the history of wireless security is laden with egregious mistakes. And it’s not necessarily the correct solution.

The first wireless encryption scheme was called WEP. It was the security equivalent of pig latin. It seems secret at first. Then the novelty wears off once you realize everyone knows what ixnay on the ottenray means, even if they don’t know the movie reference. WEP required a password to join the network, but the protocol’s poor encryption exposed enough hints about the password that someone with a wireless sniffer could reverse engineer. This was a fatal flaw, because the time required to crack the password was a fraction of that needed to blindly guess the password with a brute force attack: a matter of hours (or less) instead of weeks.

Security improvements were attempted for Wi-Fi, but many turned out to be failures since they just metaphorically replaced pig latin with an obfuscation more along the lines of Klingon (or Quenya, depending on your fandom leanings). The problem was finding an encryption scheme that protected the password well enough that attackers would be forced to fall back to the inefficient brute force attack. The security goal is a Tower of Babel, with languages that only your computer and the wireless access point could understand — and which don’t drop hints for attackers. Protocols like WPA2 accomplish this far better than WEP ever did.

Whereas you’ll find it easy to set up WPA2 on your home network, you’ll find it sadly missing on the ubiquitous public Wi-Fi services of cafes and airplanes. They usually avoid encryption altogether. Even still, encrypted networks that use a single password for access merely reduce the pool of attackers from everyone to everyone who knows the password (which may be a larger number than you expect).

We’ve been paying attention to public spaces, but the problem spans all kinds of networks. In fact, sniffing attacks are just as feasible in corporate environments. They only differ in terms of the type of threat, and who might be carrying out the sniffing attack. Fundamentally, HTTPS is required to protect your data.

S For Secure

Sites that don’t use HTTPS judiciously are crippling the privacy controls you thought were protecting your data. Websites’ adoption of opt-in sharing and straightforward privacy settings are laudable. Those measures restrict the amount of information about you that leaks from websites (at least they’re supposed to). Yet they have no bearing on sniffing attacks if the site doesn’t encrypt traffic. This is why sites like Facebook and Twitter recently made HTTPS always available to users who care to turn it on — it’s off by default.

If my linguistic metaphors have left you with no understanding of the technical steps to execute sniffing attacks, you can quite easily execute these attacks with readily-available tools. A recent one is a plugin you can add to your Firefox browser. The plugin, called Firesheep, enables mouse-click hacking for sites like Amazon, Facebook, Twitter and others. The creation of the plugin demonstrates that technical attacks can be put into the hands of anyone who wishes to be mischievous, unethical, or malicious.

To be clear, sniffing attacks don’t need to grab your password in order to impersonate you. Web apps that use HTTPS for authentication protect your password. If they use regular HTTP after you log in, they’re not protecting your privacy or your temporary identity.

We need to take an existential diversion here to distinguish between “you” as the person visiting a website and the “you” that the website knows. Websites speak to browsers. They don’t (yet?) reach beyond the screen to know that you are in fact who you say you are. The username and password you supply for the login page are supposed to prove your identity because you are ostensibly the only one who knows them. So that you only need authenticate once, the website assigns a cookie to your browser. From then on, that cookie is your identity: a handful of bits.

These identifying cookies need to be a shared secret — a value known to no one but your browser and the website. Otherwise, someone else could use your cookie value to impersonate you.

Mobile apps are ignoring the improvements that web browsers have made in protecting our privacy and security. Some of the fault lies with the HTML and HTTP that underlies the web. HTTP becomes creaky once you try to implement strong authentication mechanisms on top of it, mostly because of our friend the cookie. Some fault lies with app developers. For example, Twitter provides a setting to ensure you always access the web site with HTTPS. However, third-party apps that use Twitter’s APIs might not be so diligent. While your password might still be protected with HTTPS, the app might fall back to HTTP for all other traffic — including the cookie that identifies you.

Google suffered embarrassment recently when 99% of its Android-based phones were shown to be vulnerable to impersonation attacks. The problem is compounded by the sheer number of phones and the difficulty of patching them. Furthermore, the identifying cookies (authTokens) were used for syncing, which means they’d traverse the network automatically regardless of the user’s activity. This is exactly the problem that comes with lack of encryption, cookies, and users who want to be connected anywhere they go.

Notice that there’s been no mention of money or credit cards being sniffed. Who cares about those when you can compromise someone’s email account? Email is almost universally used as a password reset mechanism. If you can read someone’s email, then you can obtain the password for just about any website they use, from gaming to banking to corporate environments. Most of this information has value.

S For Sometimes

Sadly, it seems that money and corporate embarrassment motivates protective measures far more often than privacy concerns. Some websites have started to implement a more rigorous enforcement of HTTPS connections called HTTP Strict Transport Security (HSTS). Paypal, whose users have long been victims of money-draining phishing attacks, was one of the first sites to use HSTS to prevent malicious sites from fooling browsers into switching to HTTP or spoofing pages. Like any good security measure, HSTS is transparent to the user. All you need is a browser that supports it (most do) and a website to require it (most don’t).

Improvements like HSTS should be encouraged. HTTPS is inarguably an important protection. However, the protocol has its share of weaknesses and determined attackers. Plus, HTTPS only protects against certain types of attacks; it has no bearing on cross-site scripting, SQL injection, or a myriad of other vulnerabilities. The security community is neither ignorant of these problems nor lacking in solutions. Yet the roll out of better protocols like DNSSEC has been glacial. Never the less, HTTPS helps as much today as it will tomorrow. The lock icon on your browser that indicates a site uses HTTPS may be minuscule, but the protection it affords is significant.


Many of the attacks and privacy references noted in the article may seem dated, but they are no less relevant.

DNSSEC has indeed been glacial. It took Google until January 2013 to support it. Cloudflare reported in October 2014 that wide adoption remained “in its infancy.”

And perhaps a final irony? At the time this article first appeared, WordPress didn’t support HTTPS for custom domain names, e.g. deadliestwebattacks.com. To this day, Mashable still won’t serve the article over HTTPS without a hostname mismatch error.

Codex Securum, Obiter Dictum

In the past, you have come here for truth. I now give you law.

Science fiction author Arthur C. Clarke succinctly described the wondrous nature of technology in what has come to be known as Clarke’s Third Law (from a letter published in Science in January 1968):

Any sufficiently advanced technology is indistinguishable from magic.

The sentiment of that law can be found in an earlier short story by Leigh Brackett, “The Sorcerer of Rhiannon,” published in Astounding Science-Fiction Magazine in February 1942:

Witchcraft to the ignorant . . . Simple science to the learned.

With those formulations as our departure point, we can now turn towards crypto, browser technologies, and privacy.

The Latinate Lex Cryptobellum:

Any sufficiently advanced cryptographic escrow system is indistinguishable from ROT13.

Or in Leigh Brackett’s formulation:

Cryptographic escrow to the ignorant . . . Simple plaintext to the learned.

A few Laws of Browser Plugins (somewhat like the fonts of dis-knowledge):

Any sufficiently patched Flash is indistinguishable from a critical update.

Any sufficiently patched Java is indistinguishable from Flash.

A few Laws of Browsers:

Any insufficiently patched browser is indistinguishable from malware.

Any sufficiently patched browser remains distinguishable from a privacy-enhancing one.

For what are browsers but thralls to Laws of Ads:

Any sufficiently targeted ad is indistinguishable from chance.

Any sufficiently distinguishable person’s browser has tracking cookies.

Any insufficiently distinguishable person has privacy.

Mike’s law of writing on schedule:

Any sufficiently delivered manuscript is indistinguishable from overdue.

Which leads us to the foundational Zeroth Law of Deadliest Web Attacks:

Any sufficiently popular post is indistinguishable from truth.

Please share!

 


 

p.s. I highly recommend Gary Westfahl’s Science Fiction Quotations: From the Inner Mind to the Outer Limits should you wish to discover more authors and their books to explore.

RSA APJ 2014, CDS-W07 Slides

Here are the slides for my presentation, Building and Breaking Privacy Barriers, at this year’s RSA Asia Pacific and Japan conference in Singapore.

The slides convey more theory than practical examples, but the ideas should come across without too much confusion. I expect to revisit the idea of a Rot network (a play on Tor) and toy with an implementation. Instead of blocking tracking bugs, the concept is to reduce their utility by sharing them across unrelated browsers — essentially polluting the data.

In any case, with this presentation over and out of the way, it’s time to start working on more articles!

RSA USA 2014, DSP-R04A Slides

Here are the slides for my presentation, DSP-R04A Is Your Browser a User Agent or a Double Agent?, at this year’s RSA USA conference in San Francisco.

This departed from a security focus into the realm of privacy, noting how browsers struggle (or not) against tracking mechanisms and how various organizations build views of web site visitors.

Fonts of Dis-Knowledge

The oracles of ancient Greece claimed to have the power of precognition, derived from the gods themselves. In the 17th century, John Locke wrote of more experiential sources for ideas, where sensation and reflection were two fountains of knowledge.

But none of these philosophical considerations are necessary to predict the effect of plugins on browser security. In the course of putting a presentation together, I’ve annotated two particular items.

Java. The Comic Sans of browser plugins. Inexplicably, some people think it’s a clever design choice. But it really conveys an uninformed decision, is ultimately useless (especially for modern browsers), and is a sign of incompetence.

Flash. Wingdings. At first it looks pretty. But its overuse quickly leads to annoyance. There’s no reason for a Flash plugin other than to look at legacy cat videos and suffer agitating ad banners.

They are nothing more than fonts of malware. When was the last time you installed a non-critical update for either one? If you haven’t disabled these plugins, do so now.

I’ll have new content soon. And, with my own knowledge of the future, here’s a peek at what those topics might be:

– Ruminations on privacy.
– Identity, passwords, personas.
– More examples of HTML injection.

And that doesn’t include putting up content for the newly released Anti-Hacker Tool Kit. Alas, my fountain of writing is a mere trickle!

…And They Have a Plan

No notes are so disjointed as the ones skulking about my brain as I was preparing slides for last week’s BlackHat presentation. I’ve now wrangled them into a mostly coherent write-up.

This won’t be the last post on this topic. I’ll be doing two things over the next few weeks: throwing a doc into github to track changes/recommendations/etc., responding to more questions, working on a different presentation, and trying to stick to the original plan (i.e. two things). Oh, and getting better at MarkDown.

So, turn up some Jimi Hendrix, play some BSG in the background, and read on.

== The Problem ==

Cross-Site Request Forgery (CSRF) abuses the normal ability of browsers to make cross-origin requests by crafting a resource on one origin that causes a victim’s browser to make a request to another origin using the victim’s security context associated with that target origin.

The attacker creates and places a malicious resource on an origin unrelated to the target origin to which the victim’s browser will make a request. The malicious resource contains content that causes a browser to make a request to the unrelated target origin. That request contains parameters selected by the attacker to affect the victim’s security context with regard to the target origin.

The attacker does not need to violate the browser’s Same Origin Policy to generate the cross origin request. Nor does the attack require reading the response from the target origin. The victim’s browser automatically includes cookies associated with the target origin for which the forged request is being made. Thus, the attacker creates an action, the browser requests the action and the target web application performs the action under the context of the cookies it receives — the victim’s security context.

An effective CSRF attack means the request modifies the victim’s context with regard to the web application in a way that’s favorable to the attacker. For example, a CSRF attack may change the victim’s password for the web application.

CSRF takes advantage of web applications that fail to enforce strong authorization of actions during a user’s session. The attack relies on the normal, expected behavior of web browsers to make cross-origin requests from resources they load on unrelated origins.

The browser’s Same Origin Policy prevents a resource in one origin to read the response from an unrelated origin. However, the attack only depends on the forged request being submitted to the target web app under the victim’s security context — it does not depend on receiving or seeing the target app’s response.

== The Proposed Solution ==

SOS is proposed an additional policy type of the Content Security Policy. Its behavior also includes pre-flight behavior as used by the Cross Origin Resource Sharing spec.

SOS isn’t just intended as a catchy an acronym. The name is intended to evoke the SOS of Morse code, which is both easy to transmit and easy to understand. If it is required to explain what SOS stands for, then “Session Origin Security” would be preferred. (However, “Simple Origin Security”, “Some Other Security”, and even “Save Our Site” are acceptable. “Same Old Stuff” is discouraged. More options are left to the reader.)

An SOS policy may be applied to one or more cookies for a web application on a per-cookie or collective basis. The policy controls whether the browser includes those cookies during cross-origin requests. (A cross-origin resource cannot access a cookie from another origin, but it may generate a request that causes the cookie to be included.)

== Format ==

A web application sets a policy by including a Content-Security-Policy response header. This header may accompany the response that includes the Set-Cookie header for the cookie to be covered, or it may be set on a separate resource.

A policy for a single cookie would be set as follows, with the cookieName of the cookie and a directive of 'any', 'self', or 'isolate'. (Those directives will be defined shortly.)

Content-Security-Policy: sos-apply=cookieName 'policy'

A response may include multiple CSP headers, such as:

Content-Security-Policy: sos-apply=cookieOne 'policy'
Content-Security-Policy: sos-apply=cookieTwo 'policy'

A policy may be applied to all cookies by using a wildcard:

Content-Security-Policy: sos-apply=* 'policy'

== Policies ==

One of three directives may be assigned to a policy. The directives affect the browser’s default handling of cookies for cross-origin requests to a cookie’s destination origin. The pre-flight concept will be described in the next section; it provides a mechanism for making exceptions to a policy on a per-resource basis.

Policies are only invoked for cross-origin requests. Same origin requests are unaffected.

'any' — include the cookie. This represents how browsers currently work. Make a pre-flight request to the resource on the destination origin to check for an exception response.

'self' — do not include the cookie. Make a pre-flight request to the resource on the destination origin to check for an exception response.

'isolate' — never include the cookie. Do not make a pre-flight request to the resource because no exceptions are allowed.

== Pre-Flight ==

A browser that is going to make a cross-origin request that includes a cookie covered by a policy of 'any' or 'self' must make a pre-flight check to the destination resource before conducting the request. (A policy of 'isolate' instructs the browser to never include the cookie during a cross-origin request.)

The purpose of a pre-flight request is to allow the destination origin to modify a policy on a per-resource basis. Thus, certain resources of a web app may allow or deny cookies from cross-origin requests despite the default policy.

The pre-flight request works identically to that for Cross Origin Resource Sharing, with the addition of an Access-Control-SOS header. This header includes a space-delimited list of cookies that the browser might otherwise include for a cross-origin request, as follows:

Access-Control-SOS: cookieOne CookieTwo

A pre-flight request might look like the following, note that the Origin header is expected to be present as well:

OPTIONS https://web.site/resource HTTP/1.1
Host: web.site
Origin: http://other.origin
Access-Control-SOS: sid
Connection: keep-alive
Content-Length: 0

The destination origin may respond with an Access-Control-SOS-Reply header that instructs the browser whether to include the cookie(s). The response will either be 'allow' or 'deny'.

The response header may also include an expiration in seconds. The expiration allows the browser to remember this response and forego subsequent pre-flight checks for the duration of the value.

The following example would allow the browser to include a cookie with a cross-origin request to the destination origin even if the cookie’s policy had been 'self‘. (In the absence of a reply header, the browser would not include the cookie.)

Access-Control-SOS-Reply: 'allow' expires=600

The following example would deny the browser to include a cookie with a cross-origin request to the destination origin even if the cookie’s policy had been 'any'. (In the absence of a reply header, the browser would include the cookie.)

Access-Control-SOS-Reply: 'deny' expires=0

The browser would be expected to track policies and policy exceptions based on destination origins. It would not be expected to track pairs of origins (e.g. different cross-origins to the destination) since such a mapping could easily become cumbersome, inefficient, and more prone to abuse or mistakes.

As described in this section, the pre-flight is an all-or-nothing affair. If multiple cookies are listed in the Access-Control-SOS header, then the response applies to all of them. This might not provide enough flexibility. On the other hand, simplicity tends to encourage security.

== Benefits ==

Note that a policy can be applied on a per-cookie basis. If a policy-covered cookie is disallowed, any non-covered cookies for the destination origin may still be included. Think of a non-covered cookie as an unadorned or “naked” cookie — their behavior and that of the browser matches the web of today.

The intention of a policy is to control cookies associated with a user’s security context for the destination origin. For example, it would be a good idea to apply 'self' to a cookie used for authorization (and identification, depending on how tightly coupled those concepts are by the app’s reliance on the cookie).

Imagine a WordPress installation hosted at https://web.site/. The site’s owner wishes to allow anyone to visit, especially when linked-in from search engines, social media, and other sites of different origins. In this case, they may define a policy of 'any' set by the landing page:

Content-Security-Policy: sos-apply=sid 'any'

However, the /wp-admin/ directory represents sensitive functions that should only be accessed by intention of the user. WordPress provides a robust nonce-based anti-CSRF token. Unfortunately, many plugins forget to include these nonces and therefore become vulnerable to attack. Since the site owner has set a policy for the sid cookie (which represents the session ID), they could respond to any pre-flight request to the /wp-admin/ directory as follows:

Access-Control-SOS-Reply: 'deny' expires=86400

Thus, the /wp-admin/ directory would be protected from CSRF exploits because a browser would not include the sid cookie with a forged request.

The use case for the 'isolate' policy is straight-forward: the site does not expect any cross-origin requests to include cookies related to authentication or authorization. A bank or web-based email might desire this behavior. The intention of isolate is to avoid the requirement for a pre-flight request and to forbid exceptions to the policy.

== Notes ==

This is a draft. The following thoughts represent some areas that require more consideration or that convey some of the motivations behind this proposal.

This is intended to affect cross-origin requests made by a browser.

It is not intended to counter same-origin attacks such as HTML injection (XSS) or intermediation attacks such as sniffing. Attempting to solve multiple problems with this policy leads to folly.

CSRF evokes two sense of the word “forgery”: creation and counterfeiting. This approach doesn’t inhibit the creation of cross-origin requests (although something like “non-simple” XHR requests and CORS would). Nor does it inhibit the counterfeiting of requests, such as making it difficult for an attacker to guess values. It defeats CSRF by blocking a cookie that represents the user’s security context from being included in a cross-origin request the user likely didn’t intend to make.

There may be a reason to remove a policy from a cookie, in which case a CSP header could use something like an sos-remove instruction:

Content-Security-Policy: sos-remove=cookieName

Cryptographic constructs are avoided on purpose. Even if designed well, they are prone to implementation error. They must also be tracked and verified by the app, which exposes more chances for error and induces more overhead. Relying on nonces increases the difficulty of forging (as in counterfeiting) requests, whereas this proposed policy defines a clear binary of inclusion/exclusion for a cookie. A cookie will or will not be included vs. a nonce might or might not be predicted.

PRNG values are avoided on purpose, for the same reasons as cryptographic nonces. It’s worth noting that misunderstanding the difference between a random value and a cryptographically secure PRNG (which a CSRF token should favor) is another point against a PRNG-based control.

A CSP header was chosen in favor of decorating the cookie with new attributes because cookies are already ugly, clunky, and (somewhat) broken enough. Plus, the underlying goal is to protect a session or security context associated with a user. As such, there might be reason to extended this concept to the instantiation of Web Storage objects, e.g. forbid them in mixed-origin resources. However, this hasn’t really been thought through and probably adds more complexity without solving an actual problem.

The pre-flight request/response shouldn’t be a source of information leakage about cookies used by the app. At least, it shouldn’t provide more information than might be trivially obtained through other techniques.

It’s not clear what an ideal design pattern would be for deploying SOS headers. A policy could accompany each Set-Cookie header. Or the site could use a redirect or similar bottleneck to set policies from a single resource.

It would be much easier to retrofit these headers on a legacy app by using a Web App Firewall than it would be trying to modify code to include nonces everywhere.

It would be (possibly) easier to audit a site’s protection based on implementing the headers via mod_rewrite tricks or WAF rules that apply to whole groups of resources than it would for a code audit of each form and action.

The language here tilts (too much) towards formality, but the terms and usage haven’t been vetted yet to adhere to those in HTML, CSP and CORS. The goal right now is clarity of explanation; pedantry can wait.

== Cautions ==

In addition to the previous notes, these are highlighted as particular concerns.

Conflicting policies would cause confusion. For example, two different resources separately define an 'any' and 'self' for the same cookie. It would be necessary to determine which receives priority.

Cookies have the unfortunate property that they can belong to multiple origins (i.e. sub-domains). Hence, some apps might incur additional overhead of pre-flight requests or complexity in trying to distinguish cross-origin of unrelated domains and cross-origin of sub-domains.

Apps that rely on “Return To” URL parameters might not be fixed if the return URL has the CSRF exploit and the browser is now redirecting from the same origin. Maybe. This needs some investigation.

There’s no migration for old browsers: You’re secure (using a supporting browser and an adopted site) or you’re not. On the other hand, an old browser is an insecure browser anyway — browser exploits are more threatening than CSRF for many, many cases.

There’s something else I forgot to mention that I’m sure I’ll remember tomorrow.

=====

You’re still here? I’ll leave you with this quote from the last episode of BSG. (It’s a bit late to be apologizing for spoilers…) Thanks for reading!

Six: All of this has happened before.
Baltar: But the question remains, does all of this have to happen again?

BlackHat US 2013: Dissecting CSRF…

Here are the slides for my presentation at this year’s BlackHat US conference, Dissecting CSRF Attacks & Countermeasures. Thanks to everyone who came and to those who hung around afterwards to ask questions and discuss the content.

The major goal of this presentation was to propose a new way to leverage the concepts of Content Security Policy and Cross-Origin Resource Sharing to counter CSRF attacks. Essentially, we proposed a header that web apps could set to inform browsers when to include that app’s cookies during cross-origin requests. As always, slides alone don’t convey the nuances of the presentation. Stay tuned for a more thorough explanation of the concept.

Plugins Stand Out

A minor theme in my recent B-Sides SF presentation was the stagnancy of innovation since HTML4 was finalized in December 1999. New programming patterns emerged over that time, only to be hobbled by the outmoded spec. To help recall that era I scoured archive.org for ancient curiosities of the last millennium. (Like Geocities’ announcement of 2MB of free hosting space.) One item I came across was a Netscape advisory regarding a Java bytecode vulnerability — in March 1996.

March 1996 Java Bytecode Vulnerability

Almost twenty years later Java still plagues browsers with continuous critical patches released month after month after month, including March 2013.

Java: Write none, uninstall everywhere.

The primary complaint against browser plugins is not their legacy of security problems (the list of which is exhausting to read). Nor that Java is the only plugin to pick on. Flash has its own history of releasing nothing but critical updates. The greater issue is that even a secure plugin lives outside the browser’s Same Origin Policy (SOP).

When plugins exist outside the security and privacy controls enforced by browsers they weaken the browsing experience. It’s true that plugins aren’t completely independent of these controls, their instantiation and usage with regard to the DOM still falls under the purview of SOP. However, the ways that plugins extend a browser (such as network and file access) are rife with security and privacy pitfalls.

For one example, Flash’s Local Storage Object (LSO) was easily abused as an “evercookie” because it was unaffected by clearing browser cookies and even how browsers decided to accept cookies or not. Yes, it’s still possible to abuse HTTP and HTML to establish evercookies. Even the lauded HTML5 Local Storage API could be abused in a similar manner. It’s for reasons like these that we should be as diligent about demanding “privacy patches” as much as we demand security fixes.

Unlike Flash, an HTML5 API like Local Storage is an open standard created by groups who review and balance the usability, security, and privacy implications of features designed to improve the browsing experience. Establishing a feature like Local Storage in the HTML spec and aligning it with similar concepts like cookies and security controls like SOP (or HTML5 features like CORS, CSP, etc.) makes them a superior implementation in terms of integrating with users’ expectations and browser behavior. Instead of one vendor providing a means to extend a browser, browser vendors (the number of which is admittedly dwindling) are competing with each other to implement a uniform standard.

Sure, HTML5 brings new risks and preserves old vulnerabilities in new and interesting ways, but a large responsibility for those weaknesses lies with developers who would misuse an HTML5 feature in the same way they might have misused XHR and JSONP in the past. Maybe we’ll start finding plaintext passwords in Local Storage objects, or more sophisticated XSS exploits using Web Workers and WebSockets to scour data from a compromised browser. Security ignorance takes a long time to fix. And even experienced developers are challenged by maintaining the security of complex web applications.

HTML5 promises to obviate plugins altogether. We’ll have markup to handle video, drawing, sound, more events, and more features to create engaging games and apps. Browsers will compete on the implementation and security of these features rather than be crippled by the presence of plugins out of their control.

Getting rid of plugins makes our browsers more secure, but adopting HTML5 doesn’t imply browsers and web sites become secure. There are still vulnerabilities that we can’t fix by simple application design choices like including X-Frame-Options or adopting Content Security Policy headers.

Exterminate!

It’ll be a long time before everyone’s comfortable with the Dirty Harry test. Would you click on an unknown link — better yet, scan an inscrutable QR code — with your current browser? Would you still do it with multiple tabs open to your email, bank, and social networking accounts?

Who cares if “the network is the computer” or an application lives in the “cloud” or it’s offered via something as a service? It’s your browser that’s the door to web apps and when it’s not secure, an open window to your data.

Implicit HTML, Explicit Injection

When designing security filters against HTML injection you need to outsmart the attacker, not the browser. HTML’s syntax is more forgiving of mis-nested tags, unterminated elements, and entity-encoding compared to formats like XML. This is a good thing, because it ensures a User-Agent renders a best-effort layout for a web page rather than bailing on errors or typos that would leave visitors staring at blank pages or incomprehensible error messages.

It’s also a bad thing, because User-Agents have to make educated guesses about a page author’s intent when it encounters unexpected markup. This is the kind of situation that leads to browser quirks and inconsistent behavior.

One of HTML5’s improvements is a codified algorithm for parsing content. In the past, browsers not only had quirks, but developers would write content specifically to take advantage of those idiosyncrasies — giving us a world where sites worked well with one and only one version of Internet Explorer (or Mozilla, etc.). A great deal of blame lays at the feet of site developers who refused to consider good HTML design patterns in favor of the principle of Code Relying on Advanced Persistent Stubbornness.

Parsing Disharmony

Untidy markup is a security hazard. It makes HTML injection vulnerabilities more difficult to detect and block, especially for regex-based countermeasures.

Regular expressions have irregular success as security mechanisms for HTML. While regexes excel at pattern-matching they fare miserably in semantic parsing. Once you start building a state mechanism for element start characters, token delimiters, attribute names, and so on anything other than a narrowly-focused regex becomes unwieldy at best.

First, let’s take a look at some simple elements with uncommon syntax. Regular readers will recognize a favorite XSS payload of mine, the img tag:

<img/alt=""src="."onerror=alert(9)>

Spaces aren’t required to delimit attribute name/value pairs when the value is marked by quotes. Also, the element name may be separated from its attributes with whitespace or the forward slash. We’re entering strange parsing territory. For some sites, this will be a trip to the undiscovered country.

Delimiters are fun to play with. Here’s a case where empty quotes separate the element name from an attribute. Note the difference in value delineation. The id attribute has an unquoted value, so we separate it from the subsequent attribute with a space. The href has an empty value delimited with quotes. The parser doesn’t need whitespace after a quoted value, so we put onclick immediately after.

<a""id=a href=""onclick=alert(9)>foo</a>

User-Agents try their best to make sites work. As a consequence, they’ll interpret markup in surprising ways. Here’s an example that mixes start and end tag tokens in order to deliver an XSS payload:

<script/<a>alert(9)</script> 

We can adjust the end tag if there’s a filter watching for </script>. Note there is a space between the last </script and </a>.

<script/<a>alert(9)</script </a>

Successful HTML injection thrives on bad mark-up to bypass filters and take advantage of browser quirks. Here’s another case where the browser accepts an incorrectly terminated tag. If the site turns the following payload’s %0d%0a into \r\n (carriage return, line feed) when it places the payload into HTML, then the browser might execute the alert function.

<script%0d%0aalert(9)</script>

Or you might be able to separate the lack of closing > character from the alert function with an intermediate HTML comment:

<script%20<!--%20-->alert(9)</script>

The way browsers deal with whitespace is a notorious source of security problems. The Samy worm exploited IE’s tolerance for splitting a javascript: scheme with a line feed.

<div id=mycode style="BACKGROUND: url('java 
script:eval(document.all.mycode.expr)')" expr="alert(9)"></div>

Or we can throw an entity into the attribute list. The following is bad markup. But if it’s bad markup that bypasses a filter, then it’s a good injection.

<a href=""&amp;/onclick=alert(9)>foo</a>

HTML entities have a special place within parsing and injection attacks. They’re most often used to bypass string-matching. For example, the following three JavaScript schemes use an entity for the “s” character:

java&#115;cript:alert(9)
java&#x73;cript:alert(9)
java&#x0073;cript:alert(9)

The danger with entities and parsing is that you must keep track of the context in which you decode them. But you also need to keep track of the order in which you resolve entities (or otherwise normalize data) and when you apply security checks. In the previous example, if you had checked for “javascript” in the scheme before resolving the entity, then your filter would have failed. Think of it as a time of check to time of use (TOCTOU) problem that’s affected by data transformation rather than the more commonly thought-of race condition.

Security

User Agents are often forced to second-guess the intended layout of error-ridden pages. HTML5 brings more sanity to parsing markup. But we still don’t have a mechanism to help browsers distinguish between typos, intended behavior, and HTML injection attacks. There’s no equivalent to prepared statements for SQL.

  • Fix the vulnerability, not the exploit.
    It’s not uncommon for developers to blacklist a string like alert or javascript under the assumption that doing so prevents attacks. That sort of thinking mistakes the payload for the underlying problem. The problem is placing user-supplied data into HTML without taking steps to ensure the browser renders the data as text rather than markup.
  • Test with multiple browsers.
    A payload that takes advantage of a rendering quirk for browser A isn’t going to exhibit security problems if you’re testing with browser B.
  • Prefer parsing to regex patterns.
    Regexes may be as effective as they are complex, but you pay a price for complexity. Trying to read someone else’s regex, or even maintaining your own, becomes more error-prone as the pattern becomes longer.
  • Encode characters.
    You’ll be more successful at blocking HTML injection attacks if you consistently apply encoding rules for characters like < and > and prevent quotes from breaking attribute values.
  • Enforce rules strictly.
    Ambiguity for browsers enables them to recover from errors gracefully. Ambiguity for security weakens the system.

HTML injection attacks try to bypass filters in order to deliver a payload that a browser will render. Security filters should be strict, by not so myopic that they miss “improper” HTML constructs that a browser will happily render.

User Agent. Secret Agent. Double Agent.

We hope our browsers are secure in light of the sites we choose to visit. What we often forget, is whether we are secure in light of the sites our browsers choose to visit. Sometimes it’s hard to even figure out whose side our browsers are on.

Browsers act on our behalf, hence the term User Agent. They load HTML from the link we type in the navbar, then retrieve the resources defined in the HTML in order to fully render the site. The resources may be obvious, like images, or behind-the-scenes, like CSS that style the page’s layout or JSON messages sent by XmlHttpRequest objects.

Then there are times when our browsers work on behalf of others, working as a Secret Agent to betray our data. They carry out orders delivered by Cross-Site Request Forgery (CSRF) exploits enabled by the very nature of HTML.

Part of HTML’s success is its capability to aggregate resources from different Origins into a single page. Check out the following HTML. It loads a CSS file, JavaScript functions, and two images from different hosts, all but one over HTTPS. None of it violates the Same Origin Policy. Nor is there an issue with loading different Origins with different SSL connections.

<!doctype html>
<html>
<head>
<link href="https://fonts.googleapis.com/css?family=Open+Sans" rel="stylesheet" media="all" type="text/css" />
<script src="https://ajax.aspnetcdn.com/ajax/jQuery/jquery-1.9.0.min.js"></script>
<script>
$(document).ready(function() {
  $("#main").text("Come together...");
});
</script>
</head>
<body>
<img alt="www.baidu.com" src="http://www.baidu.com/img/shouye_b5486898c692066bd2cbaeda86d74448.gif" />
<img alt="www.twitter.com" src="https://twitter.com/images/resources/twitter-bird-blue-on-white.png" />
<div id="main" style="font-family: 'Open Sans';"></div>
</body>
</html>

CSRF attacks rely on being able to include resources from unrelated Origins in a single page. They’re not concerned with the Same Origin Policy since they are neither restricted by it nor need to break it (they don’t need to read or write across Origins). CSRF is concerned with a user’s context in relation to a web app — the data and state transitions associated with a user’s session, security, or privacy.

To get a sense of user context with regard to a site, let’s look at the Bing search engine. Click on the Preferences gear in the upper right corner to review your Search History. You’ll see a list of search terms like the following example:

Bing Search History

Bing’s Search box is an <input> field with parameter name “q“. Searching for a term — and therefore populating the Search History — is done when the form is submitted. Doing so creates a request for a link like this:

http://www.bing.com/search?q=lilith%27s+brood

For a CSRF exploit to work, it’s important to be able to recreate a user’s request. In the case of Bing, an attacker need only craft a GET request to the /search page and populate the q parameter.

Forge a Request

We’ll use a CSRF attack to populate the user’s Search History without their knowledge. This requires luring the victim to a web page that’s able to forge (as in craft) a search request. If successful, the forged (as in fake) request will affect the user’s context (i.e. the Search History). One way to forge an automatic request is via the src attribute of an img tag. The following HTML would be hosted on some Origin unrelated to Bing, e.g. http://web.site/page.

<!doctype html>
<html>
<body>
<img src="http://www.bing.com/search?q=deadliest%20web%20attacks" style="visibility: hidden;" alt="" />
</body>
</html>

The victim has to visit the attacker’s web page or come across the img tag in a discussion forum or social media site. The user does not need to have Bing open in a different browser tab or otherwise be using Bing at the same time they come across the CSRF exploit. Once their browser encounters the booby-trapped page, the request updates their Search History even though they never typed “deadliest web attacks” into the search box.

Bing Search History CSRF

As a thought experiment, extend this from a search history “infection” to a social media status update, or changing an account’s email address (to the attacker’s), or changing a password (to something the attacker knows), or any other action that affects the user’s security or privacy.

The trick is that CSRF requires full knowledge of the request’s parameters in order to successfully forge it. That kind of forgery (as in faking a legitimate request) requires another article to better explore. For example, if you had to supply the old password in order to update a new password, then you wouldn’t need a CSRF attack — just log in with the known password. Or another example, imagine Bing randomly assigned a letter to users’ search requests. One user’s request might use a “q” parameter, whereas another user’s request relies instead on an “s” parameter. If the parameter name didn’t match the one assigned to the user, then Bing would reject the search request. The attacker would have to predict the parameter name. Or, if the sample space were small, fake each possible combination — which would be only 26 letters in this imagined scenario.

Crafty Crafting

We’ll end on the easier aspect of forgery (as in crafting). Browsers automatically load resources from the src attributes of elements like img, iframe, and script (or the href attribute of a link). If an action can be faked by a GET request, that’s the easiest way to go.

HTML5 gives us another nifty way to forge requests using Content Security Policy directives. We’ll invert the expected usage of CSP by intentionally creating an element that violates a restriction. The following HTML defines a CSP rule that forbids src attributes (default-src 'none') and a URL to report rule violations. The victim’s browser must be lured to this page, either through social engineering or by placing it on a commonly-visited site that permits user-uploaded HTML.

<!doctype html>
<html>
<head>
<meta http-equiv="X-WebKit-CSP" content="default-src 'none'; report-uri http://www.bing.com/search?q=deadliest%20web%20attacks%20CSP" />
</head>
<body>
<img alt="" src="/" />
</body>
</html>

The report-uri creates a POST request to the link. Being able to generate a POST is highly attractive for CSRF attacks. However, the usefulness of this technique is tempered by the fact that it’s not possible to add arbitrary name/value pairs to the POST data. The browser will percent-encode the values for the “document-url” and “violated-directive” parameters. Unless the browser incorrectly implements CSP reporting, it’s a half-successful measure at best.

POST /search?q=deadliest%20web%20attacks%20CSP HTTP/1.1
Host: www.bing.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/536.26.17 (KHTML, like Gecko) Version/6.0.2 Safari/536.26.17
Content-Length: 121
Origin: null
Content-Type: application/x-www-form-urlencoded
Referer: http://web.site/HWA/ch3/bing_csp_report_uri.html
Connection: keep-alive

document-url=http%3A%2F%2Fweb.site%2FHWA%2Fch3%2Fbing_csp_report_uri.html&violated-directive=default-src+%27none%27

There’s far more to finding and exploiting CSRF vulnerabilities than covered here. We didn’t mention risk, which in this example is low (there’s questionable benefit to the attacker or detriment to the victim, notice you can even turn history off, and the history feature is presented clearly rather than hidden in an obscure privacy setting). But the Bing example demonstrates the essential mechanics of an attack:

  • A site tracks per-user contexts.
  • A request is known to modify that context.
  • The request can be recreated by an attacker (i.e. parameter names and values are predictable without direct access to the victim’s context).
  • The forged request can be placed on a page unrelated to the site (i.e. in a different Origin) where the victim’s browser comes across it.
  • The victim’s browser submits the forged request and affects the user’s context (this usually requires the victim to be currently authenticated to the site).

Later on, we’ll explore attacks that affect a user’s security context and differentiate them from nuisance attacks or attacks with negligible impact to the user. We’ll also examine the forging of requests, including challenges of creating GET and POST requests. Then we’ll heighten those challenges against the attacker and explore ways to counter CSRF attacks.

Until then, consider who your User Agent is really working for. It might not be who you expect.


“We are what we pretend to be, so we must be careful about what we pretend to be.” — Kurt Vonnegut, Introduction to Mother Night.