Ill ne’er look you i’ the plaintext again

 

Look at this playbill: air fresheners, web security, cats. Thanks to Let’s Encrypt, this site is now accessible via HTTPS by default. Even better, WordPress serves the Strict-Transport-Security header to ensure browsers adhere to HTTPS when visiting it. So, whether you’re being entertained by odors, HTML injection, or felines, your browser is encrypting traffic.

deadliestwebattacks TLS

Let’s Encrypt makes this possible for two reasons. The project provides free certificates, which addresses the economic aspect of obtaining and managing them. Users who blog, create content, or set up their own web sites can do so with free tools. But the HTTPS certificates were never free and there was little incentive for them to spend money. To further compound the issue, users creating content and web sites rarely needed to know the technical underpinnings of how those sites were set up (which is perfectly fine!). Yet the secure handling and deployment of certificates requires more technical knowledge.

Most importantly, Let’s Encrypt addressed this latter challenge by establishing a simple, secure ACME protocol for the acquisition, maintenance, and renewal of certificates. Even when (or perhaps especially when) certificates have lifetimes of one or two years, site administrators would forget to renew them. It’s this level of automation that makes the project successful.

Hence, WordPress can now afford — both in the economic and technical sense — to deploy certificates for all the custom domain names it hosts. That’s what brings us to the cert for this site, which is but one domain in a list of SAN entries from deadairfresheners to a Russian-language blog about, inevitably, cats.

Yet not everyone has taken advantage of the new ease of encrypting everything. Five years ago I wrote about Why You Should Always Use HTTPS. Sadly, the article itself is served only via HTTP. You can request it via HTTPS, but the server returns a hostname mismatch error for the certificate, which breaks the intent of using a certificate to establish a server’s identity.

Intermission.

As with things that are new, free, and automated, there will be abuse. For one, malware authors, phishers, and the like will continue to move towards HTTPS connections. The key point there being “continue to”. Such bad actors already have access to certs and to compromised financial accounts with which to buy them. There’s little in Let’s Encrypt that aggravates this.

Attackers may start looking for letsencrypt clients in order to obtain certs by fraudulently requesting new ones. For example, by provisioning a resource under a well-known URI for the domain (this, and provisioning DNS records, are two ways of establishing trust to the Let’s Encrypt CA).

Attackers may start accelerating domain enumeration via Let’s Encrypt SANs. Again, it’s trivial to walk through domains for any SAN certificate purchased today. This may only be a nuance for hosting sites or aggregators who are jumbling multiple domains into a single cert.

Such attacks aren’t proposed as creaky boards on the Let’s Encrypt stage. They’re merely reminders that we should always be reconsidering how old threats and techniques apply to new technologies and processes. For many “astounding” hacks of today (namely the proliferation of Named-Ones-Who-I-Shall-Not-Name), there are likely close parallels to old Phrack articles or basic security principles awaiting clever reinterpretation for our modern times.

Finally, I must leave you with some sort of pop culture reference, or else this post wouldn’t befit the site. This is the 400th anniversary of Shakespeare’s death. So I shall leave you with yet another quote. May it take us far less time to finally bury HTTP and praise HTTPS in ubiquity.

Nay, an I tell you that, Ill ne’er look you i’ the
face again: but those that understood him smiled at
one another and shook their heads; but, for mine own
part, it was Greek to me. I could tell you more
news too: Marullus and Flavius, for pulling scarfs
off Caesar’s images, are put to silence. Fare you
well. There was more foolery yet, if I could
remember it. (Julius Caesar. I.ii.278-284)

 

You’ve Violated APE Law!

Developers who wish to defend their code should be aware of Advanced Persistent Exploitability. It is a situation where breaking code remains possible due to broken code.

La Planète des Singes

Code has errors. Writing has errors. Consider the pervasiveness of spellcheckers and how often the red squiggle complains about a misspelling in as common an activity as composing email. Mistakes happen; they’re a natural consequence of writing, whether code, blog, email, or book. The danger here is that in code these mistakes lead to exploits.

Sometimes coding errors arise from a stubborn refusal to acknowledge fundamental principles, as seen in the Advanced Persistent Ignorance that lets SQL injection persist almost a decade after programming languages first provided countermeasures. That vuln is so old that anyone with sqlmap and a URL can trivially exploit it.

Other coding errors are due to the lack of follow-through to address the fundamental causes of a vuln; the defender fixes the observed exploit as opposed to understanding and fixing the underlying issue. This approach fails when the attacker merely needs to tweak an exploit in order to compromise the vuln again.

We’ll use the following PHP snippet as an example. It has an obvious flaw in the arg parameter:

<?php
$arg = $_GET['arg'];
$r = exec('/bin/ls ' . $arg);
?>

Confronted with an exploit that contains a semi-colon to execute an arbitrary command, a developer might remember to apply input validation. This is not necessarily wrong, but it is a first step on the dangerous path of the “Clever Factor”. In this case, the developer chose to narrow the parameter to only contain characters.

<?php
$arg = $_GET['arg'];
# did one better than escapeshellarg
if(preg_match('/[a-zA-Z]+/', $arg)) {
$r = exec('/bin/ls ' . $arg);
}
?>

As a first offense, the regex should have been anchored to match the complete input string, i.e. '/^[a-zA-Z]+$/'. That mistake alone should dismiss this dev’s understanding of the problem and claim to a clever solution. But let’s continue the exercise with three more questions:

Is the intention clear? Is it resilient? Is it maintainable?

This developer declared they “did one better” than the documented solution by restricting input to mixed-case letters. One possible interpretation is that they only expected directories with mixed-case alpha names. A subsequent dev may point out the need to review directories that include numbers or a dot (.) and, as a consequence, relax the regex. That change may still be in the spirit of the validation approach (after all, it’s restricting input to expectations), but if the regex changes to where it allows a space or shell metacharacters, then it’ll be exploited. Again.

This leads to resilience against code churn. The initial code might be clear to someone who understands the regex to be an input filter (albeit an incorrect one in the first version). But the regex’s security requirements are ambiguous enough that someone else may mistakenly change it to allow metacharacters or introduce a typo that weakens it. Additionally, what kind of unit tests accompanied the original version? Merely some strings of known directories and a few negative tests with “./” and “..”? None of those tests would have demonstrated the vulnerability or conveyed the intended security aspect of the regex.

Code must be maintained over time. In the PHP example, the point of validation is right next to the point of usage. Think of this as the spatial version of the time of check to time of use flaw. In more complex code, especially long-lived code and projects with multiple committers, the validation check could easily drift further and further from the location where its argument is used. This dilutes the original developer’s intention since someone else may not realize the validation context and re-taint (such as with string concatenation with other input parameters) or otherwise misuse the parameter.

In this scenario, the solution isn’t even difficult. PHP’s documentation gives clear, prominent warnings about how to secure calls to the entire family of exec-style commands.

$r = exec('/bin/ls ' . escapeshellarg($arg));

The recommended solution has a clear intent — escape shell arguments passed to a command. It’s resilient — the PHP function will handle all shell metacharacters, not to mention the character encoding (like UTF-8). And it’s easy to maintain — whatever manipulation the $arg parameter suffers throughout the code, it will be properly secured at its point of usage.

It also requires less typing than the back-and-forth of multiple bug comments required to explain the pitfalls of regexes and the necessity of robust defenses. Applying a fix to stop an exploit is not the same as applying a fix to solve a vulnerability’s underlying problem.

There is a wealth of examples for this phenomenon, from string-matching alert to block cross-site scripting attacks to renaming files to prevent repeat exploitation (oh, the obscurity!) to stopping a service only to have it restart when the system reboots.

 

What does the future hold for programmers of the future? Pierre Boule’s vacationing astronauts perhaps summarized it best in the closing chapter of La Planète des Singes:

Des hommes raisonnables ? … Non, ce n’est pas possible

May your interplanetary voyages lead to less strange worlds.

Codex Securum, Obiter Dictum

In the past, you have come here for truth. I now give you law.

Science fiction author Arthur C. Clarke succinctly described the wondrous nature of technology in what has come to be known as Clarke’s Third Law (from a letter published in Science in January 1968):

Any sufficiently advanced technology is indistinguishable from magic.

The sentiment of that law can be found in an earlier short story by Leigh Brackett, “The Sorcerer of Rhiannon,” published in Astounding Science-Fiction Magazine in February 1942:

Witchcraft to the ignorant . . . Simple science to the learned.

With those formulations as our departure point, we can now turn towards crypto, browser technologies, and privacy.

The Latinate Lex Cryptobellum:

Any sufficiently advanced cryptographic escrow system is indistinguishable from ROT13.

Or in Leigh Brackett’s formulation:

Cryptographic escrow to the ignorant . . . Simple plaintext to the learned.

A few Laws of Browser Plugins (somewhat like the fonts of dis-knowledge):

Any sufficiently patched Flash is indistinguishable from a critical update.

Any sufficiently patched Java is indistinguishable from Flash.

A few Laws of Browsers:

Any insufficiently patched browser is indistinguishable from malware.

Any sufficiently patched browser remains distinguishable from a privacy-enhancing one.

For what are browsers but thralls to Laws of Ads:

Any sufficiently targeted ad is indistinguishable from chance.

Any sufficiently distinguishable person’s browser has tracking cookies.

Any insufficiently distinguishable person has privacy.

Mike’s law of writing on schedule:

Any sufficiently delivered manuscript is indistinguishable from overdue.

Which leads us to the foundational Zeroth Law of Deadliest Web Attacks:

Any sufficiently popular post is indistinguishable from truth.

Please share!

 


 

p.s. I highly recommend Gary Westfahl’s Science Fiction Quotations: From the Inner Mind to the Outer Limits should you wish to discover more authors and their books to explore.

Battling Geologic Time

65 million years ago, dinosaurs ruled the earth. (Which also seems about the last time I wrote something new here.)

In 45 million lines of code, Windows XP dominated the desktop. Yes it had far too many security holes and people held onto it for far too long — even after Microsoft tried to pull support for the first time. But its duration is still a testament to a certain measure of success.

Much of today’s web still uses code that dates from the dawn of internet time, some new code is still written by dinosaurs, and even more code is written by the avian descendants of dinosaurs. These birds flock to new languages and new frameworks. Yet, looking at some of the trivial vulns that emerge (like hard-coded passwords and SQL built from string concatenation), it seems the bird brain hasn’t evolved as much security knowledge as we might wish.

I’m a fan of dead languages. I’ve mentioned before my admiration of Latin (as well as Harry Potter Latin). And hieroglyphs have an attractive mystery to them. This appreciation doesn’t carry over to Perl. (I wish I could find the original comment that noted an obfuscated Perl contest is a redundant effort.)

But I do love regular expressions. I’ve crafted, tweaked, optimized, and obscured my fair share of regexes over the years. And I’ve discovered the performance benefits of pcre_study() and JIT compilation mode.

Yet woe betide anyone using regexes as a comprehensive parser (especially for HTML). And if you’re trying to match quoted strings, be prepared to deal with complexities that turn a few character pattern into a monstrous composition.

Seeing modern day humans still rely on poorly written regexes to conduct code scanning made me wonder how little mammals have advanced beyond the dinosaurs of prehistory. They might not be burning themselves with fire, but they’re burning their chances of accurate, effective scans.

That was how I discovered pfff and its companion, sgrep. At the SOURCE Seattle conference this year I spoke a little about lessons learned from regexes and the advancements possible should you desire to venture into the realm of OCaml: SOURCE Seattle 2015 – Code Scanning. Who knows, if you can conquer fire you might be able to handle stone tools.

Bad Code Entitles Good Exploits

I have yet to create a full taxonomy of the mistakes developers make that lead to insecure code.
As a brief note towards that effort, here’s an HTML injection (aka cross-site scripting) example that’s due to a series of tragic assumptions that conspire to not only leave the site vulnerable, but waste lines of code doing so.

The first clue lies in the querystring’s state parameter. The site renders the state‘s value into a title element. Naturally, a first probe for HTML injection would be attempting to terminate that tag. If successful, then it’s trivial to append arbitrary markup such as <script> tags. A simple probe looks like this:

http://web.site/cg/aLink.do?state=abc%3C/title%3E

The site responds by stripping the payload’s </title> tag (plus any subsequent characters). Only the text leading up to the injected tag is rendered within the title.

<HTML>
<HEAD>
<TITLE>abc</TITLE>

This seems to have effectively countered the attack and not expose any vuln. Of course, if you’ve been reading this blog for any length of time, you’ll know this trope of deceitful appearances always leads to a vuln. That which seems secure shatters under scrutiny.

The developers knew that an attacker might try to inject a closing </title> tag. Consequently, they created a filter to watch for such things and strip them. This could be implemented as a basic case-insensitive string comparison or a trivial regex.

And it could be bypassed by just a few characters.

Consider the following closing tags. Regardless of whether they seem surprising or silly, the extraneous characters are meaningless to HTML yet meaningful to our exploit because they foil the assumption that regexes make good parsers.

<%00/title>
<""/title>
</title"">
</title id="">

After inspecting how the site responds to each of the tags, it’s apparent that the site’s filter only expected a so-called “good” </title> tag. Browsers don’t care about an attribute on the closing tag. (They’ll ignore such characters as long as they don’t violate parsing rules.)

Next, we combine the filter bypass with a payload. In this case, we’ll use an image onerror event.

http://web.site/cg/aLink.do?state=abc%3C/title%20id=%22a%22%3E%3Cimg%20src=x%20onerror=alert%289%29%3E

The attack works! We should have been less sloppy and added an opening <TITLE> tag to match the newly orphaned closing one. A good exploit should not leave the page messier than it was before.

<HTML>
<HEAD>
<TITLE>abc</title id="a"><img src=x onerror=alert(9)> Vulnerable & Exploited Information Resource Center</TITLE>

The tragedy of this vuln is that it proves the site’s developers were aware of the concept of HTML injection exploits, but failed to grasp the fundamental characteristics of the vuln. The effort spent blocking an attack (i.e. countering an injected closing tag) not only wasted lines of code on an incorrect fix, but left the naive developers with a false sense of security. The code became more complex and less secure.

The mistake also highlights the danger of assuming that well-formed markup is the only kind of markup. Browsers are capricious beasts; they must dance around typos, stomp upon (or skirt around) errors, and walk bravely amongst bizarrely nested tags. This syntactic havoc is why regexes are notoriously worse at dealing with HTML than proper parsers.

There’s an ancillary lesson here in terms of automated testing (or quality manual pen testing, for that matter). A scan of the site might easily miss the vuln if it uses a payload that the filter blocks, or doesn’t apply any attack variants. This is one way sites “become” vulnerable when code doesn’t change, but attacks do.

And it’s one way developers must change their attitudes from trying to outsmart attackers to focusing on basic security principles.

RSA APJ 2014, CDS-W07 Slides

Here are the slides for my presentation, Building and Breaking Privacy Barriers, at this year’s RSA Asia Pacific and Japan conference in Singapore.

The slides convey more theory than practical examples, but the ideas should come across without too much confusion. I expect to revisit the idea of a Rot network (a play on Tor) and toy with an implementation. Instead of blocking tracking bugs, the concept is to reduce their utility by sharing them across unrelated browsers — essentially polluting the data.

In any case, with this presentation over and out of the way, it’s time to start working on more articles!

A Monstrous Confluence

You taught me language, and my profit on’t

Is, I know how to curse: the red plague rid you,

For learning me your language!

Caliban, (The Tempest, I.ii.363-365)

The announcement of the Heartbleed vulnerability revealed a flaw in OpenSSL that could be exploited by a simple mechanism against a large population of targets to extract random memory from the victim. At worst, that pilfered memory would contain sensitive information like HTTP requests (with cookies, credentials, etc.) or even parts of the server’s private key. (Or malicious servers could extract similarly sensitive data from vulnerable clients.)

In the spirit of Shakespeare’s freckled whelp, I combined a desire to learn about Heartbleed’s underpinnings with my ongoing experimentation with the new language features of C++11. The result is a demo tool named Hemorrhage.

Hemorrhage shows two different approaches to sending modified TLS heartbeats. One relies on the Boost.ASIO library to set up a TCP connection, then handles the SSL/TLS layer manually. The other uses a more complete adoption of Boost.ASIO and its asynchronous capabilities. It was this async aspect where C++11 really shone. Lambdas made setting up callbacks a pleasure — especially in terms of readability compared to prior techniques that required binds and placeholders.

Readable code is hackable (in the creation sense) code. Being able to declare variables with auto made code easier to read, especially when dealing with iterators. Although hemorrhage only takes minimal advantage of the move operator and unique_ptr, they are currently my favorite aspects following lambdas and auto.

Hemorrhage itself is simple. Check out the README.md for more details about compiling it. (Hint: As long as you have Boost and OpenSSL it’s easy on Unix-based systems.)

The core of the tool is taking the tls1_heartbeat() function from OpenSSL’s ssl/t1_lib.c file and changing the payload length — essentially a one-line modification. Yet another approach might be to use the original tls1_heartbeat() function and modify the heartbeat data directly by manipulating the SSL* pointer’s s3->wrec data via the SSL_CTX_set_msg_callback().

In any case, the tool’s purpose was to “learn by implementing something” as opposed to crafting more insidious exploits against Heartbleed. That’s why I didn’t bother with more handshake protocols or STARTTLS. It did give me a better understanding of OpenSSL’s internals (of which I’ll add my voice to the chorus bemoaning its readability).

Now I’m off to other projects and more writing.

RSA USA 2014, DSP-R04A Slides

Here are the slides for my presentation, DSP-R04A Is Your Browser a User Agent or a Double Agent?, at this year’s RSA USA conference in San Francisco.

This departed from a security focus into the realm of privacy, noting how browsers struggle (or not) against tracking mechanisms and how various organizations build views of web site visitors.

Fonts of Dis-Knowledge

The oracles of ancient Greece claimed to have the power of precognition, derived from the gods themselves. In the 17th century, John Locke wrote of more experiential sources for ideas, where sensation and reflection were two fountains of knowledge.

But none of these philosophical considerations are necessary to predict the effect of plugins on browser security. In the course of putting a presentation together, I’ve annotated two particular items.

Java. The Comic Sans of browser plugins. Inexplicably, some people think it’s a clever design choice. But it really conveys an uninformed decision, is ultimately useless (especially for modern browsers), and is a sign of incompetence.

Flash. Wingdings. At first it looks pretty. But its overuse quickly leads to annoyance. There’s no reason for a Flash plugin other than to look at legacy cat videos and suffer agitating ad banners.

They are nothing more than fonts of malware. When was the last time you installed a non-critical update for either one? If you haven’t disabled these plugins, do so now.

I’ll have new content soon. And, with my own knowledge of the future, here’s a peek at what those topics might be:

– Ruminations on privacy.
– Identity, passwords, personas.
– More examples of HTML injection.

And that doesn’t include putting up content for the newly released Anti-Hacker Tool Kit. Alas, my fountain of writing is a mere trickle!

The Rank Decay Contingency

The idea: Penalize a site’s ranking in search engine results if the site suffers a security breach.

Now, for some background and details…

In December 2013 Target revealed that it had suffered a significant breach that exposed over 40 million credit card numbers. A month later it upped the count to 70 million and noted the stolen information included customers’ names, mailing addresses, phone numbers, and email addresses.

Does anybody care?

Or rather, what do they care about? Sure, the media likes stories about hacking, especially ones that affect millions of their readers. The security community marks it as another failure to point to. Banks will have to reissue cards and their fraud departments be more vigilant. Target will bear some costs. But will customers really avoid it to any degree?

Years ago, in 2007, a different company disclosed its discovery of a significant breach that affected at least 40 million credit cards. Check out the following graph of the stock price of the company (TJX Holdings) from 2006 to the end of 2013.

TJX Price 2006-2014

TJX Price 2006-2014

Notice the dip in 2009 and the nice angle of recovery. The company’s stock didn’t take a hit until 2009 when TJX announced terms of its settlement. The price nose-dived, only to steadily recover as consumers stopped caring and spent money (amongst any number of arbitrary reasons, markets not being as rational or objective as one might wish).

Consider who bears the cost of breaches like these. Ultimately, merchants pay higher fees to accept credit cards, consumers pay higher fees to have cards. And, yes, TJX paid in lost valuation over a rather long period (roughly a year), but only when the settlement was announced — not when the breach occurred. The settlement suggests that lax security has consequences, but a breach in and of itself might not.

Truth of Consequences

But what if a company weighs the costs of a breach as more favorable than the costs of increasing security efforts? What if a company doesn’t even deal with financial information and therefore has no exposure to losses related to fraud? What about companies that deal in personal information or data, like Snapchat?

Now check out another chart. The following data from Quantcast shows daily visitors to a lyrics site. The number is steady until one day — boom! — visits drop by over 60% when the site is relegated to the backwaters of search results.

RapGenius Quantcast Measure

Google caught the site (Rap Genius) undertaking sociopathic search optimization techniques like spreading link spam. Not only does spammy, vapid content annoy users, but Google ostensibly suffers by losing users who flee poor quality results for alternate engines. (How much impact it has on advertising revenue is a different matter.) Google loses revenue if advertisers care about where the users are or they perceive the value of users to be low.

The two previous charts have different time scales and measure different dimensions. But there’s an underlying sense that they reflect values that companies care about.

Rank Decay

Think back to the Target breach. (Or TJX, or any one of many breaches reported over the years, whether they affected passwords or credit cards.)

What if a penalty affected a site’s ranking in search results? For example, it could be a threshold for the “best” page in which it could appear, e.g. no greater than the fourth page (where pages are defined as blocks of N results, say 10). Or an absolute rank, e.g. no higher than the 40th entry in a list.

The penalty would decay over time at a rate, linear or exponential, based on any number of mathematical details. For example, a page-based penalty might decay by one page per month. A list-based penalty might decay by one on a weekly basis.

The decay rate could be influenced by steps the site takes to remediate the underlying problem that led to the breach, improvements to a privacy policy, fines, or covering costs related to fraud as a result of the breach.

If the search engines drives a significant portion of traffic — that results in revenue or influences valuation — then this creates an incentive for the site to maintain strong security. It’s like PCI with different teeth. It might incentivize the site to react promptly to breaches. At least one hopes.

But such a proposal could have insidious consequences.

Rank Implications

Suppose a site were able to merely buy advertising to artificially offset the rank penalty? After a breach you could have a search engine that’d love to penalize the “natural” ranking of a site only to rake in money as the site buys advertising to overcome the penalty. It’s not a smart idea to pay an executioner per head, let alone combine the role with judge and jury.

A penalty that a company fears might be one for which it suppresses the penalty’s triggers. Keeping a breach secret is a disservice to consumers. And companies subject to the S.E.C. may be required to disclose such events. But rules (and penalties) need to be clear in order to minimize legal maneuvering through loopholes.

The proposal also implies that a search engine has a near monopoly on directing traffic. Yes, I’m talking about Google. The hand waving about “search engines” is supposed to include sites like Yahoo! and Bing, even DuckDuckGo. But if you’re worried about one measure, it’s likely the Google PageRank. This is a lot of power for a company that may wish to direct traffic to its own services (like email, shopping, travel, news, etc.) in preference to competing ones.

It could also be that the Emperor wears no clothes. Google search and advertisements may not be the ultimate arbiter of traffic that turns into purchases. Strong, well-established sites may find that the traffic that drives engagement and money comes just as well from alternate sources like social media. Then again, losing any traffic source may be something no site wants to suffer.

Target is just the most recent example of breaches that will not end. Even so, Target demonstrated several positive actions before and after the breach:

– Transparency — periodic updates on breach details, remediation steps, complaint process.

– A clear privacy policy — written in accessible language (i.e. avoids a legal style that, however accurate, may be too dense, misleading, or ambiguous), including a summary of changes.

Thankfully, there were no denials, diminishing comments, or signs of incompetence on the part of Target. Breaches are inevitable for complex, distributed systems. Beyond prevention, goals should be minimizing their time to discovery and maximizing their containment.

And whether this rank idea decays from indifference or infeasibility, its sentiment should persist.