You’ve Violated APE Law!

Developers who wish to defend their code should be aware of Advanced Persistent Exploitability. It is a situation where breaking code remains possible due to broken code.

La Planète des Singes

Code has errors. Writing has errors. Consider the pervasiveness of spellcheckers and how often the red squiggle complains about a misspelling in as common an activity as composing email. Mistakes happen; they’re a natural consequence of writing, whether code, blog, email, or book. The danger here is that in code these mistakes lead to exploits.

Sometimes coding errors arise from a stubborn refusal to acknowledge fundamental principles, as seen in the Advanced Persistent Ignorance that lets SQL injection persist almost a decade after programming languages first provided countermeasures. That vuln is so old that anyone with sqlmap and a URL can trivially exploit it.

Other coding errors are due to the lack of follow-through to address the fundamental causes of a vuln; the defender fixes the observed exploit as opposed to understanding and fixing the underlying issue. This approach fails when the attacker merely needs to tweak an exploit in order to compromise the vuln again.

We’ll use the following PHP snippet as an example. It has an obvious flaw in the arg parameter:

$arg = $_GET['arg'];
$r = exec('/bin/ls ' . $arg);

Confronted with an exploit that contains a semi-colon to execute an arbitrary command, a developer might remember to apply input validation. This is not necessarily wrong, but it is a first step on the dangerous path of the “Clever Factor”. In this case, the developer chose to narrow the parameter to only contain characters.

$arg = $_GET['arg'];
# did one better than escapeshellarg
if(preg_match('/[a-zA-Z]+/', $arg)) {
$r = exec('/bin/ls ' . $arg);

As a first offense, the regex should have been anchored to match the complete input string, i.e. '/^[a-zA-Z]+$/'. That mistake alone should dismiss this dev’s understanding of the problem and claim to a clever solution. But let’s continue the exercise with three more questions:

Is the intention clear? Is it resilient? Is it maintainable?

This developer declared they “did one better” than the documented solution by restricting input to mixed-case letters. One possible interpretation is that they only expected directories with mixed-case alpha names. A subsequent dev may point out the need to review directories that include numbers or a dot (.) and, as a consequence, relax the regex. That change may still be in the spirit of the validation approach (after all, it’s restricting input to expectations), but if the regex changes to where it allows a space or shell metacharacters, then it’ll be exploited. Again.

This leads to resilience against code churn. The initial code might be clear to someone who understands the regex to be an input filter (albeit an incorrect one in the first version). But the regex’s security requirements are ambiguous enough that someone else may mistakenly change it to allow metacharacters or introduce a typo that weakens it. Additionally, what kind of unit tests accompanied the original version? Merely some strings of known directories and a few negative tests with “./” and “..”? None of those tests would have demonstrated the vulnerability or conveyed the intended security aspect of the regex.

Code must be maintained over time. In the PHP example, the point of validation is right next to the point of usage. Think of this as the spatial version of the time of check to time of use flaw. In more complex code, especially long-lived code and projects with multiple committers, the validation check could easily drift further and further from the location where its argument is used. This dilutes the original developer’s intention since someone else may not realize the validation context and re-taint (such as with string concatenation with other input parameters) or otherwise misuse the parameter.

In this scenario, the solution isn’t even difficult. PHP’s documentation gives clear, prominent warnings about how to secure calls to the entire family of exec-style commands.

$r = exec('/bin/ls ' . escapeshellarg($arg));

The recommended solution has a clear intent — escape shell arguments passed to a command. It’s resilient — the PHP function will handle all shell metacharacters, not to mention the character encoding (like UTF-8). And it’s easy to maintain — whatever manipulation the $arg parameter suffers throughout the code, it will be properly secured at its point of usage.

It also requires less typing than the back-and-forth of multiple bug comments required to explain the pitfalls of regexes and the necessity of robust defenses. Applying a fix to stop an exploit is not the same as applying a fix to solve a vulnerability’s underlying problem.

There is a wealth of examples for this phenomenon, from string-matching alert to block cross-site scripting attacks to renaming files to prevent repeat exploitation (oh, the obscurity!) to stopping a service only to have it restart when the system reboots.


What does the future hold for programmers of the future? Pierre Boule’s vacationing astronauts perhaps summarized it best in the closing chapter of La Planète des Singes:

Des hommes raisonnables ? … Non, ce n’est pas possible

May your interplanetary voyages lead to less strange worlds.

Battling Geologic Time

65 million years ago, dinosaurs ruled the earth. (Which also seems about the last time I wrote something new here.)

In 45 million lines of code, Windows XP dominated the desktop. Yes it had far too many security holes and people held onto it for far too long — even after Microsoft tried to pull support for the first time. But its duration is still a testament to a certain measure of success.

Much of today’s web still uses code that dates from the dawn of internet time, some new code is still written by dinosaurs, and even more code is written by the avian descendants of dinosaurs. These birds flock to new languages and new frameworks. Yet, looking at some of the trivial vulns that emerge (like hard-coded passwords and SQL built from string concatenation), it seems the bird brain hasn’t evolved as much security knowledge as we might wish.

I’m a fan of dead languages. I’ve mentioned before my admiration of Latin (as well as Harry Potter Latin). And hieroglyphs have an attractive mystery to them. This appreciation doesn’t carry over to Perl. (I wish I could find the original comment that noted an obfuscated Perl contest is a redundant effort.)

But I do love regular expressions. I’ve crafted, tweaked, optimized, and obscured my fair share of regexes over the years. And I’ve discovered the performance benefits of pcre_study() and JIT compilation mode.

Yet woe betide anyone using regexes as a comprehensive parser (especially for HTML). And if you’re trying to match quoted strings, be prepared to deal with complexities that turn a few character pattern into a monstrous composition.

Seeing modern day humans still rely on poorly written regexes to conduct code scanning made me wonder how little mammals have advanced beyond the dinosaurs of prehistory. They might not be burning themselves with fire, but they’re burning their chances of accurate, effective scans.

That was how I discovered pfff and its companion, sgrep. At the SOURCE Seattle conference this year I spoke a little about lessons learned from regexes and the advancements possible should you desire to venture into the realm of OCaml: SOURCE Seattle 2015 – Code Scanning. Who knows, if you can conquer fire you might be able to handle stone tools.

Write It Like It’s Stolen

Source code. How would you alter the risks associated with your web site if its source code were stolen? Hard-coded passphrases? String concatenation of SQL statements? How much security relies on secrecy of functionality versus secrecy of data? Think of it in terms of Kerchoff’s Principle, roughly “The system must not require secrecy and can be stolen by the enemy without causing trouble”. Kerchoff was writing about cryptography, but the concept applies well to software (setting aside certain issues like intellectual property, we’re just focusing on web security).

In January 2012 Reuters reported that Symantec source code had been stolen. A month later the source started appearing publicly. The compromise, initially dismissed as only affecting some five-year old code, unleashed a slew of updates to the pcAnywhere products. Of the several notable points about this hack, one to emphasize was the speed with which vulnerabilities were identified after the initial compromise. The implication being that hackers’ access to source code highlighted vulnerabilities not otherwise found by the original developers’ more obvious access to source.

Eyeballs and bugs make an unconditionally decent witch’s brew, whereas the rule “Given enough eyeballs, all bugs are shallow” fails as an unqualified statement. The most famous counter-example being the approximately two-year window of non-random randomness inside the Debian OpenSSL package. The rule’s important caveat being that not all eyeballs are created equal (nor are all eyeballs trained on the same bugs). Still, the transparency of open source provided not only the eventual discovery and fix, but — perhaps more important — a solid understanding of the explicit timeframe and software packages that were subpar. Such confidence, even when it applies to knowing software is broken, is a boon to security. (There are other fun reformulations of the rule, such as, “Given enough bugs, all software is exploitable.” or “Given enough shallowness, all bugs are not security concerns.” But that takes us off topic…)

In any case, write code like it’s going to be stolen. Or at least as if it’s going to be peer reviewed. Reviewed, that is, by someone who might be smarter, a better programmer, more knowledgeable about a security topic, or at the opposite end of the spectrum: fresh eyes that’ll notice a typo. Pick one of the web sites compromised by SQL injection in the last few years. Not only is SQL injection an inexcusable offense, but pointing out simple problems like SQL injection may also lead to pointing out other egregious mistakes like not salting passwords. Having your source code stolen might lead to heckling over stupid programming mistakes. Having your customers’ email addresses and passwords stolen has worse consequences.

In fact, we should lump privacy into this thought experiment as well. The explosion of mobile apps (along with their HTML5 and similar web sites) has significantly intertwined privacy and security. After all, much of privacy relates to control over the security of our own data. Malicious mobile apps aren’t always the ones trying to pop root on your phone or figuring out how to open a backdoor; they’re also the ones scraping your phone’s brain for everything it knows. It’s one thing to claim an App does or does not perform some action; it’s another to see what it’s actually doing — intended or not.

Your code doesn’t have to be stolen or open source to be secure, but your programmers need to care and need to know what to look for. Source code scanners are one way to add eyeballs, though not necessarily the easiest way. In fact, the future of code compilation (including interpreted languages like PHP, Python, etc.) is more likely to bring source code scanning concepts into the compiler. After all, why lump on extra tools and the all-too-often cumbersome configuration they entail when you should get the same feedback from the compiler. Clang (and tools like cppcheck) work wonders for cleaning up problematic C++ code. They don’t generate warnings for web security concepts like XSS or SQL injection, but there’s no reason they couldn’t evolve to do so.

In fact, what would be cooler: A source code analyzer that you need to dump your web site into then configure and tweak, or a compiler/interpreter that generates the security warnings for you? Imagine a mod_php for production and a mod_phpcheck that natively performs variable taint checking and function misuse for you. Not only should web sites shift more reliance on browser-based computing to established JavaScript frameworks in order to minimize reinventing the wheel (and associated security vulns). Web languages should move towards building security analysis into their compilation/interpretation environments. While a project like Emcripten isn’t a direct example of this, it’s a great example of bringing a notoriously loosely typed language like JavaScript into a complex analyzer like LLVM — and the potential for better code. Imagine a LLVM “optimizer” that knew how to detect and warn about certain types of DOM-based XSS.

The day that your IDE draws red squiggles underneath code because it’s insecure (rather than a typo, badly-typed, has a signed/unsigned mismatch, etc.) will be a day that web security takes a positive evolutionary step. Evolution favors those best adapted to an environment. In this case, security is best served by the tools used to write and execute code. Until then, we’ll be stuck with the biotic diversity of cumbersome tools and the varyingly-vigilant eyeballs that belong to developers and hackers alike.