Tag Archives: source code

Battling Geologic Time

65 million years ago, dinosaurs ruled the earth. (Which also seems about the last time I wrote something new here.)

In 45 million lines of code, Windows XP dominated the desktop. Yes it had far too many security holes and people held onto it for far too long — even after Microsoft tried to pull support for the first time. But its duration is still a testament to a certain measure of success.

Much of today’s web still uses code that dates from the dawn of internet time, some new code is still written by dinosaurs, and even more code is written by the avian descendants of dinosaurs. These birds flock to new languages and new frameworks. Yet, looking at some of the trivial vulns that emerge (like hard-coded passwords and SQL built from string concatenation), it seems the bird brain hasn’t evolved as much security knowledge as we might wish.

I’m a fan of dead languages. I’ve mentioned before my admiration of Latin (as well as Harry Potter Latin). And hieroglyphs have an attractive mystery to them. This appreciation doesn’t carry over to Perl. (I wish I could find the original comment that noted an obfuscated Perl contest is a redundant effort.)

But I do love regular expressions. I’ve crafted, tweaked, optimized, and obscured my fair share of regexes over the years. And I’ve discovered the performance benefits of pcre_study() and JIT compilation mode.

Yet woe betide anyone using regexes as a comprehensive parser (especially for HTML). And if you’re trying to match quoted strings, be prepared to deal with complexities that turn a few character pattern into a monstrous composition.

Seeing modern day humans still rely on poorly written regexes to conduct code scanning made me wonder how little mammals have advanced beyond the dinosaurs of prehistory. They might not be burning themselves with fire, but they’re burning their chances of accurate, effective scans.

That was how I discovered pfff and its companion, sgrep. At the SOURCE Seattle conference this year I spoke a little about lessons learned from regexes and the advancements possible should you desire to venture into the realm of OCaml: SOURCE Seattle 2015 – Code Scanning. Who knows, if you can conquer fire you might be able to handle stone tools.

Write It Like It’s Stolen

Source code. How would you alter the risks associated with your web site if its source code were stolen? Hard-coded passphrases? String concatenation of SQL statements? How much security relies on secrecy of functionality versus secrecy of data? Think of it in terms of Kerchoff’s Principle, roughly “The system must not require secrecy and can be stolen by the enemy without causing trouble”. Kerchoff was writing about cryptography, but the concept applies well to software (setting aside certain issues like intellectual property, we’re just focusing on web security).

In January 2012 Reuters reported that Symantec source code had been stolen. A month later the source started appearing publicly. The compromise, initially dismissed as only affecting some five-year old code, unleashed a slew of updates to the pcAnywhere products. Of the several notable points about this hack, one to emphasize was the speed with which vulnerabilities were identified after the initial compromise. The implication being that hackers’ access to source code highlighted vulnerabilities not otherwise found by the original developers’ more obvious access to source.

Eyeballs and bugs make an unconditionally decent witch’s brew, whereas the rule “Given enough eyeballs, all bugs are shallow” fails as an unqualified statement. The most famous counter-example being the approximately two-year window of non-random randomness inside the Debian OpenSSL package. The rule’s important caveat being that not all eyeballs are created equal (nor are all eyeballs trained on the same bugs). Still, the transparency of open source provided not only the eventual discovery and fix, but — perhaps more important — a solid understanding of the explicit timeframe and software packages that were subpar. Such confidence, even when it applies to knowing software is broken, is a boon to security. (There are other fun reformulations of the rule, such as, “Given enough bugs, all software is exploitable.” or “Given enough shallowness, all bugs are not security concerns.” But that takes us off topic…)

In any case, write code like it’s going to be stolen. Or at least as if it’s going to be peer reviewed. Reviewed, that is, by someone who might be smarter, a better programmer, more knowledgeable about a security topic, or at the opposite end of the spectrum: fresh eyes that’ll notice a typo. Pick one of the web sites compromised by SQL injection in the last few years. Not only is SQL injection an inexcusable offense, but pointing out simple problems like SQL injection may also lead to pointing out other egregious mistakes like not salting passwords. Having your source code stolen might lead to heckling over stupid programming mistakes. Having your customers’ email addresses and passwords stolen has worse consequences.

In fact, we should lump privacy into this thought experiment as well. The explosion of mobile apps (along with their HTML5 and similar web sites) has significantly intertwined privacy and security. After all, much of privacy relates to control over the security of our own data. Malicious mobile apps aren’t always the ones trying to pop root on your phone or figuring out how to open a backdoor; they’re also the ones scraping your phone’s brain for everything it knows. It’s one thing to claim an App does or does not perform some action; it’s another to see what it’s actually doing — intended or not.

Your code doesn’t have to be stolen or open source to be secure, but your programmers need to care and need to know what to look for. Source code scanners are one way to add eyeballs, though not necessarily the easiest way. In fact, the future of code compilation (including interpreted languages like PHP, Python, etc.) is more likely to bring source code scanning concepts into the compiler. After all, why lump on extra tools and the all-too-often cumbersome configuration they entail when you should get the same feedback from the compiler. Clang (and tools like cppcheck) work wonders for cleaning up problematic C++ code. They don’t generate warnings for web security concepts like XSS or SQL injection, but there’s no reason they couldn’t evolve to do so.

In fact, what would be cooler: A source code analyzer that you need to dump your web site into then configure and tweak, or a compiler/interpreter that generates the security warnings for you? Imagine a mod_php for production and a mod_phpcheck that natively performs variable taint checking and function misuse for you. Not only should web sites shift more reliance on browser-based computing to established JavaScript frameworks in order to minimize reinventing the wheel (and associated security vulns). Web languages should move towards building security analysis into their compilation/interpretation environments. While a project like Emcripten isn’t a direct example of this, it’s a great example of bringing a notoriously loosely typed language like JavaScript into a complex analyzer like LLVM — and the potential for better code. Imagine a LLVM “optimizer” that knew how to detect and warn about certain types of DOM-based XSS.

The day that your IDE draws red squiggles underneath code because it’s insecure (rather than a typo, badly-typed, has a signed/unsigned mismatch, etc.) will be a day that web security takes a positive evolutionary step. Evolution favors those best adapted to an environment. In this case, security is best served by the tools used to write and execute code. Until then, we’ll be stuck with the biotic diversity of cumbersome tools and the varyingly-vigilant eyeballs that belong to developers and hackers alike.