Battling Geologic Time

65 million years ago, dinosaurs ruled the earth. (Which also seems about the last time I wrote something new here.)

In 45 million lines of code, Windows XP dominated the desktop. Yes it had far too many security holes and people held onto it for far too long — even after Microsoft tried to pull support for the first time. But its duration is still a testament to a certain measure of success.

Much of today’s web still uses code that dates from the dawn of internet time, some new code is still written by dinosaurs, and even more code is written by the avian descendants of dinosaurs. These birds flock to new languages and new frameworks. Yet, looking at some of the trivial vulns that emerge (like hard-coded passwords and SQL built from string concatenation), it seems the bird brain hasn’t evolved as much security knowledge as we might wish.

I’m a fan of dead languages. I’ve mentioned before my admiration of Latin (as well as Harry Potter Latin). And hieroglyphs have an attractive mystery to them. This appreciation doesn’t carry over to Perl. (I wish I could find the original comment that noted an obfuscated Perl contest is a redundant effort.)

But I do love regular expressions. I’ve crafted, tweaked, optimized, and obscured my fair share of regexes over the years. And I’ve discovered the performance benefits of pcre_study() and JIT compilation mode.

Yet woe betide anyone using regexes as a comprehensive parser (especially for HTML). And if you’re trying to match quoted strings, be prepared to deal with complexities that turn a few character pattern into a monstrous composition.

Seeing modern day humans still rely on poorly written regexes to conduct code scanning made me wonder how little mammals have advanced beyond the dinosaurs of prehistory. They might not be burning themselves with fire, but they’re burning their chances of accurate, effective scans.

That was how I discovered pfff and its companion, sgrep. At the SOURCE Seattle conference this year I spoke a little about lessons learned from regexes and the advancements possible should you desire to venture into the realm of OCaml: SOURCE Seattle 2015 – Code Scanning. Who knows, if you can conquer fire you might be able to handle stone tools.