Twist Two [SQL Injection]

Twist #2 — The time saved by not using parameterized queries to build SQL statements should be used to read about using parameterized queries.

Nothing much to add here that I haven’t already exhausted. Instead, revisit some web hacking history with one of the first SQL injection attacks from 1999, created by Rain Forest Puppy. The following snippet of code sums up the attack nicely, default files, no validation, shell execution:

$query="select * from MSysModules where name='|shell(\"$command\")|'";

There were several lessons to be learned from this hack:

  • Remove example content (e.g. databases — if you can call an MS Access file a database…) from systems. Note the omission of “production” in front of systems. Why do you need example content on QA systems? Most of the time you don’t even need it on dev systems.
  • Prohibit direct access to a datastore. (This goes double for the NoSQL crews for whom security relies “run it on an isolated, trusted network.”) You’d think this is stating the obvious, but reality proves otherwise…
  • Remove or require strong authentication for administration capabilities. The newdsn.exe file was open to all. There have been hundreds of similar examples since then.

Even if SQL injection finally dies, it’s likely to be replaced by JavaScript injection against NoSQL data stores. Instead of OR 1=1 we’ll start creating lists of payloads like (while(1){}). There’s a big difference between throwing an access control mechanism on top of a datastore and applying security principles to the queries going into that datastore.

So You Want to Hash a Password…

Congratulations. You’re thinking about protecting a password; a concept that well-known1 sites, to this day2, fail3 to comprehend.

Choose an established, vetted algorithm (SHA-256 would suffice), include a salt (we’ll explain this a bit later), hash the password. Get rid of the plaintext password. Done. See how simple that was? There’s even Open Source code4 to help you with more complex issues.

But once you’ve set foot on the path of hashing passwords you might be tempted to make the hash Even Better. An apparently common idea is that if you hash a password once, hashing it twice makes it more secure. Being “more secure” is a commendable goal, but beware the wild beasts of cryptography, for they are subtle and quick to…well, you should be able to finish that thought.5

Repeating encryption or hashing algorithm isn’t a bad idea, it’s just not fully thought-through idea. First, we need a paragraph or three to catch everyone up on hashing and brute force attacks:

A cryptographic hash function takes an arbitrary-length input and produces a fixed-length output that has no statistical relation to the input. Consequently, a password like friend becomes an unintelligible string like 97823jnsndf234.6 An important property of a cryptographic hash function is that it’s irreversible (information is lost, similar to a lossy compression algorithm). No algorithm exists to turn the output 97823jnsndf234 back into friend. Alternately, an encryption function turns friend into mellon and if you know the encryption scheme and a key, then you know how to turn mellon back into friend. AES is an example of an encryption function.

Cracking a hashed password requires effort on the part of the attacker. This effort, or work factor, represents the time to execute a single hash function multiplied by the expected number of guesses required to find the correct input to the hash function. For example, an attacker might try all six-letter strings such as friena, frienb, frienc until finally hashing the guess of friend and observing that the output matches the reference hash, 97823jnsndf234.

Trying all six-letter lowercase combinations of the English alphabet requires 308,915,776 guesses (26 characters to the 6th power). This is actually a relatively small number in the age of multi-core behemoths and GPU trickery. If a single hash function takes 1 microsecond to execute on a particular system, then the complete brute force will take about 5 minutes. If you pass each input through the hash function N times, then you increase the work factor by N. With N = 100 the six-character attack would take close to 9 hours. The attacker is going to get the password eventually, but now it will take N times longer.

Notice that the previous equation only cared about the time required to execute a single hash function. From this perspective it doesn’t matter if the hash algorithm produces a 128 bit or 512 bit output. It only matters how long it takes to obtain the output. (We’re only talking about hashing and repeated hashing here; bit lengths and algorithm selection still have important security implications for other reasons and against other attacks.)

Here is a simplified explanation of how a repeated hash function fails to universally improve the work factor to brute force a value. The input plaintexts are marked P (with Greek letter subscripts). This brief examples uses 10 iterations of a lossy hash function. The output of each intermediate hash is marked H with a numeric subscript. The final hash iteration is marked C with a subscript corresponding to the original plaintext.

The following line shows how the final value for an input plaintext is achieved:

Pα -> H1 -> H2 -> H3 -> H4 -> H5 -> H6 -> H7 -> H8 -> H9 -> Cα

A different input should produce a different final value:

Pβ -> H43 -> H44 -> H45 -> H46 -> H47 -> H48 -> H49 -> Cβ

A problem occurs when the original plaintext has a collision with one of the intermediate or final hash values. For example, what if Pγ and H7 have the same result when passed into the hash function? You have an overlapping sequence from the end of Pα’s chain:

Pγ -> H8 -> H9 -> H10 -> H11 -> H12 -> H13 -> H14 -> H15 -> H16 -> Cγ

A more pathological case happens when a sequence overlaps significantly:

Pδ -> H44 -> H45 -> H46 -> H47 -> H48 -> H49 -> H50 -> Cδ

The way an attacker would exploit these artifacts is by creating some chain reference tables, much like a rainbow table. Yet in this case, the chain reference is used to skip rounds. For example, given an input plaintext Px, if the first hash round is H13 and the table has a precomputed chain with an H13 in it, then the attacker can fast-forward 10 steps (or however many steps have been precomputed) to get the Cx.

This case for this Time-Memory-Trade-Off (TMTO) attack chains didn’t present any math or probability calculations to back up these assertions. If you’re about to dismisse this attack based on the lack of hard evidence (in this article), consider something else about repeated rounds: they do not introduce additional entropy. Consequently, each round might actually weaken the entropy of the initial input despite the increased work factor due to additional rounds. In a worst case scenario, this dilution of entropy might lead to collisions that make a brute force search even easier.

Repeated hashing does not increase entropy (the “difficulty” of the initial password), it only increases the work factor. Repeated hashing of iloveyou doesn’t make the password any harder to guess, just longer to get there.

Think of it in terms of the attacker’s dictionary. The attacker has a pre-defined list of common passwords, from iloveyou to KAR120C. Neither the hashing algorithm nor the number of repetitions has any impact on this dictionary. Those only affect the amount of time required for the attacker to cycle through the dictionary.

The common theme for using cryptosystems is to first look for implementations that conform to a standard7 rather than creating something you think is new, novel, and unique.

In the case of repeated encryption, you should turn to RFC 2898 for the Password-Based Key Derivation Function 2 (PBKDF2).8 PBKDF2 inserts an iteration counter to prevent “chain” attacks. In other words, the attacker must perform every encryption stage. At a minimum, the iteration prevents the attacker from shortcutting rounds using a TMTO trick.

Where repeated rounds increase the attacker’s work factor, salts defeat other precomputation (a.k.a Rainbow table) attacks. A salt is merely a number of bytes (like a string, though it need not be) prefixed or suffixed to a password.

Salting passwords affects the composition of the attacker’s dictionary. Rather than trying the password me+galadriel the attacker must include a salt, which makes it somethinglongbefore-me+galadriel. Salts don’t make the dictionary bigger, they make the dictionary specific to the salt. The idea here is that all of the effort put forth to crack a password with a particular salt cannot be reused to crack the same password with a different salt — the brute force must begin anew. The hash for somethinglongbefore-me+galadriel is completely different from anotherstringinfrontof-me+galadriel. This is the primary way to prevent another TMTO attack, usually referred to as a rainbow table.

If you want a recommendation on the length of a salt, 19 is a nice, mystical number.9

Every measure you take to encrypt and obfuscate the password reduces the risk should the web site’s password store be stolen. (There’s quite a bit of precedent for such things.)

However, everything you do to protect the password in the database (or wherever it is stored) has no bearing on a multitude of other attack vectors, including the database itself.

Imagine a SQL injection attack that sets every user’s password to the hash of a password known to the attacker. What would you rather do? Download the entire DB over a period of several minutes or change every account to a password you know? These approaches have different goals: obtaining original passwords are likely re-used across email, banking, and other sites whereas setting a known password gives immediate access to the site at the expense of blatant activity more likely to be noticed.

Imagine a scenario where the attacker is able to modify the login page so cleartext passwords are stored to a file or shuffled off to another web site.

The focus on encrypting the password and preserving its confidentiality is laudable. However, too much focus takes away from the more immediate threat of brute forcing the login form itself. The work factor to crack a short password like ncc1701 might be measured in days or weeks depending on the method of encryption. On the other hand, the attacker may have a list of the site’s users (or have a reliable way of generating likely user names). In this scenario, the attacker targets the login page with a static password (ncc1701) and cycles through the user list.

Once again, there’s precedent of success for this approach such as against our high-profile friend Twitter. In 2009 a hacker cracked the long (more than the mystical “8 character minimum”), but unsophisticated password happiness for an account that had permissions to reset passwords for any other account.10

Clearly, it didn’t matter how well happiness had been kept secret, encrypted, obfuscated, and otherwise concealed. There were no limits on how many times the login page could be requested for brute forcing the account. Furthermore, the password protection for every other account was moot since the hacker now had access to an admin account from which he could take over any other. The only apparent good news in this scenario is that, while several accounts were compromised, the original passwords to those accounts were not. This is possibly negligible consolation, but important none the less considering the prevalence of password re-use across web sites.

By all means, put some effort into hashing passwords using well-established techniques. You’ll be adding to the work factor of anyone trying to crack the passwords should the password store ever be extracted from the site.

On the other hand, you may be increasing your own work factor with over-engineered solutions for password protection at the expense of other protections — like preventing SQL injection or rate limiting authentication points.

Here’s an additional note I made in the comments, but should highlight in the article:

For comparison, WPA2 uses PBKDF2 with the SSID of the network as a salt, a 256-bit key, HMAC-SHA1 for the algorithm, and 4096 iterations.

If you trying to figure out “what’s best” for hashing a password, consider WPA2 as the reference metric. For example, your hashing should generate a work factor of N times the work factor for WPA2 where N is your degree of paranoia that WPA2 is easily broken.

If you chose a double-digit N “just because”, then why would you ever use a wireless network (phone or Wi-Fi, GSM A5/3 or WPA2, etc.)? It’s much more likely someone will be able to sniff your encrypted traffic than they’ll ever get your hashed passwords. In fact, GSM’s A5/X algorithms have reported attacks. Seems like another reason to layer encryption, such as always using HTTPS.

For another comparison, the OSX File Vault apparently uses PBKDF2 with 1000 iterations. (Although it’d be nice to have a more detailed reference.)


5 If you’re not well-read in crypto you should at least be well read in fiction. After all, the security of a home-grown cryptosystem is closer to fiction.
6 It’s not necessary to think of any specific hash function at this point, but if you want a more concrete example, “friend” is hashed to the hexadecimal string “8e8b4d64f704c7a6aa632a7e6c2024e4f9fed79caac319e6bb7754db587e6f58” using the SHA-256 algorithm.
9 Read the Dark Tower series by Stephen King, books V and VII in particular. The series also has one of the greatest first lines in a book, “The man in black fled across the desert, and the gunslinger followed.”

Advanced Persistent Ignorance

The biggest threat to modern web applications is developers who exhibit Advanced Persistent Ignorance. Developers rely on all sorts of APIs to build complex software. This one makes code insecure by default. API is the willful disregard of simple, established security designs.

First, we must step back into history to establish a departure point for ignorance. This is just one of many. Almost seven years ago on July 13, 2004 PHP 5.0.0 was officially released. Importantly, it included this note:

A new MySQL extension named MySQLi for developers using MySQL 4.1 and
later. This new extension includes an object-oriented interface in addition
to a traditional interface; as well as support for many of MySQL’s new
features, such as prepared statements.

Of course, any new feature can be expected to have bugs and implementation issues. Even with an assumption that serious bugs would take a year to be worked out, that means PHP has had a secure database query mechanism for the past six years.1

The first OWASP Top 10 list from 2004 mentioned prepared statements as a countermeasure.2 Along with PHP and MySQL, .NET and Java supported these, as did Perl (before its popularity was subsumed by buzzword-building Python and Ruby On Rails). In fact, PHP and MySQL trailed other languages and databases in their support for prepared statements.

SQL injection itself predates the first OWASP Top 10 list by several years. One of the first summations of the general class of injection attacks was the 1999 Phrack article, Perl CGI problems.3 SQL injection was simply a specialization of these problems to database queries.

So, we’ve established the age of injection attacks at over a dozen years old and reliable countermeasures at least six years old. These are geologic timescales for the Internet.4

There’s no excuse for SQL injection vulnerabilities to exist in 2011.

It’s not a forgivable coding mistake anymore. Coding mistakes most often imply implementation errors — bugs due to typos, forgetfulness, or syntax. Modern SQL injection vulns are a sign of bad design. For six years, prepared statements have offered a means of establishing a fundamentally secure design for database queries. It takes actual effort to make them insecure. SQL injection attacks could still happen against a prepared statement, but only due to egregiously poor code that shouldn’t pass a basic review. (Yes, yes, stored procedures can be broken, too. String concatenation happens all over the place. Never the less, writing an insecure stored procedure or prepared statement should be more difficult than writing an insecure raw SQL statement.)

Maybe one of the two billion PHP hobby projects on Sourceforge could be expected to still have these vulns, but not real web sites. And, please, never in sites for security firms. Let’s review the previous few months:

November 2010, military web site.

December 2010, open source code repository web site.

February 2011, HBGary Federal. Sauron’s inept little brother. You might have heard about this one.

February 2011, Dating web site.

March 2011, Umm…speechless. Let’s move on.

April 2011, web security firm Barracuda (and this)

Looking back on the list, you might first notice that The Register is the of SQL injection vulns. (That is, in addition to a fine repository of typos and pun-based innuendos. I guess they’re just journalists after all, hackers don’t bother with such subtleties.)

The list will expand throughout 2011.

XSS is a little more forgivable, though no less embarrassing. HTML injection flaws continue to plague sites because of implementation bugs. There’s no equivalent of the prepared statement for building HTML or HTML snippets. This is why the vuln remains so pervasive: No one has figured out the secure, reliable, and fast way to build HTML with user-supplied data. This doesn’t imply that attempting to do so is a hopeless cause. On the contrary, JavaScript libraries can reduce these problems significantly.

For all the articles, lists, and books published on SQL injection one must assume that developers are being persistently ignorant of security concepts to such a degree that five years from now we may hear yet again of a database hack that disclosed unencrypted passwords.

If you’re going to use performance as an excuse for avoiding prepared statements then you either haven’t bothered to measure the impact, you haven’t understood how to scale web architectures, and you might as well turn off HTTPS for the login page so you can get more users logging in per second. If you have other excuses for avoiding database security, ask yourself if it takes longer to write a ranting rebuttal or a wrapper for secure database queries.

There may in fact be hope for the future. The rush to scaleability and the pious invocation of “Cloud” has created a new beast of NoSQL data stores. These NoSQL databases typically just have key-value stores with grammars that aren’t so easily corrupted by a stray apostrophe or semi-colon in the way that traditional SQL can be corrupted. Who knows, maybe security conferences will finally do away with presentations on yet another SQL injection exploit and find someone with a novel, new NoSQL Injection vulnerability.

Advanced Persistent Ignorance isn’t limited to SQL injection vulnerabilities. It has just spectacularly manifested itself in them. There are many unsolved problems in information security, but there are also many mostly-solved problems. Big unsolved problems in web security are password resets (overwhelmingly relying on e-mail) and using static credit card numbers to purchase items.

SQL injection countermeasures are an example of a mostly-solved problem. Using prepared statements isn’t 100% secure, but it makes a significant improvement. User authentication and password storage is another area of web security rife with errors. Adopting a solution like OpenID can reduce the burden of security around authentication. As with all things crypto-related, using well-maintained libraries and system calls are far superior to writing your own hash function or encryption scheme.

Excuses that prioritize security last in a web site design miss the point that not all security has to be hard. Nor does it have to impede usability or speed of development. Crypto and JavaScript libraries provide high-quality code primitives to build sites. Simple education about current development practices goes just as far. Sometimes the state of the art is actually several years old — because it’s been proven to work.

The antidote to API is the continuous acquisition of knowledge and experience. Yes, you can have your cake and eat it, too.


1 MySQL introduced support for prepared statements in version 4.1, which was first released April 3, 2003.
4 Perhaps a dangerous metaphor since here in the U.S. we still have school boards and prominent politicians for whom a complete geologic time spans a meager 4,000 years. Maybe some developers enjoy using ludicrous design patterns.