Choose an established, vetted algorithm (SHA-256 would suffice), include a salt (we’ll explain this a bit later), hash the password. Get rid of the plaintext password. Done. See how simple that was? There’s even Open Source code4 to help you with more complex issues.
But once you’ve set foot on the path of hashing passwords you might be tempted to make the hash Even Better. An apparently common idea is that if you hash a password once, hashing it twice makes it more secure. Being “more secure” is a commendable goal, but beware the wild beasts of cryptography, for they are subtle and quick to…well, you should be able to finish that thought.5
Repeating encryption or hashing algorithm isn’t a bad idea, it’s just not fully thought-through idea. First, we need a paragraph or three to catch everyone up on hashing and brute force attacks:
A cryptographic hash function takes an arbitrary-length input and produces a fixed-length output that has no statistical relation to the input. Consequently, a password like friend becomes an unintelligible string like 97823jnsndf234.6 An important property of a cryptographic hash function is that it’s irreversible (information is lost, similar to a lossy compression algorithm). No algorithm exists to turn the output 97823jnsndf234 back into friend. Alternately, an encryption function turns friend into mellon and if you know the encryption scheme and a key, then you know how to turn mellon back into friend. AES is an example of an encryption function.
Cracking a hashed password requires effort on the part of the attacker. This effort, or work factor, represents the time to execute a single hash function multiplied by the expected number of guesses required to find the correct input to the hash function. For example, an attacker might try all six-letter strings such as friena, frienb, frienc until finally hashing the guess of friend and observing that the output matches the reference hash, 97823jnsndf234.
Trying all six-letter lowercase combinations of the English alphabet requires 308,915,776 guesses (26 characters to the 6th power). This is actually a relatively small number in the age of multi-core behemoths and GPU trickery. If a single hash function takes 1 microsecond to execute on a particular system, then the complete brute force will take about 5 minutes. If you pass each input through the hash function N times, then you increase the work factor by N. With N = 100 the six-character attack would take close to 9 hours. The attacker is going to get the password eventually, but now it will take N times longer.
Notice that the previous equation only cared about the time required to execute a single hash function. From this perspective it doesn’t matter if the hash algorithm produces a 128 bit or 512 bit output. It only matters how long it takes to obtain the output. (We’re only talking about hashing and repeated hashing here; bit lengths and algorithm selection still have important security implications for other reasons and against other attacks.)
Here is a simplified explanation of how a repeated hash function fails to universally improve the work factor to brute force a value. The input plaintexts are marked P (with Greek letter subscripts). This brief examples uses 10 iterations of a lossy hash function. The output of each intermediate hash is marked H with a numeric subscript. The final hash iteration is marked C with a subscript corresponding to the original plaintext.
The following line shows how the final value for an input plaintext is achieved:
Pα -> H1 -> H2 -> H3 -> H4 -> H5 -> H6 -> H7 -> H8 -> H9 -> Cα
A different input should produce a different final value:
Pβ -> H43 -> H44 -> H45 -> H46 -> H47 -> H48 -> H49 -> Cβ
A problem occurs when the original plaintext has a collision with one of the intermediate or final hash values. For example, what if Pγ and H7 have the same result when passed into the hash function? You have an overlapping sequence from the end of Pα’s chain:
Pγ -> H8 -> H9 -> H10 -> H11 -> H12 -> H13 -> H14 -> H15 -> H16 -> Cγ
A more pathological case happens when a sequence overlaps significantly:
Pδ -> H44 -> H45 -> H46 -> H47 -> H48 -> H49 -> H50 -> Cδ
The way an attacker would exploit these artifacts is by creating some chain reference tables, much like a rainbow table. Yet in this case, the chain reference is used to skip rounds. For example, given an input plaintext Px, if the first hash round is H13 and the table has a precomputed chain with an H13 in it, then the attacker can fast-forward 10 steps (or however many steps have been precomputed) to get the Cx.
This case for this Time-Memory-Trade-Off (TMTO) attack chains didn’t present any math or probability calculations to back up these assertions. If you’re about to dismisse this attack based on the lack of hard evidence (in this article), consider something else about repeated rounds: they do not introduce additional entropy. Consequently, each round might actually weaken the entropy of the initial input despite the increased work factor due to additional rounds. In a worst case scenario, this dilution of entropy might lead to collisions that make a brute force search even easier.
Repeated hashing does not increase entropy (the “difficulty” of the initial password), it only increases the work factor. Repeated hashing of iloveyou doesn’t make the password any harder to guess, just longer to get there.
Think of it in terms of the attacker’s dictionary. The attacker has a pre-defined list of common passwords, from iloveyou to KAR120C. Neither the hashing algorithm nor the number of repetitions has any impact on this dictionary. Those only affect the amount of time required for the attacker to cycle through the dictionary.
The common theme for using cryptosystems is to first look for implementations that conform to a standard7 rather than creating something you think is new, novel, and unique.
In the case of repeated encryption, you should turn to RFC 2898 for the Password-Based Key Derivation Function 2 (PBKDF2).8 PBKDF2 inserts an iteration counter to prevent “chain” attacks. In other words, the attacker must perform every encryption stage. At a minimum, the iteration prevents the attacker from shortcutting rounds using a TMTO trick.
Where repeated rounds increase the attacker’s work factor, salts defeat other precomputation (a.k.a Rainbow table) attacks. A salt is merely a number of bytes (like a string, though it need not be) prefixed or suffixed to a password.
Salting passwords affects the composition of the attacker’s dictionary. Rather than trying the password me+galadriel the attacker must include a salt, which makes it somethinglongbefore-me+galadriel. Salts don’t make the dictionary bigger, they make the dictionary specific to the salt. The idea here is that all of the effort put forth to crack a password with a particular salt cannot be reused to crack the same password with a different salt — the brute force must begin anew. The hash for somethinglongbefore-me+galadriel is completely different from anotherstringinfrontof-me+galadriel. This is the primary way to prevent another TMTO attack, usually referred to as a rainbow table.
If you want a recommendation on the length of a salt, 19 is a nice, mystical number.9
Every measure you take to encrypt and obfuscate the password reduces the risk should the web site’s password store be stolen. (There’s quite a bit of precedent for such things.)
However, everything you do to protect the password in the database (or wherever it is stored) has no bearing on a multitude of other attack vectors, including the database itself.
Imagine a SQL injection attack that sets every user’s password to the hash of a password known to the attacker. What would you rather do? Download the entire DB over a period of several minutes or change every account to a password you know? These approaches have different goals: obtaining original passwords are likely re-used across email, banking, and other sites whereas setting a known password gives immediate access to the site at the expense of blatant activity more likely to be noticed.
Imagine a scenario where the attacker is able to modify the login page so cleartext passwords are stored to a file or shuffled off to another web site.
The focus on encrypting the password and preserving its confidentiality is laudable. However, too much focus takes away from the more immediate threat of brute forcing the login form itself. The work factor to crack a short password like ncc1701 might be measured in days or weeks depending on the method of encryption. On the other hand, the attacker may have a list of the site’s users (or have a reliable way of generating likely user names). In this scenario, the attacker targets the login page with a static password (ncc1701) and cycles through the user list.
Once again, there’s precedent of success for this approach such as against our high-profile friend Twitter. In 2009 a hacker cracked the long (more than the mystical “8 character minimum”), but unsophisticated password happiness for an account that had permissions to reset passwords for any other account.10
Clearly, it didn’t matter how well happiness had been kept secret, encrypted, obfuscated, and otherwise concealed. There were no limits on how many times the login page could be requested for brute forcing the account. Furthermore, the password protection for every other account was moot since the hacker now had access to an admin account from which he could take over any other. The only apparent good news in this scenario is that, while several accounts were compromised, the original passwords to those accounts were not. This is possibly negligible consolation, but important none the less considering the prevalence of password re-use across web sites.
By all means, put some effort into hashing passwords using well-established techniques. You’ll be adding to the work factor of anyone trying to crack the passwords should the password store ever be extracted from the site.
On the other hand, you may be increasing your own work factor with over-engineered solutions for password protection at the expense of other protections — like preventing SQL injection or rate limiting authentication points.
Here’s an additional note I made in the comments, but should highlight in the article:
For comparison, WPA2 uses PBKDF2 with the SSID of the network as a salt, a 256-bit key, HMAC-SHA1 for the algorithm, and 4096 iterations.
If you trying to figure out “what’s best” for hashing a password, consider WPA2 as the reference metric. For example, your hashing should generate a work factor of N times the work factor for WPA2 where N is your degree of paranoia that WPA2 is easily broken.
If you chose a double-digit N “just because”, then why would you ever use a wireless network (phone or Wi-Fi, GSM A5/3 or WPA2, etc.)? It’s much more likely someone will be able to sniff your encrypted traffic than they’ll ever get your hashed passwords. In fact, GSM’s A5/X algorithms have reported attacks. Seems like another reason to layer encryption, such as always using HTTPS.
For another comparison, the OSX File Vault apparently uses PBKDF2 with 1000 iterations. (Although it’d be nice to have a more detailed reference.)
5 If you’re not well-read in crypto you should at least be well read in fiction. After all, the security of a home-grown cryptosystem is closer to fiction.
6 It’s not necessary to think of any specific hash function at this point, but if you want a more concrete example, “friend” is hashed to the hexadecimal string “8e8b4d64f704c7a6aa632a7e6c2024e4f9fed79caac319e6bb7754db587e6f58″ using the SHA-256 algorithm.
9 Read the Dark Tower series by Stephen King, books V and VII in particular. The series also has one of the greatest first lines in a book, “The man in black fled across the desert, and the gunslinger followed.”