How to best protect your users’ passwords
It seems like this blog has turned into more and more of a programming blog, and here is yet another step in that direction. Perhaps somewhat boring to many, but hopefully useful to some.
Online services generally use usernames and passwords to identify their users. The simplest way of doing that is to save the information in clear text, perhaps in a database or in a text file. When a user logs in, look up the stored password and compare it with the one supplied by the user. If they match, authentication is successful. A simple solution, but a problematic one. If the username and password information gets in the hands of the wrong people. Since many user the same password over and over again it is likely that the username-email-password combinations can be used to log into other services.
This is a real problem. Writing secure web applications is difficult, and security problems that gives an attacker access to login data can be introduced not only by your own code but also by third party libraries and frameworks.
Because of this it is a good idea to make it somewhat more difficult to use the password information for an attacker using some sort of scrambling method. Many people call such methods password encryption, but encryption implies that its possible to decrypt the information with the right key, and keys can be lost so it is better to make the process one way. The basic idea is to take a plaintext password and change it into something called a hash that can not be reversed back into the plaintext. However, the scrambling process needs to be repeatable so that a plaintext password can be verified to match the password that was once used to create the hash.
To help in this process there is a family of cryptographic functions that called cryptographic hash functions. They work in a way that a variable length input is turned into a fixed length output in a way that it's very difficult to a) find two inputs that generate the same output and b) find the input given a specific output. Assuming that the cryptographic hash function works as advertised, shouldn't this should solve all our password storing problems? If we store a cryptographic hash of the user supplied password using for example the SHA-1 algorithm, an attacker that gets hold of our login data can't run SHA-1 backwards to get the passwords. All he has is a list of hash values, but at the same time we can repeat the SHA-1 function when a user needs to authenticate and compare the hashes of the new and old password. If they match they must be the same.
This would be an efficient method if it were not for one little detail: Users choose bad passwords. Some use password, some use their first name some a1. There are lists available with common passwords, and once an attacker puts a computer to the task of trying out passwords many of them gets broken. Worse yet, since the SHA-1 algorithm is such a common one, there is almost certainly hard drives out there filled with pre-calculated SHA-1 values for all common passords and even the ones for all possible password combinations shorter than for example 8 characters. With such pre-calculated values any short or simple password can be found in seconds.
To solve the second problem, the one with pre-calculated hash values, something called a salt value is used. A salt is a random number that is added to the plaintext password before it is encrypted. If the salt can have say a billion possible values, someone doing pre-calculated hash values need to do a billion times more pre-calculations and have a billion times more storage to store the hashes. The salt value can be stored in an unaltered form together with the hash value, and be used when verifying a password by simply doing the same hash operation with the same salt a second time. Another benefit of adding salt to the password is that if you have a large number of users, the attacker can't reuse each password guess with all the users, since their salt values differ.
The problem with easy to guess passwords is much more difficult to solve. The only thing that helps a bit, besides educating users in methods for choosing better passwords, is to make it more time consuming to do one hash calculation. That way it takes longer to try out millions of common passwords and password combination and guessing right will take longer, hopefully too long to be worth it. To make it take a bit longer to calculate the hash you can simply repeat the cryptographic hash function over and over again. Doing it once on my dual core desktop computer takes less than a millisecond. Doing it a thousand times increases the time spent calculating one hash value to about 200 milliseconds.
So, to take good care of your users' login information you should:
- pay attention to security on your servers. That includes operating system security, backup management as well as avoiding misstakes in your own code.
- encourage them to use good passwords.
- store hash values of the passwords instead of plaintext versions
- use a good cryptographic hash function to hash them
- use a large enough and random enough salt value when hashing
- repeat the hash function until you reach a reasonable tradeoff between efficiency and difficulty of repeating it by an attacker.
Soon, I'll publish some java code I've written that implements recommendations 3-6. But that's for another day, now I need to put this computer to sleep.
Update 090330: Since I wrote this post I have published the code for two implementations of these recommendations, in Java and Ruby.
Filed under Cryptography, Geeky | Comment (1)