The recommended way of storing passwords is as irreversible hashes. This exercise examines two different approaches to password hashing: the traditional algorithm that was once used by Unix systems and a more modern algorithm.
Note: you’ll need to have Python 3 installed if doing this on your own PC.
The traditional Unix algorithm involved a modified form of the DES symmetric cipher, in which the password was used as the key. Since DES expects a 56-bit key, this meant that only 7 bits from each of the first 8 characters in the password were of any significance! The key was used to repeatedly encrypt a block of bits that were initially all zero. A total of 25 iterations were used. Encryption was modified using 12 bits of random salt, represented in the form of two characters chosen from a set of 64 different possibilities.
You can investigate this approach to password hashing using Python’s
crypt
module.
Start a Python 3 REPL and do the following:
>>> from crypt import crypt
>>> crypt("kangaroo", salt="aa")
The first argument to crypt()
is our chosen password. The output from
the function is the final ciphertext generated by the encryption algorithm,
Base64-encoded to turn it into printable text, with the two characters
of salt added as a prefix. The latter is an important point: the salt
has to be stored with the hashed password, otherwise the application won’t
be able to generate a hash from the password supplied by a user, in order
to compare it with the stored hash.
In the same Python REPL, try this:
>>> crypt("kangaroos", salt="aa")
Notice that the output is identical to before. The addition of a ninth character to the password has no effect on the hash!
Now try this:
>>> crypt("kangaroo", salt="ab")
Changing the salt results in the same password producing a completely different hash.
By using a random salt each time a new password hash is generated, an application can prevent the same hash being stored in the password database for two users who happen to choose the same (presumably not very strong) password. If the same hash were stored, cracking that hash would compromise the accounts of both users.
Another important benefit of salting is that it makes password cracking attacks using a precomputed dictionary of hashes less feasible, by increasing the storage requirements for that dictionary. For the traditional Unix algorithm, 12 bits of salt means that 4,096 hashes would have to be computed and stored for each possible password appearing in that dictionary – something considered infeasible at the time the algorithm was devised (but certainly feasible today).
The traditional Unix algorithm is weak by today’s standards because of password truncation, salt that is too small, and its use of a weak cipher with an insufficient number of iterations. Modern approaches improve things by not truncating passwords so aggressively, and by combining the password with a much larger amount of salt, chosen using a cryptographically strong PRNG. We can also use a modern, cryptographically strong hash function and slow down hashing considerably by using a very large number of iterations. The latter is necessary to make brute-forcing with massively-parallel hardware such as GPUs less practical.
One of the modern standards is an algorithm known as PBKDF2. A convenient implementation of this is available in Python’s hashlib module.
In the Python REPL, create a password and some salt like so:
>>> password = b"kangaroo"
>>> salt = b"wxyz"
Note the ‘b’ prefix for both of these. They need to be sequences of
bytes rather than text strings. A relatively short fixed salt is used
here for testing purposes, but in a real application you would need
to generate at least 8, preferably 16, random bytes – e.g., using the
token_bytes()
function from the secrets
module.
Compute and display the hash like so:
>>> import hashlib
>>> phash = hashlib.pbkdf2_hmac("sha256", password, salt, 250000)
>>> phash.hex()
The first argument in the call to pbkdf2_hmac()
specifies the hash
function that will be used. The final argument is the number of
iterations performed using this hash function. The recommended number of
iterations increases over time, in line with increases in computational
power; it was only 1,000 when the standard was first defined twenty years
ago, but by 2013 had risen to 100,000. Version 3.1 of the Django web
framework, released in 2020, uses PBKDF2 with 216,000 iterations.
Output value phash
is a sequence of bytes, so the last of the lines above
renders it as a more user-friendly string of hex digits. You should see a
string that is 64 characters in length, indicating a 32-byte, or 256-bit,
hash – matching the choice of SHA-256 as the hash function. By default,
the size of the output matches the size of the chosen hash function,
although you can specify a different output size if you prefer.
Note: unlike the crypt()
function, the salt is not included in the output
here, so you would need to concatenate this with the hash yourself when
storing the hash in a password database.
Repeat the hash computation using a significantly larger number of iterations – e.g., try using 500,000, then 1,000,000. You should start to see a noticeable delay before the value is returned.
Choosing a value for the number of iterations is a trade-off: it needs to be big enough to offer good security against brute-forcing, but not so big that password hashing becomes a bottleneck (which could easily happen in web applications with large numbers of simultaneous users).
□