This exercise gives you practice at computing hashes using the facilities offered by the standard libraries of Python and Java. If you are doing this on your own PC, there is nothing else to install, beyond having working installations of Python 3 and the Java Development Kit.
Create a directory for this exercise and download message.txt
into it.
If doing this exercise on a SoC Linux machine, activate the Anaconda Python distribution by entering the following in a terminal window:
module load legacy-eng
module add anaconda3/2020.11
This will ensure that the python
command runs Python 3.
Enter python
in the terminal window to run Python’s REPL
(read-eval-print loop). Read the contents of message.txt
into a
byte string using the following code:
>>> import pathlib
>>> path = pathlib.Path("message.txt")
>>> data = path.read_bytes()
Use the hashlib module to compute and display a SHA-256 hash of the file contents, like so:
>>> import hashlib
>>> h = hashlib.sha256()
>>> h.update(data)
>>> print(h.hexdigest())
Note that the update()
method can be called repeatedly, to feed the
hash function with multiple items of data. If you have a single chunk
of data, this example can be shortened to
print(hashlib.sha256(data).hexdigest())
Note also that you can use digest()
instead of hexdigest()
, to get
output from the hash function as raw bytes instead of a string of
printable hex digits. Try this now to compare the sizes of the input to
and output from the hash function. You can use len(data)
to get the
former and len(h.digest())
to get the latter. You’ll see that the
hash is smaller than the input. The hash size never varies, regardless
of how large or small the input gets.
A wide range of hash functions is provided by the hashlib
module.
You can see what is available on your platform by examining the value of
the variable hashlib.algorithms_available
in the REPL.
Hash functions can be accessed in Java using the MessageDigest
class, which is part of the java.security
package in Java’s standard library.
Download Hash.java
to the directory you are working in.
This file contains a Java program that is supposed to compute the hash
of a file named on the command line, using a hash function also
specified on the command line. Currently, all it does is read bytes
from the named file.
Under the ‘Apply hash function’ comment, add the following:
MessageDigest md = MessageDigest.getInstance(args[1]);
md.update(message);
byte[] hash = md.digest();
A MessageDigest
object is obtained by calling the getInstance()
method,
supplying the name of the desired hash function as a string. The Java
documentation has a list of supported hash function names. In this
code, the name comes from the second command line argument of the program.
As with the Python example, we feed data to the hash function by calling
the update()
method, then call the digest()
method to retrieve the
computed hash.
Check that the program compiles before continuing. If you see a compiler
error, make sure that you’ve imported MessageDigest
from the
java.security
package.
Under the ‘Display hash’ comment, add code that will display the hash
as a string of hex digits. Compile and run the program, specifying
message.txt
and SHA-256
as command line arguments. Compare its
output with that obtained from Python.
System.out.printf
to print each one, with %02x
as the formatting directive.
□