Exercise 2: Hash Functions in Python & Java

This exercise gives you practice at computing hashes using the facilities offered by the standard libraries of Python and Java. If you are doing this on your own PC, there is nothing else to install, beyond having working installations of Python 3 and the Java Development Kit.

Python

  1. Create a directory for this exercise and download message.txt into it.

  2. If doing this exercise on a SoC Linux machine, activate the Anaconda Python distribution by entering the following in a terminal window:

    module load legacy-eng
    module add anaconda3/2020.11
    

    This will ensure that the python command runs Python 3.

  3. Enter python in the terminal window to run Python’s REPL (read-eval-print loop). Read the contents of message.txt into a byte string using the following code:

    >>> import pathlib
    >>> path = pathlib.Path("message.txt")
    >>> data = path.read_bytes()
    
  4. Use the hashlib module to compute and display a SHA-256 hash of the file contents, like so:

    >>> import hashlib
    >>> h = hashlib.sha256()
    >>> h.update(data)
    >>> print(h.hexdigest())
    

    Note that the update() method can be called repeatedly, to feed the hash function with multiple items of data. If you have a single chunk of data, this example can be shortened to

    print(hashlib.sha256(data).hexdigest())
    

    Note also that you can use digest() instead of hexdigest(), to get output from the hash function as raw bytes instead of a string of printable hex digits. Try this now to compare the sizes of the input to and output from the hash function. You can use len(data) to get the former and len(h.digest()) to get the latter. You’ll see that the hash is smaller than the input. The hash size never varies, regardless of how large or small the input gets.

  5. A wide range of hash functions is provided by the hashlib module. You can see what is available on your platform by examining the value of the variable hashlib.algorithms_available in the REPL.

Java

Hash functions can be accessed in Java using the MessageDigest class, which is part of the java.security package in Java’s standard library.

  1. Download Hash.java to the directory you are working in. This file contains a Java program that is supposed to compute the hash of a file named on the command line, using a hash function also specified on the command line. Currently, all it does is read bytes from the named file.

  2. Under the ‘Apply hash function’ comment, add the following:

    MessageDigest md = MessageDigest.getInstance(args[1]);
    md.update(message);
    byte[] hash = md.digest();
    

    A MessageDigest object is obtained by calling the getInstance() method, supplying the name of the desired hash function as a string. The Java documentation has a list of supported hash function names. In this code, the name comes from the second command line argument of the program.

    As with the Python example, we feed data to the hash function by calling the update() method, then call the digest() method to retrieve the computed hash.

    Check that the program compiles before continuing. If you see a compiler error, make sure that you’ve imported MessageDigest from the java.security package.

  3. Under the ‘Display hash’ comment, add code that will display the hash as a string of hex digits. Compile and run the program, specifying message.txt and SHA-256 as command line arguments. Compare its output with that obtained from Python.