8. Hashing

We have seen most of the search tree that you can encounter in CS.

We will now see an ADT that you already used before the hash table ADT.

Hash tables only have a subset of the search trees operations:

Insertions
Deletions
Find

All the operations that require any ordering information among the elements are not supported.

8.1. General Idea

A hash table can be seen as an array containing elements.

The elements are composed of two parts:

The key.
The element itself.

Another important element of the table is the table’s size. We will see that the size is crucial part of the ADT.

package hashtables;

import java.util.*;

public class HashTable<K, V>{

    protected static class Entry<K, V> {

        /**
         * The key
         */
        private final K key;
        /**
         * The value
         */
        private V value;

        /**
         * Creates a new key-value pair.
         *
         * @param key   The key
         * @param value The value
         */
        public Entry(K key, V value) {
            this.key = key;
            this.value = value;
        }

        /**
         * Retrieves the key.
         *
         * @return The key
         */
        public K getKey() {
            return key;
        }

        /**
         * Retrieves the value.
         *
         * @return The value
         */
        public V getValue() {
            return value;
        }

        /**
         * Sets the value.
         *
         * @param val The new value
         * @return The old value
         */
        public V setValue(V val) {
            V oldVal = value;
            value = val;
            return oldVal;
        }

        @Override
        /** Return a string representation of this entry
         @return "(key, value)"
         */
        public String toString() {
            return "(" + key + ", " + value + ")";
        }
    }

}

8.1.1. How does it work?

Each key is mapped into some number in the range of 0 to \(\text{TableSize}-1\).
The mapping is called a hash function.
- The function must be simple to compute
- And should ensure that a key is unique

Example

Imagine that we want to use a Hash Table to store names.

We could obtain the following table:

In this table John has to \(3\), Phil to \(4\).

Important

This example shows a perfect case. Some key could have the same hash!

8.2. Hash Function

This hash function is the most important part of the ADT.

Activity

Think about a possible function.
Discuss it with your neighbours.

8.2.1. A simple case

Consider a hash table that need to store integers.

Then, a simple hash function could be

\[\text{key} \mod \text{TableSize}\]

Activity

Is it always working for integers?

If yes, prove it.
Otherwise, show a counterexample.

Consider a table of size 10, if you have multiple integers ending with a zero, then it will not work.

To counter this, we always ensure that the size is prime. It distributes the keys evenly.

8.2.2. More general case

Usually keys are strings.

With strings keys the hash function is more complex and must be chosen carefully.

We will consider three examples of hash functions.

8.2.2.1. First Example

Activity

Write a hash function that takes a string as key and hash it by adding the ASCII value of each char and return the modulo.

int hash(const String key, int tableSize){

}

The table size needs to be large enough.

Consider a size of \(10 007\)
Suppose the keys have a size of eight or less.
The hash will calculate only values between 0 and \(1 016\), which is \(127\times 8\).

It does not distribute the values evenly.

8.2.2.2. Second Example

We can consider a second example.

Consider that the key will have at least three characters.
The value 27 represents the number of letters in the English alphabet (plus the blank).
729 is \(27^2\).

We can implement the following function:

int hash(const String key, int tableSize){
    return (key[0] + 27*key[1] + 729*key[2]) % tableSize;
}

It could work if the word in English are random, but they are not.

We could imagine there are \(26^3 =17576\) possible combinations of three characters.

In reality only \(2851\)!

Only 28 percent of the table can actually be hashed, and increasing the size of the table doesn’t change anything.

8.2.2.3. Third Example

The following code implement the Horner’s method.

int hash(const String key, int tableSize){
    int hashVal = 0;

    for (char ch: key){
        hashVal = 37 * hashVal + ch;
    }
    return hashVal % tableSize;
}

It calculates a polynomial function of 37 to an \(n`th degree, :math:`n\) being the size of the string:

\[\sum_{i=0}^{KeySize - 1} Key\left[KeySize - i - 1 \right] \times 37^i\]

Activity

Is it a good hash function?

Important

Types in Java, and especially, the type String already possesses a method hashCode() that return the hash value.

8.3. Separate Chaining

8.3.1. How are we dealing collision?

We can have a good hash function, but it can always create a collision even if it’s uncommon.

A first strategy is known as separate chaining.

8.3.2. The idea

We represent the HashTable by a ArrayList (from the Java API) of list.
When a key has a hash that collides we add the key to the list corresponding to the vector index.

If it’s not very clear, here is an illustration:

Activity

Create the class HashTableChain.
- Create the constructor
- Create the private data members.

public class HashTableChain<K, V> extends HashTable<K, V> implements KWHashMap<K, V> {

    // Data Fields
    public HashTableChain(){
    }
}

8.4. Example

We will see a practical example.

Consider the following class Employee:

public class Employee {

    // Name of the employee
    private String name;
    // Salary of the employee
    private double salary;
    // Seniority of the employee
    private int seniority;

    public Employee(String name, double salary, int seniority){
        this.name = name;
        this.salary = salary;
        this.seniority = seniority;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public double getSalary() {
        return salary;
    }

    public void setSalary(double salary) {
        this.salary = salary;
    }

    public int getSeniority() {
        return seniority;
    }

    public void setSeniority(int seniority) {
        this.seniority = seniority;
    }
    @Override
    public int hashCode() {
    }

Activity

Implement hashCode()

8.5. Implementation

There are 5 methods that need to be implemented:

get
put
remove
size
isEmpty

public interface KWHashMap<K, V> {

    /**
     * Return the value associated to the key.
     * @param key the key
     * @return the value or null if the key is not present.
     */
    V get(K key);

    /**
     * Returns true if this table contains no key-value mappings.
     * @return true or false
     */
    boolean isEmpty();

    /**
     * Associated the value with the speficied key.
     * @param key the key
     * @param value the value
     * @return Returns the previous value associated with the specified key, or null if there was no mapping for the key
     */
    V put(K key, V value);

    /**
     * Remove the mapping for this key from the table if it is present.
     * @param key the key
     * @return Returns the previous value associated with the specified key, or null if there was no mapping.
     */
    V remove(K key);

    /**
     * Return the size of the table
     * @return the size
     */
    int size();

}

8.5.1. isEmpty

The method is simple, return if the hash table is empty or not.

8.5.2. get

It is also simple.

You start by hashing the object.
Then, you get the list in the vector.
Finally, you try to find the element in this list.

8.5.3. remove

The first steps are the same that for get.

If the element is not found, return false.

Otherwise, erase the element, reduce numKeys by 1 and return true.

8.5.4. put

The first steps are the same that for get.

If the element is not inside the list, insert it.

Increase numKeys by 1, and if numKeys is greater than the size of the list rehash.

Finally return true.

Activity

Implement the methods.