8. Hashing

We have seen most of the search tree that you can encounter in CS.

We will now see an ADT that you already used before the hash table ADT.

Hash tables only have a subset of the search trees operations:

  • Insertions

  • Deletions

  • Find

All the operations that require any ordering information among the elements are not supported.

8.1. General Idea

A hash table can be seen as an array containing elements.

The elements are composed of two parts:

  • The key.

  • The element itself.

Another important element of the table is the table’s size. We will see that the size is crucial part of the ADT.

 1package hashtables;
 2
 3import java.util.*;
 4
 5public class HashTable<K, V>{
 6
 7    protected static class Entry<K, V> {
 8
 9        /**
10         * The key
11         */
12        private final K key;
13        /**
14         * The value
15         */
16        private V value;
17
18        /**
19         * Creates a new key-value pair.
20         *
21         * @param key   The key
22         * @param value The value
23         */
24        public Entry(K key, V value) {
25            this.key = key;
26            this.value = value;
27        }
28
29        /**
30         * Retrieves the key.
31         *
32         * @return The key
33         */
34        public K getKey() {
35            return key;
36        }
37
38        /**
39         * Retrieves the value.
40         *
41         * @return The value
42         */
43        public V getValue() {
44            return value;
45        }
46
47        /**
48         * Sets the value.
49         *
50         * @param val The new value
51         * @return The old value
52         */
53        public V setValue(V val) {
54            V oldVal = value;
55            value = val;
56            return oldVal;
57        }
58
59        @Override
60        /** Return a string representation of this entry
61         @return "(key, value)"
62         */
63        public String toString() {
64            return "(" + key + ", " + value + ")";
65        }
66    }
67
68}

8.1.1. How does it work?

  • Each key is mapped into some number in the range of 0 to \(\text{TableSize}-1\).

  • The mapping is called a hash function.

    • The function must be simple to compute

    • And should ensure that a key is unique

Example

Imagine that we want to use a Hash Table to store names.

We could obtain the following table:

../../_images/hash_table.png

In this table John has to \(3\), Phil to \(4\).

Important

This example shows a perfect case. Some key could have the same hash!

8.2. Hash Function

This hash function is the most important part of the ADT.

Activity

  • Think about a possible function.

  • Discuss it with your neighbours.

8.2.1. A simple case

Consider a hash table that need to store integers.

Then, a simple hash function could be

\[\text{key} \mod \text{TableSize}\]

Activity

Is it always working for integers?

  • If yes, prove it.

  • Otherwise, show a counterexample.

Consider a table of size 10, if you have multiple integers ending with a zero, then it will not work.

To counter this, we always ensure that the size is prime. It distributes the keys evenly.

8.2.2. More general case

Usually keys are strings.

With strings keys the hash function is more complex and must be chosen carefully.

We will consider three examples of hash functions.

8.2.2.1. First Example

Activity

  • Write a hash function that takes a string as key and hash it by adding the ASCII value of each char and return the modulo.

int hash(const String key, int tableSize){

}

The table size needs to be large enough.

  • Consider a size of \(10 007\)

  • Suppose the keys have a size of eight or less.

  • The hash will calculate only values between 0 and \(1 016\), which is \(127\times 8\).

It does not distribute the values evenly.

8.2.2.2. Second Example

We can consider a second example.

  • Consider that the key will have at least three characters.

  • The value 27 represents the number of letters in the English alphabet (plus the blank).

  • 729 is \(27^2\).

We can implement the following function:

int hash(const String key, int tableSize){
    return (key[0] + 27*key[1] + 729*key[2]) % tableSize;
}

It could work if the word in English are random, but they are not.

We could imagine there are \(26^3 =17576\) possible combinations of three characters.

In reality only \(2851\)!

Only 28 percent of the table can actually be hashed, and increasing the size of the table doesn’t change anything.

8.2.2.3. Third Example

The following code implement the Horner’s method.

int hash(const String key, int tableSize){
    int hashVal = 0;

    for (char ch: key){
        hashVal = 37 * hashVal + ch;
    }
    return hashVal % tableSize;
}

It calculates a polynomial function of 37 to an \(n`th degree, :math:`n\) being the size of the string:

\[\sum_{i=0}^{KeySize - 1} Key\left[KeySize - i - 1 \right] \times 37^i\]

Activity

Is it a good hash function?

Important

Types in Java, and especially, the type String already possesses a method hashCode() that return the hash value.

8.3. Separate Chaining

8.3.1. How are we dealing collision?

We can have a good hash function, but it can always create a collision even if it’s uncommon.

A first strategy is known as separate chaining.

8.3.2. The idea

  • We represent the HashTable by a ArrayList (from the Java API) of list.

  • When a key has a hash that collides we add the key to the list corresponding to the vector index.

If it’s not very clear, here is an illustration:

../../_images/separate_chaining.png

Activity

  • Create the class HashTableChain.

    • Create the constructor

    • Create the private data members.

1public class HashTableChain<K, V> extends HashTable<K, V> implements KWHashMap<K, V> {
2
3    // Data Fields
4    public HashTableChain(){
5    }
6}

8.4. Example

We will see a practical example.

Consider the following class Employee:

 1public class Employee {
 2
 3    // Name of the employee
 4    private String name;
 5    // Salary of the employee
 6    private double salary;
 7    // Seniority of the employee
 8    private int seniority;
 9
10    public Employee(String name, double salary, int seniority){
11        this.name = name;
12        this.salary = salary;
13        this.seniority = seniority;
14    }
15
16    public String getName() {
17        return name;
18    }
19
20    public void setName(String name) {
21        this.name = name;
22    }
23
24    public double getSalary() {
25        return salary;
26    }
27
28    public void setSalary(double salary) {
29        this.salary = salary;
30    }
31
32    public int getSeniority() {
33        return seniority;
34    }
35
36    public void setSeniority(int seniority) {
37        this.seniority = seniority;
38    }
39    @Override
40    public int hashCode() {
41    }

Activity

  • Implement hashCode()

8.5. Implementation

There are 5 methods that need to be implemented:

  • get

  • put

  • remove

  • size

  • isEmpty

 1public interface KWHashMap<K, V> {
 2
 3    /**
 4     * Return the value associated to the key.
 5     * @param key the key
 6     * @return the value or null if the key is not present.
 7     */
 8    V get(K key);
 9
10    /**
11     * Returns true if this table contains no key-value mappings.
12     * @return true or false
13     */
14    boolean isEmpty();
15
16    /**
17     * Associated the value with the speficied key.
18     * @param key the key
19     * @param value the value
20     * @return Returns the previous value associated with the specified key, or null if there was no mapping for the key
21     */
22    V put(K key, V value);
23
24    /**
25     * Remove the mapping for this key from the table if it is present.
26     * @param key the key
27     * @return Returns the previous value associated with the specified key, or null if there was no mapping.
28     */
29    V remove(K key);
30
31    /**
32     * Return the size of the table
33     * @return the size
34     */
35    int size();
36
37}

8.5.1. isEmpty

The method is simple, return if the hash table is empty or not.

8.5.2. get

It is also simple.

  • You start by hashing the object.

  • Then, you get the list in the vector.

  • Finally, you try to find the element in this list.

8.5.3. remove

The first steps are the same that for get.

If the element is not found, return false.

Otherwise, erase the element, reduce numKeys by 1 and return true.

8.5.4. put

The first steps are the same that for get.

If the element is not inside the list, insert it.

Increase numKeys by 1, and if numKeys is greater than the size of the list rehash.

Finally return true.

Activity

Implement the methods.