******* Hashing ******* We have seen most of the search tree that you can encounter in CS. We will now see an ADT that you already used before the **hash table** ADT. Hash tables only have a subset of the search trees operations: * Insertions * Deletions * Find All the operations that require any ordering information among the elements are not supported. General Idea ============ A hash table can be seen as an array containing elements. The elements are composed of two parts: * The key. * The element itself. Another important element of the table is the table's size. We will see that the size is crucial part of the ADT. .. literalinclude:: ./HashTable.java :language: java :linenos: How does it work? ----------------- * Each key is mapped into some number in the range of 0 to :math:`\text{TableSize}-1`. * The mapping is called a **hash function**. * The function must be simple to compute * And should ensure that a key is unique .. admonition:: Example :class: example Imagine that we want to use a Hash Table to store names. We could obtain the following table: .. figure:: ./img/hash_table.* :align: center In this table `John` has to :math:`3`, `Phil` to :math:`4`. .. important:: This example shows a perfect case. Some key could have the same hash! Hash Function ============= This hash function is the most important part of the ADT. .. admonition:: Activity :class: activity * Think about a possible function. * Discuss it with your neighbours. A simple case ------------- Consider a hash table that need to store integers. Then, a simple hash function could be .. math:: \text{key} \mod \text{TableSize} .. admonition:: Activity :class: activity Is it always working for integers? * If yes, prove it. * Otherwise, show a counterexample. Consider a table of size 10, if you have multiple integers ending with a zero, then it will not work. To counter this, we always ensure that the size is prime. It distributes the keys evenly. More general case ----------------- Usually keys are strings. With strings keys the hash function is more complex and must be chosen carefully. We will consider three examples of hash functions. First Example ************* .. admonition:: Activity :class: activity * Write a hash function that takes a string as key and hash it by adding the ASCII value of each char and return the modulo. .. code-block:: java int hash(const String key, int tableSize){ } The table size needs to be large enough. * Consider a size of :math:`10 007` * Suppose the keys have a size of eight or less. * The hash will calculate only values between 0 and :math:`1 016`, which is :math:`127\times 8`. It does not distribute the values evenly. Second Example ************** We can consider a second example. * Consider that the key will have at least three characters. * The value 27 represents the number of letters in the English alphabet (plus the blank). * 729 is :math:`27^2`. We can implement the following function: .. code-block:: java int hash(const String key, int tableSize){ return (key[0] + 27*key[1] + 729*key[2]) % tableSize; } It could work if the word in English are random, but they are not. We could imagine there are :math:`26^3 =17576` possible combinations of three characters. In reality only :math:`2851`! Only 28 percent of the table can actually be hashed, and increasing the size of the table doesn't change anything. Third Example ************* The following code implement the `Horner's method `_. .. code-block:: java int hash(const String key, int tableSize){ int hashVal = 0; for (char ch: key){ hashVal = 37 * hashVal + ch; } return hashVal % tableSize; } It calculates a polynomial function of 37 to an :math:`n`th degree, :math:`n` being the size of the string: .. math:: \sum_{i=0}^{KeySize - 1} Key\left[KeySize - i - 1 \right] \times 37^i .. admonition:: Activity :class: activity Is it a good hash function? .. important:: Types in Java, and especially, the type :code:`String` already possesses a method :code:`hashCode()` that return the hash value. Separate Chaining ================= How are we dealing collision? ----------------------------- We can have a good hash function, but it can always create a collision even if it's uncommon. A first strategy is known as **separate chaining**. The idea -------- * We represent the HashTable by a ArrayList (from the Java API) of list. * When a key has a hash that collides we add the key to the list corresponding to the vector index. If it's not very clear, here is an illustration: .. figure:: ./img/separate_chaining.* :align: center .. admonition:: Activity :class: activity * Create the class :code:`HashTableChain`. * Create the constructor * Create the private data members. .. literalinclude:: ./HashTableChain.java :language: java :linenos: :lines: 5-7, 15, 17, 110 .. Hashing .. ------- .. Now we need to implement the hash function. .. In C++, it is usually done with a function object template: .. .. code-block:: cpp .. template .. class hash{ .. public: .. size_t operator() (const Key &k) const; .. }; .. It is already implemented for basic types such as :code:`int` or :code:`string`. .. .. admonition:: Example .. The code given in the third example would be written like this: .. .. code-block:: cpp .. :linenos: .. template <> .. class hash{ .. public: .. size_t operator() (const string &key){ .. size_t hashVal = 0; .. for(char ch: key){ .. hashVal = 37*hashVal+ch; .. } .. return hashVal; .. } .. }; .. The type :code:`size_t` is an unsigned integer type that represents the size of an object. .. It guarantees that it can store an array index. .. We can finally see how to implement :code:`myhash()`. .. .. code-block:: cpp .. size_t myhash(const HashObj &x) const{ .. static hash hf; .. return hf(x) % theLists.size(); .. } Example ======= We will see a practical example. Consider the following class :code:`Employee`: .. literalinclude:: ./Employee.java :language: java :linenos: :lines: 3-40, 54-55, 57 .. admonition:: Activity :class: activity * Implement :code:`hashCode()` Implementation ============== There are 5 methods that need to be implemented: * get * put * remove * size * isEmpty .. literalinclude:: ./KWHashMap.java :language: java :linenos: :lines: 3-39 isEmpty ------- The method is simple, return if the hash table is empty or not. get --- It is also simple. * You start by hashing the object. * Then, you get the list in the vector. * Finally, you try to find the element in this list. remove ------ The first steps are the same that for :code:`get`. If the element is not found, return false. Otherwise, erase the element, reduce :code:`numKeys` by 1 and return true. put --- The first steps are the same that for :code:`get`. If the element is not inside the list, insert it. Increase :code:`numKeys` by 1, and if :code:`numKeys` is greater than the size of the list :code:`rehash`. Finally return true. .. admonition:: Activity :class: activity Implement the methods.