***** Trees ***** Trees are a crucial concept in computer science. When you have large amount of data, the linear access (:math:`O(N)`) of linked lists is prohibitive. We would like to have at most a complexity of :math:`O(\log N)` to access data. Trees allow us to do that, especially the binary search tree. Concept ======= A tree can be defined in several ways, but a natural is way is recursively. In this representation a tree is composed of a collection of nodes. * The collection can be empty. * Or consisting of a **root** (noted :math:`r`), * With zero or more subtrees :math:`T_1,\dots,T_k`. * Each of whose roots are connected by an edge from :math:`r`. The root of each subtree is called a **child** of :math:`r`, and :math:`r` is the parent of each subtree root. Here is a figure to visualize the structure. .. figure:: ./img/FG_04_001.* :align: center :width: 120% | This definition has some implications. * A tree is a collection of :math:`N` nodes * It has :math:`N-1` edges. * Every node has one parent, except the root. The next figure illustrates that: .. figure:: ./img/FG_04_002.* :align: center :width: 120% | OK, we have some more vocabulary to review: * Nodes without children are called **leaves**. * Nodes with the same parents are called siblings. .. admonition:: Activity :class: activity * What is the parent of F? * What are the children of F? * Gives all the leaves of the tree. * How many siblings K has? Now, we can see the last few definitions that you should already know. Path A path from node :math:`n_1` to :math:`n_k` is defined as a sequence of nodes :math:`n_1,n_2,\dots,n_k` such that :math:`n_i` is the parent of :math:`n_{i+1}` for :math:`1\leq i`. Tree traversal -------------- It is important to discuss how we can navigate through a binary tree. As the data stored in a binary tree is not ordered the different algorithm to **traverse** the tree is not based on them, but on the structure of the tree itself. We have three **tree traversals** algorithms: * Preorder: Visit root node, traverse :math:`T_L`, and traverse :math:`T_R`. * Inorder: Traverse :math:`T_L`, visit root node, and traverse :math:`T_R`. * Postorder: Traverse :math:`T_L`, traverse :math:`T_R`, and visit root node. .. admonition:: Activity :class: activity * What type of algorithms it is? * Give the visit sequence for each order for the following graph that represents the equation :math:`(x+y)*((a+b)/c)`. .. figure:: ./img/tree_traversal.drawio.png :align: center The Search Tree ADT =================== An important application of binary trees is searching. Let us assume that each node in the tree stores an item. For the following example, we will consider it stores integers. What's the difference between a binary tree and a binary search tree? In a **binary search tree** for every node :math:`X`, * the values of all the items in its *left subtree are smaller* than the item in :math:`X` * the values of all the items in its *right subtree are greater* than the item in :math:`X`. .. note:: It implies that all the elements in the tree can be ordered in some consistent manner. .. admonition:: Activity :class: activity * Are both trees a binary tree search? .. figure:: ./img/FG_04_015.svg :align: center | For the remaining, I will use the following class: .. literalinclude:: ./BinaryTree.java :language: java :linenos: :lines: 5-15, 36-39 As you can see, the tree is only composed of the root of type :code:`Node`. Searching --------- The main utility of the binary search tree is to search for a specific element inside it. Indeed, all the elements are sorted in a binary search tree, so we can exploit this to have a very efficient search algorithm. We need two methods: * a public one that has one parameter the element we want to find. * a private one, that will be recursive and compare the element with the current node. .. literalinclude:: ./BinarySearchTree.java :language: java :linenos: :lines: 35-40, 42, 44, 97-103, 114-115 The idea is very simple: * Take a node of the tree (starting with the root). * If the node is empty; return false. * If the element is the one in the current node; return true. * Otherwise: * If the element is greater, launch the recursion on the right node. * If the element is less, launch the recursion on the left node. .. admonition:: Activity :class: activity * Implement both methods: .. literalinclude:: ./BinarySearchTree.java :language: java :linenos: :lines: 42, 44, 103, 114 Why are BSTs good for searching? We will do the algorithm analysis later, but finding an element or returning false takes at most :math:`O(\log N)`. Each recursion reduce the depth of the tree by 1. We are operating on a tree that is half large. Find Min and Find Max ********************* Finding the minimum and the maximum in a binary search tree is a special case of :code:`contains`. These routines are very simple: * :code:`findMin`: Start at the root and go left as long as there is a left child. * :code:`findMax`: Same routine except it goes to the right each time. I will not show you the code, because it is very simple. Inserting --------- Binary search trees would not be as useful if we don't have a simple method to add a new node in the tree. .. note:: For now consider that each element is unique. As for the searching, we need a public method and a private one: .. literalinclude:: ./BinarySearchTree.java :language: java :linenos: :lines: 12-17, 19, 22, 116-122, 140 The insertion routine is simple: * Start with the root. * If the node is empty; insert the new element. * If the element is the one in the current node; Do nothing. * Otherwise: * If the **new** element is greater, launch the recursion on the right node. * If the **new** element is less, launch the recursion on the left node. In the following example, we insert 5 in the tree following the previous routine: .. figure:: ./img/FG_04_022.* :align: center | As you can see, nothing complicated. .. admonition:: Activity :class: activity * Implement both methods: .. literalinclude:: ./BinarySearchTree.java :language: java :linenos: :lines: 19, 22, 122, 140 * What is the time complexity of inserting a new element? Removing -------- It is very common for data structures (ADT) that the hardest operation is deletion. We know how to find an element in a binary search tree, but when the element we want to remove is found, how do we proceed? It depends where is the node. Case 1 - The node is a leaf *************************** The node you want to delete is a leaf, then we can delete it immediately. Case 2 - The node has one child ******************************* It is a simple case too. We need to link the parent of the node that we want to delete to its child. Here is a simple example: .. figure:: ./img/FG_04_024.* :align: center Deletion of a node :math:`4` with one child, before and after. Case 3 - The node has two children ********************************** The node you need to delete has two children, it makes everything more complicated. The general strategy is: * Replace the node by the one containing the smallest data in the right subtree. * Recursively delete that node. Because the smallest node in the right subtree cannot have a left child, the second :code:`remove` is an easy one. The following figure illustrates this situation: .. figure:: ./img/FG_04_025.* :align: center Deletion of a node :math:`2` with two children, before and after. The code that works for the three cases: .. literalinclude:: ./BinarySearchTree.java :language: java :linenos: :lines: 142-183 .. admonition:: Activity :class: activity * Convince yourself that it works. This implementation is not efficient. We use :code:`findMin` first then we remove this element using :code:`delete`. There is a way to implement it to avoid the repetitive work, but it's outside the scope of the review. Average-Case Analysis --------------------- Intuitively, we estimate that most operations have a time complexity of :math:`O(\log N)`. We descend a level in the tree in constant time, so the operations have a running time of :math:`O(d)`, where :math:`d` is the depth of the tree. As I said it's our intuition, but we could try to prove it. What we want to prove is that the average depth over all nodes in a tree is :math:`O(\log N)`. To do that we need a small definition. Internal path length The sum of the depths of all nodes in a tree is known as the internal path length. Proof ***** Let :math:`D(N)` be the internal path length for some tree :math:`T` of :math:`N` nodes. * :math:`D(1) = 0` * An :math:`N`-node tree consists of * an :math:`i`-node left subtree, of internal path :math:`D(i)`. * an :math:`(N-i-1)`-node right subtree, of internal path :math:`D(N - i - 1)`. * a root at depth 0, for :math:`0\leq i < N`. In the main tree, all these nodes are one level deeper than the root. We get the recurrence .. math:: D(N) = D(i) + D(N - i - 1) + N - 1 If all subtree sizes are equally likely, which is true for binary search trees (since the subtree size depends only on the relative rank of the first element inserted into the tree). .. important:: It is not true for binary trees! Then the average value of both :math:`D(i)` and :math:`D(N-i-1)` is :math:`(1/N)\sum_{j=0}^{N-1}D(j)`. This yields .. math:: D(N) = \frac{2}{N}\left[ \sum_{j=0}^{N-1}D(j) \right] + N - 1 We will not solve this recurrence today, but we obtain an average value of :math:`D(N) = O(N \log N)`. Thus the expected depth of any node is :math:`O(\log N)`. Limits ****** It is tempting to say that it implies that the average running time is :math:`O(\log N)`. However, it is not entirely true. The delete routine favours making the left subtrees deeper than the right, because we always replace the deleted node with a node from the right subtree. It is also depending on the root and the new nodes inserted in the tree. Not all binary search trees are **balanced**. It is something we will see in the next topic.