******* B-Trees ******* In this topic we will discuss nonbinary trees. We will start with the *2-3-tree* that is a simpler version of the general B-tree. 2-3-tree ======== A *2-3-tree* is a tree that has 2 or 3 children for each node. These nodes are called *2-nodes* and *3-nodes*: * 2-nodes: * There are similar to the binary tree search nodes. * They contain one attribute and references to two children. .. figure:: ./img/2_trees.drawio.png :align: center * 3-nodes: * There are different as they contain two attributes ordered so the first one is less than the second. It also references three children. * The first child containing a value less than the first attribute. * The second has a value between the first and second attributes. * And the last has a value greater than the second attribute. .. figure:: ./img/3_trees.drawio.png :align: center .. important:: The main property is that all the leaves are at the lowest level of the tree. This is how this type of tree maintain the balance! .. admonition:: Activity :class: activity * Consider the following tree. * Ensure that the property of a 2-3-tree is respected. .. figure:: ./img/2_3_tree_example.drawio.png :align: center | * Think of a search strategy to find an element in a 2-3-tree. Operations ---------- 2-3-trees are different from the regular binary search trees that we covered until now, so we need to redefine all the operations: * Search * Insert * Remove Search ------ The search algorithm is more similar to the binary search tree that we could expect. The pseudocode of the search can be written as follow: .. code-block:: python :linenos: if the local root is null: return null else if 2-node: if item = data1: return data1 else if item < data1: recursively search the left subtree else: recursively search the right subtree else: if item = data1: return data1 else if item = data2: return data2 else if item < data1: recursively search the left subtree else if item < data2: recursively search the middle subtree else: recursively search the right subtree .. admonition:: Activity :class: activity * Apply the previous algorithm to find if :code:`13` is in the following tree. .. figure:: ./img/2_3_tree_example.drawio.png :align: center Insertion --------- When we insert a new item, we need to make sure that the tree stay balanced. The main difference with 2-3-trees is that the balance is down by building the tree from the bottom up and not the top down. The idea: * Start by finding where to insert the new item by using the search method defined previously. * Insert the new item **inside** a node and **not attached** to a node. Contrary to a regular binary search tree, we can be in three cases: * Case 1: Inserting into a 2-node leaf. * Case 2: Inserting into a 3-node leaf with 2-node parent. * Case 3: Inserting into a 3-node leaf with 3-node parent. Case 1: 2-node leaf ******************* This is the easiest case. If the leaf is a 2-node then there is enough *space* to insert an element inside it, thus creating a 3-node. .. admonition:: Example :class: example * Consider the tree on the left. * If we insert :code:`15` inside it, then we find a 2-node, :code:`11`, that can accept a new item inside it. .. figure:: ./img/2_3_tree_insert_case_1.drawio.png :align: center Case 2: 3-node leaf with 2-node parent ************************************** This case happens when the node we want to insert the item is already a 3-node and the parent is a 2-node parent. As 3-node cannot store three values, the strategy is to transform the parent node (which is a 2-node) to a 3-node by inserting the middle value in the parent directly. However, if we do that the smallest item is not at the correct position in the tree, so we need to create a middle node for the parent containing the smallest value. .. admonition:: Example :class: example * Consider the tree in the previous example. * If we insert :code:`17` inside it, then we find a 3-node :code:`11, 15`. * This node cannot contain three values, so we take the middle value :code:`15` and insert it in the parent :code:`7`. * Now, :code:`11` is misplaced, so create a middle node of :code:`7, 15` with :code:`11`. * The following trees shows the insertion step by step: .. figure:: ./img/2_3_tree_insert_case_2.drawio.png :align: center .. admonition:: Activity :class: activity * Insert :code:`5`, :code:`10` and :code:`20`. Case 3: 3-node leaf with 3-node parent ************************************** This case happens when the node we want to insert the item is already a 3-node and the parent is also a 3-node parent. As 3-node cannot store three values, we need to propagate up the middle value of the virtual node, as seen before. However, the parent node is also a 3-node, so we apply the same strategy and propagate up again. .. admonition:: Example :class: example * Consider the following example. .. figure:: ./img/case_3/2_3_tree_insert_case_3_1.drawio.png :align: center | * Now, if we insert :code:`13`, then a 3-node get a third value. .. figure:: ./img/case_3/2_3_tree_insert_case_3_2.drawio.png :align: center | * To solve that we propagate up the middle value (:code:`11`) of the node. .. figure:: ./img/case_3/2_3_tree_insert_case_3_3.drawio.png :align: center | * The parent was a 3-node, so we repeat the strategy and propagate up the middle value, creating a new root with two children. .. figure:: ./img/case_3/2_3_tree_insert_case_3_4.drawio.png :align: center | * We can see that the properties of a 2-3-tree are followed. The general pseudocode for insertion can be written as follow: .. code-block:: if root is null: Create a 2-node with the new item else if item in root: return False else if root is leaf: if root is 2-node: expand 2-node to 3 and insert item else: Create two 2-nodes and propagate up the new parent and the right child. else: if item < smaller item in local root: insert left else if local root is 2-node: insert right else if item < larger item in local root: insert middle else: insert right if new parent was propagated up: if new parent will be the tree root: Create 2-node with new parent Left child is old parent Right child is the propagated up child The 2-node becomes the new parent else: Insert new parent at local root return True Analysis -------- Before going further we need to discuss the advantages of a 2-3-tree. .. admonition:: Activity :class: activity * Discuss the advantages of a 2-3-tree against AVL trees or Red-Black Trees. The number of items that can contain a 2-3-tree of height :math:`h` is between :math:`2^h - 1` and :math:`3^h - 1`. Meaning that the height of 2-3-tree is between :math:`\log_3 n` and :math:`\log_2 n`. Thus the search time is :math:`O(\log n)`. Deleting -------- Deleting an item is very similar to inserting an item. The idea: * Find the item to remove. * If the item is a leaf -> remove it * Otherwise we swap it with its inorder predecessor in a leaf node and delete it in the leaf node. * If removing an item empty a leaf. * Items from the sibling and parent can be redistributed into that leaf. * Or the leaf can be merged with its parent and sibling. .. admonition:: Example :class: example * Consider the following tree: .. figure:: ./img/2_3_tree_delete/delete_1.drawio.png :align: center | * If we remove :code:`13`, the node becomes empty and :code:`15` in the parent node will not have left child. * We can merge :code:`15` with the right child creating the virtual node :code:`15, 17, 19`. * Then, we propagate up :code:`17` similarly to the insertion strategy. .. figure:: ./img/2_3_tree_delete/delete_2.drawio.png :align: center | * Now we remove :code:`11`. * As it's not a leaf we replace it with its predecessor :code:`9`. * Now the leaf is empty, so we merge its right child :code:`15`. .. figure:: ./img/2_3_tree_delete/delete_3.drawio.png :align: center | * Finally, we remove :code:`1`. * First we merge :code:`3` with its right child :code:`5`; it empties :code:`3`. * So the process repeat with :code:`7` and its right child :code:`17`. * It becomes the root node. .. figure:: ./img/2_3_tree_delete/delete_4.drawio.png :align: center | B-Trees ======= The 2-3-tree is inspired the B-trees which allow up to :math:`n` children per node, where :math:`n` can be a very large number. Originally, B-trees are designed to build indexes to very large databases stored on a hard disk. In the B-tree, the maximum number of children is the order of the B-tree. The general properties for B-trees are: * Other than the root, each node has between :code:`order/2` and :code:`order` children. * The number of items in a node is :code:`number_of_children - 1`. * The data items in each node are ordered. Below is an example of a B-tree: .. figure:: ./img/b-tree/b_tree_example.drawio.png :align: center | .. admonition:: Actvity :class: activity * What is the order of the B-Tree? Insertion --------- The insertion is similar to the 2-3-trees: * The new item is always inserted in a leaf. * If the leaf is not full, we are done. * Otherwise, if the leaf is full it is split into two nodes with the middle node propagated up. * If the root is being split, then the middle item becomes the new root. .. admonition:: Example :class: example * If we take the previous B-tree and we insert :code:`17` we obtain the following new tree. .. figure:: ./img/b-tree/b_tree_insert_example.drawio.png :align: center | The pseudocode for insertion is: .. code-block:: if the root is null Create a new Node that contains item else search item in local root: if item in local root: return False else: if local root is leaf: if local root is null: insert new item return null as the newChild and True to indicate successful insertion else: split local root return newParent and newChild and True to indicate successful insertion else: recursively call insert if newChild is not null: if local root not null insert newParent and newChild in local root return null as the newChild and True to indicate successful insertion else: split local root return newParent and newChild and True to indicate successful insertion else: return success/fail indicator for the insertion .. important:: The code and pseudocode is the hardest covered in class. You're not expected to be able to implement it, but you should understand how it works. Deletion -------- Removing an item in a B-tree is a generalization of a removing an item from a 2-3-tree. We can distinguish two main cases: * The item is in a leaf. * Then we just remove it. * If the item is in an interior node: * It can't be deleted otherwise it will damage the B-tree properties. * The item must be replaced by its inorder predecessor. We also need to consider the case in which deleting an item in a leaf results in a leaf being half full. * In this case, items from a sibling node and parent are redistributed into that leaf. * However if the sibling is itself exactly half full, the leaf, its parent item and siblings are merged into a single node. .. admonition:: Example :class: example * Consider the following B-tree. * If we remove :code:`40`, we are in the second case man case. .. figure:: ./img/b-tree/b_tree_delete_01.drawio.png :align: center | * We replace its inorder predecessor. * In this case :code:`39`. .. figure:: ./img/b-tree/b_tree_delete_02.drawio.png :align: center | * Now, we want to remove :code:`18`. * If we remove it, the leaf will have less than half full. * Because the sibling is exactly half full, we can merge the parent item and the leaf together. .. figure:: ./img/b-tree/b_tree_delete_03.drawio.png :align: center | * Once done, the parent node containing :code:`10` is less than half full. * The sibling is also exactly half full, so we repeat the procedure. .. figure:: ./img/b-tree/b_tree_delete_04.drawio.png :align: center | * The B-tree doesn't break any B-tree properties so we can stop.