*****
Trees
*****

Trees are a crucial concept in computer science.

When you have large amount of data, the linear access (:math:`O(N)`) of linked lists is prohibitive. 

We would like to have at most a complexity of :math:`O(\log N)` to access data. 

Trees allow us to do that, especially the binary search tree.

Concept
=======

A tree can be defined in several ways, but a natural is way is recursively.

In this representation a tree is composed of a collection of nodes.

*  The collection can be empty.
*  Or consisting of a **root** (noted :math:`r`),

   *  With zero or more subtrees :math:`T_1,\dots,T_k`.
   *  Each of whose roots are connected by an edge from :math:`r`.

The root of each subtree is called a **child** of :math:`r`, and :math:`r` is the parent of each subtree root.

Here is a figure to visualize the structure.

.. figure:: ./img/FG_04_001.*
    :align: center
    :width: 120%

|

This definition has some implications.

*  A tree is a collection of :math:`N` nodes
*  It has :math:`N-1` edges.
*  Every node has one parent, except the root.

The next figure illustrates that:

.. figure:: ./img/FG_04_002.*
    :align: center
    :width: 120%

|

OK, we have some more vocabulary to review:

*  Nodes without children are called **leaves**.
*  Nodes with the same parents are called siblings.

.. admonition:: Activity
    :class: activity

    *  What is the parent of F?
    *  What are the children of F?
    *  Gives all the leaves of the tree.
    *  How many siblings K has?

Now, we can see the last few definitions that you should already know.

Path
    A path from node :math:`n_1` to :math:`n_k` is defined as a sequence of nodes :math:`n_1,n_2,\dots,n_k` such that :math:`n_i` is the parent of :math:`n_{i+1}` for :math:`1\leq i<k`.

.. note::

    In a tree there is only one path from the root to each node.

Depth
    From any node :math:`n_i`, the depth is the length of the unique path from the root to :math:`n_i`.

Height
    The height of a node :math:`n_i` is the length of the longest path from :math:`n_i` to a leaf.

.. note::

    All leaves are at height 0.

    The height of a tree is equal to the height of the root.


.. note::

    The depth of a tree is equal to the depth of the deepest leaf; this is always equal to the height of the tree.

.. admonition:: Activity
    :class: activity

    *  What is the height of the tree?
    *  What is the depth of N?
    *  What is the height of J?


Tree Implementations
====================

Implementing a tree depends mostly of the type of tree.

For now, we can consider a general tree, you don't suppose anything on the purpose of the tree.

The first way to implement a tree would be to have in each node a link to each child.

It seems logical, but is it efficient?

Spoiler, no it is not.

*  We don't know how many nodes we will have.
*  We could waste a lot of space, if you create arrays.

A solution is to keep the children of each node in a linked list of tree nodes.

The code would be very similar to the following:

.. code-block:: java

    class TreeNode
    {
        public Object element;
        public TreeNode firstChild;
        public TreeNode nextSibling;
    }

The previous tree would be very different:


.. figure:: ./img/FG_04_004.*
    :align: center
    :width: 120%

|

Of course this type of representation has its limits, access time, etc.

Binary Trees
============

Binary trees are the most common and well-known trees in computer science.

As its name says, binary trees are trees in which no node can have more than two children.

Meaning that a non-empty binary tree has a root :math:`r` and two subtrees :math:`T_L` and :math:`T_R`. Both subtrees can be empty or not.

.. figure:: ./img/FG_04_011.*
    :align: center

|

A property of the binary trees is in average the deph is :math:`O(\sqrt N)`.

More importantly, the average depth for **binary search tree** is :math:`O(\log N)`.

.. admonition:: Activity
    :class: activity

    *  What is the worst-case depth?
    *  Give one example.

Implementation
--------------

Binary tree nodes only have at most two children.

It is possible to keep direct links to them.

The node should have the element that it contains plus two pointers to the left and right nodes.

.. admonition:: Activity
    :class: activity

    *  Create a generic class :code:`Node`.
    *  The class should contain the element of the node.
    *  It should also contain two references for the left and right nodes.
    *  Create the appropriate methods.


We will use this class inside our new class :code:`BinaryTree`.

.. literalinclude:: ./BinaryTree.java
    :language: java
    :linenos:
    :lines: 5-15, 36-39

As you can see, the tree is composed of the root of type :code:`Node<E>`.

Tree traversal
--------------

It is important to discuss how we can navigate through a binary tree.

As the data stored in a binary tree is not ordered the different algorithm to **traverse** the tree is not based on them, but on the structure of the tree itself.

We have three **tree traversals** algorithms:

*  Preorder: Visit root node, traverse :math:`T_L`, and traverse :math:`T_R`.
*  Inorder: Traverse :math:`T_L`, visit root node, and traverse :math:`T_R`.
*  Postorder: Traverse :math:`T_L`, traverse :math:`T_R`, and visit root node.

.. admonition:: Activity
    :class: activity

    *  What type of algorithms it is?
    *  Give the visit sequence for each order for the following graph that represents the equation :math:`(x+y)*((a+b)/c)`.

    .. figure:: ./img/tree_traversal.drawio.png
        :align: center

The Search Tree ADT
===================

An important application of binary trees is searching.

Let us assume that each node in the tree stores an item. For the following example, we will consider it stores integers.

What's the difference between a binary tree and a binary search tree?

In a **binary search tree** for every node :math:`X`,

*  the values of all the items in its *left subtree are smaller* than the item in :math:`X`
*  the values of all the items in its *right subtree are greater* than the item in :math:`X`.

.. note::

    It implies that all the elements in the tree can be ordered in some consistent manner.

.. admonition:: Activity
    :class: activity

    *  Are both trees a binary tree search?

    .. figure:: ./img/FG_04_015.svg
        :align: center

    |

For the remaining, I will use the following class:

.. literalinclude:: ./BinaryTree.java
    :language: java
    :linenos:
    :lines: 5-15, 36-39

As you can see, the tree is only composed of the root of type :code:`Node`.


Searching
---------

The main utility of the binary search tree is to search for a specific element inside it.

Indeed, all the elements are sorted in a binary search tree, so we can exploit this to have a very efficient search algorithm.

We need two methods:

*  a public one that has one parameter the element we want to find.
*  a private one, that will be recursive and compare the element with the current node.

.. literalinclude:: ./BinarySearchTree.java
    :language: java
    :linenos:
    :lines: 35-40, 42, 44, 97-103, 114-115


The idea is very simple:

*  Take a node of the tree (starting with the root).
*  If the node is empty; return false.
*  If the element is the one in the current node; return true.
*  Otherwise:

   *  If the element is greater, launch the recursion on the right node.
   *  If the element is less, launch the recursion on the left node.

.. admonition:: Activity
    :class: activity

    *  Implement both methods:

    .. literalinclude:: ./BinarySearchTree.java
        :language: java
        :linenos:
        :lines: 42, 44, 103, 114

Why are BSTs good for searching?
    We will do the algorithm analysis later, but finding an element or returning false takes at most :math:`O(\log N)`.

    Each recursion reduce the depth of the tree by 1.
    
    We are operating on a tree that is half large.

Find Min and Find Max
*********************

Finding the minimum and the maximum in a binary search tree is a special case of :code:`contains`.

These routines are very simple:

*  :code:`findMin`: Start at the root and go left as long as there is a left child.
*  :code:`findMax`: Same routine except it goes to the right each time.

I will not show you the code, because it is very simple.

Inserting
---------

Binary search trees would not be as useful if we don't have a simple method to add a new node in the tree.

.. note::

    For now consider that each element is unique.

As for the searching, we need a public method and a private one:

.. literalinclude:: ./BinarySearchTree.java
    :language: java
    :linenos:
    :lines: 12-17, 19, 22, 116-122, 140

The insertion routine is simple:

*  Start with the root.
*  If the node is empty; insert the new element.
*  If the element is the one in the current node; Do nothing.
*  Otherwise:

   *  If the **new** element is greater, launch the recursion on the right node.
   *  If the **new** element is less, launch the recursion on the left node.


In the following example, we insert 5 in the tree following the previous routine:

.. figure:: ./img/FG_04_022.*
    :align: center

|
         
As you can see, nothing complicated.

.. admonition:: Activity
    :class: activity

    *  Implement both methods:

    .. literalinclude:: ./BinarySearchTree.java
        :language: java
        :linenos:
        :lines: 19, 22, 122, 140

    *  What is the time complexity of inserting a new element?


Removing
--------

It is very common for data structures (ADT) that the hardest operation is deletion.

We know how to find an element in a binary search tree, but when the element we want to remove is found, how do we proceed?

It depends where is the node.

Case 1 - The node is a leaf
***************************

The node you want to delete is a leaf, then we can delete it immediately.


Case 2 - The node has one child
*******************************

It is a simple case too.

We need to link the parent of the node that we want to delete to its child.

Here is a simple example:

.. figure:: ./img/FG_04_024.*
    :align: center

    Deletion of a node :math:`4` with one child, before and after.

Case 3 - The node has two children
**********************************

The node you need to delete has two children, it makes everything more complicated.

The general strategy is:

*  Replace the node by the one containing the smallest data in the right subtree.
*  Recursively delete that node.

Because the smallest node in the right subtree cannot have a left child, the second :code:`remove` is an easy one.

The following figure illustrates this situation:

.. figure:: ./img/FG_04_025.*
    :align: center

    Deletion of a node :math:`2` with two children, before and after.


The code that works for the three cases:

.. literalinclude:: ./BinarySearchTree.java
    :language: java
    :linenos:
    :lines: 142-183

.. admonition:: Activity
    :class: activity

    *  Convince yourself that it works.

This implementation is not efficient. We use :code:`findMin` first then we remove this element using :code:`delete`.

There is a way to implement it to avoid the repetitive work, but it's outside the scope of the review.

Average-Case Analysis
---------------------

Intuitively, we estimate that most operations have a time complexity of :math:`O(\log N)`.

We descend a level in the tree in constant time, so the operations have a running time of :math:`O(d)`, where :math:`d` is the depth of the tree.

As I said it's our intuition, but we could try to prove it.

What we want to prove is that the average depth over all nodes in a tree is :math:`O(\log N)`.

To do that we need a small definition.

Internal path length
    The sum of the depths of all nodes in a tree is known as the internal path length.

Proof
*****

Let :math:`D(N)` be the internal path length for some tree :math:`T` of :math:`N` nodes.

*  :math:`D(1) = 0`
*  An :math:`N`-node tree consists of

   *  an :math:`i`-node left subtree, of internal path :math:`D(i)`.
   *  an :math:`(N-i-1)`-node right subtree, of internal path :math:`D(N - i - 1)`.
   *  a root at depth 0, for :math:`0\leq i < N`.

In the main tree, all these nodes are one level deeper than the root.

We get the recurrence 

.. math::

    D(N) = D(i) + D(N - i - 1) + N - 1

If all subtree sizes are equally likely, which is true for binary search trees (since the subtree size depends only on the relative rank of the first element inserted into the tree).

.. important::

    It is not true for binary trees!

Then the average value of both :math:`D(i)` and :math:`D(N-i-1)` is :math:`(1/N)\sum_{j=0}^{N-1}D(j)`.

This yields

.. math::

    D(N) = \frac{2}{N}\left[ \sum_{j=0}^{N-1}D(j) \right] + N - 1

We will not solve this recurrence today, but we obtain an average value of :math:`D(N) = O(N \log N)`.

Thus the expected depth of any node is :math:`O(\log N)`.

Limits
******

It is tempting to say that it implies that the average running time is :math:`O(\log N)`.

However, it is not entirely true.

The delete routine favours making the left subtrees deeper than the right, because we always replace the deleted node with a node from the right subtree.

It is also depending on the root and the new nodes inserted in the tree.

Not all binary search trees are **balanced**.

It is something we will see in the next topic.