Topic 3 - Tree Review

Trees are a crucial concept in computer science.

I consider that you already have a good knowledge of simple trees, such as binary trees and ADT trees.

However, I will review them quickly, so we can compare them to more advanced trees.

When you have large amount of data, the linear access (\(O(N)\)) of linked lists is prohibitive.

We would like to have at most a complexity of \(O(\log N)\) to access data.

Trees allow us to do that, especially the binary search tree.

Concept review

A tree can be defined in several ways, but a natural is way is recursively.

In this representation a tree is composed of a collection of nodes.

  • The collection can be empty.

  • Or consisting of a root (noted \(r\)),

    • With zero or more subtrees \(T_1,\dots,T_k\).

    • Each of whose roots are connected by an edge from \(r\).

The root of each subtree is called a child of \(r\), and \(r\) is the parent of each subtree root.

Here is a figure to visualize the structure.

_images/FG_04_001.svg

This definition has some implications.

  • A tree is a collection of \(N\) nodes

  • It has \(N-1\) edges.

  • Every node has one parent, except the root.

The next figure illustrates that:

_images/FG_04_002.svg

OK, we have some more vocabulary to review:

  • Nodes without children are called leaves.

  • Nodes with the same parents are called siblings.

Activity

  • What is the parent of F?

  • What are the children of F?

  • Gives all the leaves of the tree.

  • How many siblings K has?

Now, we can see the last few definitions that you should already know.

Path

A path from node \(n_1\) to \(n_k\) is defined as a sequence of nodes \(n_1,n_2,\dots,n_k\) such that \(n_i\) is the parent of \(n_{i+1}\) for \(1\leq i<k\).

Note

In a tree there is only one path from the root to each node.

Depth

From any node \(n_i\), the depth is the length of the unique path from the root to \(n_i\).

Height

The height of a node \(n_i\) is the length of the longest path from \(n_i\) to a leaf.

Note

All leaves are at height 0.

The height of a tree is equal to the height of the root.

Note

The depth of a tree is equal to the depth of the deepest leaf; this is always equal to the height of the tree.

Activity

  • What is the height of the tree?

  • What is the depth of N?

  • What is the height of J?

Tree Implementations

Implementing a tree depends mostly of the type of tree.

For now, we can consider a general tree, you don’t suppose anything on the purpose of the tree.

The first way to implement a tree would be to have in each node a link to each child.

It seems logical, but is it efficient?

Spoiler, no it is not.

  • We don’t know how many nodes we will have.

  • We could waste a lot of space, if you create arrays.

A solution is to keep the children of each node in a linked list of tree nodes.

The code would be very similar to the following:

struct TreeNode
{
    Object element;
    TreeNode *firstChild;
    TreeNode *nextSibling;
};

The previous tree would be very different:

_images/FG_04_004.svg

Of course this type of representation has its limits, access time, etc…

Binary Trees

Binary trees are the most common and well-known trees in computer science.

As its name says, binary trees are trees in which no node can have more than two children.

Meaning that a non-empty binary tree has a root \(r\) and two subtrees \(T_L\) and \(T_R\). Both subtrees can be empty or not.

_images/FG_04_011.svg

A property of the binary trees is in average the deph is \(O(\sqrt N)\).

More importantly, the average depth for binary search tree is \(O(\log N)\).

Activity

  • What is the worst-case depth?

  • Give one example.

Implementation

Binary tree nodes only have at most two children.

It is possible to keep direct links to them.

The node should have the element that it contains plus two pointers to the left and right nodes.

Activity

  • Create a template class BinaryNode.

  • The class should contain the element of the node.

  • It should also contain two pointers for the left and right nodes.

  • Create the appropriate methods.

The Search Tree ADT

An important application of binary trees is searching.

Let us assume that each node in the tree stores an item. For the following example, we will consider it stores integers.

What’s the difference between a binary tree and a binary search tree?

In a binary search tree for every node \(X\),

  • the values of all the items in its left subtree are smaller than the item in \(X\)

  • the values of all the items in its right subtree are greater than the item in \(X\).

Note

It implies that all the elements in the tree can be ordered in some consistent manner.

Activity

  • Are both trees a binary tree search?

_images/FG_04_015.svg

For the remaining, I will use the following class:

 1template <typename Comparable>
 2class BinarySearchTree
 3{
 4
 5  private:
 6    struct BinaryNode
 7    {
 8        Comparable element;
 9        BinaryNode *left;
10        BinaryNode *right;
11
12        BinaryNode( const Comparable & theElement, BinaryNode *lt, BinaryNode *rt )
13          : element{ theElement }, left{ lt }, right{ rt } { }
14        
15        BinaryNode( Comparable && theElement, BinaryNode *lt, BinaryNode *rt )
16          : element{ std::move( theElement ) }, left{ lt }, right{ rt } { }
17    };
18
19    BinaryNode * clone( BinaryNode *t ) const;

As you can see, the tree is only composed of the root of type BinaryNode.

Searching

The main utility of the binary search tree is to search for a specific element inside it.

Indeed, all the elements are sorted in a binary search tree, so we can exploit this to have a very efficient search algorithm.

We need two methods:

  • a public one that has one parameter the element we want to find.

  • a private one, that will be recursive and compare the element with the current node.

 1template <typename Comparable>
 2class BinarySearchTree
 3{
 4  public:
 5
 6    /**
 7     * Returns true if x is found in the tree.
 8     */
 9    bool contains( const Comparable & x ) const;
10  private:
11
12    /**
13     * Internal method to test if an item is in a subtree.
14     * x is item to search for.
15     * t is the node that roots the subtree.
16     */
17    bool contains( const Comparable & x, BinaryNode *t ) const;
18};

The idea is very simple:

  • Take a node of the tree (starting with the root).

  • If the node is empty; return false.

  • If the element is the one in the current node; return true.

  • Otherwise:

    • If the element is greater, launch the recursion on the right node.

    • If the element is less, launch the recursion on the left node.

Activity

  • Implement both methods:

1bool BinarySearchTree::contains( const Comparable & x ) const
2{
3}
4bool BinarySearchTree::contains( const Comparable & x, BinaryNode *t ) const
5{
6}
Why are BSTs good for searching?

We will do the algorithm analysis later, but finding an element or returning false takes at most \(O(\log N)\).

Each recursion reduce the depth of the tree by 1.

We are operating on a tree that is half large.

Find Min and Find Max

Finding the minimum and the maximum in a binary search tree is a special case of contains.

These routines are very simple:

  • findMin: Start at the root and go left as long as there is a left child.

  • findMax: Same routine except it goes to the right each time.

I will not show you the code, as it is very simple.

Inserting

Binary search trees would not be as useful if we don’t have a simple method to add a new node in the tree.

Note

For now consider that each element is unique.

As for the searching, we need a public method and a private one:

 1template <typename Comparable>
 2class BinarySearchTree
 3{
 4  public:
 5    /**
 6     * Insert x into the tree; duplicates are ignored.
 7     */
 8    void insert( const Comparable & x );
 9  private:
10    /**
11     * Internal method to insert into a subtree.
12     * x is the item to insert.
13     * t is the node that roots the subtree.
14     * Set the new root of the subtree.
15     */
16    void insert( const Comparable & x, BinaryNode * & t );
17};

The insertion routine is simple:

  • Start with the root.

  • If the node is empty; insert the new element.

  • If the element is the one in the current node; Do nothing.

  • Otherwise:

    • If the new element is greater, launch the recursion on the right node.

    • If the new element is less, launch the recursion on the left node.

In the following example, we insert 5 in the tree following the previous routine:

_images/FG_04_022.svg

As you can see, nothing complicated.

Activity

  • Implement both methods:

1void BinarySearchTree::insert( const Comparable & x )
2{
3}
4void BinarySearchTree::insert( const Comparable & x, BinaryNode * & t )
5{
6}
  • What is the time complexity of inserting a new element?

Removing

It is very common for data structures (ADT) that the hardest operation is deletion.

We know how to find an element in a binary search tree, but when the element we want to remove is found, how do we proceed?

It depends where is the node.

Case 1 - The node is a leaf

The node you want to delete is a leaf, then we can delete it immediately.

Case 2 - The node has one child

It is a simple case too.

We need to link the parent of the node that we want to delete to its child.

Here is a simple example:

_images/FG_04_024.svg

Deletion of a node \(4\) with one child, before and after.

Case 3 - The node has two children

The node you need to delete has two children, it makes everything more complicated.

The general strategy is:

  • Replace the node by the one containing the smallest data in the right subtree.

  • Recursively delete that node.

Because the smallest node in the right subtree cannot have a left child, the second remove is an easy one.

The following figure illustrates this situation:

_images/FG_04_025.svg

Deletion of a node \(2\) with two children, before and after.

The code that works for the three cases:

 1void BinarySearchTree::remove( const Comparable & x, BinaryNode * & t )
 2{
 3    if( t == nullptr )
 4        return;   // Item not found; do nothing
 5    if( x < t->element )
 6        remove( x, t->left );
 7    else if( t->element < x )
 8        remove( x, t->right );
 9    else if( t->left != nullptr && t->right != nullptr ) // Two children
10    {
11        t->element = findMin( t->right )->element;
12        remove( t->element, t->right );
13    }
14    else
15    {
16        BinaryNode *oldNode = t;
17        t = ( t->left != nullptr ) ? t->left : t->right;
18        delete oldNode;
19    }
20}

Activity

  • Convince yourself that it works.

This implementation is not efficient. We use findMin first then we remove this element using remove.

There is a way to implement it to avoid the repetitive work, but it’s outside the scope of the review.

Average-Case Analysis

Intuitively, we estimate that most operations have a time complexity of \(O(\log N)\).

We descend a level in the tree in constant time, so the operations have a running time of \(O(d)\), where \(d\) is the depth of the tree.

As I said it’s our intuition, but we could try to prove it.

What we want to prove is that the average depth over all nodes in a tree is \(O(\log N)\).

To do that we need a small definition.

Internal path length

The sum of the depths of all nodes in a tree is known as the internal path length.

Proof

Let \(D(N)\) be the internal path length for some tree \(T\) of \(N\) nodes.

  • \(D(1) = 0\)

  • An \(N\)-node tree consists of

    • an \(i\)-node left subtree, of internal path \(D(i)\).

    • an \((N-i-1)\)-node right subtree, of internal path \(D(N - i - 1)\).

    • a root at depth 0, for \(0\leq i < N\).

In the main tree, all these nodes are one level deeper than the root.

We get the recurrence

\[D(N) = D(i) + D(N - i - 1) + N - 1\]

If all subtree sizes are equally likely, which is true for binary search trees (since the subtree size depends only on the relative rank of the first element inserted into the tree).

Important

It is not true for binary trees!

Then the average value of both \(D(i)\) and \(D(N-i-1)\) is \((1/N)\sum_{j=0}^{N-1}D(j)\).

This yields

\[D(N) = \frac{2}{N}\left[ \sum_{j=0}^{N-1}D(j) \right] + N - 1\]

We will not solve this recurrence today, but we obtain an average value of \(D(N) = O(N \log N)\).

Thus the expected depth of any node is \(O(\log N)\).

Limits

It is tempting to say that it implies that the average running time is \(O(\log N)\).

However, it is not entirely true.

The delete routine favours making the left subtrees deeper than the right, because we always replace the deleted node with a node from the right subtree.

It is also depending on the root and the new nodes inserted in the tree.

Not all binary search trees are balanced.

It is something we will see in the next topic.