AVL Trees !

“Don’t stop when you are tired; stop when you are done” – Unknown

This post is a part of IIT-M CSE “Concept-Sunday” series .

AVL trees – a self balancing binary search tree

src : http://pearlhouseng.org/refreshed-and-restored/ ; http://www.dailymail.co.uk/sciencetech/article-3486215

Our topic for today is AVL trees – a self balancing binary search tree.

Although AVL trees have been losing their popularity to red-black trees lately, its worth exploring the same considering that its the first height balanced binary search tree

Although AVL trees have been losing their popularity to red-black trees lately, its worth exploring the same considering that its the first height balanced binary search tree. AVL is named after its two inventors, G.M. Abelson-Velvety and E.M. Landis who introduced these special trees in their paper titled “An algorithm for the organization of information.” in 1962.

Balancing act:

They say life is all about maintaining balance; balanced diet, work life balance, blah blah .. And computer scientists are not left far behind, they too jumped into this race of finding balance and coined what is known as balanced search trees. In layman’s terms when the height of left and right sub trees of a binary search tree(BST) are almost same, it becomes height balanced binary search tree.

Binary trees by themselves are not much useful so binary search trees popped up where in nodes of a tree follow the property that the key of left child node if always less than or equal to the key value of current node; similarly key value of right child node is always greater than or equal to the value of current node. This helps us search for a element in O(log n) time.

In case you are finding it hard to recall what binary search trees are and tempting to switch to new tab let me add a one liner before you do that. A binary tree organises data in a tree structure with each node having a left child and right child; there is no implicit relation between the node and its children.

Binary trees by themselves are not much useful so binary search trees popped up where in nodes of a tree follow the property that the key of left child node if always less than or equal to the key value of current node; similarly key value of right child node is always greater than or equal to the value of current node. This helps us search for a element in O(log n) time and even other operations like insertion and deletion require logarithmic time (at each node we make a binary decision discarding half of the elements).

But if the elements are presented in sorted order or some what sorted order the BST’s gets skewed with more number of nodes in left sub-tree or right sub-tree

If the elements are presented in random order, then our good old BST’s just work fine. But if the elements are presented in sorted order or some what sorted order the BST’s gets skewed with more number of nodes in left sub-tree or right sub-tree. As a consequence the tree data structure which is famous for its O(log n) behaviour starts showing O(n) behaviour  which is undesirable.   We therefore balance the height of BST after each insertion and deletion operation to make sure that we get a height balanced tree and a complexity of O(log n) for most of our operations.

Applications :

To be specific, file system directory entries are tracked using red-black trees; at the same time VMA(virtual memory areas) areas of a process are tracked using such trees. If you are a c++ programmer then how can you escape from std::set and std::map which internally uses balanced trees.

These are also known as self balancing trees because of their self balancing nature. Most commonly used balanced trees are AVL tree, red-black trees and splay. We use these self balancing trees in our every day life without our knowledge in most of the cases. Operating system kernel makes extensive use of these for memory management, file system management and networking related tasks. To be specific, file system directory entries are tracked using red-black trees; at the same time VMA(virtual memory areas) areas of a process are tracked using such trees. If you are a c++ programmer then how can you escape from std::set and std::map which internally uses balanced trees. Moreover linux kernel keeps track of identification field in ip-v4 packet header using an AVL tree indexed by peer IP address. Generally AVL trees are used for lookup intensive tasks as it performs well than red-black tress because of its strict balance criteria.

We never really get a chance to implement or use such trees because they are buried deep inside standard libraries and api’s and hence eventually fail to appreciate its beauty!

So it starts with binary trees, moves on to binary search trees and concludes at height balanced binary search trees. The single most important thing about these trees is the balancing operations which are nothing but tree rotation operations.

Enough of theory for today, lets see how AVL performs the self balancing act.

Balancing operations(LL, RR, LR, RL – rotations) :

AVL trees maintain extra bit of information in each node to support this tree balancing operation. This information associated with each node is nothing but the balance factor- the difference between height of left sub-tree and right sub-tree.

It’s time to get technical 🙂 , AVL trees maintain extra bit of information in each node to support this tree balancing operation. This information associated with each node is nothing but the balance factor – the difference between height of left sub-tree and right sub-tree. AVL tree enforces that this factor should always be one out of { -1, 0 , 1} for each node.  Any deviation from these three values will result in an unbalanced tree that we need to take care of.

To start with we assume that the tree is initially balanced such that the balance factor of each node is either of {-1, 0, 1}. The only operations that can change this state of tree is insertion and deletion; so after each insertion and deletion we check if there are nodes with undesirable factor. We then apply the balancing algorithm if there are any.

Imagine if the left sub-tree’s height is more than right sub-tree’s we simply need to move nodes from left sub-tree(or the root itself) to the right sub-tree i.e we need a right rotation. On similar lines if height of right sub-tree is more than that of left sub-tree we make a left rotation. These simple intuitive left and right rotations form the basis of rotation algorithms.

The balancing operations apply rotation to parts of tree in order to adjust the balance factor. Instead of confusing you with different types of rotations, lets look at the simple intuition that this rotation is based on.

Imagine if the left sub-tree’s height is more than right sub-tree’s we simply need to move nodes from left sub-tree(or the root itself) to the right sub-tree i.e we need a right rotation. On similar lines if height of right sub-tree is more than that of left sub-tree we make a left rotation. These simple intuitive left and right rotations form the basis of rotation algorithms.

There are two types of rotations – single rotation and double rotation. While we have already talked about single rotation, double rotation is simply combination of single rotations. There are two types of single rotations – LL rotation and RR rotation and two types of double rotations – LR rotation and RL rotation.

LL implies that a insertion has been made on left side of left sub tree (of a node whose whose balance factor is violated). This naturally implies that the height of left sub-tree is more and we need a right rotation.

If the insertion is made on left side of right sub-tree(RL-case) then we first make a right rotation for right sub-tree and then a left rotation on current node under consideration.

Single rotation – LL – RR- rotation :

avl_ll_rotation

left rotation (right-right insert)

avl_rr_rotation

right rotation(left-left insert)

LL implies that a insertion has been made on left side of left sub tree (of a node whose whose balance factor is violated). This naturally implies that the height of left sub-tree is more and we need a right rotation. Similarly RR implies that insertion has been made on right side of right sub tree and we now need a left rotation to balance the tree.

Double rotation – LR – RL – rotation :

avl_lr_rotation

Rotate left-right (L-R insertion)

avl_rl_rotation

Rotate right-left (R-L insertion)

We need double rotation(making two rotations :P) when insertion is made on left side of right sub-tree or right side of left sub-tree(of a node whose balance factor is violated). If the insertion is made on left side of right sub-tree(RL-case) then we first make a right rotation for right sub-tree and then a left rotation on current node under consideration. Similarly if a insertion is done on right side of left sub-tree(LR-case), make a left rotation on left sub-tree and then a right rotation on current node.

Similar rotations are used in case of deletion!

 

Feedback is welcome 🙂 !

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s