AVL Tree – Part A

DinS          Written on 2017/5/7

1. Visually, what is an AVL tree?

See featured image above.

AVL, designed by G. M. Adelson-Velsky and E. M. Landis in 1962, is an improved version of B.S.T. it can control the height of tree to O(logn), making all operations to O(logn) time. This is a great triumph in the field of data structure.

2. Balance factor(平衡因子)

As a B.S.T., AVL must meet the Search Tree Property. Besides this, there’s something called balance factor for AVL. Balance factor is defined as the difference of height between left subtree and right subtree. In a more formal way, it is defined as this:

balFac(v) = height(lc(v)) – height(rc(v))

AVL must meet this property: for any given node v, |balFac(v)| <= 1. In other words, the difference of height between left subtree and right subtree must be equal or less than one.

Let’s check whether this tree is an AVL.

balFac(5) = 2 – 3 = -1

balFac(3) = 1 – 0 = 1

balFac(2) = 0 – 0 = 0

balFac(13) = 2 – 1 = 1

balFac(7) = 0 – 1 = -1

balFac(47) = 0 – 0 = 0

balFac(10) = 0 – 0 = 0

*Notice: in some books you’ll see the height of an empty tree is defined as -1, and the height of a root-only tree defined as 0. Anyway as long as the rule is consistent, it works the same way.

This means for any node v, |balFac(v)| <= 1. This tree is indeed an AVL.

3. Why balance factor helps control the height of tree?

OK, we have balance factor now, but what about the height? What we care about is the height of tree, not balance factor. In fact, as long as we have |balFac(v)| <= 1, we can show that for an AVL with height h, it at least contains fib(h+2) -1 nodes. Fib(x) is the Fibonacci sequence. To prove this we need to do some math here.

Proof:

Define S(h) as the least possible size of an AVL with height h. Size means how many nodes the tree has. |S| means the size of tree S.

Suppose we have an AVL called S with height h and has the least possible size. As it is a tree, it has root node r, and left subtree Sl and right subtree Sr. Then we have

|S| = 1 + | Sl | + | Sr |

Since S has height h, either Sl or Sr must have height h – 1, otherwise S won’t have height h. Considering the rule of balance factor, we know that the difference of height for the two subtrees can’t be bigger than 1. As a result, the height of the two subtrees must be h-1 and h-2. In addition, for S to have the least possible size, Sl and Sr also have to be the least possible size.

Now we have

S(h) = 1 + S(h-1) + S(h-2)

Add 1to both sides and we get

S(h) + 1 = [S(h-1) + 1] + [S(h-2) + 1]

Let T(h) = S(h) + 1. We get

T(h) = T(h-1) + T(h-2)

From the format we can conclude that T(h) must be Fibonacci number.

Now let’s consider the initial condition.

For h=0, tree contains no nodes, S(0) = 0. T(0) = 1.

For h=1, tree contains only the root node, S(1) = 1. T(1) = 2.

For h=2, the least possible size of tree is 2 nodes, one root plus a single child, S(2) = 2. T(2) = 3.

For h=3, the least possible size of AVL is 4 nodes, one root with 2 children. One child has a single child. S(3) = 4. T(3) = 5.

We know that the Fibonacci sequence is 1,           1,            2,          3,          5,   …

|                                                                                      fib(1),  fib(2),  fib(3),  fib(4),  fib(5) …

We can rewrite it as                                              1,       T(0),        T(1),   T(2),       T(3)  …

From here we can conclude that T(h) = fib(h+2)

S(h) = fib(h+2) – 1

Thus we have proved that for an AVL with height h, it at least contains fib(h+2) -1 nodes.

Intuitively, if AVL of height h has a lower bound of size, we know that it can’t degenerate to a list. But what’s the exact height?

The least possible size is just n. Based on the Fibonacci formula, we get

We know that this is an exponential express, thus we can get

h = O(logn)

Finally we have proved that the AVL can control the height of tree to O(logn)!

4. Search

AVL is basically a B.S.T., and the balance factor has nothing to do with Search Tree Property. In short, the search operation in AVL is exactly the same in a normal B.S.T.

This article continues at AVL Tree – Part B.