Tuesday, November 11, 2008

Trees Comparison

We have seen several ways to construct dictionaries: hash tables, unbalanced binary search trees, red-black trees, and skip lists. There are several factors that influence the choice of an algorithm:

· Sorted output. If sorted output is required, then hash tables are not a viable alternative.

Entries are stored in the table based on their hashed value, with no other ordering. For

binary trees, the story is different. An in-order tree walk will produce a sorted list. For

example:

void WalkTree(Node *P) {

if (P == NIL) return;

WalkTree(P->Left);

/* examine P->Data here */

WalkTree(P->Right);

}

WalkTree(Root);

To examine skip list nodes in order, simply chain through the level-0 pointers. For

example:

Node *P = List.Hdr->Forward[0];

while (P != NIL) {

/* examine P->Data here */

P = P->Forward[0];

}

· Space. The amount of memory required to store a value should be minimized. This is

especially true if many small nodes are to be allocated.

¨ For hash tables, only one forward pointer per node is required. In addition, the

hash table itself must be allocated.

¨ For red-black trees, each node has a left, right, and parent pointer. In addition, the

color of each node must be recorded. Although this requires only one bit, more

space may be allocated to ensure that the size of the structure is properly aligned.

Therefore each node in a red-black tree requires enough space for 3-4 pointers.

¨ For skip lists, each node has a level-0 forward pointer. The probability of having

a level-1 pointer is ½. The probability of having a level-2 pointer is ¼. In

general, the number of forward pointers per node is

· Time. The algorithm should be efficient. This is especially true if a large dataset is

expected. Table 3-2 compares the search time for each algorithm. Note that worst-case

behavior for hash tables and skip lists is extremely unlikely. Actual timing tests are

described below.

· Simplicity. If the algorithm is short and easy to understand, fewer mistakes may be made.

This not only makes your life easy, but the maintenance programmer entrusted with the

task of making repairs will appreciate any efforts you make in this area. The number of

statements required for each algorithm is listed in Table 3-2.

n 1

1

2

1

4

= + + + L =2 .

- 28 -

method statements average time worst-case time

hash table 26 O (1) O (n)

unbalanced tree 41 O (lg n) O (n)

red-black tree 120 O (lg n) O (lg n)

skip list 55 O (lg n) O (n)

Table 3-2: Comparison of Dictionaries

Average time for insert, search, and delete operations on a database of 65,536 (216) randomly

input items may be found in Table 3-3. For this test the hash table size was 10,009 and 16 index

levels were allowed for the skip list. Although there is some variation in the timings for the four

methods, they are close enough so that other considerations should come into play when

selecting an algorithm.

method insert search delete

hash table 18 8 10

unbalanced tree 37 17 26

red-black tree 40 16 37

skip list 48 31 35

Table 3-3: Average Time (ms), 65536 Items, Random Input

order count hash table unbalanced tree red-black tree skip list

16 4 3 2 5

random 256 3 4 4 9

input 4,096 3 7 6 12

65,536 8 17 16 31

16 3 4 2 4

ordered 256 3 47 4 7

input 4,096 3 1,033 6 11

65,536 7 55,019 9 15

Table 3-4: Average Search Time (us)

Table 3-4 shows the average search time for two sets of data: a random set, where all valuesare unique, and an ordered set, where values are in ascending order. Ordered input creates a worst-case scenario for unbalanced tree algorithms, as the tree ends up being a simple linked list.The times shown are for a single search operation. If we were to search for all items in a database of 65,536 values, a red-black tree algorithm would take .6 seconds, while an unbalanced tree algorithm would take 1 hour.

No comments: