Programming assignment #4: Average height of binary trees

Course links

External links

The height of a binary search tree

As Weiss explains in chapter 18, the height of a tree is one greater than the height of its highest subtree, or zero if it has no subtrees. It is straightforward to compute the height of a binary search tree by adopting the convention that the height of null is -1 (base case) and using recursion on any non-null binary search tree to find the heights of its subtrees, taking the larger, and adding one.

Different binary search trees constructed from the same data values can have different heights. For instance, the binary search trees in Figure 19.20 on page 646 of our textbook are all constructed from the same data (the integers 1, 2, and 3), but the height of tree (c) is 1, while all the other trees have height 2.

The data in this example can be arranged in six different orders, and the structure of the constructed binary search tree depends on the order in which the data are added. Four of the six orders result in trees of height 2, while the other two orders result in tree (c), which has height 1. If we average over all possible orders, the mean height of the resulting tree is (2 + 2 + 2 + 2 + 1 + 1)/6, or 5/3.

Empirically estimating mean height

As the data sets and trees get larger, it becomes more difficult to work out by hnad what all the possible shapes of the binary search trees are and how many different orderings of the data might result in trees of each shape. One way to get some idea of how high the binary search trees will be, for data sets of a given size, is to use a random-number generator to construct random orderings of data sets of that size, actually build the binary search trees from them, and measure their height.

  1. Write a static method that takes a non-negative integer n as argument and returns an array of Integer, of size n, containing the integers from 0 to n - 1 in a random order (each wrapped as an Integer object).
  2. Write a constructor for Weiss's BinarySearchTree class that takes an array as its argument and constructs and returns a binary search tree containing the elements of the array. Add the elements in the order in which they occur in the array.
  3. Write a program that constructs one thousand random arrays of one thousand Integer values and computes and outputs the mean height of the binary search trees constructed from them.

Computing the mean

It is also possible to calculate the mean directly, rather than estimating it. The calculation is based on the observation that the first value added to a binary search tree always becomes its root, so that the number of elements in each subtree is completely determined by that choice. For each possible choice r of the root, therefore, the mean of the heights binary search trees with root r is the result of adding 1 to the mean of the heights of the left subtrees (containing elements smaller than r) or the mean of the heights of the right subtrees (containing elements larger than r), whichever is greater. But, if we're considering all possible ways of arranging the data as equally likely, any datum is equally likely to be added first and thus to become the root. So we can figure the mean height for each possible choice of root and simply average the results.

For instance, if there are four items in the data set, the root is equally likely to be the smallest, the second smallest, the second largest, or the largest datum. In the first and last of these cases, the completed binary search tree will have one subtree containing three items and one with none. The larger subtree, the one with three items, will have a mean height of 5/3, as we have seen, so the mean height of the overall tree in these cases will be 5/3 + 1, or 8/3. In the other two cases, the completed binary search tree will have one subtree containing two items and one with one. The larger subtree will have a mean height of 1 (indeed, it will always have a height of 1), so the overall tree will have a mean height of 2. Thus, combining the four cases, the mean height of a binary search tree containing four items will be (8/3 + 2 + 2 + 8/3)/4, or 7/3.

  1. Using recursion, write a static method that takes any non-negative integer as argument and returns the mean height of binary search trees for data sets of that size.
  2. If you test your method on a large argument (such as 1000), you'll probably find that it is unacceptably slow. Rewrite it so that it generates the mean heights for all sizes less than or equal to the given argument, in ascending order, and stores the results in an array. Instead of issuing recursive calls, simply look in the array to find the computed values for smaller dataset sizes.
  3. Compare your theoretical result for 1000 elements with the result of your empirical test. Are they close? If not, what went wrong?

This assignment will be due on Tuesday, April 29.