Outline of Class 45: Priority Queues and Heaps
Held: Wednesday, April 22, 1998
 Reminder: our next exam is scheduled for Tuesday, April 28. I will
not give another assignment until after that exam, and the next
assignment will be our final assignment. I've now completed a
review sheet for that exam.
 I encourage those of you who haven't read the Kane and Krukowski
reports to do so. Learn about what people are recommending for
Grinnell.
 On Friday, May 1, the department will be hosting a picnic at Merrill
park. Sign up to have fun with the majors, nonmajors, faculty, and
families. It's gotta be more fun than the MathLAN, right?
 Start on Bailey, Chapter 13 (Dictionaries) for Friday's class.
 I've been asked to remind you that today is Earth Day. Hug a tree,
clean a park, or do whatever else you deem appropriate. I'm working
on recycling my office.
 Recall that in our discussion of linear structures, we suggested that
there could be a number of policies for determining which element is
removed. These included FIFO (first in, first out) and LIFO
(last in, first out).
 In an priority queue, the element removed is the least
(or, perhaps, greatest) element in the structure.
 As in our design of other structures, we need to consider the
efficiency of the various operations.
 It may be a goal of the data structure designer to make
removeLeast()
as efficient as possible.
 It may be a goal of the data structure designer to make
insert()
as efficient as possible.
 It may be a goal to balance the various methods
 or ...
 The design choices may be illustrated by an attempt to create
a
PriorityQueue
class.
 We could ensure that the elements in the vector are always in order
(smallest to largest), so that
removeLeast()
removes the
first element.

insert()
is then an O(n) operation as we may have to
shift all the elements to insert an element in the "correct" place.

removeLeast()
is also an O(n) operation, as we need to
shift everything left after the removal.
 We could ensure that the elements in the vector are always in order
(largest to smallest), so that
removeLeast()
removes the
last element.

insert()
is still an O(n) operation as we may still have
to shift.

removeLeast()
becomes an O(1) operation.
 We could leave the elements in the vector unordered, and run a
min()
routine to find the smallest element.

insert()
is either O(1) or O(n), depending on whether or
not we have to grow the vector.

removeLeast()
is now an O(n) operation.
 If we restrict ourselves to binary trees (particularly complete binary
trees), it is relatively easy to implement trees with arrays.
 How?
 Assume we have a complete binary tree in that every interior (nonleaf)
node has exactly two children.
 Number the nodes in the tree in a lefttoright, preorder, breadthfirst
traversal.
 This numbering gives you the positions in the array for each element.
 (If you don't want to build complete trees and are willing to waste space,
you can store a special value to represent "nothing presently at this
position".)
 As Bailey suggests in Section 11.4.1, this provides a very convenient
way of figuring out where children belong.
 The root of the tree is in location 0.
 The left child of an element stored at location i can
be found in location 2*i+1.
 The right child of an element stored at location i can
be found in location 2*i+2 (also representable as
2*(i+1).
 The parent of an element stored at location i can be
found at location
floor
((i1)/2).
 Can we prove all this?
 The root is obviously at position 0.
 We may be able to prove the child property by induction.
 We may need to induce on both level of tree and position within
that level.
 It may help to have an additional property that we'll also
prove using induction. The first element at level i is
at position (2^i)1.
 The root is on level 0 and in position 0. 2^01 is 0.
 Assuming this property holds for all i between 0
and k, we need to prove it for k+1.
 The first element on level k+1 appears immediately
after the last element on level k (by traversal
order).
 The first element on level k appears at position
(2^k)1 (induction hypotheses).
 There are 2^k elements on level k (because
it's a complete tree; this might also be proved by induction).
 The last element on level k is at position
(2^k)1+2^k1 (by previous results)
 The first element on level k+1 is at position
(2^k)1+2^k1+1 (by previous results)
 This can be simplified to 2^(k+1)1.
 Using this result, we can prove the child property based on
induction over position within level.
 The initial (0th) element on level k is at position
2^k1. It's left child is the initial element on
level k+1 which is at position 2^(k+1)1.
2*(2^k1)+1 = 2^(k+1)1.
It's right child is the next element, which is therefore at
position 2^(k+1), which is the second form given above.
 The inductive part is trivial.
 We can prove the parent property based on the child property.
 Nodes in odd positions (of the form 2x+1) are left children.
Their parents are at position x.
 Nodes in even positions (of the form 2x) are right children.
Their parents are at position x1.
 That fun expression unifies these two concepts.
 These properties make it simple to move the cursor around the tree
and to get values. However, they do make it more difficult to
do some operations. For example,
setSubtree
might
require modifying a large number of cells (since we've decided that
it deletes the old subtree).
 Heaps are a particular form of binary tree designed to provide
quick access to the smallest element in the tree.
 Heaps are yet another structure that have multiple definitions;
I'll use one slightly different from Bailey's.
 A heap is
 a binary tree,
 that is nearly complete in that
 at most one node has one child (the rest have zero or two)
 the nodes on the last level are at the lefthandside of the
level
 and that has the heap property: the value stored in
each node is smaller than or equal to the values stored below it.
 Unlike many other data structures we've considered, heaps focus
more on implementation than interface.
 (Bailey doesn't require the completeness property, but others do.)
 Here are some heaps of varying sizes
2 2 2 2 2 2
/ \  / \ / \ / \
3 7 3 3 7 3 7 3 7
/ \  / / \
9 7 8 9 9 7
 Here are some nonheaps. Can you tell why?
2 2
/ \ / \
3 7 9 7
/ / \ / \
9 8 8 9 7
 What good are heaps? They make it very easy to find the smallest element
in a group, which is something we've looked for in the past.
 How do we modify heaps? Through insertition and deletion.
 How do we do insertion while maintaining the two key properties
(near completeness and heap order)?
 It's clear where the heap expands ... it always expands at the end
of the lowest level (if that level is full, it is added to the beginning
of the next level).
 Putting the new element there may violate heap order, so we then need
to rearrange the tree. The process of rearranging is often called
percolating.
 Percolating is fairly simple: The present node is compared to its parent.
 If the present node is smaller (violating the heap property), we swap
the two and continue up the tree.
 Otherwise, we're done.
 When we do the swap, the subtree that contains the old parent is clearly
in heap order (the old parent was an ancestor to all the nodes in that
subtree, and therefore smaller). The present node is clearly smaller
than both of its new subtrees (it's smaller than the old parent, and
the old parent was smaller than everything else below it).
 Eventually, we stop (either because we no longer violate heap property
or because we reach the root).
 Here's an example, based on inserting the values 5, 6, 4, 4, 7, 2
 How much time does this take? Well, the depth of a complete binary
tree with n nodes is O(log_2(n)), and the algorithm may require swapping
up from leaf to root, so the running time is also O(log_2(n)).
 Can we also do deletion and still maintain the desired properties?
Certainly.
 After deleting the root, we move the rightmost leaf to the root.
This maintains completeness.
 It may, however, violate the heap property, so we must percolate
down.
 Percolating an element down is slightly more difficult, since there
are two possible subtrees to move to. As you might guess, you must
swap with the root of the smaller subtree and then continue within
that subtree.
 In some sense, deletion reverses the process of insertion (delete last
element in the heap vs. insert last element in heap; percolate down vs.
percolate up).
 Here's a sample case of removal of least element.
2 ? 5 3 3
/ \ / \ / \ / \ / \
3 4 to 3 4 to 3 4 to 5 4 to 4 4
/ \  / \  / \ / \ / \
6 4 5 6 4 5 6 4 6 4 6 5
 What's the running time? O(log_2(n)) again.
 We can use the heap structure to provide a fairly simple and quick
sorting algorithm. To sort a set of n elements,
 insert them into a heap, onebyone.
 remove them from the heap in order.
 What's the running time?
 There are n insertions. Each takes O(log_2(n)) time by our
prior analysis.
 There are n "delete least" operations. Each takes O(log_2(n))
by our prior analysis.
 Hence, we've developed yet another O(n*log_2(n)) sorting algorithm.
 It is not an inplace sorting algorithm, and does require O(n)
extra space for the heap.
 Most people implement heap sort with arraybased trees. Some even
define heap sort completely in terms of the array operations, and
forget the origins.