Algorithms and OOD (CSC 207 2014F) : Assignments

Assignment 9: Skip Lists


Due: 10:30 p.m., Wednesday, 12 November 2014

Summary: In this assignment, you will implement a randomized data structure called the “skip list” and experimentally analyze the efficiency of that data structure.

Purposes: To give you further experience designing and implementing data structures. To give you experience working with “the literature”. To help you think differently about data structure design. To explore a few patterns of design.

Collaboration: You may choose to work alone or with an assigned partner. You may discuss this assignment with anyone you like provided you credit those didscussions.

Submitting: Please put all of your work in a GitHub repository named csc207-skip-lists. Email the address of that repository to grader-207-01@cs.grinnell.edu. Please use a subject of “CSC207 2014F Assignment 9 (Your Name)”.

Warning: So that this assignment is a learning experience for everyone, we may spend class time publicly critiquing your work.

Preparation

Fork the repository at https://github.com/Grinnell-CSC207/skip-lists-assignment. Rename it to csc207-skip-lists. Skim through the files other than those that include “SkipList” in their name to understand what code has been provided.

Background: About Skip Lists

You've learned about a variety of linked structures. A nice aspect of linked structures is that it's easy and fast to insert something in the middle of the structure or to remove something from the middle of the structure. However, getting to the right place in the structure is slow.

Skip lists solve the problem by having multiple forward links from each node, with the links at level 0 stepping through every element, the links at level 1 skipping about 1/2 the elements, the links at level 2 skipping about 3/4 the elements, and so on and so forth. Skip lists take advantage of a random number generator to help ensure that we get the appropriate distribution of values. Most of the time, skip lists require O(logn) steps to find an element, O(logn) steps to insert an element, and O(logn) steps to remove an element.

You can learn more about the design of skip lists from the following article. (And, yes, you should read that article.)

William Pugh. 1990. Skip lists: a probabilistic alternative to balanced trees. Commun. ACM 33, 6 (June 1990), 668-676. DOI=10.1145/78973.78977 http://doi.acm.org/10.1145/78973.78977.

Skip Lists are useful in a variety of situations. One possible use is for sets, collections of values in which you want to be able to expand and shrink the collection, and to be able to find values in the collection. Skip lists are also useful for “sorted lists”, lists in which you can iterate from smallest to largest.

The Assignment

Part One: Implement Basic Skip Lists

Start this assignment by implementing the following class, which follows the basic structure of skip lists in that it focuses on insertion, removal, searching, and iteration.

/**
 * A randomized implementation of sorted lists.  
 */
public class SkipList<T extends Comparable<T>>
    implements SortedList<T>
{
  . . .
} // class SkipList<T>

/**
 * Sorted lists - dynamic collections that support insertion, removal,
 * search, and iteration from smallest to largest.
 */
public interface SortedList<T extends Comparable<T>>
    extends Iterable<T>, SimpleSet<T>, SemiIndexed<T>
{
  /**
   * Return an iterator that visits the elements of the list from
   * smallest to largest.
   */
  public Iterator<T> iterator();
} // interface SortedList<T>

The “<T extends Comparable<T>>” means that whatever type we use in skip lists needs to be comparable to itself (using the single-parameter compareTo method. So, we can put strings and integers in skip lists, but probably not people.

We're using our own SimpleSet interface to indicate that these objects implement methods to add elements, remove elements, and check membership. Here's the interface.

/**
 * Simple sets of values.
 */
public interface SimpleSet<T>
{
  /**
   * Determine if the set contains a particular value.
   */
  public boolean contains(T val);

  /**
   * Add a value to the set.
   *
   * @post contains(val)
   * @post For all v != val, if contains(v) held before the call
   *   to add, contains(v) continues to hold.
   * @post For all v != val, if contains(v) did not hold before
   *   the call, contains(v) will not hold after the call.
   */
  public void add(T val);

  /**
   * Remove an element from the set.
   *
   * @post !contains(val)
   * @post For all v != val, if contains(v) held before the call
   *   to remove, contains(v) continues to hold.
   * @post For all v != val, if contains(v) did not hold before
   *   the call, contains(v) will not hold after the call.
   */
  public void remove(T val);
} // interface SimpleSet<T>

You may find it useful to write loop invariants for the three methods, but you are not required to do so.

You'll note that SortedList also implements SemiIndexed. For this part of the assignment, you need not worry about indexing; leave those methods as stubs.

Part Two: Implement Indexed Skip Lists

This problem is optional.

When we were comparing the relative benefits of arrays and linked lists, we observed that it would be useful to have a data structure that is dynamic, like linked lists, but provides relatively fast indexed access, like arrays. Skip lists seem to have some benefits over regular linked lists in that we can more easily find an element (although skip lists do require that we keep the elements in order, so they lose the “user ordered” aspect of regular lists).

Can we also provide fast indexed access to the elements of a skip list? In this case, “fast” will probably mean O(logn) rather than O(1). Presumably, that should be possible, but it may take some careful thought.

We can't store the index in each node as that would mean doing O(n) updates after each insertion. However, we can add a field to the structure that stores the total number of values in the structure, and we can add a set of fields to each node to keep track of how many nodes fall between that node and the next node at each level. We update each "before next" field as appropriate. (That is, you increment the field for level k if you insert a node between this node and the next node on level k, you decrement the field by 1 if you delete a node between the current node and the next node on level k, and you update the field when you delete the next node on level k.)

How do you find a node with a particular index? You step along at the top level until you know that you're in the right range. For example, if we are looking for the 15th node in a 5-level skip list (levels 0..4) and there are 10 values between the start of the list and the first node at level 4, then we know that the node with index 15 falls after the first node of level 4. If there are 20 nodes between the first node at level 4 and the second node at level 4, then the node with index 15 falls between those two nodes, so we drop down to level 3. We follow a similar process for level 3, then level 2, then level 1, then level 0.

Note that this approach only works if we increment only when we insert a node after the current node and decrement only when we remove a node after the current node. You'll need to think about how to ensure that the count of intermediate values is correct if insertion can fail (e.g., if the value is already in the skip list, then insertion seems pointless), if removal can fail (e.g., if the value is not in the skip list), or if removal can remove more than 1 element (e.g., if you allow insertion to insert duplicate copies, the remove method has to remove all copies).

a. In the paragraphs above, we discuss a rough algorithm for finding the node with a particular index. Create visual and textual invariants for that algorithm.

b. Using the algorithm and invariants, implement the get(int i) and length() methods described below.

/**
 * Objects that index elements and allow you to retrieve the ith element.
 */
public interface SemiIndexed<T>
{
  /**
   * Get the element at index i.
   *
   * @throws IndexOutOfBoundsException
   *   if the index is out of range (index < 0 || index >= length)
   */
  public T get(int i);

  /**
   * Determine the number of elements in the collection.
   */
  public int length();
} // interface SemiIndexed 

Part Three: Test Skip Lists

You'll see that we've written a randomized test for skip lists. Randomized tests are good, but it's also useful to have some carefully designed predicatble tests. Add at least six useful tests to SortedListTest.java.

Part Four: Experimentally Analyze Skip Lists

You'll note that we have a simple analysis package for implementations of SortedList. Use that package, as well as any extensions you choose to make to that package, to compare the efficiency of your skip lists and SortedArrayLists. What program behavior would lead you to use SkipListss rather than SortedArrayLists and vice-versa?

Part Five: Reflect

As part of this assignment, you likely read a number of the code files that are part of the project. Pick three interesting things you learned in reading those files, summarize them in a way that one of your classmates could understand them, and provide examples (at lesat one per thing) of other ways in which you might use that approach.

Citations

This assignment is based on a more general assignment on algorithm design and analysis that I assigned in a previous semester.

As the in-text citation suggests, Skip Lists were designed by Bill Pugh.