Class 42: Hash Tables
Held Tuesday, April 20
- Implementing dictionaries with hash tables
- Design decisions in hash tables
- Hash functions
- Your initial design notes are due today.
- Almost no one made it to the UML talk yesterday. We're currently
deciding what to do about that.
- I've heard some reports that some of you are making group meetings.
From my perspective, it's one thing to blow off class; the only
people you affect are yourself and me. It's another thing altogether
to blow off your classmates. There will be repercussions.
- We've been looking at dictionaries: structures that store
values which are indexed by keys.
- Both values and keys are objects.
- We've seen three implementation strategies for dictionaries.
- Association lists
- Sorted arrays
- Search trees
- Association list have
- O(n) lookup and delete
- O(1) add
- Sorted arrays have
- O(logn) lookup
- O(n) add and delete
- Search trees have
- O(depth) lookup, add, and delete
- The depth can range from O(logn) to O(n), depending on how
the tree is built.
- Can we do better? (By using ``buckets'' as the terror trio
- Surprisingly, if you're willing to sacrifice some space and increase
your constant, it is possible to build a dictionary with O(1) lookup
- How? By using an array, and numbering your keys in such a way that
- all numbers are between 0 and array.length-1
- no two keys have the same number (or at least few have the same
- If there are no collisions (keys with the same number), the system is simple
- To add a value, determine the number corresponding to the key
and put it in that place of the array. This is O(1+cost of finding
- To lookup a value, determine the number corresponding to the key
and look in the appropriate cell. This is O(1+cost of finding that
- Implementations of dictionaries using this strategy are called
- The function used to convert an object to a number is the
- To better understand hash tables, we need to consider
- The hash functions we might develop.
- What to do about collisions.
- The goal in developing a hash function is to come up with a function
that is unlikely to map two objects to the same position.
- Now, this isn't possible (particularly if we have more objects than
- We'll discuss what to do about two objects mapping to
the same position later.
- Hence, we sometimes accept a situation in which the hash function
distributes the objects more or less uniformly.
- It is worth some experimentation to come up with such a function.
- In addition, we should consider the cost of computing the hash function.
We'd like something that is relatively low cost (not just constant time,
but not too many steps within that constant).
- We'd also like a function that does (or can) give us a relatively
large range of numbers, so that we can get fewer collisions by increasing
the size of the hash table.
- We might want to make the size of the table a parameter to the
- We might strive for a hash function that uses the range of positive
integers, and mod it by the size of the table.
- What are some hash functions you might use for strings?
- Sum the ASCII values in the string
- N*first letter + M*second letter
- What do you do when you try to insert an object into a hash table,
and there's already an object at the position?
- You can put multiple objects in the same position (by using
a list, a tree, or even another hash table with a different
- You can find another place in the table for the object by adding
another value to the hash value.
- You can expand the hash table.
- The first method requires us to provide a dynamic data structure
for each cell of the table. It also means that the lookup cost
also involves searching through the structure.
- The second method requires us to do more than one ``step'' when
looking up a value (as we might need to repeatedly skip
already filled spaces).
- The third method requires us to grow the table regularly, and at
a significant cost (proportional to the number of elements in the
- The second method also assumes a limited number of objects will
be placed in the table. when you delete an object, you need to
consider whether one of the objects in other cells really belonged there.
- There are a number of ways you can compute the ``next'' space in
the second method.
- You can offset by a fixed amount.
- You can use a second function to determine the offset (making
the offset object-dependent).
- You can can use a sequence of hash functions.
- Despite all this joyous freedom, I'll admit that I prefer the first
- Our analysis of Hash Tables to date has been based on two simple
operations: lookup and add.
- What happens if we want to remove elements? This can significantly
- If we've chosen the ``shift into a blank space'' technique for
resolving collisions, what do we
do when it comes time to remove elements?
- Do we shift everything back? If so, think about how far we may have
- Do we leave the thing there as a blank? We might then then remove
it later when it's convenient to do so.
- Do we do something totally different?
- Note also that there are different ways of specifying ``remove''. We
might remove the element with a particular key. We might instead remove
elements based on their value. The second is obviously a much slower
operation than the first (unless we've developed a special way to handle
that problem - see if you can think of one).
- Created Monday, January 11, 1999.
- Added short summary on Friday, January 22, 1999.
- Filled in some the details on Monday, April 19, 1999. Details were based,
in part, on
outline 46 and
outline 47 of
- Some redesign on Tuesday, April 20, 1999.