Outline of Class 46: Dictionaries and Hash Tables
Held: Friday, April 24, 1998
- See my quick notes on grading of assignments.
- We have an exam next Tuesday. A review
sheet is available. Any questions on that?
- Don't forget to sign up for the departmental picnic lunch. There's a
sign-up sheet outside my door. Only sign up once!
We've talked somewhat abaout dictionaries in
a previous class. Recall that dictionaries
are a lot like arrays that can be indexed by objects instead of numbers.
- How can we implement dictionaries?
- We could define a
Pair object that joins a
key and value.
- We could make an unordered array, vector, or list of those objects.
- To insert an element, we put it at the beginning or end of the array,
vector or list. This
is generally O(1), depending on the structure we use.
- To look up an element, we step through the pairs one by one, comparing
the key we are searching for to the key in each pair. This is
O(n), where n is the number of elements.
- We could make an ordered array of those objects (an ordered array
being one that is always sorted).
- To insert an element, we put it at the appropriate place, shifting
as necessary. This is O(n) and is again limited by the size of
the array.
- For lookup an element, we do binary search on the sorted array.
- Another strategy for building dictionaries is to use
search trees. These are binary trees that are ordered in
a particular way. Everything to the left of a node is smaller than
or equal to the node. Everything to the right is greater than or
equal to the node.
- To find an element in a search tree, you compare the object you're
looking for to the current node. If the object you're looking for
is smaller, you follow the left branch. If the object you're looking
for is bigger, choose the right branch.
- We can use a similar process to insert an element.
- If we can make the tree relatively balanced, this gives an
O(log_2(n)) algorithm. Unfortunately, keeping the tree balanced is
difficult. That is a topic for another course.
- Surprisingly, if you're willing to sacrifice some space and increase
your constant, it is possible to build an expected O(1) dictionary.
- How? By using an array, and numbering your keys in such a way that
- all numbers are between 0 and array.length-1
- no two keys have the same number (or at least few have the same
number).
- If there are no collisions, the system is simple
- To insert a value, determine the number corresponding to the key
and put it in that place of the array. This is O(1+cost of finding
that number).
- To lookup a value, determine the number corresponding to the key
and look in the appropriate cell. This is O(1+cost of finding that
number).
- Implementations of dictionaries using this strategy are called
hash tables.
- The function used to convert an object to a number is the
hash function.
- To better understand hash tables, we need to consider
- The hash functions we might develop.
- What to do about collisions.
- The goal in developing a hash function is to come up with a function
that is unlikely to map two objects to the same position.
- Now, this isn't possible (particularly if we have more objects than
positions).
- We'll discuss what to do about two objects mapping to
the same position later.
- Hence, we sometimes accept a situation in which the hash function
distributes the objects more or less uniformly.
- It is worth some experimentation to come up with such a function.
- In addition, we should consider the cost of computing the hash function.
We'd like something that is relatively low cost (not just constant time,
but not too many steps within that constant).
- We'd also like a function that does (or can) give us a relatively
large range of numbers, so that we can get fewer collisionss by increasing
the size of the hash table.
- We might want to make the size of the table a parameter to the
hash function.
- We might strive for a hash function that uses the range of positive
integers, and mod it by the size of the table.