Held Friday, November 12, 1999
Overview
Today, we'll consider another (and somewhat different) implementation
of dictionaries: the hash table. Hash tables usually provide O(1)
get and put operations.
Notes
- Don't forget that I expect mostly-working code for your portions
of the project on Monday.
- I need a volunteer to collect Pizza orders (I'll pick it up and
pay for it once I have details).
Contents
Summary
- Possible visit from prospectives
- Hash tables: efficient dictionaries
Handouts
- Since we are employing dictionary-like objects in the mail project
(in particular, the Options class), it's appropriate to spend a
few more minutes considering that part.
- Presumably, options needs to be stored between invocations of the
email package.
- Where should those options be stored?
- And whose responsibility is it to remember the location?
- For our purposes, it will be easiest to store the options in a file.
The group and I have decided on one of the following formats:
Option-Name:Option-Type:Option-Value
or
Option-Name
Option-Type
Option-Value
- The group is willing to deal with the basic types: booleans, integers,
doubles, and Strings.
- However, they need the cooperation of other groups creating classes
that might be stored as options, such as the Ordering class
- You need to provide a reasonable
toString method
that generates a string with no embedded carriage returns.
- You need to provide a constructor that takes a string (as generated
by your
toString method) as its only parameter.
- Initially, they will only support a prespecified set of classes.
- I've written some code that they can use for using any class
that supports these two operations.
- Surprisingly, if you're willing to sacrifice some space and increase
your constant, it is possible to build a dictionary with O(1)
get and put.
- How? By using an array, and numbering your keys in such a way that
- all numbers are between 0 and array.length-1
- no two keys have the same number (or at least few have the same
number).
- If there are no collisions (keys with the same number), the system is simple
- To put a value, determine the number corresponding to the key
and put it in that place of the array. This is O(1+cost of
computing that number).
- To get a value, determine the number corresponding to the key
and look in the appropriate cell. This is O(1+cost of finding that
number).
- Implementations of dictionaries using this strategy are called
hash tables.
- The function used to convert an object to a number is the
hash function.
- To better understand hash tables, we need to consider
- The hash functions we might develop.
- What to do about collisions.
- The goal in developing a hash function is to come up with a function
that is unlikely to map two objects to the same position.
- Now, this isn't possible (particularly if we have more objects than
positions).
- We'll discuss what to do about two objects mapping to
the same position later.
- Hence, we sometimes accept a situation in which the hash function
distributes the objects more or less uniformly.
- It is worth some experimentation to come up with such a function.
- In addition, we should consider the cost of computing the hash function.
We'd like something that is relatively low cost (not just constant time,
but not too many steps within that constant).
- We'd also like a function that does (or can) give us a relatively
large range of numbers, so that we can get fewer collisions by increasing
the size of the hash table.
- We might want to make the size of the table a parameter to the
hash function.
- We might strive for a hash function that uses the range of positive
integers, and mod it by the size of the table.
- What are some hash functions you might use for strings?
- Sum the ASCII values in the string
- N*first letter + M*second letter
- ...
- What do you do when you try to insert an object into a hash table,
and there's already an object at the position (but with a different
key)?
- You can put multiple objects in the same position (by using
a list, a tree, or even another hash table with a different
hash function).
- You can find another place in the table for the object by
changing the hash value.
- You can expand the hash table.
- The first method requires us to provide a dynamic data structure
for each cell of the table. It also means that the cost for
getting an element also involves searching through the structure.
- The second method requires us to do more than one ``step'' when
looking up a value (as we might need to repeatedly skip
already filled spaces).
- The third method requires us to grow the table regularly, and at
a significant cost (proportional to the number of elements in the
table).
- The second method also assumes a limited number of objects will
be placed in the table. when you delete an object, you need to
consider whether one of the objects in other cells really belonged there.
- There are a number of ways you can compute the ``next'' space in
the second method.
- You can offset by a fixed amount.
- You can use a second function to determine the offset (making
the offset object-dependent).
- You can can use a sequence of hash functions.
- ...
- Despite all this joyous freedom, I'll admit that I prefer the first
method.
- Hash tables are so useful that Java includes them as a standard
library class,
java.util.Hashtable.
- Let's look over
the documentation
- Why are there three constructors?
- What methods are there other than
get and put?
- Where's the hash function?
- Our analysis of Hash Tables to date has been based on two simple
operations: get and put.
- What happens if we want to remove elements? This can significantly
complicate matters.
- If we've chosen the ``shift into a blank space'' technique for
resolving collisions, what do we
do when it comes time to remove elements?
- Do we shift everything back? If so, think about how far we may have
to look.
- Do we leave the thing there as a blank? We might then then remove
it later when it's convenient to do so.
- Do we do something totally different?
- Note also that there are different ways of specifying ``remove''. We
might remove the element with a particular key. We might instead remove
elements based on their value. The second is obviously a much slower
operation than the first (unless we've developed a special way to handle
that problem - see if you can think of one).
Tuesday, 10 August 1999
- Created as a blank outline.
Wednesday, 10 November 1999
Friday, 12 November 1999
- Added the section on the Options class.
- Added the section on hash tables in Java.
- Some minor cosmetic changes.
Back to Search Trees.
On to Introduction to Trees.