# Class 42: Hash Tables

Back to Dictionaries. On to Project Discussion.

Held Tuesday, April 20

Summary

• Implementing dictionaries with hash tables
• Java's `java.util.Hashtable` class
• Design decisions in hash tables
• Hash functions

Contents

Notes

• Your initial design notes are due today.
• Almost no one made it to the UML talk yesterday. We're currently deciding what to do about that.
• I've heard some reports that some of you are making group meetings. From my perspective, it's one thing to blow off class; the only people you affect are yourself and me. It's another thing altogether to blow off your classmates. There will be repercussions.

## A Quick Review

• We've been looking at dictionaries: structures that store values which are indexed by keys.
• Both values and keys are objects.
• We've seen three implementation strategies for dictionaries.
• Association lists
• Sorted arrays
• Search trees
• Association list have
• O(n) lookup and delete
• Sorted arrays have
• O(logn) lookup
• Search trees have
• O(depth) lookup, add, and delete
• The depth can range from O(logn) to O(n), depending on how the tree is built.
• Can we do better? (By using ``buckets'' as the terror trio suggested?)

## Hash Tables

• Surprisingly, if you're willing to sacrifice some space and increase your constant, it is possible to build a dictionary with O(1) lookup and addition.
• How? By using an array, and numbering your keys in such a way that
• all numbers are between 0 and array.length-1
• no two keys have the same number (or at least few have the same number).
• If there are no collisions (keys with the same number), the system is simple
• To add a value, determine the number corresponding to the key and put it in that place of the array. This is O(1+cost of finding that number).
• To lookup a value, determine the number corresponding to the key and look in the appropriate cell. This is O(1+cost of finding that number).
• Implementations of dictionaries using this strategy are called hash tables.
• The function used to convert an object to a number is the hash function.
• To better understand hash tables, we need to consider
• The hash functions we might develop.
• What to do about collisions.

### Hash Functions

• The goal in developing a hash function is to come up with a function that is unlikely to map two objects to the same position.
• Now, this isn't possible (particularly if we have more objects than positions).
• We'll discuss what to do about two objects mapping to the same position later.
• Hence, we sometimes accept a situation in which the hash function distributes the objects more or less uniformly.
• It is worth some experimentation to come up with such a function.
• In addition, we should consider the cost of computing the hash function. We'd like something that is relatively low cost (not just constant time, but not too many steps within that constant).
• We'd also like a function that does (or can) give us a relatively large range of numbers, so that we can get fewer collisions by increasing the size of the hash table.
• We might want to make the size of the table a parameter to the hash function.
• We might strive for a hash function that uses the range of positive integers, and mod it by the size of the table.
• What are some hash functions you might use for strings?
• Sum the ASCII values in the string
• N*first letter + M*second letter
• ...

### An Exercise

• Let's try an exercise. We'll come up with a hash value for everybody's first name. We'll then put things in the hash table.
• We'll use ``sum the values of the letters in the name''.
• We'll use the following table:
```A: 1   F: 6   K: 11  P: 16  U: 21  Z: 26
B: 2   G: 7   L: 12  Q: 17  V: 22
C: 3   H: 8   M: 13  R: 18  W: 23
D: 4   I: 9   N: 14  S: 19  X: 24
E: 5   J: 10  O: 15  T: 20  Y: 25
```
• For my name (Samuel), the hash value is 19 (S) + 1 (A) + 13 (M) + 21 (U) + 5 (E) + 12 (L) = 71
• For one son's name (William), the hash value is 23 (W) + 9 (I) + 12 (L) + 12 (L) + 9 (I) + 1 (A) + 13 (M) = 79
• For the other son's name (Jonathan), the hash value is 10 (J) + 15 (O) + 14 (N) + 1 (A) + 20 (T) + 8 (H) + 1 (A) + 14 (N) = 83

### Handling Collisions

• What do you do when you try to insert an object into a hash table, and there's already an object at the position?
• You can put multiple objects in the same position (by using a list, a tree, or even another hash table with a different hash function).
• You can find another place in the table for the object by adding another value to the hash value.
• You can expand the hash table.
• The first method requires us to provide a dynamic data structure for each cell of the table. It also means that the lookup cost also involves searching through the structure.
• The second method requires us to do more than one ``step'' when looking up a value (as we might need to repeatedly skip already filled spaces).
• The third method requires us to grow the table regularly, and at a significant cost (proportional to the number of elements in the table).
• The second method also assumes a limited number of objects will be placed in the table. when you delete an object, you need to consider whether one of the objects in other cells really belonged there.
• There are a number of ways you can compute the ``next'' space in the second method.
• You can offset by a fixed amount.
• You can use a second function to determine the offset (making the offset object-dependent).
• You can can use a sequence of hash functions.
• ...
• Despite all this joyous freedom, I'll admit that I prefer the first method.

## Removing Elements

• Our analysis of Hash Tables to date has been based on two simple operations: lookup and add.
• What happens if we want to remove elements? This can significantly complicate matters.
• If we've chosen the ``shift into a blank space'' technique for resolving collisions, what do we do when it comes time to remove elements?
• Do we shift everything back? If so, think about how far we may have to look.
• Do we leave the thing there as a blank? We might then then remove it later when it's convenient to do so.
• Do we do something totally different?
• Note also that there are different ways of specifying ``remove''. We might remove the element with a particular key. We might instead remove elements based on their value. The second is obviously a much slower operation than the first (unless we've developed a special way to handle that problem - see if you can think of one).

History

• Created Monday, January 11, 1999.
• Added short summary on Friday, January 22, 1999.
• Filled in some the details on Monday, April 19, 1999. Details were based, in part, on outline 46 and outline 47 of CS152 98S.
• Some redesign on Tuesday, April 20, 1999.

Back to Dictionaries. On to Project Discussion.

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.