Fundamentals of Computer Science II (CSC-152 97F)

[News] [Basics] [Syllabus] [Outlines] [Assignments] [Examples] [Readings] [Bailey Docs]

# Outline of Class 47: Hashing

## Miscellaneous

• I'd like to remind you once again that Tuesday at 7:30 is the "I was a 70's math junkie" confessional in the forum coffeehouse.
• On Thursday, December 4, at 4:30 in Science 2413, Marc Chamberland will be talking about Dinosaur Hunting and the Mathematical Contest on Modeling.
• An advance warning: I will be gone between the afternoon of Friday, December 12 and the afternoon of Tuesday, December 16 so that I can help Michelle drive back from Maine. Get your questions about the final to me before then!
• I'm still not sure when I'll get grading done this week. I'll do my best to get grade summaries to you early next week.

## Hash Tables, Revisited

• Recall that hash tables are a particular implementation of dictionaries.
• Dictionaries are structures that support key/value pairs so that it is possible to
• insert a key/value pair
• lookup a value based on a key.
• Hash tables support these operations in expected O(1) time by storing the pairs in an array, with each pair placed in the cell corresponding to the hash value of the key of that pair.
• We saw that choice of hash function can have a significant impact on usefulness of hashing as a technique.
• We also saw that collisions are likely.
• Today, we'll discuss collisions, hashing in Java, and other functions appropriate for hash tables.

### Handling Collisions

• What do you do when you try to insert an object into a hash table, and there's already an object at the position?
• You can put multiple objects in the same position (by using a list, a tree, or even another hash table with a different hash function).
• You can find another place in the table for the object by adding another value to the hash value.
• You can grow the hash table.
• The first method requires us to provide a dynamic data structure for each cell of the table. It also means that the lookup cost also involves searching through the structure.
• The second method requires us to do more than one "step" when looking up a value (as we might need to repeatedly skip already filled spaces).
• The second method also assumes a limited number of objects will be placed in the table.
• The second makes deletion much more difficult. Why? Because when you delete an object, you need to consider whether one of the objects in other cells really belonged there.
• There are a number of ways you can compute the "next" space in the second method.
• You can offset by a fixed amount.
• You can use a second function to determine the offset (making the offset object-dependent).
• You can can use a sequence of hash functions.
• ...
• Despite all this joyous freedom, I'll admit that I prefer the first method.
• We may try an example to consider the underlying meaning of each. You may also get an assignment to do it.

### Hash Tables in Java

• Because hash tables are so common, Java has support for hash tables built in in a number of ways.
• There is a `Hashtable` class (in `java.util`).
• These are expandable hash tables; when you exceed a load factor, they automatically increase in size.
• One of the standard Object methods is `hashCode()`, which is used to compute the hash value of each object.
• When you create your own objects, you should override this method.
• I have no idea what the default is, but it's probably based on the address of the object.
• I've been told by one of the designers of Java that one of the few platform-dependent things in Java was that the initial size of the hash table wasn't specified, which could then effect the order in which the iterator corresponding to that hash table returned stuff.
• Is there anything interesting you note in the description of the `Hashtable` class?

### Other Methods

• Up to now, we've discussed hash tables that support only two operations: insert and lookup. Are there other operations we might want to support?
• At times, it may be useful to remove elements from the hash table.
• As I suggested earlier, this can significantly influence your choice of deletion technique to use.
• Removal is usually based on key.
• Removal could also be based on value (but with considerably slower running time).
• Since we've talked a lot about iterators, it seems reasonable to provide iterators for our hash tables. Note that there are two kinds of iterators appropriate for hash tables
• `Iterator keys()`, which gives all of the keys in the table.
• `Iterator values()`.
• What others might you add?
• Which ones appear in the standard Java `Hashtable` class?

Outlines: prev next

[News] [Basics] [Syllabus] [Outlines] [Assignments] [Examples] [Readings] [Bailey Docs]

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.