# CSC 301.01, Class 09: Hash Tables

*Overview*

- Preliminaries
- Notes and news
- Upcoming work
- Extra credit
- Questions

- Review of hash tables.
- Hash functions, revisited.
- Other uses of hash tables and hash functions.

### News / Etc.

- Today’s class may be sketchier than normal because I lost thirty minutes of prep/reflection time due to fire alarms.

### Upcoming work

- Assignment 3, due 10:30 p.m. TONIGHT
- Code via email
- Printed under door

- Assignment 4, due 10:30 p.m. Next Wednesday
- Implement hash tables in Scheme.
- Reflect on how to implement sets.

### Extra credit (Academic)

- CS Extras, Thursday, Klinge Map Group on Cauldron
- CS Table, Tuesday, ???

### Extra credit (Peer)

*???*

### Extra Credit (Misc)

*???*

### Other good things

### Questions

- Loop invariants help you a whole lot in writing partition correctly. What
- is a loop invariant.
- It is a way of thinking about the state of the system.
- Usually with arrays.
- If the invariant holds at the start of one iteration of the loop, it still holds at the end of that iteration.
- It provides useful information about what our loop accomplishes.
- What should we track?
- How about array references.
- Can we assume that n is a power of whatever in the inductive proofs?
- If you must.
- But you could also try doing so without that assumption.
- What’s the difference between strong and weak induction.
- Weak: If it works for n-1, it should work for n
- Strong: If it works for <= n-1, it should work for n
- What do you want us to do for part b?
- Figure out which of the three patterns is at play. Explain why. Use that pattern.
- Will you force us to argue regularity?
- It would be nice, but no.

## Review of hash tables

*What are the key ideas of hash tables?*

- We have pairs of keys and values (dictionaries). We want to look up values by keys.
- We use a hash function that takes keys and returns an integer and use integer to index into an array where we store key/value pairs.
- It gives expected constant time lookup of values by keys.

*What are some design decisions we make in implementing hash tables?*

- Sometimes two keys end up with the same place in the array, particulary
when we mod by the size of the array.
- We can put a linked list at that point in the array (chaining/bucketing)
- We can look in a nearby cell (probing)

- To keep the buckets small, we generally grow the underlying array when it reaches some percent of capacity.
- It is important to have a good hash function, one that distributes keys fairly uniformly across the number space.
- The size of the underlying array may be important.

## Hash functions, revisited

*What does the following hash function do?*

[Borrowed from Skienna p. 89]

```
#define alpha SOME_LARGE_PRIME
int hash(char *s)
{
int len = strlen(s);
int code = 0;
for (int i = 0; i < len; i++)
{
code += s[i] * expt(alpha, len-(i+1))
} // for
return code;
} // hash
```

*Suppose we have a really long string. What the difference between
hash(substring(str, 0, k)) and hash(substring(str, 1, k+1))?
E.g., hash(substring(str, 0, 6) vs hash(substring(str, 1, 7))*

- subtract s0*alpha^5
- multiply by alpha
- add s6

## Other uses of hash tables and hash functions

*Ideas stolen from Skiena*

How could you use hash functions or tables to help you …

- Detect plagiarism
- Determine if string
`a`

is a substring of string`b`

?