CSC 301.01, Class 32: Sets and unionfind
Overview
 Preliminaries
 Notes and news
 Upcoming work
 Extra credit
 Questions
 A set ADT
 Data structure design, revisited
 The unionfind structure
 Analyzing union find
 Improving unionfind
News / Etc.
 I expect that most upperlevel CS classes will fill this semester. You help the department better consider stress points and how to address them if you register earlier, rather than later.
 Prof. French noted that Kruskal’s algorithm serves as second purpose: It allows you to count the number of conntected components in a graph.
 The due date of Exam 2 has been pushed to Friday.
Upcoming work
 Exam 2 due Friday at 4:00 p.m.
 Read Skiena 6.16.4 for Wednesday (or Friday).
Extra Credit (Academic/Artistic)
 CS Table, Tomorrow
 Convocation Thursday at 11 a.m. in JRC 101. “Work’s Provocative Future: Which Graduates Will Thrive?”
Extra credit (Peer)
 PubFree Quiz: Math/Stats, Wednesday
 Swim meet, Saturday
 Orchestra Saturday at 2pm in Sebring Lewis
Extra Credit (Misc)
Other good things
 “The First Time I Walked on the Moon”. Thursday 7:30, Friday 7:30, Saturday 2:00, Saturday 7:30, Sunday 2:00.
 Fresh Flutes Thursday
 Voice recitals Friday at 4:15 (Henderson) and 7:00 (Manuel)
 Women’s Basketball vs. Emmaus Wed at 5:00 p.m.
 Men’s Basketball vs. Emmaus Wed at 7:00 p.m.
Questions
 How do I do part c of problem 3? (That is, how do I determine the final value based on the initial values, given that things are random?)
 A good invariant will help.
 Should we prove the invarant correct
 A good argument suffices.
 Does pseudocode suffice for the online algorithm?
 Psuedocode is fine. But you do need to argue that it’s O(n).
 A description is also fine. But you do need to argue that it’s O(n).
 For problem 4: Are v and u already in the graph?
 Maybe. At least one is. You may suppose wlog that v is in the graph.
 “wlog” = “without loss of generality”
 Does that mean we need to check if it’s already in the graph?
 It seems so.
 Can we use existing algorithms and their running time?
 Certainly.
 Can we use the Web site that builds redblack trees for you?
 Yes, if you cite it.
 But you might want to build them yourself, first.
 Do we have to prove that our solution to #4 is correct?
 No.
 Do we have an inclass final?
 Yes.
 I have two inclass finals that day. Will you find another time?
 Yes.
 When is our final?
 Wednesday at 9am!
 Next Wednesday?
 No. Wednesday of finals week.
A set ADT
Suppose we have a program that requires us to deal with sets.

What goes in the set interface? (Alternately: What operations do you want as you deal with sets?)
set.add!(value)
orset.add(value) => set
set.remove!(value)
orset.remove(value) => set
set.contains?(value) => boolean
set.union!(set)
orset.union(set) => set
set.intersect!(set)
orset.intersect(set) => set
set.subtract!(set)
orset.subtract(set) => set
set.disjunction!(set)
orset.disjunction(set) => set
set.filter(proc) => set
set.subsetOf?(set) => boolean
set.reduce(binproc) => val
set.map(proc) => set
set.product(set) => set
set.iterate(proc)
orset.iterator() => Iterator
.
Data structure design, revisited
 Sometimes it’s better to focus on the particular things you need for a particular task, rather than the “kitchen sink” approach.
 We will look at a small set of operations and consider how to implement them efficiently.
SetOfSets.setofone(val)
SetOfSets.union!(set,set)
SetOfSets.findset(val)
 Assume that each value can only be in one set.
 Note: We will use
findset
primarily to determine if two values are in the same set.
What strategies might you use?
 “hash tables”
 “trees”
Examples
 Values a, b, c, d, e,f are all in separate sets.
 We union the {a} and {b} sets, giving us {a,b}
 We union the {a,b} and {c} sets, giving us {a,b,c}
 We union the {d} and {e} sets, giving us {d,e}
 We union the {a,b,c} and {d,e} sets, giving us {a,b,c,d,e}
 We union the {a,b,c,d,e} and {f} sets, giving us {a,b,c,d,e,f}
The unionfind structure
 Represent each set as a directed graph: An upsidedown tree. Children point to parents.
 The root is the representative element (what we get from findset)
 How do we keep the tree shallow?
 When unioning two trees, have the root of the smaller one point to the root of the bigger one.
 Our height will never be bigger than log(n).
 So we now have union is O(1) and find is O(logn)
 union could be O(logn) if we need to find the root
 Keep the size in each tree so that we know which is smaller
 Note: This is really helpful for Kruskal’s, which repeatedly asks if an edge joins two different components
Analyzing union find
Improving unionfind
 Whenver you run find, update the links from every element along the way.