# Class 10: Searching

Back to Analyzing Algorithms. On to Sorting.

Held Tuesday, February 8, 2000

Overview

Today we consider the problem of searching: given a collection of values with ``keys'', find an item with a designated key.

Question 10 for today's class: Explain unambiguously how to look up a phone number in a phone book given the person's name. Also describe how to look up a person's name in the phone book given their phone number.

Question 11 for Wednesday's class: Describe how to put a pile of books in alphabetical order by author.

Notes

• On Friday, class will be in 2424 rather than 2417 (another class needs to use this lab)
• As some of you have noted, I don't always respond to email promptly. For example, since almost all of your daily email messages have a title of "question"; I let such email slip to the bottom of my priority list. If you have email that needs a response, please give it a title like HELP!

Contents

Summary

• Data structures
• The problem of searching
• Sequential search
• Binary search
• Handouts:
• Your answers to question 10: Explain unambiguously how to look up a phone number in a phone book given the person's name. Also describe how to look up a person's name in the phone book given their phone number.

## Detour: Data Structures and Arrays

• As you will see in some of our algorithms, many algorithms can perform more efficiently or more clearly if we organize the input data in a logical way.
• We call the structures used to organize data data structures.
• In our algorithm to find matches in a list, it was convenient to number the input values.
• Given a number, i, we can find the corresponding input value, ai.
• In effect, the structure maps i (called the index) to the corresopnding value.
• We call structures that contain values indexed by numbers arrays.
• Often, we write A[i] for the ith element of array A.
• A second useful structure is the record. Records combine multiple values, such as names and phone numbers.
• In your algorithms, some of you wondered how we'd keep the two together. Records are how computers do it.

## The Problem of Searching

• Today's question was intended to reveal a common problem that we often solve by computers, that of searching.
• Given a collection of values, each of which has a special ``key'', find an object with a designated key.
• The two variations of the question were to help you think about how the organization of data might affect the running time of the algorithm.
• In one case (using the phone number as the key), the data were not organized.
• In the other (using the name as the key), the data were organized in alphabetical order.
• We'll consider each case in turn, and also look at the running time in Big-O notation.

### Sequential Search

• Suppose our data are not organized. How do we search?
• Most of you came up with some variant of the following algorithm:
To look up the name for phone number p in phonebook B
1. For each record, r, in B
a. If r.number equals p then
ii. Don't look at other records
2. If answer has a value then
a. Return that value
3. Otherwise
a. Indicate that no one has that phone number

• What is the running time? If there are n entries, we potentially look at all of them. Each time, we use a constant number of steps (no more than three or four), so this is an O(n) algorithm.

### Binary Search

• Can we do better if the data are organized? Most of you felt that we could.
• Most of you quickly realized that we can ignore large portions of the data.
• Here's one formulation
boolean binarySearch(Element element, Collection stuff)
While there are usable elements in stuff
Identify mid, the middle usable element
If element equals mid then
Return true
Else if element < mid then
Throw away all elements greater than mid
Else
Throw away all elements less than mid

• Here's another formulation
boolean binarySearch(Element element, Collection stuff)
If stuff is empty then return false
Identify mid, the middle element of stuff
If element equals mid then
Return true
Else if element < mid then
Let smaller be the elements of stuff smaller than mid
Return binarySearch(element, smaller)
Else // element > mid
Let bigger be the elements of stuff larger than mid
Return binarySearch(element, bigger)

• This is an example of a technique known as divide and conquer. By breaking the problem up, we often come up with a better algorithm.
• So, what's the running time? Each time, we throw away about half the elements. So, it's ``the number of times it takes to go from n to 1 by splitting.''
• There is a formal mathematical notation for this value, log2n.
• Note that n can grow fairly quickly and log2n still grows slowly.

## History

Saturday, 22 January 2000

• Created as a blank outline.

Tuesday, 8 February 2000

• Filled in the details.

Back to Analyzing Algorithms. On to Sorting.

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.