|The Tao of Computing:|
|A Down-to-Earth Approach to Computer Fluency|
|by Henry M. Walker||Jones and Bartlett Publishers|
Although computers are often considered to work very quickly, only some algorithms proceed rapidly; others take much longer. This laboratory exercise provides an intuitive framework for the consideration of algorithm effectiveness, including the amount of time and memory required for an algorithm. When algorithms are applied to successively larger data sets, some algorithms may scale up nicely while others require more time than may be feasible. Also, when several algorithms are available to solve a problem, it is natural to wonder if one solution were better than another. Altogether, we might use many criteria to evaluate such solutions, including:
Accuracy: Of course, any program should produce correct answers. (If we were satisfied with wrong results, it is trivial to produce many such answers very quickly.) However, it may not be immediately clear just how accurate results should be in a specific instance. For example, one algorithm may be simple to program and may run quickly, but it only may be accurate to 5 decimal places. A second algorithm may be more complex and much slower, but may give 15 place accuracy. If 5-place accuracy is adequate for a specific application, the first algorithm is the better choice. However, if 10 or 12 place accuracy is required, the slower algorithm must be used.
Efficiency: Efficiency can be measured in many ways: programmer time, algorithm execution time, memory used, and so on. If a program is to be used once, then programmer time may be a major consideration, and a simple algorithm might be preferred. If a program is to be used many times, however, then it may be worth spending more development time with a complex algorithm, so the procedure will run very quickly.
Use of Memory: One algorithm may require more computer memory in which to execute. If space is a scarce resource, then the amount of space an algorithm requires should be taken into consideration when comparing algorithms.
Ease of Modification: It is common practice to modify old programs to solve new problems. A very obscure algorithm that is difficult to understand, therefore, is usually less desirable than one which can be easily read and modified.
For this laboratory exercise, we focus on algorithm execution time.
In determining algorithm execution time, we may proceed in several ways:
Each of these approaches has advantages, but each also has drawbacks. Execution times on a specific machine normally depend upon details of the machine and on the specific data used. Timings may vary from data set to data set and from machine to machine, so experiments from one machine and one data set may not be very helpful in general.
The analysis of instructions may take into account the nature of the data -- for example, one might consider what happens in a worst case. Also, such analysis commonly is based on the size of the data being processed -- the number of items or how large or small the data are. This is sometimes called a microanalysis of program execution. Once again, however, the specific instructions may vary from machine to machine, and detailed conclusions from one machine may not apply to another.
A high-level analysis may identify types of activities performed, without considering exact timings of instructions. This is sometimes called a macroanalysis of program execution. This can give a helpful overall assessment of an algorithm, based on the size of the data. However, such an analysis cannot show fine variations among algorithms or machines.
For many purposes, it turns out than a high-level analysis provides adequate information to compare algorithms. For the most part, we follow that approach here.
In class, we have discussed both the linear search and the binary search.
Write a paragraph or two description of how a linear search works.
Write a paragraph or two description of how a binary search works.
In class, we have discussed how efficient each of these algorithms might be. Now we take a more concrete, experimental approach.
We will search for random items in a sequence of even integers:
Technically, the structure holding such a collection of data is called an array. With the array data just described, a search should be successful if we are looking for an even, nonnegative integer that is not too large. The search should fail if the desired integer is negative, odd, or too large. In our experiment, we will pick 20 integers at random as candidates for the search. We pick the first 10 integers, so the search will be successful, and the last 10 so the search will fail.
The program searchTest performs both a linear search and a binary search for randomly-selected items, and records the time required for this work. Due to the speed of the computer and the limitations of the clocking mechanism available, we repeat each experiment 100,000 times. This means that times are magnified, so we can easily see differences.
The mechanics of running these search experiments are as follows:
Log onto a computer in MathLAN, and open a terminal window.
In the terminal window, type the command:
The program itself can be run with the command
The program will ask you to enter the size of the sequence to be searched. Suggested values might be selected from the range from 1000 to 50,000, although you can choose smaller or larger sizes for the array. The program then will report the time required to search for each of 20 values in the array, both for the linear search and for the binary search.
Note: After you run the above command the first time, you can use the upward-arrow key at your keyboard to retrieve the same command again. After hitting the upward-arrow key to get the desired command, hit Return or Enter to run the program again.
In this part of the lab, you are to gather experimental data regarding the search times for various size arrays for both the linear and binary searches. Since the searchTest program performs 20 trials, you will need to combine the results in some way. It is suggested that you average the results for the 10 trials that succeed and compute a second average for the 10 trials that fail.
Run searchTest for a variety of array sizes between 1000 and 50,000.
Use a spreadsheet to record the times required for each trial and for each searching method. Use separate columns to record times for the sample size, the search time for a trial for the linear search, the search time for the binary search. Maintain separate statistics for trials in which the item was found and for trials was not found.
Use the spreadsheet to tabulate the average times for each algorithm for each array size -- with one time for when an item is found and a second time for when the item is not found. Organize this work in a separate part of the spreadsheet:
Use the spreadsheet to plot the sizes and times for the four tables as separate graphs. The horizontal axis on the graph paper should indicate the size of the array, and the vertical axis should indicate time. You should conduct sufficient experiments, so that a fairly consistent pattern emerges.
Describe (in words) the nature of the graphs you have observed.
Your experimental results, of course, relied on a particular program running on a specific machine. Actual numbers are likely to vary from one computer and program to another. However, we still would anticipate the same general patterns -- even if the numbers differ.
The following table gives experimental measurements for the average time required for a linear search for several search trials on another machine with another program.
|Array||Average Time||Average Time If|
|Size||If Value Found||Value Not Found|
Estimate the time for an average linear search of arrays of size 1500, 3000, 8000, and 16000. Briefly justify your answers.
Continuing this experiment, the following table gives experimental measurements for the average time required for a binary search for several search trials.
|Array||Average Time||Average Time If|
|Size||If Value Found||Value Not Found|
Estimate the time for an average binary search of arrays of size 1500, 3000, 8000, and 16000. Briefly justify your answers.
created 31 December 2003
last revised 27 February 2006