# Lab L3: Sorting Algorithms

In this laboratory session, you will investigate a number of related algorithms. Initially, you will consider some algorithms used to find the smaller elements of an array. You will then consider ways to enhance these algorithms in order to sort arrays (place the elements in order).

The goals of this laboratory session are to:

• investigate a number of key algorithms, particularly important sorting algorithms;
• consider the effects of the constants in running times;
• consider the effects of more significant improvements to algorithm design;
• detour into the generation of random sequences; and
• further your skills at analyzing algorithms.

Your instructor will tell you which of the proposed experiments you are to perform.

Prerequisite skills:

• Arrays
• Loops
• Recursion
• Timing

Required files:

• `Counter.java`
• `SimpleInput.java`
• `SimpleOutput.java`
• `SortableIntSeq.java`
• `SortTester.java`

## Discussion

If you've done any reading about algorithm analysis, you've learned that computer scientists tend to analyze algorithms in terms of an upper bound on their expected or worst-case running times, and that they express those running times in terms of an unknown constant times a function of the size of the input. Such running times are written O(the function) and pronounced ``big O of the function'' or ``order the function''.

For example, if we were deleting the smallest element of an array, we might say that it takes O(n) time, where n is the number of elements in the array. Why? There may be cases in which it takes less. For example, if we know the smallest element is at the end of the array, then we can probably delete it in one step. But how do we determine that it's the smallest element? Usually by comparing it to all the other elements. In addition, if we don't want to leave gaps in the array, we may need to shift all the elements left one space. If we end up deleting the leftmost element, that's another n ``steps''.

When we say that a method requires O(f(n)) steps, we mean that there is some constant, `c` such that the number of steps for an input of size n is never more than (but sometimes less than) c*f(n), no matter how we count as steps. (The choice of c may depend on our definition of ``step''.) Different methods with the same big-O running time may have very different constants. At the same time, choice of a different algorithm with a ``smaller'' function can have a much bigger impact on the actual running time of an algorithm, even when the smaller function has a larger constant.

In the following discussion and subsequent experiments, we will investigate these issues in more depth.

### Finding the smaller elements of a sequence

Suppose you were asked to find the smallest element of a sequence. You might do this be assuming that the first element is the smallest and then stepping through the remaining elements, updating your estimate of the smallest whenever you found a smaller element. In pseudocode,

```guess = the first element of the sequence
for each remaining element of the sequence, e
if (e < guess) then
guess = e;
end if
end for
```

Using arrays in Java, we might express this as

```  /**
* Compute the smallest element in the sequence.
*/
public int smallest() {
// Our guess as to the smallest element
int guess = this.elements[0];
// A counter variable
int i;
// Look through all subsequent elements
for (i = 1; i < this.elements.length; ++i) {
// If the element is smaller than our guess, then
//   update the guess
if (this.elements[i] < guess) {
guess = this.elements[i];
} // if
} // for
// That's it, we're done
return guess;
} // smallest()
```

As a variation, we might write an `indexOfSmallest` that returns the index of the smallest element in a subsequence? Why would we want such a method? As you've seen, whenever your write a method for a sequence, it is helpful to write a similar method for a subsequence. Why return an index rather than the actual value? Because it will be helpful for the subsequent experiments.

If we decided to generalize this, we might change it to an

```  /**
* Compute the index of the smallest element in the subsequence
* given by lower bound lb and upper bound ub.
*/
public int indexOfSmallest(int lb, int ub) {
// Make sure the upper bound and lower bound are reasonable.
if (lb < 0) {
lb = 0;
}
if (ub >= this.elements.length) {
ub = this.elements.length - 1;
}
// Our guess as to the index of the smallest element
int guess = lb;
// A counter variable
int i;
// Look through all subsequent elements
for (i = lb + 1; i <= ub; ++i) {
// If the element is smaller than our guess, then
//   update the guess
if (this.elements[i] < this.elements[guess]) {
guess = i;
} // if
} // for
// That's it, we're done
return guess;
} // indexOfSmallest()
```

In experiment L3.1 you will investigate these two methods in a little more depth. You will also begin to examine the three classes you will use in the remaining experiments.

### Finding groups of smallest entries

Suppose you were instead asked to find the two smallest entries in a sequence. One question you might ask would be ``How should I return two values?'' So that we need not concern ourselves with that question, let us instead try to move the two smallest entries to the first two positions of the array.

One approach would be to look through the sequence to find the smallest entry and move it to the front of the sequence, then look through all but the first element of the modified sequence for the next smallest element. Using the `indexOfSmallest` method described above, we might phrase this as

```  /**
* Put the two smallest elements of the sequence at the beginning
* of the sequence.  The sequence must have at least two elements.
*/
public void twoSmallest() {
// Swap the initial element with the smallest
swap(0, indexOfSmallest(0, this.length()-1));
// Swap the next element with the smallest remaining
swap(1, indexOfSmallest(1, this.length()-1));
} // twoSmallest()
```

As you might guess, `swap(i,j)` swaps the elements at positions i and j in the sequence.

Now, how might we put the five smallest elements in a sequence of 50 elements at the front of that sequence? One approach would be to comb through the sequence to find the smallest entry and move it to the front of the sequence. Next, you could comb through the 49 entries following this newly positioned entry to find the next smallest entry and move it to the position following the smallest entry. By repeating this process three more times, each time finding the smallest entry remaining in the sequence and placing it just behind the entry found in the previous pass, you will have placed the five smallest entries at the beginning of the sequence in increasing order of size.

When turning this narrative into code, it is appropriate to use a loop (since the five pieces are quite similar). For example,

```  /**
* Put the five smallest elements of the array at the beginning of
* the array (naive method).  The sequence should have at least
* five elements.
*/
public void fiveSmallest() {
int i;
// For each index i from 0 to 4,
for (i = 0; i < 5; ++i) {
// Swap the smallest element in [i .. last] with the ith element.
swap(i, indexOfSmallest(i, this.length()-1));
} // for
} // fiveSmallest()
```

What we have accomplished is a partial sorting of the sequence by selecting the smallest entries. Thus, we call this algorithm the partial selection sort.

Our task now is to analyze the efficiency of this approach. This we do in terms of the number of times two entries in the sequence are compared. To find and position the smallest entry in the sequence requires 49 comparisons, to process the next smallest entry requires 48, and so on. Thus, the total number of comparisons to find the five smallest entries is

49 + 48 + 47 + 46 + 45 = 235

In general, applying this selection method to find the k smallest entries in a sequence of n entries requires

(1/2)(2*n*k - k2 - k)

comparisons. (Can you derive this formula?) Thus, to find the 10 smallest entries in a sequence of 10,000 entries requires 99,945 comparisons. We might also say that this is an O(n*k) algorithm.

Can we do better? Recall that when we found the smallest element in a sequence, we began with a guess of the smallest and then refined that guess by looking at the remaining elements. We can do the same thing to find the five smallest elements. Initially, we'll assume that the first five elements are the five smallest elements. Sort the first five entries in the sequence by any method. Then consider the sixth entry. Compare it to the fifth entry in the sequence, which is now the largest of the first five entries. If the sixth entry is larger, pass over it because it is not one of the five smallest entries. If, however, the sixth entry is smaller than the fifth, compare it with the fourth, third, and so on, inserting it among the first five entries so that the first five entries in the list remain the smallest entries found so far in increasing order. Repeat this process for the entries in positions 7, 8, ..., 50.

Note that this process creates a partially sorted list by inserting the smallest entries into the beginning of the list. Thus, we call this approach the partial insertion sort. To see why the partial insertion sort is superior to the partial selection sort, let us compare the two approaches when searching for the 10 smallest entries within a list of 1,000 entries. We suppose that our partial insertion sort has reached the halfway point. The 10 smallest items in the first 500 have been found and we are about to consider the entry in position 501. If the original list was randomly scrambled, it is unlikely that this entry will be less than the tenth entry, and only one comparison is required to discover this. The same is true for all the entries in positions 501 through 1000. However, if one of these entries does belong among the top 10, then this will be discovered with one comparison, and at most nine more comparisons will be required to position it properly. This is much more efficient than our partial selection sort, in which each of the last 500 entries is involved in 10 comparisons.

In experiment L3.2, you will investigate these two algorithms.

### Detour: random sequences

For our simple experiments, it is useful to be able to generate ``random'' sequences of numbers. What do we mean by ``random''? Typically, that each sequence of a particular size is equally likely or that at each point in the sequence, each number is equally likely as the next element. How can we generate such sequences? Fortunately, Java provides a standard utility class, `java.util.Random`. This class includes a `nextInt` method that gives that next ``random'' number in the sequence. In truth, this number is not random, in that it is generated by an algorithm. However, it is close enough to random for our purposes.

Hence, to fill the array `elements` with a random sequence of 100 integers, we might write

```import java.util.Random;
...
int i;
Random generator = new Random();
elements = new int[100];
for (i = 0; i < 100; ++i) {
elements[i] = generator.nextInt();
} // for
```

However, when we're comparing two algorithms, it is helpful to have the same input to both algorithms. Fortunately, Java's random number generator can take a seed that uniquely determines the random sequence. You can think of a seed as being a number for the sequence. If you use the same seed, you end up with the same sequence. For example, to get the ``first'' sequence, you would write

```    Random generator = new Random(1);
```

Note that random sequences are not always the best test cases for your algorithms. For example, when testing a sorting algorithm, you should also test sequences of varying lengths, sequences which contain all the same value, presorted sequences, and ``backwards'' sorted sequences (in which the numbers are organized largest to smallest). Nonetheless, random sequences still serve many purposes, and are often a good starting point.

In experiment L3.3 you will investigate random number generators.

### Sorting entire lists

By requesting the partial selection sort to find the n smallest entries in a list of n entries, we obtain an algorithm, known as the selection sort, for sorting an entire list. This algorithm first finds the smallest of the n entries of the list, requiring n - 1 comparisons, and places that entry at the top of the list. Next, the algorithm finds the smallest entry among the remaining n - 1 entries, requiring n - 2 comparisons, and moves it to the second position in the list. This process repeats until all the entries are in order. The entire process requires

(n - 1) + (n - 2) + ... + 2 + 1 = (n - 1)(n/2)

or

(1/2)(n2 - n)

comparisons between list entries when sorting a list of length n. In big-O notation, the running time is O(n2).

A similar analysis shows that insertion sort requires an average of

(1/4)(n2 - n)

comparisons to sort a list of n entries. Again, in big-O notation, the running time is O(n2).

In experiment L3.4, you will consider insertion sort. In experiment L3.5, you will consider selection sort. In experiment L3.6, you will compare the two.

### The Quicksort Algorithm

In the 1960's C. A. R. Hoare, a pioneer in the field of computer science, discovered the Quicksort algorithm. In the average case, the number of comparisons performed by this algorithm when sorting a list of n entries is O(n*lg(n)). However, in the worst case, Quicksort is also O(n2).

You will investigate the running time of Quicksort in experiment L3.4.

## Experiments

Name: ________________
ID:_______________

### Experiment L3.1: Finding the smallest element

Required files:

Step 1. Make copies of `Counter.java`, `SortableIntSeq.java`, and `SortTester.java`. Compile all three and execute `SortTester`. Find the smallest element in a list of size 50. Describe what `SortTester` does (or can do).

Step 2. One problem with `SortTester` and `SortableIntSeq` is that they do not provide an easy way to count the steps in an algorithm. How should we do that? Preferably with a `Counter` object. Read the code for that class and explain what it does.

Step 3. Build a new version of the `smallest` method from `SortableIntSeq` that takes a `Counter` as a parameter and uses that counter to count the steps it executes. Recompile `SortableIntSeq` and correct any errors. Summarize your changes.

Step 4. Extend `SortTester` so that it uses a `Counter` to count the steps in `SortableIntSeq`'s `smallest` method. Recompile `SortTester` and correct any errors. Summarize your changes.

Step 5. Execute `SortTester` and record the number of steps required to find the smallest element in lists of size 10, 20, 100, and 1000.

```10:

20:

100:

1000:

```
After recording your results, you may want to look at our notes on this step.

### Experiment L3.2: Finding smaller elements

Required files:

Step 1. Make copies of `Counter.java`, `SortableIntSeq.java`, and `SortTester.java`. Compile all three and execute `SortTester`. Find the five smallest elements in a list of size 50. Record the results.

Step 2. Update `SortableIntSeq` so that `fiveSmallest`, `newFiveSmallest`, and any methods they use take `Counter`s as parameters and count their steps. Update `SortTester` to call those methods with a `Counter` and print out the number of steps executed. Recompile both files and correct any errors. Summarize your changes.

Step 3. Use your modified `SortTester` to fill in the following table.

```Steps to find the smallest five elements in a list of size n, using
naive partial selection sort and the better partial insertion sort.

n          steps               steps
(naive)           (improved)

500

1000

2000

```

Step 4. Update `SortableIntSeq` and `SortTester` to look for the seven smallest elements, rather than the five smallest elements. Fill in the table.

```Steps to find the smallest seven elements in a list of size n, using
naive partial selection sort and the better partial insertion sort.

n          steps               steps
(naive)           (improved)

500

1000

2000

```

Step 5. Add `kSmallest` and `newKSmallest` methods to `SortableIntSeq`. These will behave like `fiveSmallest` and `newFiveSmallest` so that they take `k` (the number of small elements to find) as a parameter. Recompile `SortableIntSeq` and correct any errors. Summarize your changes.

Step 6. Update `SortTester` so that it reads in the number of elements to find (in the cases in which we want k small elements). Recompile `SortTester` and correct any errors. Summarize your changes.

Step 7. Using your augmented `SortTester`, record the number of steps for each of the following

```Steps to find the smallest k elements in a list of size n, using
naive partial selection sort and the better partial insertion sort.

k    n          steps               steps
(naive)            (improved)

5   500

10  1000

15  2000

```

Step 8.

Step 2. Repeat step 7 for the following table. In these cases you are selecting the top 10 percent of the list, while in step 7 you selected the top 1 percent.

```Steps to find the smallest k elements in a list of size n, using
naive partial selection sort and the better partial insertion sort.

k    n          steps               steps
(naive)            (improved)

25   250

50   500

100 1000

200 2000
```

Step 9.

Do those number match those theorized in the discussion? Why or why not?

Step 10. What do you conclude about the advantages of one partial sort over the other?

### Experiment L3.3: Random sequences

Required files:

Step 1. Make copies of `SortableIntSeq.java` and `SortTester.java`. Compile the two files. Using `SortTester`, make five lists of ten random numbers. Record those lists.

Step 2. Update `SortTester` to take a seed as an input. Use that seed and the appropriate method of `SortableIntSeq` to use that seed. Recompile the files and correct any errors. Using `SortTester`, make three lists of ten random numbers, using the same seed each time (do not use zero as our seed). Record your results.

Step 3. At times, you will want to use presorted sequences instead of random sequences. Read the code for `SortableIntSeq` and determine which methods can be used to generated presorted sequences. What command might you use to create the sequence [1,3,5,7,9,...,301]? What command might you use to create the sequence [301,299,297,...,5,3,1]?

### Experiment L3.4: Insertion sort

Required files:

Step 1. Make copies of `Counter.java`, `SortableIntSeq.java`, and `SortTester.java`. Compile all three and execute `SortTester`. Using `insertionSort`, sort a list of ten numbers. Did it work correctly? Record the original list and the sorted list.

Step 2. Update `SortTester` and `SortableIntSeq` to count the number of steps in insertion sort. Recompile the files and summarize your changes.

Step 3. Use insertion sort to sort ten randomly generated lists of 100 elements. Record the number of steps in each case.

Step 4. Is the number of steps always the same? Why or why not? After answering this question you may want to read our notes on this step.

Step 5. Use insertion sort to sort the lists

• [1,2,3,...,99,100]
• [2,4,6,...,198,200]
• [101,102,103,...199,200]

Record the number of steps in each case.

Step 6. Is the number of steps always the same? Why or why not? After answering this question you may want to read our notes on this step.

Step 7. Use insertion sort to sort the lists

• [100,99,98,...,1]
• [200,198,196,...,4,2]
• [200,199,198,...,101]

Record the number of steps in each case.

Step 8. Is the number of steps always the same? Why or why not? After answering this question you may want to read our notes on this step.

Step 9. Reflecting on your experiments, which types of lists is insertion sort best at sorting? Worst at sorting? Why? Is its running time on random lists closer to the best time or worst?

### Experiment L3.5: Selection sort

Required files:

Step 1. Make copies of `Counter.java`, `SortableIntSeq.java`, and `SortTester.java`. Compile all three and execute `SortTester`. Using `selectionSort`, sort a list of ten numbers. Did it work correctly? Record the original list and the sorted list.

Step 2. If you read the code in `SortableIntSeq`, you will see that `selectionSort` is not defined. Fill in the body appropriately, recompile `SortTester`, test the new `selectionSort`, and correct any errors. Enter the definition of `selectionSort` here. Note that you may want to look at the definition of `fiveSmallest` as you define `selectionSort`.

Step 3. Update `SortTester` and `SortableIntSeq` to count the number of steps in selection sort. Recompile the files, correct any errors, and summarize your changes.

Step 4. Use insertion sort to sort ten randomly generated lists of 100 elements. Record the number of steps in each case.

Step 5. Use insertion sort to sort the lists

• [1,2,3,...,99,100]
• [2,4,6,...,198,200]
• [101,102,103,...199,200]

Record the number of steps in each case.

Step 6. Use insertion sort to sort the lists

• [100,99,98,...,1]
• [200,198,196,...,4,2]
• [200,199,198,...,101]

Record the number of steps in each case.

Step 7. Is the number of steps always the same? Why or why not?

Step 8. Reflecting on your experiments, which types of lists is selection sort best at sorting? Worst at sorting? Why? Is its running time on random lists closer to the best time or worst?

### Experiment L3.6: Comparing insertion sort and selection sort

Required files:

Step 1. Using the modified versions of `SortableIntSeq` and `SortTester`, fill in the following table. For each length sequence, try three different random sequences. Make sure that the two sorting mechanisms are run on the same random sequences.

```Running time of insertion sort and selection sort on different length
random sequences, with three tests per sequence length.

Sequence         Steps                   Steps
length      (insertion sort)         (selection sort)
Test1  Test2  Test3     Test1  Test2  Test3
100

200

400

800

2000

```

Step 2. Using the modified `SortTester` and `SortableIntSeq`, fill in the following table, using sequences of the form [1,2,3,...,n].

```Running time of insertion sort and selection sort on different length
increasing sequences, with one tests per sequence length.

Sequence         Steps                   Steps
length      (insertion sort)         (selection sort)

100

200

400

800

2000

```

Step 3. Using the modified `SortTester` and `SortableIntSeq`, fill in the following table, using sequences of the form [n,n-1,n-2,...,3,2,1].

```Running time of insertion sort and selection sort on different length
decreasing sequences, with one tests per sequence length.

Sequence         Steps                   Steps
length      (insertion sort)         (selection sort)

100

200

400

800

2000

```

Step 4. What do you observe from the tables above? Explain your findings.

### Experiment L3.7: Quicksort

Required files:

Step 1. Make copies of `Counter.java`, `SortableIntSeq.java`, and `SortTester.java`. Compile all three and execute `SortTester`. Using `quickSort`, sort a list of ten numbers. Did it work correctly? Record the original list and the sorted list.

Step 2. Augment the classes to count the number of steps in Quicksort. Recompile the files and correct any errors. Summarize your changes.

Step 3. Run insertion sort and Quicksort on a few lists of different sizes, recording the number of steps.

```Running time of insertion sort and Quicksort on different length
random sequences, with three tests per sequence length.

Sequence         Steps                   Steps
length      (insertion sort)          (Quicksort)
Test1  Test2  Test3     Test1  Test2  Test3
100

200

400

800

2000

```

Step 4. Plot the results from the previous table.

### Experiment L3.8: Quicksort

Required files:

Step 1. Using the modified versions of `SortableIntSeq` and `SortTester`, fill in the following table. For each length sequence, try three different random sequences, one increasing sequence, and one decreasing sequence.

```Running time of Quicksort on different types and lengths of
sequences.

Sequence                        Steps
length     Rand1   Rand2   Rand3  Inc.  Decr.

100

200

400

800

2000

```

Step 2. What do these results suggest?

## Post-Laboratory Problems

### Problem L3-A: Sorting objects

a. Develop a `Person` class in which each object contains information about a person, including last name, telephone number, city, state, and zip.

b. Write a program that sorts sequences of `Person` objects. You may want to use the `compareTo(String other)` method from the `String` class, which returns a negative number if the current string is less than the other string.

c. Allow the user to select a "tie-breaking" field by which to distinguish between records that have the same value. For example, you might wish to have last-name ties sorted by first name within each group. Incorporate this tie-breaking method into your program.

d. Note any efficiency issues that arise while implementing these various sorting routines.

### Problem L3-B: Shuffling decks

a. Develop a `PlayingCard` class.

b. Develop a `Deck` class, for decks of playing cards.

c. Create a `shuffle` method that shuffles a deck of playing cards.

You might shuffle a deck by randomly selecting cards to swap, and doing that some appropriate number of times. Remember that you can use absolute value and the modulus operator to translate a number to a particular range.

You might also shuffle a deck by assigning a random number to each card and then sorting by those numbers.

### Problem L3-C: Sorting decks

a. Add a `sort` method to `Deck` class that sorts the cards in a deck into ascending order. Design your method to report the number of comparisons performed during the sorting process.

b. Using the methods `shuffle` and `sort`, write a program that reports statistics (such as the number of comparisons per sort) over numerous shuffles of the deck.

c. How difficult would it be to change your sorting algorithm to, say, descending order, a different suit arrangement, or by sorting the deck into groups of similarly valued card groups?

### Problem L3-D: Finding primes

Write a program implementing the sieve of Eratosthenes for finding the prime numbers between 1 and n. Apply your solution to various values of n. How does the time required by the program increase as n increases? Explain your findings.

## Notes

Experiment L3.1, Step 5. If you only count the number of times the body in the loop is executed, it is likely that the number of steps in `smallest` is one less than the number of elements in the sequence.

Experiment L3.4, Step 4. Since the insertions we have to do may differ from sequence to sequence, it is likely that the running times will be different.

Experiment L3.4, Step 6. While the lists are different, they are ordered the same. This means that the number of swaps should be the same.

Experiment L3.4, Step 8. While the lists are different, they are ordered the same. This means that the number of swaps should be the same.

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.