Computer Science 235 :: Data Structures and Algorithms

Project 4 :: Sorting


Note: Projects are to be completed by each student individually (not by groups of students).

Introduction

This project requires you to add counters to a number of sorting algorithms to empirically measure the number of compares used by each algorithm. You will then write code to analyze the measured data and automatically determine the BigOh function for each set of data.

The files needed to complete the project are packaged together in a zip archive (sort.zip).

Make sure to check out the TIPS page which gives good advice about counting compares.

Sorting

Sorting is the process of putting a sequence of items into order. Comparison-based sorting algorithms make ordering decisions using only pairwise comparisons of two items at a time. A number of algorithms have been invented that use comparison-based sorting including insertion sort, selection sort, merge sort, and quick sort. Any objects that implement the Comparable interface can be sorted this way.

Compares

The efficiency of a sorting algorithm is sometimes evaluated by counting the number of compares the algorithm uses to sort the data. A compare is counted each time two items are compared. You will count compares for a number of sorting algorithms as part of this project. The specific definition of compare, for the purposes of this project, will be given in a later section.


Requirement #1: Implement the Sort Interface

The file Sort.java defines the Sort interface. Classes that implement the sort interface are capable of sorting arrays of Comparable objects and counting the numbers of compares used to do the sorting.

The Sort interface defines two methods:

void sort( Comparable [ ] a );

This method sorts the array a in ascending order, smaller items come first and larger items come later.

long getCompares();

This method returns the number of compares used in the sort.

Write four classes that implement the Sort interface. Name your classes InsertionSort, SelectionSort, MergeSort, and QuickSort. Put your classes in the cs235.sort package. Your classes must use the algorithms defined in the files InsertionSortStd.java, SelectionSortStd.java, MergeSortStd.java, and QuickSortStd.java. The insertion sort, merge sort, and quick sort algorithms are taken directly from the code in the textbook. You must modify the provided code to count compares and to implement the Sort interface. Don't change the workings of the algorithms as this may cause your compare counts to be wrong.

Counting Compares

For the purposes of this project, the compare count is defined to be the number of times an algorithm executes the compareTo method.

Testing Your Sort Classes

The first half of the test driver (TestDriver.java) tests the sorting classes using a number of different types and sizes of test data. The driver verifies that each sort actually sorts the data and that the numbers of compares are counted correctly. We recommend that you test and debug your sorting classes before continuing.


Empirical BigOh Analysis

Empirical BigOh analysis is the process of estimating the BigOh of a program using measurements obtained from actually running the program. Suppose you want to show that the number of compares in a sort is O(F(N)). (The function F(N) could be N, Nlog N, or N^2 for example.) You could run the sort for a number of different input sizes and record the number of compares (C(N)) for each input size. If the function F(N) is a good estimate of the correct BigOh, the ratio C(N)/F(N) should converge to a positive constant.

Requirement #2: Implement the Analyzer Interface

The file Analyzer.java defines the Analyzer interface. Classes that implement the Analyzer interface are capable of estimating the BigOh of a program using empirical data. Write a class named AnalyzerImpl that implements the Analyzer interface. Put your class in the cs235.sort package.

The Analyzer interface defines four methods:

void analyze(int[] sizes, long[] data);

This method is given two arrays. The first array provides the sizes of the datasets that were input to the program. The second array gives the values that were measured when running the program with each dataset. From this data analyze decides which BigOh function gives ratios that converge to a constant as described above. Analyze must consider the following possible BigOh functions:

F(N) = 1
F(N) = log N
F(N) = N
F(N) = N log N
F(N) = N^2
F(N) = N^3
F(N) = 2^N

Analyze tries each BigOh function and computes the ratio for each size of input data. How do you decide if the ratios converge to a constant? First compute the mean of the ratios. Then compute the relative error of each ratio with respect to the mean. If the average of the relative errors is a small number, the ratios are probably close to the same constant. Relative error can be computed like this:

|ratio - mean|/mean

(Those bars around 'ratio - mean' represent absolute value.)

So if you compute the average relative error in the ratios for each possible BigOh function and select the BigOh function that gives the smallest error, its likely that you've chosen the best BigOh function.

double[] getRatios();

This method returns the ratios computed by analyze for the BigOh function that has the smallest error.

double getError();

This method returns the error computed by analyze for the BigOh function that has the smallest error.

String getBigOh();

This method returns a string describing the BigOh function selected by analyze. One of the following strings must be returned.

"O(1)"
"O(log N)"
"O(N)"
"O(N log N)"
"O(N^2)"
"O(N^3)"
"O(2^N)"

Testing Your Analyzer

The second half of the test driver tests the AnalyzerImpl class using a number of different types and sizes of test data. The driver verifies that analyze selects the correct BigOh function for each set of test data. Note that the test driver does not test getRatios and getError. We recommend that you test and debug your analyze method before continuing.


Requirement #3: Test Your Code using the TestDriver Program

The TestDriver class (TestDriver.java) contains a number of test cases you may use to test your code. TestDriver does not test all of the functionality of your code. You should create additional test cases to completely test your code. Different test cases are used when you pass off your project. It is possible that your code will pass all the test cases in TestDriver and still fail at pass off.


Requirement #4: Write a driver program that empirically determines the BigOh for each of your sorting algorithms.

Write a class named SortAnalyzer. Put your class in the cs235.sort package. Within SortAnalyzer write a main method and supporting methods that do the following:

  1. Run each of your four sorting algorithms.
  2. Run each algorithm with three types of data:
  3. Run each algorithm with five sizes of each type of data. The required five sizes are 100, 200, 400, 800, and 1600 elements. This means you will execute a total of 4*3*5 = 60 runs.
  4. Record the numbers of compares used in each sorting run.
  5. Pass the compare data to an Analyzer object.
  6. Print the results of each run in a format similar to the following:
BUBBLE sort with SORTED data
sizes: 100 200 400 800 1600
compares: 19800 79600 319200 1278400 5116800
ratios: 1.980 1.990 1.995 1.998 1.999 O(N^2) error 0.003

BUBBLE sort with REVERSE data
sizes: 100 200 400 800 1600
compares: 19800 79600 319200 1278400 5116800
ratios: 1.980 1.990 1.995 1.998 1.999 O(N^2) error 0.003

BUBBLE sort with RANDOM data
sizes: 100 200 400 800 1600
compares: 19800 79600 319200 1278400 5116800
ratios: 1.980 1.990 1.995 1.998 1.999 O(N^2) error 0.003

Important!

The purpose of this requirement isn't to just make a copy of the TestDriver and to change it until it has the function of SortAnalyzer, but to learn to write your own program to debug your code and to better understand the Big O functions of a number of sorting algorithms. Therefore, you may not copy the TestDriver, but instead, are to write your own code for SortAnalyzer.