Project 4 :: Sorting
Note: Projects are to be completed by each student individually (not by groups of students).
Introduction
This project requires you to add counters to a number of sorting algorithms to empirically measure the number of compares used by each algorithm. You will then write code to analyze the measured data and automatically determine the BigOh function for each set of data.
The files needed to complete the project are packaged together in a zip archive (sort.zip).
Make sure to check out the TIPS page which gives good advice about counting compares.
Sorting
Sorting is the process of putting a sequence of items into order. Comparison-based sorting algorithms make ordering decisions using only pairwise comparisons of two items at a time. A number of algorithms have been invented that use comparison-based sorting including insertion sort, selection sort, merge sort, and quick sort. Any objects that implement the Comparable interface can be sorted this way.
Compares
The efficiency of a sorting algorithm is sometimes evaluated by counting the number of compares the algorithm uses to sort the data. A compare is counted each time two items are compared. You will count compares for a number of sorting algorithms as part of this project. The specific definition of compare, for the purposes of this project, will be given in a later section.
Requirement #1: Implement the Sort Interface
The file Sort.java defines the Sort interface. Classes that implement the sort interface are capable of sorting arrays of Comparable objects and counting the numbers of compares used to do the sorting.
The Sort interface defines two methods:
void sort( Comparable [ ] a );
This method sorts the array a in ascending order, smaller items come first and larger items come later.
long getCompares();
This method returns the number of compares used in the sort.
Write four classes that implement the Sort interface. Name your classes InsertionSort, SelectionSort, MergeSort, and QuickSort. Put your classes in the cs235.sort package. Your classes must use the algorithms defined in the files InsertionSortStd.java, SelectionSortStd.java, MergeSortStd.java, and QuickSortStd.java. The insertion sort, merge sort, and quick sort algorithms are taken directly from the code in the textbook. You must modify the provided code to count compares and to implement the Sort interface. Don't change the workings of the algorithms as this may cause your compare counts to be wrong.
Counting Compares
For the purposes of this project, the compare count is defined to be the number of times an algorithm executes the compareTo method.
Testing Your Sort Classes
The first half of the test driver (TestDriver.java) tests the sorting classes using a number of different types and sizes of test data. The driver verifies that each sort actually sorts the data and that the numbers of compares are counted correctly. We recommend that you test and debug your sorting classes before continuing.
Empirical BigOh Analysis
Empirical BigOh analysis is the process of estimating the BigOh of a program using measurements obtained from actually running the program. Suppose you want to show that the number of compares in a sort is O(F(N)). (The function F(N) could be N, Nlog N, or N^2 for example.) You could run the sort for a number of different input sizes and record the number of compares (C(N)) for each input size. If the function F(N) is a good estimate of the correct BigOh, the ratio C(N)/F(N) should converge to a positive constant.
Requirement #2: Implement the Analyzer Interface
The file Analyzer.java defines the Analyzer interface. Classes that implement the Analyzer interface are capable of estimating the BigOh of a program using empirical data. Write a class named AnalyzerImpl that implements the Analyzer interface. Put your class in the cs235.sort package.
The Analyzer interface defines four methods:
void analyze(int[] sizes, long[] data);
This method is given two arrays. The first array provides the sizes of the datasets that were input to the program. The second array gives the values that were measured when running the program with each dataset. From this data analyze decides which BigOh function gives ratios that converge to a constant as described above. Analyze must consider the following possible BigOh functions:
F(N) = 1
F(N) = log N
F(N) = N
F(N) = N log N
F(N) = N^2
F(N) = N^3
F(N) = 2^N
Analyze tries each BigOh function and computes the ratio for each size of input data. How do you decide if the ratios converge to a constant? First compute the mean of the ratios. Then compute the relative error of each ratio with respect to the mean. If the average of the relative errors is a small number, the ratios are probably close to the same constant. Relative error can be computed like this:
|ratio - mean|/mean
(Those bars around 'ratio - mean' represent absolute value.)
So if you compute the average relative error in the ratios for each possible BigOh function and select the BigOh function that gives the smallest error, its likely that you've chosen the best BigOh function.
double[] getRatios();
This method returns the ratios computed by analyze for the BigOh function that has the smallest error.
double getError();
This method returns the error computed by analyze for the BigOh function that has the smallest error.
String getBigOh();
This method returns a string describing the BigOh function selected by analyze. One of the following strings must be returned.
"O(1)"
"O(log N)"
"O(N)"
"O(N log N)"
"O(N^2)"
"O(N^3)"
"O(2^N)"
Testing Your Analyzer
The second half of the test driver tests the AnalyzerImpl class using a number of different types and sizes of test data. The driver verifies that analyze selects the correct BigOh function for each set of test data. Note that the test driver does not test getRatios and getError. We recommend that you test and debug your analyze method before continuing.
Requirement #3: Test Your Code using the TestDriver Program
The TestDriver class (TestDriver.java) contains a number of test cases you may use to test your code. TestDriver does not test all of the functionality of your code. You should create additional test cases to completely test your code. Different test cases are used when you pass off your project. It is possible that your code will pass all the test cases in TestDriver and still fail at pass off.
Requirement #4: Write a driver program that empirically determines the BigOh for each of your sorting algorithms.
Write a class named SortAnalyzer. Put your class in the cs235.sort package. Within SortAnalyzer write a main method and supporting methods that do the following:
- Run each of your four sorting algorithms.
- Run each algorithm with three types of data:
- data that is already sorted
- data that is sorted in reverse order
- data that is in random order
- Run each algorithm with five sizes of each type of data. The required five sizes are 100, 200, 400, 800, and 1600 elements. This means you will execute a total of 4*3*5 = 60 runs.
- Record the numbers of compares used in each sorting run.
- Pass the compare data to an Analyzer object.
- Print the results of each run in a format similar to the following:
BUBBLE sort with SORTED data
sizes: 100 200 400 800 1600
compares: 19800 79600 319200 1278400 5116800
ratios: 1.980 1.990 1.995 1.998 1.999 O(N^2) error 0.003
BUBBLE sort with REVERSE data
sizes: 100 200 400 800 1600
compares: 19800 79600 319200 1278400 5116800
ratios: 1.980 1.990 1.995 1.998 1.999 O(N^2) error 0.003
BUBBLE sort with RANDOM data
sizes: 100 200 400 800 1600
compares: 19800 79600 319200 1278400 5116800
ratios: 1.980 1.990 1.995 1.998 1.999 O(N^2) error 0.003
Important!
The purpose of this requirement isn't to just make a copy of the TestDriver and to change it until it has the function of SortAnalyzer, but to learn to write your own program to debug your code and to better understand the Big O functions of a number of sorting algorithms. Therefore, you may not copy the TestDriver, but instead, are to write your own code for SortAnalyzer.