Implement and then benchmark a couple different sorts under a variety of conditions.
Text: 8.2, 8.7, Chapter 8
Concepts: sorting, big-O, micro-benchmarking.
In a class named UsernameA08
, write these three static sort methods:
public static <E extends Comparable<E>> void insertionSort(List<E> list)
public static <E extends Comparable<E>> void quickSort(List<E> list)
public static <E extends Comparable<E>> void otherSort(List<E> list)
Note that we are sorting java.util.List objects here, rather than arrays. When writing your sorts, you can assume these lists are random access (ArrayLists) in terms of performance. For example, you do not need to use an iterator and you can assume that a get
method call is an O(1) operation.
You should implement a standard insertion sort algorithm.
Your quickSort should select a random pivot, as in the lecture slides.
For otherSort, you can choose whatever comparison sort you want that is O(n2) or better. Remember that O(n lg n) is better than O(n2). This can be one of the sorts covered in lecture, or, for extra credit, a different interesting sort that you find on Wikipedia or elsewhere. You could also do quickSort again with a different pivot selection algorithm. Name your method otherSort so that Tamarin can invoke it, but document in your javadoc comments what sort you are actually implementing and any optimization choices you made.
Here is a program to generate some unsorted data: FileGenerator.java
See the javadoc documentation on how to run the program. You'll need to specify the number of integers you want in the file as a command line argument. Run the program with 1000, 10000, and 100000 to produce the files 1000.txt, 10000.txt, and 100000.txt. Since it's a little hard to quickly count that many 0s, I'll refer to these as your 1K, 10K, and 100K test cases.
Make sure your sort methods are correct. How you do this is up to you, but make sure you don't test all 3 sorts on the same list object.
Now time each of your sorts, in milliseconds, for lists of 1K, 10K, and 100K integers. As a control measure, also time the java.util.Collections.sort
method. To time a method, do something like this:
long sortTime = System.currentTimeMillis(); //start time insertionSort(listCopy); sortTime = System.currentTimeMillis() - sortTime; //duration = end - start time System.out.println("Insertion sort time: " + sortTime);
You will find that benchmarking is not a very precise science. If you run your program multiple times, you'll probably see that your times can vary about 20% either way. Run each test at least 3 or 4 times and take an average. (You can do this formally with a loop in your program, or you can just run your program a few times and eyeball it.)
Time each method for each of the 1K, 10K, and 100K data sets in each of these cases:
If a single call to a sort method takes more than a few minutes to complete, you can skip recording the time for that entry. You do not have produce all of these values with a single run of your program. That would certainly be acceptable, but you can also modify your program and run it again to handle the different cases. For example, you could read in the file as integers, get the resulting times, and then change your program to read the file in as strings.
Log into Laulima and go to the new Wiki section. Further instructions will be on the wiki page itself. We'll collect everyone's times there and then discuss any trends after the assignment deadline.
To speed up the edit process and so hopefully avoid edit conflicts, you'll need your times in the following format on one line: UH username, 3 times for insertion sort, 3 times for quick sort, 3 times for other sort, and 3 times for for Collections.sort. All of these elements should be separated by a | character. You'll have 1 line for each of the four conditions. An example:
Username | i-1K | i-10K | i-100K | q-1K | q-10K | q-100K | o-1K | o-10K | o-100K | C-1K | C-10K | C-100k #Unordered integers: ztomasze| 25 | 230 | 10900 | 6 | 75 | 50 | 12 | 60 | 95 | 5 | 50 | 105 #Unordered strings: ztomasze| 30 | 630 | 140500| 10 | 77 | 55 | 4 | 40 | 170 | 4 | 35 | 170 #Sorted integers: ztomasze| 1 | 9 | 13 | 9 | 58 | 48 | 1 | 9 | 30 | 1 | 9 | 15 #LinkedList of integers: ztomasze| 340 | 343365 | + | 30 | 2602 | 295548 | 19 | 1980 | 274976 | 3 | 30 | 105
Having your data already in this format means you can just paste one new row into each of the 4 wiki tables. You'll also be asked for your username, java version, OS, CPU speed (type optional), and specifically what other sort you implemented (if any):
ztomasze | 1.7.0_06 | Windows 7 | 2.4Ghz (Intel i5) | merge sort
Tamarin will test your 3 static methods directly for correctness. It won't time them, and it won't care what you did in your main method.
Up to you.
Upload your UsernameA08.java
file to Tamarin. (No jar files this time.)
Remember to follow the course coding standards on all assignments.
I recommend you add a main method to your class that reads in a file of integers. A quick way to do this is specify the filename as a command line argument. You can use a Scanner's hasNextInt() and nextInt() methods to read the numbers from the file as ints, or use hasNextLine() and nextLine() to read them as Strings. You should store them in an an ArrayList. Pass a copy of this List to your sort methods and then print the sorted list out afterwards to make sure the sort worked. You'll probably want to test it with a small file of 20 or 30 elements.
Note that you can't pass the same list to all of your sort methods because it will already have been sorted by the first sort you test. One way to avoid this is to re-read the contents from the file before each sort test, but a better approach is just to make a copy of the unsorted list. Remember that every Collection in Java has a constructor that takes another Collection, so you can do something like this:
List<Integer> unsorted = new ArrayList<Integer>(); // ...load unsorted list with file contents... List<Integer> listCopy = new ArrayList<Integer>(unsorted); insertionSort(listCopy); System.out.println("After insertion sort: " + listCopy); // although listCopy is now sorted, original unsorted list is still unsorted, // so you can make another copy of it to test quickSort
Benchmarking means running a program to evaluate the real-world performance of either the code or the underlying hardware. What we're doing here is micro-benchmarking: testing the performance of just a small section of code.
Benchmarking code is fraught with peril. There are many many factors that can affect the time you get: your hardware, your operating system, the possible precision of your timing mechanism, what else your OS was doing while running the test, what the compiler did to your code (where small implementation differences of the same algorithm can make big differences), what the JVM does while running your code (whether the garbage collector runs, whether the JVM pauses to compile part of your code into machine code for faster performance), etc.
The general advice on benchmarking is: don't do it (without very good reason). If you do it, don't put too much stock in it. (The first reading below is actually from the documentation of a Java microbenchmarking framework, and they say the same thing.)
So if the numbers we're getting are not very useful, why are we bothering? Because, while the "error rate" is very high in our measurements, we should still be able to see some significant trends as predicted by big-O. For example, a particular benchmark time might vary by 30% or more on the same machine, but if a different test takes 100x as long, that much larger difference will still be apparent despite the high error rate.
More reading: Java Microbenchmarks, Writing a Java Micro-Benchmark
For example, consider this chunk of code from array-based bubbleSort:
for (int i = 0; i < array.length - 1; i++) { if (array[i].compareTo(array[i+1]) > 0) { //swap the two E temp = array[i]; array[i] = array[i + 1]; array[i + 1] = temp; swapped = true; } }This can be easily converted to using List methods, like so:
for (int i = 0; i < list.size() - 1; i++) { if (list.get(i).compareTo(list.get(i+1)) > 0) { //swap the two E temp = list.get(i); // Or replace these 3 lines with: list.set(i, list.get(i + 1)); // Collections.swap(list, i, i+1); list.set(i + 1, temp); // swapped = true; } }
Note that we could have performed the swap by using the remove and add methods, but that would entail an O(n) shift for the remove and then again for the reinsertion. If you're assuming the List is an ArrayList, then the get/set is more efficient.
If you want to, you can write your sorts using iterators in places where you think it will improve performance. The translation from array-based sorting code will be a little trickier if you do so, though.
public static <T extends Comparable<T>> void mergeSort(T[] array) { public static <T extends Comparable<? super T>> void quickSort(T[] array) { ... }
Quicksort's definition is really the better way, but mergeSort's is simpler. You can declare your own methods either way, but here's what the difference means:
<T extends Comparable<T>>
<T extends Comparable<? super T>>
The later is more flexible. For example, consider your Meep class that implemented Comparable:
public class Meep implements Comparable<Meep> { ... public int compareTo(Meep m) { ...
Meeps are comparable to other Meeps. In other words, "Meep extends Comparable<Meep>", so Meep would be a valid value for T. If we created an array of Meeps, each element of the list will have a compareTo(Meep) method, and so we could pass this array to either of the sort methods above.
But now consider a subclass:
/** Exactly like a Meep, except a bit heavier than it should be. */ public class FatMeep extends Meep { public FatMeep(int size) { super(size + 1); } }
Thanks to inheritance, FatMeep is Comparable<Meep> and has a compareTo(Meep) method. This is fine; it means a FatMeep works just like a regular Meep in all comparison tests. You can compare a Meep to a FatMeep or a FatMeep to a Meep and get the same results. However, note that "FatMeep extends Comparable<Meep>" will match the generics requirement for quickSort but not for mergeSort. So we would not be able to sort a FatMeep[] using the mergeSort method above.
This situation is fairly rare, but it does reveal the complex underbelly of generics. (For more information, do a search for java generics PECS.)
List<E> aux = new ArrayList<E>(list);
Just leaving it as array is okay too, though.
Try to keep your edit times as short as possible to avoid trashing someone else's data. There's also a history record of all changes made to the document, so even if your times do get overwritten by someone else's changes, we will know that you did at least submit them before the deadline. So don't stress about your grade being affected by this. However, it would help us out if you could check back after the deadline and reinsert any missing sort-time data after the editing frenzy has died down.