Monday, May 08, 2006

C++ : Exploring Sorting Part IV

Let's start with the full code for the promised generic quicksort implementation that is my attempt at an "exercise for the reader" from an Alexandrescu article mentioned previously

template<class Iter>
void sort(Iter left, Iter right)
{
// make larger to invoke insertion sort sooner
enum { threshold = 56 };
while(right-left>1)
{
// do insertion sort on range
if(right-left<=threshold)
{
Bannister::insertionSort(left, right);
return; // done sorting range
}
else // do quicksort on range
{
const std::pair<Iter,Iter> pivot =
Bannister::partition(
left, right,
*Bannister::selectRandomPivot(
left, right));
if (left < pivot.first)
Bannister::sort(left, pivot.first);
if (right > pivot.second)
left = pivot.second;
else
break;
}
}
}


Quite different from the original C code I posted in my last blog! I'm sure you could have made that function generic using the same method as shown with the bubblesort, it's not worth going into detail again. The above algorithm uses a quicksort for arrays over 56 elements long, otherwise it uses an insertion sort which is another fast method. It's a matter of testing to get the best speed benefits, something I have as yet to write a test framework for. Now to explain the quicksort part of the algorithm. The selecting of a pivot value and partitioning of the array passed to the sort function is deferred to two functions for clarity and to make it easier to tinker with the algorithm. The sort function itself has been partially converted to an iterative solution to try and alleviate the speed costs of recursion.

The pivot value is chosen randomly to help prevent the "worst-case scenario" of always choosing the lowest number around which to partition the array, which is always the case when sorting an array which is already ordered using the first value as the pivot. Another speed increase is achieved by uses Alexandrescus "fit pivot" technique which is based on the more well-known "fat-pivot". The "fat pivot" algorithm partitions the array into 3, not 2, sub arrays. The 3rd contains all occurences of the pivot value and places this in the middle of the other two (which contain values lower and values higher than the pivot respectively). This reduces the number of partitioning steps needed when there are long runs of equal numbers. However there is an overhead cost in doing the partitioning. The "fit pivot" algorithm will simply apply the normal "two partition" algorithm to an array but carry on moving the right pointer to the left and vice versa until numbers are found which are not equal to the pivot value. This way, if there is a run of pivot-equal numbers in the middle of the array, it will be kept there, not passed to the quicksort again and the resulting two partitions will be smaller, increasing the speed in those situations at no extra cost. Here are the two functions (including a more standard selectFirstPivot).

template<class Iter>
Iter selectFirstPivot(Iter left, Iter right)
{
return left;
}

template<class Iter>
Iter selectRandomPivot(Iter left, Iter right)
{
long int index = Bannister::rand(0, right-left);
return left + index;
}

template<class I0, class I1, class T>
std::pair<I1,I0> partition(
I0 left, I1 right, T pivot)
{
--right;
while (left < right)
{
while ((*right >= pivot) && (left < right))
--right;
if (left != right)
{
*left = *right;
++left;
}
while ((*left <= pivot) && (left < right))
++left;
if (left != right)
{
*right = *left;
--right;
}
}
*left = pivot;
// all numbers in-between will be equal
return std::make_pair(right, left+1);
}


Note, I've put my functions in a namespace called "Bannister" (rubbish name I know, it's named after Roger Bannister, the first man to run the 4 minute mile). Here's the insertion sort algorithm.

template<class Iter, class T>
void insertionSort(Iter left, Iter right, T)
// T unused, only type is needed
{
Iter i, j;
T index;

for (i=left+1; i != right; ++i)
{
index = *i;
j = i;
while ((j > left) && (*(j-1) > index))
{
*j = *(j-1);
--j;
}
*j = index;
}
}

template<class Iter>
void insertionSort(Iter left, Iter right)
{
// dereference left to deduce type of iterator
Bannister::insertionSort(left, right, *left);
}


And finally I'll include the fast pseudo-random function I found to calculate the random pivot.

// Park-Miller "minimal standard" 31 bit
// pseudo-random number generator
// implemented with David G. Carta's
// optimisation: with 32 bit math and
// without division..

long unsigned int rand31()
{
static long unsigned int seed = 16807;
long unsigned int hi, lo;

lo = 16807 * (seed & 0xFFFF);
hi = 16807 * (seed >> 16);

lo += (hi & 0x7FFF) << 16;
lo += hi >> 15;

if(lo>0x7FFFFFFF) lo -= 0x7FFFFFFF;

return (seed=(long)lo);
}

long int rand(long int lo, long int hi)
{
return (long int)(rand31() % (hi-lo)) + lo;
}


I should include some examples of using these functions as well ... (I really am letting the code speak for itself in this blog)... disclaimer: really bad test harness with no error-checking!!

#include <iostream>
#include <cstdlib>
#include <ctime>
#include <limits>
#include <vector>
// previous code goes in here...
#include "sort.h"

template<class Iter>
void output(Iter left, Iter right)
{
for(Iter it=left; it!=right; ++it)
std::cerr << *it << " ";
std::cerr << std::endl;
}

template<class Iter>
Iter find_unsorted(Iter left, Iter right)
{
for(Iter it=left+1; it!=right; ++it)
{
if(*it<*(it-1))
{
return it;
}
}
return right;
}

int unit(void)
{
const int size = 12;
int arr[] = {4, 0, 5, 7, 2, 2, 2, 2, 2, 1, 78, 4};
float arf[] = {4.2f, 0.1f, 5.3f, 7.8f, 8.02f,
2.3f, 45.9f, 2.1f, 2.1f, 1.2f,
78.0f, 4.2f};
int arr[] = {0, 7, 1, 8, 2, 9,
3, 10, 4, 11, 5, 6};
typedef std::vector<char> char_array_t;
char_array_t arc;
arc.push_back('k');
arc.push_back('c');
arc.push_back('e');
arc.push_back('f');
arc.push_back('b');
arc.push_back('g');
arc.push_back('i');
arc.push_back('h');
arc.push_back('l');
arc.push_back('d');
arc.push_back('j');
arc.push_back('a');

output(arr, arr+size);
bubbleSortT<int*>(arr, size);
Bannister::sort(arr, arr+size);
output(arr, arr+size);

output(arf, arf+size);
Bannister::insertionSort(arf, arf+size);
Bannister::sort(arf, arf+size);
output(arf, arf+size);

output(arc.begin(), arc.end());
Bannister::sort(arc.begin(), arc.end());
Bannister::insertionSort(arc.begin(), arc.end());
output(arc.begin(), arc.end());

return 1;
}

int main(int argc, char **argv)
{
if(argc==1)
{
return unit();
}

typedef std::vector<char> char_array_t;
char_array_t arc;

// Kilobytes of characters to sort
const int mem_size_bytes = 1024 * atoi(argv[1]);
for(int i=0; i<mem_size_bytes; ++i)
{
arc.push_back(Bannister::rand('A', 'Z'));
}

switch(argv[2][0])
{
case 'b':
std::cerr << "bubble sorting array...\n";
bubbleSort(arc.begin(), arc.end());
break;
case 'q':
std::cerr << "Bannister::sorting array... "
"(hybrid quicksort/insertion sort "
"with random fit pivot)\n";
Bannister::sort(arc.begin(), arc.end());
break;
case 'i':
std::cerr << "insertion sorting array...\n";
Bannister::insertionSort(
arc.begin(), arc.end());
break;
case 's':
std::cerr << "std::sorting array...\n";
std::sort(arc.begin(), arc.end());
break;
default:
break;
}

char_array_t::iterator it =
find_unsorted(
arc.begin(), arc.end());
if(it!=arc.end())
{
std::cerr << "Unsorted! Error!\n" << std::endl;
output(it-10, it+10);
//output(arc.begin(), arc.end());
}

return 1;
}


Now I propose an exercise for the reader... write some proper test code to automatically time and compare the results using different sorting methods.

Some simple timing tests I've done (using >time ./sort.exe on the bash command line under Cygwin) show that the Banniser::sort is many times quicker than a bubble sort or even an insertion sort with large arrays. It doesn't hold a candle to std::sort though. I suspect the std::sort implementation uses some really good platform-specific tricks, maybe implementing some inner loops in assembly. Anyone who could attempt modifying this code to go even faster is more than welcome!

No comments: