The second project is to build a sorting utility called lsort in C++ that
internally operates on linked lists. This program must implement the
following sorting methods or modes:
STL: This sorting method simply uses the std::sort function.
qsort: This sorting method simply uses the qsort function.
merge: This sorting method uses a custom implementation of the merge sort algorithm.
quick: This sorting method uses a custom implementation of the quick sort algorithm.
This project is to be done in groups of 1 - 3 and is due midnight Friday September 30, 2016.
A reference implementation of the lsort sorting utility can be found at
/afs/nd.edu/user15/pbui/pub/bin/lsort. Here is its usage message:
$ lsort -h
usage: lsort
-m MODE Sorting mode (stl, qsort, merge, quick)
-n Perform numerical ordering
As can be seen, lsort takes two optional arguments. The first is the -m
flag, which lets the user select which sorting method to use. The second is
the -n flag, which informs the program to treat the input values as numeric
integers rather than textual strings.
As with most Unix filters, lsort reads from [standand input] and then
outputs to standard output:
# Use the shuf program to generate 10 random numbers and then use lsort to order them $ shuf -i 1-100 -n 10 -r | lsort -m quick 12 32 37 37 50 61 69 71 90 97
To start this project, you must form a group of 1 - 3 people. One group member should fork the Project 02 repository on GitLab:
https://gitlab.com/nd-cse-30331-fa16/project02
Once this repository has been forked, follow the instructions from Reading 00 to:
Make the repository private
Configure access to the repository
Make sure you add all the members of the team in addition to the instructional staff.
As with the programming challenges in this course, the project repository
contains a Makefile, input, and output.{number,string} test files for
automated building and testing of your program.
In addition to these files, the repository should also contain the following:
measure.cpp: This is utility for measuring execution time and memory
usage of an application.
lsort.h: This is the project header file that contains the definitions
for the Node and List structures, a few type definitions, and some
function prototypes.
main.cpp: This contains the main function of the the lsort utility and
is in charge of parsing the command line, reading the input, dispatching
the appropriate sorting method, and outputting the results.
node.cpp: This contains comparison functions that will be used by the
sorting methods to determine the order of two different Nodes.
list.cpp: This contains the implementation of a simple [singly-linked
list] that will contain the data read in from standard input.
stl.cpp: This contains the implementation of the stl sorting mode
which uses std::sort internally.
qsort.cpp: This contains the implementation of the qsort sorting mode
which uses qsort internally.
merge.cpp: This contains a custom implementation of merge sort.
quick.cpp: This contains a custom implementation of quick sort.
The lsort.h header file contains the definition of the struct Node and struct List:
struct Node { std::string string; int number; Node *next; }; struct List { Node *head; size_t size; List(); ~List(); void push_front(const std::string &s); };
As can be seen, the struct List is a singly-linked list with a
constructor, destructor, and a single push_front method. This
struct List consists of struct Node entries that contain both string
values, the corresponding number value, and a pointer to the next struct
Node.
In list.cpp, you are to implement the struct List constructor,
destructor, and push_front method. The push_front method should
insert a new struct Node into the front of the list. To get the number
value for the new struct Node, you should use the std::stoi function (if
this conversion fails, you can default to 0 for the number value).
In node.cpp, you are to implement the struct Node comparison functions:
// C++ Style comparison function bool node_number_compare(const Node *a, const Node *b); bool node_string_compare(const Node *a, const Node *b); // C Style comparison function int void_number_compare(const void *a, const void *b); int void_string_compare(const void *a, const void *b);
There are four comparision functions in total: one for comparing the
number field and the other for comparing the string field of the
struct Node in both the C++ style and C style.
Additionally, you will need to implement the dump_node utility function
which outputs the contents of each node start from n until nullptr is
reached:
// Utility function void dump_node(Node *n);
This may be useful for debugging.
The first sorting mode is to use the std::sort method in the stl.cpp
file. Because our struct List does not implement random access iterators,
we will need to copy the struct Nodes in the struct List into another
container such as an array. Once we have this copy, we can then call the
std::sort function on this copy with the appropriate comparison function.
Finally, we must update the links between the struct Nodes to reflect the
sorted order.
Hint: You should store struct Node * in your copy array.
Hint: Make sure you set the head of the struct List after you have
set the links of all the struct Nodes.
The second sorting mode is similar to the first, except you will use qsort
instead of std::sort in the qsort.cpp file. In fact, the code could be
nearly identical to stl_sort except for the line calling the appropriate
sorting function.
The third sorting mode is a custom implementation of merge sort in the
merge.cpp file. You are to implement a top-down version of the
algorithm:
// Public functions void merge_sort(List &l, bool numeric); // Internal functions Node *msort(Node *head, CompareFunction compare); void split(Node *head, Node *&left, Node *&right); Node *merge(Node *left, Node *right, CompareFunction compare);
merge_sort takes a struct List and whether or not the comparison should
be numeric and performs the top-down merge sort algorithm. This
function serves as a wrapper or helper function for the recursive msort
function.
msort is the recursive portion of the algorithm and calls split to
divide and calls merge to conquer. It returns the new head of
the list.
split is a helper function that splits the singly-linked list in half
by using the slow and faster pointer trick (aka [tortoise and hare]).
merge is a helper function that combines both the left and right
lists and returns the new head of the list.
Both of the custom merge sort and quick sort implementations must use the following divide-and-conquer strategy:
Node *msort(Node *head, CompareFunction compare) { // Handle base case // Divide into left and right sublists // Conquer left and right sublists // Combine left and right sublists }
There are many examples of merge sort on linked lists. For instance, GeeksForGeeks has an article about Merge Sort for Linked Lists. You should resist the temptation to look at such sources.
If you do look at those examples, remember that any code you submit must be your own and that you should understand everything you submit.
Moreover, in this case, the example from GeeksForGeeks is suboptimal as it
uses recursion to perform the merge. You will need an iterative version
of merge, otherwise, you run the risk of overflowing the stack on large
input sizes.
The fourth sorting mode is a custom implementation of quick sort in the
quick.cpp file. You are to implement a simple version of the algorithm
that uses the [Lomuto partition scheme] (in this case we choose the first
element as the pivot rather than the last):
// Public functions void quick_sort(List &l, bool numeric); // Internal functions Node *qsort(Node *head, CompareFunction compare) void partition(Node *head, Node *pivot, Node *&left, Node *&right, CompareFunction compare); Node *concatenate(Node *left, Node *right);
quick_sort takes a struct List and whether or not the comparison should
be numeric and performs the quick sort algorithm. This function serves
as a wrapper or helper function for the recursive qsort function.
qsort is the recursive portion of the algorithm and calls partition to
divide the list and calls concatenate to conquer. It returns the
new head of the list.
partition is a helper function that splits the singly-linked list into
two left and right lists such that all elements less than the pivot
are in the left side and everything else is on the right.
concatenate is a helper function that combines both the left and
right lists and returns the new head of the list.
Again, there are many examples of quick sort on linked lists. For instance, GeeksForGeeks has an article about QuickSort on Singly Linked Lists. You should resist the temptation to look at such sources.
If you do look at those examples, remember that any code you submit must be your own and that you should understand everything you submit.
Moreover, in this case, the example from GeeksForGeeks is much more complex than the proposed divide-and-conquer strategy shown above.
When you are finished implementing your lsort sorting utility answer the
following questions in the project02/README.md:
What is complexity of each of the sorting modes (ie. STL, QSORT, MERGE, QUICK)? For each method, briefly explain the best, average, and worst case complexities of the sorting mode in terms of Big-O notation.
Benchmark all four different sorting modes on files with the following number of integers:
10, 100, 1000, 10000, 100000, 1000000, 10000000
Record the elapsed time and memory consumption of each mode and file size in a Markdown table:
| Mode | Size | Elapsed Time | Memory | |---------|-------|---------------|-----------| | STL | 10 | 0 | 0 | | STL | 100 | 0 | 0 | | ... | ... | ... | ... | | QSORT | 10 | 0 | 0 | | ... | ... | ... | ... | | MERGE | 10 | 0 | 0 | | ... | ... | ... | ... | | QUICK | 10 | 0 | 0 | | ... | ... | ... | ... |
Hint: Use the provided measure program to get the elapsed time and memory consumption.
Hint: Use shuf or create a Python script to generate the input files.
Hint: Write a shell script to automate the benchmarking.
You may wish to create your test files in /tmp as that is faster and
larger than your AFS directory.
Likewise, you should redirect the output of lsort to /dev/null for
the benchmarking.
After you have performed your benchmark:
Discuss the relative performance of each sorting method and try to explain the differences.
What do these results reveal about the relationship between theoretical complexity and actual performance?
In your opinion, which sorting mode is the best? Justify your conclusion by examining the trade-offs for the chosen mode.
In addition to the questions, please provide a brief summary of each individual group members contributions to the project.
Your project will be scored based on the following metrics:
| Metric | Points |
|---|---|
lsort -m stl passes output test |
2 |
lsort -m qsort passes output test |
2 |
lsort -m merge passes output test |
3 |
lsort -m quick passes output test |
3 |
lsort passes memory test |
2 |
lsort passes time test |
2 |
README.md responses |
3 |
| Coding Style | 1 |
| Individual contributions | 2 |
| Total | 20 |
To test your programs, simply run make test. Your programs must correctly
process the input correctly, have no memory errors, and run in the allotted
time.
To submit your project, simply push your changes to your Project 02 repository. There is no need to create a Merge Request. Make sure you have enable access to your repository to all of the instructional staff.