The second project is to build a sorting utility called lsort
in C++ that
internally operates on linked lists. This program must implement the
following sorting methods or modes:
STL: This sorting method simply uses the std::sort function.
qsort: This sorting method simply uses the qsort function.
merge: This sorting method uses a custom implementation of the merge sort algorithm.
quick: This sorting method uses a custom implementation of the quick sort algorithm.
This project is to be done in groups of 1 - 3 and is due midnight Friday September 30, 2016.
A reference implementation of the lsort
sorting utility can be found at
/afs/nd.edu/user15/pbui/pub/bin/lsort
. Here is its usage message:
$ lsort -h usage: lsort -m MODE Sorting mode (stl, qsort, merge, quick) -n Perform numerical ordering
As can be seen, lsort
takes two optional arguments. The first is the -m
flag, which lets the user select which sorting method to use. The second is
the -n
flag, which informs the program to treat the input values as numeric
integers rather than textual strings.
As with most Unix filters, lsort
reads from [standand input] and then
outputs to standard output:
# Use the shuf program to generate 10 random numbers and then use lsort to order them $ shuf -i 1-100 -n 10 -r | lsort -m quick 12 32 37 37 50 61 69 71 90 97
To start this project, you must form a group of 1 - 3 people. One group member should fork the Project 02 repository on GitLab:
https://gitlab.com/nd-cse-30331-fa16/project02
Once this repository has been forked, follow the instructions from Reading 00 to:
Make the repository private
Configure access to the repository
Make sure you add all the members of the team in addition to the instructional staff.
As with the programming challenges in this course, the project repository
contains a Makefile
, input
, and output.{number,string}
test files for
automated building and testing of your program.
In addition to these files, the repository should also contain the following:
measure.cpp
: This is utility for measuring execution time and memory
usage of an application.
lsort.h
: This is the project header file that contains the definitions
for the Node
and List
structures, a few type definitions, and some
function prototypes.
main.cpp
: This contains the main function of the the lsort
utility and
is in charge of parsing the command line, reading the input, dispatching
the appropriate sorting method, and outputting the results.
node.cpp
: This contains comparison functions that will be used by the
sorting methods to determine the order of two different Nodes
.
list.cpp
: This contains the implementation of a simple [singly-linked
list] that will contain the data read in from standard input.
stl.cpp
: This contains the implementation of the stl
sorting mode
which uses std::sort internally.
qsort.cpp
: This contains the implementation of the qsort
sorting mode
which uses qsort internally.
merge.cpp
: This contains a custom implementation of merge sort.
quick.cpp
: This contains a custom implementation of quick sort.
The lsort.h
header file contains the definition of the struct Node
and struct List
:
struct Node { std::string string; int number; Node *next; }; struct List { Node *head; size_t size; List(); ~List(); void push_front(const std::string &s); };
As can be seen, the struct List
is a singly-linked list with a
constructor, destructor, and a single push_front method. This
struct List
consists of struct Node
entries that contain both string
values, the corresponding number value, and a pointer to the next struct
Node
.
In list.cpp
, you are to implement the struct List
constructor,
destructor, and push_front method. The push_front method should
insert a new struct Node
into the front of the list. To get the number
value for the new struct Node
, you should use the std::stoi function (if
this conversion fails, you can default to 0
for the number value).
In node.cpp
, you are to implement the struct Node
comparison functions:
// C++ Style comparison function bool node_number_compare(const Node *a, const Node *b); bool node_string_compare(const Node *a, const Node *b); // C Style comparison function int void_number_compare(const void *a, const void *b); int void_string_compare(const void *a, const void *b);
There are four comparision functions in total: one for comparing the
number field and the other for comparing the string field of the
struct Node
in both the C++ style and C style.
Additionally, you will need to implement the dump_node
utility function
which outputs the contents of each node start from n
until nullptr
is
reached:
// Utility function void dump_node(Node *n);
This may be useful for debugging.
The first sorting mode is to use the std::sort method in the stl.cpp
file. Because our struct List
does not implement random access iterators,
we will need to copy the struct Node
s in the struct List
into another
container such as an array. Once we have this copy, we can then call the
std::sort function on this copy with the appropriate comparison function.
Finally, we must update the links between the struct Node
s to reflect the
sorted order.
Hint: You should store struct Node *
in your copy array.
Hint: Make sure you set the head of the struct List
after you have
set the links of all the struct Node
s.
The second sorting mode is similar to the first, except you will use qsort
instead of std::sort in the qsort.cpp
file. In fact, the code could be
nearly identical to stl_sort
except for the line calling the appropriate
sorting function.
The third sorting mode is a custom implementation of merge sort in the
merge.cpp
file. You are to implement a top-down version of the
algorithm:
// Public functions void merge_sort(List &l, bool numeric); // Internal functions Node *msort(Node *head, CompareFunction compare); void split(Node *head, Node *&left, Node *&right); Node *merge(Node *left, Node *right, CompareFunction compare);
merge_sort
takes a struct List
and whether or not the comparison should
be numeric
and performs the top-down merge sort algorithm. This
function serves as a wrapper or helper function for the recursive msort
function.
msort
is the recursive portion of the algorithm and calls split
to
divide and calls merge
to conquer. It returns the new head
of
the list.
split
is a helper function that splits the singly-linked list in half
by using the slow and faster pointer trick (aka [tortoise and hare]).
merge
is a helper function that combines both the left
and right
lists and returns the new head
of the list.
Both of the custom merge sort and quick sort implementations must use the following divide-and-conquer strategy:
Node *msort(Node *head, CompareFunction compare) { // Handle base case // Divide into left and right sublists // Conquer left and right sublists // Combine left and right sublists }
There are many examples of merge sort on linked lists. For instance, GeeksForGeeks has an article about Merge Sort for Linked Lists. You should resist the temptation to look at such sources.
If you do look at those examples, remember that any code you submit must be your own and that you should understand everything you submit.
Moreover, in this case, the example from GeeksForGeeks is suboptimal as it
uses recursion to perform the merge. You will need an iterative version
of merge
, otherwise, you run the risk of overflowing the stack on large
input sizes.
The fourth sorting mode is a custom implementation of quick sort in the
quick.cpp
file. You are to implement a simple version of the algorithm
that uses the [Lomuto partition scheme] (in this case we choose the first
element as the pivot rather than the last):
// Public functions void quick_sort(List &l, bool numeric); // Internal functions Node *qsort(Node *head, CompareFunction compare) void partition(Node *head, Node *pivot, Node *&left, Node *&right, CompareFunction compare); Node *concatenate(Node *left, Node *right);
quick_sort
takes a struct List
and whether or not the comparison should
be numeric
and performs the quick sort algorithm. This function serves
as a wrapper or helper function for the recursive qsort
function.
qsort
is the recursive portion of the algorithm and calls partition
to
divide the list and calls concatenate
to conquer. It returns the
new head
of the list.
partition
is a helper function that splits the singly-linked list into
two left
and right
lists such that all elements less than the pivot
are in the left
side and everything else is on the right
.
concatenate
is a helper function that combines both the left
and
right
lists and returns the new head
of the list.
Again, there are many examples of quick sort on linked lists. For instance, GeeksForGeeks has an article about QuickSort on Singly Linked Lists. You should resist the temptation to look at such sources.
If you do look at those examples, remember that any code you submit must be your own and that you should understand everything you submit.
Moreover, in this case, the example from GeeksForGeeks is much more complex than the proposed divide-and-conquer strategy shown above.
When you are finished implementing your lsort
sorting utility answer the
following questions in the project02/README.md
:
What is complexity of each of the sorting modes (ie. STL, QSORT, MERGE, QUICK)? For each method, briefly explain the best, average, and worst case complexities of the sorting mode in terms of Big-O notation.
Benchmark all four different sorting modes on files with the following number of integers:
10, 100, 1000, 10000, 100000, 1000000, 10000000
Record the elapsed time and memory consumption of each mode and file size in a Markdown table:
| Mode | Size | Elapsed Time | Memory | |---------|-------|---------------|-----------| | STL | 10 | 0 | 0 | | STL | 100 | 0 | 0 | | ... | ... | ... | ... | | QSORT | 10 | 0 | 0 | | ... | ... | ... | ... | | MERGE | 10 | 0 | 0 | | ... | ... | ... | ... | | QUICK | 10 | 0 | 0 | | ... | ... | ... | ... |
Hint: Use the provided measure program to get the elapsed time and memory consumption.
Hint: Use shuf or create a Python script to generate the input files.
Hint: Write a shell script to automate the benchmarking.
You may wish to create your test files in /tmp
as that is faster and
larger than your AFS directory.
Likewise, you should redirect the output of lsort
to /dev/null
for
the benchmarking.
After you have performed your benchmark:
Discuss the relative performance of each sorting method and try to explain the differences.
What do these results reveal about the relationship between theoretical complexity and actual performance?
In your opinion, which sorting mode is the best? Justify your conclusion by examining the trade-offs for the chosen mode.
In addition to the questions, please provide a brief summary of each individual group members contributions to the project.
Your project will be scored based on the following metrics:
Metric | Points |
---|---|
lsort -m stl passes output test |
2 |
lsort -m qsort passes output test |
2 |
lsort -m merge passes output test |
3 |
lsort -m quick passes output test |
3 |
lsort passes memory test |
2 |
lsort passes time test |
2 |
README.md responses |
3 |
Coding Style | 1 |
Individual contributions | 2 |
Total | 20 |
To test your programs, simply run make test
. Your programs must correctly
process the input
correctly, have no memory errors, and run in the allotted
time.
To submit your project, simply push your changes to your Project 02 repository. There is no need to create a Merge Request. Make sure you have enable access to your repository to all of the instructional staff.