Project 02: List Sort

Overview

The second project is to build a sorting utility called lsort in C++ that internally operates on linked lists. This program must implement the following sorting methods or modes:

STL: This sorting method simply uses the std::sort function.
qsort: This sorting method simply uses the qsort function.
merge: This sorting method uses a custom implementation of the merge sort algorithm.
quick: This sorting method uses a custom implementation of the quick sort algorithm.

This project is to be done in groups of 1 - 3 and is due midnight Friday September 30, 2016.

Reference

A reference implementation of the lsort sorting utility can be found at /afs/nd.edu/user15/pbui/pub/bin/lsort. Here is its usage message:

$ lsort -h
usage: lsort
    -m MODE   Sorting mode (stl, qsort, merge, quick)
    -n        Perform numerical ordering

As can be seen, lsort takes two optional arguments. The first is the -m flag, which lets the user select which sorting method to use. The second is the -n flag, which informs the program to treat the input values as numeric integers rather than textual strings.

As with most Unix filters, lsort reads from [standand input] and then outputs to standard output:

# Use the shuf program to generate 10 random numbers and then use lsort to order them
$ shuf -i 1-100 -n 10 -r | lsort -m quick
12
32
37
37
50
61
69
71
90
97

GitLab

To start this project, you must form a group of 1 - 3 people. One group member should fork the Project 02 repository on GitLab:

https://gitlab.com/nd-cse-30331-fa16/project02

Once this repository has been forked, follow the instructions from Reading 00 to:

Make the repository private
Configure access to the repository

Make sure you add all the members of the team in addition to the instructional staff.

As with the programming challenges in this course, the project repository contains a Makefile, input, and output.{number,string} test files for automated building and testing of your program.

In addition to these files, the repository should also contain the following:

measure.cpp: This is utility for measuring execution time and memory usage of an application.
lsort.h: This is the project header file that contains the definitions for the Node and List structures, a few type definitions, and some function prototypes.
main.cpp: This contains the main function of the the lsort utility and is in charge of parsing the command line, reading the input, dispatching the appropriate sorting method, and outputting the results.
node.cpp: This contains comparison functions that will be used by the sorting methods to determine the order of two different Nodes.
list.cpp: This contains the implementation of a simple [singly-linked list] that will contain the data read in from standard input.
stl.cpp: This contains the implementation of the stl sorting mode which uses std::sort internally.
qsort.cpp: This contains the implementation of the qsort sorting mode which uses qsort internally.
merge.cpp: This contains a custom implementation of merge sort.
quick.cpp: This contains a custom implementation of quick sort.

Node and List

The lsort.h header file contains the definition of the struct Node and struct List:

struct Node {
  std::string string;
  int         number;
  Node       *next;
};

struct List {
  Node       *head;
  size_t      size;

  List();
  ~List();

  void push_front(const std::string &s);
};

As can be seen, the struct List is a singly-linked list with a constructor, destructor, and a single push_front method. This struct List consists of struct Node entries that contain both string values, the corresponding number value, and a pointer to the next struct Node.

In list.cpp, you are to implement the struct List constructor, destructor, and push_front method. The push_front method should insert a new struct Node into the front of the list. To get the number value for the new struct Node, you should use the std::stoi function (if this conversion fails, you can default to 0 for the number value).

In node.cpp, you are to implement the struct Node comparison functions:

// C++ Style comparison function
bool node_number_compare(const Node *a, const Node *b);
bool node_string_compare(const Node *a, const Node *b);

// C Style comparison function
int  void_number_compare(const void *a, const void *b);
int  void_string_compare(const void *a, const void *b);

There are four comparision functions in total: one for comparing the number field and the other for comparing the string field of the struct Node in both the C++ style and C style.

Additionally, you will need to implement the dump_node utility function which outputs the contents of each node start from n until nullptr is reached:

// Utility function
void dump_node(Node *n);

This may be useful for debugging.

STL Sort

The first sorting mode is to use the std::sort method in the stl.cpp file. Because our struct List does not implement random access iterators, we will need to copy the struct Nodes in the struct List into another container such as an array. Once we have this copy, we can then call the std::sort function on this copy with the appropriate comparison function. Finally, we must update the links between the struct Nodes to reflect the sorted order.

Hint: You should store struct Node * in your copy array.

Hint: Make sure you set the head of the struct List after you have set the links of all the struct Nodes.

QSort

The second sorting mode is similar to the first, except you will use qsort instead of std::sort in the qsort.cpp file. In fact, the code could be nearly identical to stl_sort except for the line calling the appropriate sorting function.

Merge Sort

The third sorting mode is a custom implementation of merge sort in the merge.cpp file. You are to implement a top-down version of the algorithm:

// Public functions
void merge_sort(List &l, bool numeric);

// Internal functions
Node *msort(Node *head, CompareFunction compare);
void split(Node *head, Node *&left, Node *&right);
Node *merge(Node *left, Node *right, CompareFunction compare);

merge_sort takes a struct List and whether or not the comparison should be numeric and performs the top-down merge sort algorithm. This function serves as a wrapper or helper function for the recursive msort function.
msort is the recursive portion of the algorithm and calls split to divide and calls merge to conquer. It returns the new head of the list.
split is a helper function that splits the singly-linked list in half by using the slow and faster pointer trick (aka [tortoise and hare]).
merge is a helper function that combines both the left and right lists and returns the new head of the list.

Both of the custom merge sort and quick sort implementations must use the following divide-and-conquer strategy:

Node *msort(Node *head, CompareFunction compare) {
    // Handle base case

    // Divide into left and right sublists

    // Conquer left and right sublists

    // Combine left and right sublists
}

Online Code

There are many examples of merge sort on linked lists. For instance, GeeksForGeeks has an article about Merge Sort for Linked Lists. You should resist the temptation to look at such sources.

If you do look at those examples, remember that any code you submit must be your own and that you should understand everything you submit.

Moreover, in this case, the example from GeeksForGeeks is suboptimal as it uses recursion to perform the merge. You will need an iterative version of merge, otherwise, you run the risk of overflowing the stack on large input sizes.

Quick Sort

The fourth sorting mode is a custom implementation of quick sort in the quick.cpp file. You are to implement a simple version of the algorithm that uses the [Lomuto partition scheme] (in this case we choose the first element as the pivot rather than the last):

// Public functions
void quick_sort(List &l, bool numeric);

// Internal functions
Node *qsort(Node *head, CompareFunction compare)
void partition(Node *head, Node *pivot, Node *&left, Node *&right, CompareFunction compare);
Node *concatenate(Node *left, Node *right);

quick_sort takes a struct List and whether or not the comparison should be numeric and performs the quick sort algorithm. This function serves as a wrapper or helper function for the recursive qsort function.
qsort is the recursive portion of the algorithm and calls partition to divide the list and calls concatenate to conquer. It returns the new head of the list.
partition is a helper function that splits the singly-linked list into two left and right lists such that all elements less than the pivot are in the left side and everything else is on the right.
concatenate is a helper function that combines both the left and right lists and returns the new head of the list.

Online Code

Again, there are many examples of quick sort on linked lists. For instance, GeeksForGeeks has an article about QuickSort on Singly Linked Lists. You should resist the temptation to look at such sources.

If you do look at those examples, remember that any code you submit must be your own and that you should understand everything you submit.

Moreover, in this case, the example from GeeksForGeeks is much more complex than the proposed divide-and-conquer strategy shown above.

Questions

When you are finished implementing your lsort sorting utility answer the following questions in the project02/README.md:

What is complexity of each of the sorting modes (ie. STL, QSORT, MERGE, QUICK)? For each method, briefly explain the best, average, and worst case complexities of the sorting mode in terms of Big-O notation.
Benchmark all four different sorting modes on files with the following number of integers:
```
10, 100, 1000, 10000, 100000, 1000000, 10000000
```
Record the elapsed time and memory consumption of each mode and file size in a Markdown table:
```
| Mode    | Size  | Elapsed Time  | Memory    |
|---------|-------|---------------|-----------|
| STL     | 10    | 0             | 0         |
| STL     | 100   | 0             | 0         |
| ...     | ...   | ...           | ...       |
| QSORT   | 10    | 0             | 0         |
| ...     | ...   | ...           | ...       |
| MERGE   | 10    | 0             | 0         |
| ...     | ...   | ...           | ...       |
| QUICK   | 10    | 0             | 0         |
| ...     | ...   | ...           | ...       |
```
Hint: Use the provided measure program to get the elapsed time and memory consumption.

Hint: Use shuf or create a Python script to generate the input files.

Hint: Write a shell script to automate the benchmarking.

Scratch Space and Output

You may wish to create your test files in /tmp as that is faster and larger than your AFS directory.

Likewise, you should redirect the output of lsort to /dev/null for the benchmarking.
After you have performed your benchmark:
- Discuss the relative performance of each sorting method and try to explain the differences.
- What do these results reveal about the relationship between theoretical complexity and actual performance?
- In your opinion, which sorting mode is the best? Justify your conclusion by examining the trade-offs for the chosen mode.

In addition to the questions, please provide a brief summary of each individual group members contributions to the project.

Rubric

Your project will be scored based on the following metrics:

Metric	Points
`lsort -m stl` passes output test	2
`lsort -m qsort` passes output test	2
`lsort -m merge` passes output test	3
`lsort -m quick` passes output test	3
`lsort` passes memory test	2
`lsort` passes time test	2
`README.md` responses	3
Coding Style	1
Individual contributions	2
Total	20

To test your programs, simply run make test. Your programs must correctly process the input correctly, have no memory errors, and run in the allotted time.

Submission

To submit your project, simply push your changes to your Project 02 repository. There is no need to create a Merge Request. Make sure you have enable access to your repository to all of the instructional staff.

Project 02: List Sort

Overview

Reference

GitLab

Node and List

STL Sort

QSort

Merge Sort

Online Code

Quick Sort

Online Code

Questions

Scratch Space and Output

Rubric

Submission