Overview

The second project is to build a sorting utility called lsort in C++ that internally operates on linked lists. This program must implement the following sorting methods or modes:

  1. STL: This sorting method simply uses the std::sort function.

  2. qsort: This sorting method simply uses the qsort function.

  3. merge: This sorting method uses a custom implementation of the merge sort algorithm.

  4. quick: This sorting method uses a custom implementation of the quick sort algorithm.

This project is to be done in groups of 1 - 3 and is due midnight Friday September 30, 2016.

Reference

A reference implementation of the lsort sorting utility can be found at /afs/nd.edu/user15/pbui/pub/bin/lsort. Here is its usage message:

$ lsort -h
usage: lsort
    -m MODE   Sorting mode (stl, qsort, merge, quick)
    -n        Perform numerical ordering

As can be seen, lsort takes two optional arguments. The first is the -m flag, which lets the user select which sorting method to use. The second is the -n flag, which informs the program to treat the input values as numeric integers rather than textual strings.

As with most Unix filters, lsort reads from [standand input] and then outputs to standard output:

# Use the shuf program to generate 10 random numbers and then use lsort to order them
$ shuf -i 1-100 -n 10 -r | lsort -m quick
12
32
37
37
50
61
69
71
90
97

GitLab

To start this project, you must form a group of 1 - 3 people. One group member should fork the Project 02 repository on GitLab:

https://gitlab.com/nd-cse-30331-fa16/project02

Once this repository has been forked, follow the instructions from Reading 00 to:

  1. Make the repository private

  2. Configure access to the repository

    Make sure you add all the members of the team in addition to the instructional staff.

As with the programming challenges in this course, the project repository contains a Makefile, input, and output.{number,string} test files for automated building and testing of your program.

In addition to these files, the repository should also contain the following:

  1. measure.cpp: This is utility for measuring execution time and memory usage of an application.

  2. lsort.h: This is the project header file that contains the definitions for the Node and List structures, a few type definitions, and some function prototypes.

  3. main.cpp: This contains the main function of the the lsort utility and is in charge of parsing the command line, reading the input, dispatching the appropriate sorting method, and outputting the results.

  4. node.cpp: This contains comparison functions that will be used by the sorting methods to determine the order of two different Nodes.

  5. list.cpp: This contains the implementation of a simple [singly-linked list] that will contain the data read in from standard input.

  6. stl.cpp: This contains the implementation of the stl sorting mode which uses std::sort internally.

  7. qsort.cpp: This contains the implementation of the qsort sorting mode which uses qsort internally.

  8. merge.cpp: This contains a custom implementation of merge sort.

  9. quick.cpp: This contains a custom implementation of quick sort.

Node and List

The lsort.h header file contains the definition of the struct Node and struct List:

struct Node {
  std::string string;
  int         number;
  Node       *next;
};

struct List {
  Node       *head;
  size_t      size;

  List();
  ~List();

  void push_front(const std::string &s);
};

As can be seen, the struct List is a singly-linked list with a constructor, destructor, and a single push_front method. This struct List consists of struct Node entries that contain both string values, the corresponding number value, and a pointer to the next struct Node.

In list.cpp, you are to implement the struct List constructor, destructor, and push_front method. The push_front method should insert a new struct Node into the front of the list. To get the number value for the new struct Node, you should use the std::stoi function (if this conversion fails, you can default to 0 for the number value).

In node.cpp, you are to implement the struct Node comparison functions:

// C++ Style comparison function
bool node_number_compare(const Node *a, const Node *b);
bool node_string_compare(const Node *a, const Node *b);

// C Style comparison function
int  void_number_compare(const void *a, const void *b);
int  void_string_compare(const void *a, const void *b);

There are four comparision functions in total: one for comparing the number field and the other for comparing the string field of the struct Node in both the C++ style and C style.

Additionally, you will need to implement the dump_node utility function which outputs the contents of each node start from n until nullptr is reached:

// Utility function
void dump_node(Node *n);

This may be useful for debugging.

STL Sort

The first sorting mode is to use the std::sort method in the stl.cpp file. Because our struct List does not implement random access iterators, we will need to copy the struct Nodes in the struct List into another container such as an array. Once we have this copy, we can then call the std::sort function on this copy with the appropriate comparison function. Finally, we must update the links between the struct Nodes to reflect the sorted order.

Hint: You should store struct Node * in your copy array.

Hint: Make sure you set the head of the struct List after you have set the links of all the struct Nodes.

QSort

The second sorting mode is similar to the first, except you will use qsort instead of std::sort in the qsort.cpp file. In fact, the code could be nearly identical to stl_sort except for the line calling the appropriate sorting function.

Merge Sort

The third sorting mode is a custom implementation of merge sort in the merge.cpp file. You are to implement a top-down version of the algorithm:

// Public functions
void merge_sort(List &l, bool numeric);

// Internal functions
Node *msort(Node *head, CompareFunction compare);
void split(Node *head, Node *&left, Node *&right);
Node *merge(Node *left, Node *right, CompareFunction compare);

Both of the custom merge sort and quick sort implementations must use the following divide-and-conquer strategy:

Node *msort(Node *head, CompareFunction compare) {
    // Handle base case

    // Divide into left and right sublists

    // Conquer left and right sublists

    // Combine left and right sublists
}

Online Code

There are many examples of merge sort on linked lists. For instance, GeeksForGeeks has an article about Merge Sort for Linked Lists. You should resist the temptation to look at such sources.

If you do look at those examples, remember that any code you submit must be your own and that you should understand everything you submit.

Moreover, in this case, the example from GeeksForGeeks is suboptimal as it uses recursion to perform the merge. You will need an iterative version of merge, otherwise, you run the risk of overflowing the stack on large input sizes.

Quick Sort

The fourth sorting mode is a custom implementation of quick sort in the quick.cpp file. You are to implement a simple version of the algorithm that uses the [Lomuto partition scheme] (in this case we choose the first element as the pivot rather than the last):

// Public functions
void quick_sort(List &l, bool numeric);

// Internal functions
Node *qsort(Node *head, CompareFunction compare)
void partition(Node *head, Node *pivot, Node *&left, Node *&right, CompareFunction compare);
Node *concatenate(Node *left, Node *right);

Online Code

Again, there are many examples of quick sort on linked lists. For instance, GeeksForGeeks has an article about QuickSort on Singly Linked Lists. You should resist the temptation to look at such sources.

If you do look at those examples, remember that any code you submit must be your own and that you should understand everything you submit.

Moreover, in this case, the example from GeeksForGeeks is much more complex than the proposed divide-and-conquer strategy shown above.

Questions

When you are finished implementing your lsort sorting utility answer the following questions in the project02/README.md:

  1. What is complexity of each of the sorting modes (ie. STL, QSORT, MERGE, QUICK)? For each method, briefly explain the best, average, and worst case complexities of the sorting mode in terms of Big-O notation.

  2. Benchmark all four different sorting modes on files with the following number of integers:

    10, 100, 1000, 10000, 100000, 1000000, 10000000
    

    Record the elapsed time and memory consumption of each mode and file size in a Markdown table:

    | Mode    | Size  | Elapsed Time  | Memory    |
    |---------|-------|---------------|-----------|
    | STL     | 10    | 0             | 0         |
    | STL     | 100   | 0             | 0         |
    | ...     | ...   | ...           | ...       |
    | QSORT   | 10    | 0             | 0         |
    | ...     | ...   | ...           | ...       |
    | MERGE   | 10    | 0             | 0         |
    | ...     | ...   | ...           | ...       |
    | QUICK   | 10    | 0             | 0         |
    | ...     | ...   | ...           | ...       |
    

    Hint: Use the provided measure program to get the elapsed time and memory consumption.

    Hint: Use shuf or create a Python script to generate the input files.

    Hint: Write a shell script to automate the benchmarking.

    Scratch Space and Output

    You may wish to create your test files in /tmp as that is faster and larger than your AFS directory.

    Likewise, you should redirect the output of lsort to /dev/null for the benchmarking.

  3. After you have performed your benchmark:

    • Discuss the relative performance of each sorting method and try to explain the differences.

    • What do these results reveal about the relationship between theoretical complexity and actual performance?

    • In your opinion, which sorting mode is the best? Justify your conclusion by examining the trade-offs for the chosen mode.

In addition to the questions, please provide a brief summary of each individual group members contributions to the project.

Rubric

Your project will be scored based on the following metrics:

Metric Points
lsort -m stl passes output test 2
lsort -m qsort passes output test 2
lsort -m merge passes output test 3
lsort -m quick passes output test 3
lsort passes memory test 2
lsort passes time test 2
README.md responses 3
Coding Style 1
Individual contributions 2
Total 20

To test your programs, simply run make test. Your programs must correctly process the input correctly, have no memory errors, and run in the allotted time.

Submission

To submit your project, simply push your changes to your Project 02 repository. There is no need to create a Merge Request. Make sure you have enable access to your repository to all of the instructional staff.