Homework 07: Dynamic Memory

The goal of this homework assignment is to allow you to practice using dynamic memory in C. The first activity involves building a singly-linked list library, while the second activity requires you to use this library to build and benchmark a lsort utility similar to sort.

For this assignment, record your source code and any responses to the following activities in the homework07 folder of your assignments GitLab repository and push your work by 11:59 PM Friday, March 31, 2017.

Activity 1: Singly-Linked List Library (12 Points)

As discussed in class, arrays in C are fast for random access, but have of the limitation of being fixed sized. When we need a data structure that can grow organically at run-time, we can turn to the singly-linked list.

For this first activity, you are to create singly-linked list library, liblist.a, which contains functions for creating a list, deleting a list, adding elements to a list, dumping a list, converting a list to an array, sorting a list, and reversing a list.

Singly-Linked List

As explained in your data structures class, a singly-linked list is a data structure which consists of linked nodes. That is, it is a sequence container of node structures where each node consists of a value and a reference to the next node in the sequence as shown below:

For our library, we will be using a singly-linked list with a tail pointer and a stored size:

As with a normal singly-linked list, we will have a series of individual node structures which consist of a value and a reference to the next node.

In the diagram above, we have three individual node structures holding the values 5, 7, and 4. The 5 node has a reference to the 7 node, while the 7 node has a reference to the 4 node. Because there is nothing after the 4 node, the 4 node's next node reference is simply the NULL pointer.

This is useful because it allows us to iterate through our sequence using the following pseudo-code:
```
for (curr = head; curr != NULL; curr = curr.next)
    visit(curr)
```
To keep track of our nodes, we have a list structure that tracks the head of the list, the tail of the list, and the size of the list.

In the diagram above, head references the first node structure, which is the 5 node. Similarly, tail references the last node structure, which is the 4 node. Finally, size has the value of 3 since the sequence container has 3 elements.
Because our singly-linked list has both a head and tail pointer, we can insert new node structures to either the front or back of the sequence container in constant time (ie. O(1)).
Because we maintain the size of the list in the list structure, we do not need to iterate through the sequence container to compute this value and thus can report the number of elements in constant time.

For this first activity, you are to implement this version of a singly-linked list.

Starter Code

To help you get started, the instructor has provided you with the following starter code:

# Go to assignments repository
$ cd path/to/assignments/repository

# Download starter code tarball
$ curl -O http://www3.nd.edu/~pbui/teaching/cse.20289.sp17/static/tar/homework07.tar.gz

# Extract starter code tarball
$ tar xzvf homework07.tar.gz

Once downloaded and extracted, you should see the following files in your homework07 directory:

homework07
    \_ Makefile        # This is the Makefile for building all the project artifacts
    \_ README.md       # This is the README file for recording your responses
    \_ list.c          # This is the C99 implementation file for the list structure
    \_ list.h          # This is the C99 header file for the list structure
    \_ lsort.c         # This is the C99 implementation file for the list sorting utility
    \_ node.c          # This is the C99 implementation file for the node structure
    \_ node.h          # This is the C99 header file for the node structure
    \_ test_list.c     # This is the C99 implementation file for the list structure test program
    \_ test_lsort.sh   # This is the shell script for testing the list sorting utility
    \_ test_node.c     # This is the C99 implementation file for the node structure test program

The details on what you need to implement are described in the following sections.

Task 0: `node.h` and `list.h`

The node.h file is the header file for the node structure used in our implementation of a singly-linked list. Besides a few function prototypes, this header defines a struct node as:

struct node {
    char  *string;
    int    number;
    struct node *next;
};

This means that each node consists of a string value , its corresponding integer value, and a reference to the next node.

It also contains a type definition node_compare that defines a type for comparison functions on struct nodes:

typedef int   (*node_compare)(const void *a, const void *b);

This will allow us to define functions which accept as node comparison functions such as node_compare_string and node_compare_number as arguments.

The list.h file is the header file for the list structure used in our implementation of a singly-linked list. Besides a few function prototypes, this header defines a struct list as:

struct list {
    struct node *head;
    struct node *tail;
    size_t       size;
};

This reflects the design describe above. Each struct list has a **reference to the head of the list (ie. first element), the tail of the list (ie. last element), and the size of the list (ie. number of elements).

For this task, you do not need to modify these header files. Instead, you should review them and ensure you understand the provided code.

Task 1: `Makefile`

Once again, the Makefile contains all the rules or recipes for building the project artifacts (e.g. liblist.a, lsort, etc.). Although the provided Makefile contains most of the variable definitions and test recipes, you must add the appropriate rules for liblist.a, test_node, test_list, lsort, and any intermediate objects. The dependencies for these targets are shown in the DAG below:

Note: Although we are producing a static library in the form of liblist.a, we will not be statically linking our executables. Instead, we will use the liblist.a as another object file when we link our executables.

Makefile Variables

You must use the CC, CFLAGS, LD, LDFLAGS, AR, and ARFLAGS variables when appropriate in your rules.

Once you have a working Makefile, you should be able to run the following commands:

# Build all TARGETS
$ make
Compiling lsort.o...
Compiling node.o...
Compiling list.o...
list.c:151:15: warning: 'merge' defined but not used [-Wunused-function]
struct node * merge(struct node *left, struct node *right, node_compare f) {
                  ^~~~~
list.c:138:7: warning: 'split' defined but not used [-Wunused-function]
void  split(struct node *head, struct node **left, struct node **right) {
          ^~~~~
list.c:125:15: warning: 'msort' defined but not used [-Wunused-function]
struct node * msort(struct node *head, node_compare f) {
                  ^~~~~
list.c:102:14: warning: 'reverse' defined but not used [-Wunused-function]
struct node* reverse(struct node *curr, struct node *prev) {
             ^~~~~~~
Linking liblist.a...
Linking lsort...

# Run all tests
$ make test
Compiling test_node.o...
Linking test_node...
Testing node...
Segmentation fault
...

Depending on your compiler, you may see some warnings with the initial starter code. Likewise, the test programs will all fail in some fashion.

Note: To display messages rather than showing the full command you are executing for each recipe, you can use the following approach:

target:   source
        @echo "Action $@..."
        @command $@ $<

Prefixing a command in the action portion of a recipe with @ will suppress displaying the command (only the output will be displayed). If you wish to actually see the commands, you can temporarily display them by running make with the -n flag:

# Simulate build with tracing output
$ make -n
echo Compiling lsort.o...
gcc -g -gdwarf-2 -Wall -std=gnu99 -c -o lsort.o lsort.c
echo Compiling node.o...
gcc -g -gdwarf-2 -Wall -std=gnu99 -c -o node.o node.c
echo Compiling list.o...
gcc -g -gdwarf-2 -Wall -std=gnu99 -c -o list.o list.c
echo Linking liblist.a...
ar rcs liblist.a node.o list.o
echo Linking lsort...
gcc -L. -o lsort lsort.o liblist.a

Note: the -n has make simulate a build and display the actions it would take. It does not actually perform the actions.

Task 2: `node.c`

The node.c file contains the C99 implementation for the node structure. For this task, you will need to implement the following functions:

struct node * node_create(char *string, struct node *next)

This function allocates a new struct node, sets its string, number, and next fields, and returns the allocated structure.

Notes:
- You must check if memory allocation fails.
- You must allocate and copy the string (consider strdup).
- The number value comes from converting the string to an integer (consider use either atoi or strtol).
struct node * node_delete(struct node *n, bool recursive)

This function deallocates the given struct node (including the string field). If recursive is true this function will also deallocate the next struct node and any of its successors.

Notes: Althought the recursive parameter implies you should use recursion, you may wish to consider using an iterative implementation instead to avoid stack overflow.
void node_dump(struct node *n, FILE *stream)

This function prints the struct node information to the stream in the following format: "Node{string, number, next}".
int node_compare_number(const void *a, const void *b)

This function is used by sorting functions such as qsort to compare two struct nodes by their number fields.

Note: Although the parameters are const void *, they are really struct node**. Since C lacks templates, it has to use void * to represent any type. Sorting implementations such as qsort will pass in &element to the comparison function, meaning each parameter is really a struct node** since each element is originally a struce node*.

Hint: You can cast the void * parameters to an appropriate type.
int node_compare_string(const void *a, const void *b)

This function is used by sorting functions such as qsort to compare two struct nodes by their string fields.

As you implement node.c, you can test it by running the test-node target:

# Run test-node target
$ make test-node
Compiling test_node.o...
Compiling node.o...
Compiling list.o...
Linking liblist.a...
Linking test_node...
Testing node...

# View log
$ cat test_node.log

# Run test_node manually
$ ./test_node

These tests run on the principle of silence is golden, which means you will only get error messages when something fails (it will not say success or ok).

Feel free to use valgrind to gdb to track down any memory errors.

Task 3: `list.c`

The list.c file contains the C99 implementation for the list structure. For this task, you will need to implement the following functions:

struct list * list_create()

This function allocates a new struct list, initializes its fields, and returns the allocate structure.
struct list * list_delete(struct list *l)

This function deallocates the given struct list.
void list_push_front(struct list *l, char *s)

This function adds a new struct node with the given s string value to the front of the struct list.

Notes: Be sure to update the head and tail of the struct list appropriately. Also, you should use the node_create function.
void list_push_back(struct list *l, char *s)

This function adds a new struct node with the given s string value to the back of the struct list.

Notes: Be sure to update the head and tail of the struct list appropriately. Also, you should use the node_create function.
void list_dump(struct list *l, FILE *stream)

This function iterates through the struct list and calls node_dump on each element.
struct node ** list_to_array(struct list *l)

This function converts the struct list into an array of struct node pointers.

Notes: You should not create new nodes; rather, you should simply have the pointers in the array reference the existing nodes in the struct list.
void list_qsort(struct list *l, node_compare f)

This function uses the qsort function to sort the struct list.

Hints:
- You should use the list_to_array function to create an array.
- Remember to update the next fields of each element after sorting.
- Remember to update the head and tail fields of the struct list.
void list_reverse(struct list *l)

This function reverses the struct list recursively using the internal reverse function.

Hints:
- Draw a picture on a piece of paper and do this by hand first.
- reverse takes in the current node and what its previous node is. Because we are reversing the list, we want to end up saying the the next field of the current node is the previous node.
- Remember to update the head and tail fields of the struct list.
void list_msort(struct list *l, node_compare f)

This function performs merge sort, which is a recursive divide and conquer sorting algorithm, on the struct list using the internal msort, split, and merge functions.

Hints:
- Draw a picture on a piece of paper and do this by hand first.
- The msort function has the following pseudo-code:
```
# Divide
split(head, &left, &right)

# Conquer
left  = msort(left, f)
right = msort(right, f)

# Combine
head = merge(left, right, f);
```
- The split function divides the provided list into left and right sublists by using the fast and slow pointer trick. Be sure to NULL terminate your sublists.
- The merge function combines two sublists into a single ordered list.
- Remember to update the head and tail fields of the struct list.
Online Code

There are many examples of merge sort on linked lists. For instance, GeeksForGeeks has an article about Merge Sort for Linked Lists. You should resist the temptation to look at such sources.

If you do look at those examples, remember that any code you submit must be your own and that you should understand everything you submit.

Moreover, in this case, the example from GeeksForGeeks is suboptimal as it uses recursion to perform the merge. You will need an iterative version of merge, otherwise, you run the risk of overflowing the stack on large input sizes.

As you implement list.c, you can test it by running the test-list target:

# Run test-list target
$ make test-list
Compiling test_list.o...
Linking test_list...
Testing list...

# View log
$ cat test_list.log

# Run test_list manually
$ ./test_list
Push front...
Node{superman, 0, 0x13e4500}
Node{wolverine, 0, 0x13e44c0}
Node{batman, 0, 0x13e4480}
Node{deadpool, 0, 0x13e4440}
Node{spider-man, 0, 0x0}
Push back...
Node{spider-man, 0, 0x13e4500}
Node{deadpool, 0, 0x13e44c0}
Node{batman, 0, 0x13e4480}
Node{wolverine, 0, 0x13e4440}
Node{superman, 0, 0x0}
QSort...
Node{batman, 0, 0x13e4500}
Node{deadpool, 0, 0x13e4540}
Node{spider-man, 0, 0x13e4440}
Node{superman, 0, 0x13e4480}
Node{wolverine, 0, 0x0}
Reversing...
Node{wolverine, 0, 0x13e4440}
Node{superman, 0, 0x13e4540}
Node{spider-man, 0, 0x13e4500}
Node{deadpool, 0, 0x13e44c0}
Node{batman, 0, 0x0}
MSort...
Node{batman, 0, 0x13e4500}
Node{deadpool, 0, 0x13e4540}
Node{spider-man, 0, 0x13e4440}
Node{superman, 0, 0x13e4480}
Node{wolverine, 0, 0x0}

Task 4: `README.md`

In your README.md, respond to the following prompts:

Describe what memory you allocated in node_create and what memory you deallocated in node_delete. How did you handle the recursive flag?
Describe what memory you allocated in list_create and what memory you deallocated in list_delete. How did you handle internal struct nodes?
Describe how your list_qsort function works and analyze its time complexity and space complexity in terms of Big-O notation (both average and worst case).
Describe how your list_reverse function works and analyze its time complexity and space complexity in terms of Big-O notation (both average and worst case).
Describe how your list_msort function works and analyze its time complexity and space complexity in terms of Big-O notation (both average and worst case).

Activity 2: List Sort (3 Points)

Once you have a working list library, you are to complete the lsort.c program:

# Display usage
$ ./lsort -h
Usage: ./lsort
  -n   Numerical sort
  -q   Quick sort

# Perform string sort using merge sort
$ shuf -i1-10 -n 10 | ./lsort
1
10
2
3
4
5
6
7
8
9

# Perform numeric sort using merge sort
$ shuf -i1-10 -n 10 | ./lsort -n
1
2
3
4
5
6
7
8
9
10

# Perform string sort using quick sort
$ shuf -i1-10 -n 10 | ./lsort -q
1
10
2
3
4
5
6
7
8
9

# Perform numeric sort using quick sort
$ shuf -i1-10 -n 10 | ./lsort -n -q
1
2
3
4
5
6
7
8
9
10

The lsort program reads in data from standard input one line at a time, builds a singly-linked list, and then uses either qsort or your custom merge sort to order the data. The user may specify either a string or numerical based ordering based on the the -n flag.

Task 1: `lsort.c`

The lsort.c file is contains the C99 implementation of the list sorting tool described above.

In addition to implementing command line parsing in the main function and you will need to implement the lsort function:

void lsort(FILE *stream, bool numeric, bool quicksort)

This function reads one line at a time from the stream and builds a singly-linked list. It then sorts the list using list_qsort if quicksort is true, otherwise it uses the list_msort. To compare elements, it uses node_compare_number if numeric is true, otherwise it uses node_compare_string. Once the list is sorted, this function will print the string field of each element in the list.

Task 2: `test_lsort.sh`

As you implement lsort.c, you can test it by running the test-list target:

# Run test-lsort target
$ make test-lsort
make test-lsort
Compiling lsort.o...
Compiling node.o...
Compiling list.o...
Linking liblist.a...
Linking lsort...
Testing lsort...
Testing lsort output (strings & msort)...
Testing lsort output (numbers & msort)...
Testing lsort output (strings & qsort)...
Testing lsort output (numbers & qsort)...
Testing lsort memory (strings & msort)...
Testing lsort memory (numbers & msort)...
Testing lsort memory (strings & qsort)...
Testing lsort memory (numbers & qsort)...

# View log
$ cat test_lsort.log

# Run test_lsort.sh manually
$ ./test_lsort.sh
Testing lsort output (strings & msort)...
Testing lsort output (numbers & msort)...
Testing lsort output (strings & qsort)...
Testing lsort output (numbers & qsort)...
Testing lsort memory (strings & msort)...
Testing lsort memory (numbers & msort)...
Testing lsort memory (strings & qsort)...
Testing lsort memory (numbers & qsort)...

Task 3: `benchmark.{sh,py}`

Once you have a working lsort list sorting utility, you are to write a benchmark script in either shell or Python that measures the amount of time and space it takes to numerically sort the following number of items:

1, 10, 100, 1000, 10000, 100000, 1000000, 10000000

To help you measure the time and space a program uses, you can utilize the measure.c program written by David Chiang. Once you have downloaded and compiled this program, you can use it in the following manner:

$ shuf -i1-100000000 -n 10000000 | ./measure ./lsort -n > /dev/null
11.706000 seconds       611.625000 Mbytes

This means that the lsort utility performed a numerical merge sort on 10000000 numbers in 11.7 seconds and used 611.6 megabytes of memory.

Your script, either benchmarch.sh or benchmark.py, should automate the timing and measurement of these experiments to produce a markdown table looks like the following:

| NITEMS   | SORT     | TIME      | SPACE     |
| -------- | -------- |-----------|-----------|
| 1        | Merge    | 0.0       | 1.0       |
| 1        | Quick    | ...       | ...       |
| ...      | ...      | ...       | ...       |
| 10000000 | Merge    | 11.7      | 611.6     |
| 10000000 | Quick    | ...       | ...       |

Task 4: `README.md`

In your README.md, include markdown table your benchmark script generated and then answer the following questions:

Based on your experiments, what is the overall trend in terms of time and space as the number of items increases for both of your sorting methods?

Are the results surprising to you? Explain why or why not.
Given what you said about the space complexity and time complexity of list_msort and list_qsort in Activity 1 and the experimental results in Activity 2, what can you say is the relationship between theoretical complexity and real world performance? Explain.

Guru Point: Hacker Rank (1 Point)

A great way to practice your data structures knowledge (say for preparing for a technical interview) is to work on some programming challenges. Hacker Rank is a website that allows users to practice all sorts of different types of programming concepts including data structures and algorithms.

For extra credit, you are to signup for Hacker Rank and solve at least three Linked Lists Challenges.

Afterwards, simply show the instructor or TA the Hacker Rank dashboard showing you successfully solved at least three challenges to receive credit.

Submission

To submit your assignment, please commit your work to the homework07 folder in your assignments GitLab repository. Your homework07 folder should only contain at the following files:

Makefile
README.md
benchmark.sh or benchmark.py
list.c
list.h
lsort.c
node.c
node.h
test_list.c
test_lsort.sh
test_node.c

Homework 07: Dynamic Memory

Activity 1: Singly-Linked List Library (12 Points)

Singly-Linked List

Starter Code

Task 0: node.h and list.h

Task 1: Makefile

Makefile Variables

Task 2: node.c

Task 3: list.c

Online Code

Task 4: README.md

Activity 2: List Sort (3 Points)

Task 1: lsort.c

Task 2: test_lsort.sh

Task 3: benchmark.{sh,py}

Task 4: README.md

Guru Point: Hacker Rank (1 Point)

Submission

Task 0: `node.h` and `list.h`

Task 1: `Makefile`

Task 2: `node.c`

Task 3: `list.c`

Task 4: `README.md`

Task 1: `lsort.c`

Task 2: `test_lsort.sh`

Task 3: `benchmark.{sh,py}`

Task 4: `README.md`