The goal of this homework assignment is to allow you to practice using dynamic memory allocation to implement algorithms and build data structures in C99. In particular, you will be implementing a library and utility that computes the Huffman Coding for a stream of text. This encoding is a mapping from each character in the input stream to a unique bit sequence or prefix code, which unlike say ASCII is variable-length rather than fixed size. The purpose of the Huffman Coding is to utilize a greedy algorithm to produce an encoding that uses less space than the fixed-size ASCII encoding as show below:

In the example above, our input sequence is: `BACABA`, which in ASCII would consist of a total of `48` bits (e.g. `6` bytes):

```010000100100000101000011010000010100001001000001  # ASCII Representation
```

To generate a more compressed version of this text, we would need to construct a Huffman coding by performing the following process:

1. Frequency Table: First, read in the input stream and count the occurrences of each character to create a frequency table.

2. Priority Queue: Next, represent each letter and count pairs as a node and push them into a priority queue.

3. Tree: Afterwards, we construct the Huffman tree by doing the following:

```while q.size() > 1:
l = q.pop()                                           # Pop nodes with smallest counts
r = q.pop()
n = Node(count=l.count + r.count, left=l, right=r)    # Construct new composite node

q.push(n)                                             # Push new node into priority queue

t.root = q.pop()                                          # Last node is root of Huffman tree
```

Basically, as long there is more than one node in the priority queue, we will pop off the two smallest items, create a new node with the combined count of these two nodes, and insert this new node back into the priority queue. We repeat this process until there is only one node left in the priority queue, which we then set as the root of our Huffman tree.

By constantly grouping nodes with the smallest counts, we use a greedy problem solving approach in order to maximize the path from the root to the nodes with the lowest occurrences (and thus minimize the path from the root to the nodes with the highest occurrences.

4. Walk: To actually produce the Huffman Coding, we must walk the Huffman tree we constructed in the previous step and record the path we took to the leaf nodes (ie. our original nodes from step `2`). To ensure a consistent prefix code, we can use the convention where the left child gets a `0` and the right child gets a `1` when we take the path from parent to child. In the example above, using this traversal mechanism yields the following encoding:

Letter Count Encoding
A30
B211
C110

Using this new Huffman Coding, our original text, `BACABA`, can be encoding as follows:

```110100110     # Huffman Encoding
```

This means the Huffman Coding was about to compress down to about `20%` of the original ASCII representation. This is the reason why many real-world data compression schemes such as PKZIP, JPEG, and MP3 use elements of the Huffman Coding technique to achieve smaller file sizes for different types of data.

To keep things manageable, we will only worry about the construction of the Huffman Coding in this assignment and leave out the actual compression and decompressing process (though you are free to implement that on your own!).

For this assignment, record your source code and any responses to the following activities in the `homework07` folder of your assignments GitLab repository and push your work by noon Saturday, April 7.

## Activity 0: Preparation

Before starting this homework assignment, you should first perform a ```git pull``` to retrieve any changes in your remote GitLab repository:

```\$ cd path/to/repository                   # Go to assignments repository

\$ git checkout master                     # Make sure we are in master branch

\$ git pull --rebase                       # Get any remote changes not present locally
```

Next, create a new branch for this assignment:

```\$ git checkout -b homework07              # Create homework07 branch and check it out
```

### Starter Code

To help you get started, the instructor has provided you with some starter code.

```# Download starter code tarball
\$ curl -LO https://www3.nd.edu/~pbui/teaching/cse.20289.sp18/static/tar/homework07.tar.gz

# Extract starter code tarball
\$ tar xzvf homework07.tar.gz

# Add and commit starter code
\$ git commit -m "homework07: starter code"
```

Once downloaded and extracted, you should see the following files in your `homework07` directory:

```homework07
\_ Makefile                   # This is the Makefile for building all the project artifacts
\_ debug.h                    # This is the C99 header file with debugging macros
\_ huff.c                     # This is the C99 implementation file for the Huffman utility
\_ node.c                     # This is the C99 implementation file for the Huffman Node
\_ node.h                     # This is the C99 header file for the Huffman Node
\_ queue.c                    # This is the C99 implementation file for the Priority Queue
\_ queue.h                    # This is the C99 header file for the Priority Queue
\_ test_huff.sh               # This is the Shell test script for the Huffman utility
\_ test_node.c                # This is the C99 test file for the Huffman Node
\_ test_node.sh               # This is the Shell test script for the Huffman Node
\_ test_queue.c               # This is the C99 test file for the Priority Queue
\_ test_queue.sh              # This is the Shell test script for the Priority Queue
\_ test_tree.c                # This is the C99 test file for the Huffman Tree
\_ test_tree.sh               # This is the Shell test script for the Huffman Tree
\_ tree.c                     # This is the C99 implementation file for the Huffman Tree
\_ tree.sh                    # This is the C99 header file for the Huffman Tree
```

The details on what you need to implement are described in the following sections.

## Activity 1: Huffman Library (9.5 Points)

In order to follow the process described above, we will need to build a library that consists of the three following data structure components:

1. Huffman Node: This represents the symbols in the text stream along with their corresponding frequency counts.

2. Priority Queue: This is a binary heap that allows us to efficiently extrac the smallest Huffman Nodes so we can build our Huffman Tree.

3. Huffman Tree: This computes the table of frequencies, makes a priority queue of nodes, and then constructs a binary tree using a greedy approach.

The following tasks will walk you through implementing each of these components as part of the Huffman Library, `libhuff.a`.

### Task 0: `Makefile` (1 Point)

Before you implement any of the data structures, you should first complete the provided `Makefile`. Remeber, the `Makefile` contains all the rules or recipes for building the project artifacts (e.g. `libhuff.a`, `huff`, etc.). Although the provided `Makefile` contains most of the variable definitions and `test` recipes, you must add the appropriate rules for `libhuff.a` `test_node`, `test_queue`, `test_tree`, `huff`, and any intermediate objects. The dependencies for these targets are shown in the DAG below:

Note: Although we are producing a static library in the form of `libhuff.a`, we will not be statically linking our executables. Instead, we will use the `libhuff.a` as another object file when we link our executables.

#### Makefile Variables

You must use the `CC`, `CFLAGS`, `LD`, `LDFLAGS`, `AR`, and `ARFLAGS` variables when appropriate in your rules.

Moreover, since there are a lot of object files, it is recommend that you use a pattern rule for compiling `*.c` files into `*.o` files (rather than explicitly specifying each object rule).

Once you have a working `Makefile`, you should be able to run the following commands:

```\$ make                    # Build all targets
Compiling huff.o...
Compiling node.o...
Compiling queue.o...
Compiling tree.o...

\$ make clean              # Clean targets and intermediate files
Cleaning...

\$ make -n                 # Simulate build with tracing output
echo Compiling huff.o...
gcc -g -gdwarf-2 -Wall -std=gnu99 -fPIC -o huff.o -c huff.c
echo Compiling node.o...
gcc -g -gdwarf-2 -Wall -std=gnu99 -fPIC -o node.o -c node.c
echo Compiling queue.o...
gcc -g -gdwarf-2 -Wall -std=gnu99 -fPIC -o queue.o -c queue.c
echo Compiling tree.o...
gcc -g -gdwarf-2 -Wall -std=gnu99 -fPIC -o tree.o -c tree.c
ar rcs libhuff.a node.o queue.o tree.o
gcc -L. -o huff huff.o libhuff.a

\$ make test               # Run all tests
...
```

Depending on your compiler, you may see some warnings with the initial starter code. Likewise, the test programs will all fail in some fashion.

#### Warnings

You must include the `-Wall` flag in your `CFLAGS` when you compile. This also means that your code must compile without any warnings, otherwise points will be deducted.

### Task 1: `node.c` (1.5 Points)

Once you can build the project artifacts, your first task is to implement the `Node` structure, which is used to represent individual letter and count pairs in text stream. This structure is defined in the `node.h` header, which in addition to declaring function prototypes, also defines the following types:

```#define NODE_NOT_SYMBOL (-1)
```

The `NODE_NOT_SYMBOL` constant is used as the `symbol` for internal Huffman tree nodes (ie. non-leaf nodes), which do not correspond to any particular `symbol`.

```typedef union {
int64_t letter;
char    string[sizeof(int64_t)];
} Symbol;
```

The `Symbol` union stores the `symbol` of the `Node` structure. It can be accessed as either an integer or a string. In general, we use the `letter` field for initializing the `Symbol`, but then use the `string` for printing since the `string` representation handles escaping the `symbol` for printing.

```typedef struct node Node;
struct node {
Symbol  symbol;
size_t  count;
Node   *left;
Node   *right;
};
```

The `Node` struct represents our letter and count pairs and contains the following attributes:

• `symbol` is a union type that consists of either a string or an integer letter.

• `count` is an unsigned integer that stores the number of occurrences of the `symbol` in the text stream.

• `left` and `right` are pointers to the corresponding child `Node` structures.

```typedef void (*NodeFunc)(const Node *n, char *encoding, void *arg);
```

The `NodeFunc` type definition is used for declaring function arguments in functions such as `tree_walk`.

#### Type Definitions

The use of typedef in this header file allow us to simply say `Node *n` rather than `struct Node *n`. While there is some controversary over whether or not this is good practice, we will use these type definitions to make our code clearer and more compact.

Note: In C++, you do not need to typedef in order to do this since a struct is really just a public class.

#### Functions

To manipulate a `Node` struct, the `node.c` file defines the following functions:

```/**
* Create Node structure.
* @param   symbol      Symbol letter.
* @param   count       Initial count.
* @param   left        Left Node child.
* @param   right       Right Node child.
* @return  Newly allocated Node structure with specified attributes.
*/
Node *  node_create(int64_t symbol, size_t count, Node *left, Node *right);
```

The `node_create` function constructs a new `Node` struct and initializes its attributes to the specified `symbol`, `count`, `left` and `right` arguments.

Notes:

• You must check if memory allocation fails and handle those situations appropriately.

• You should use the internal `node_set_symbol` function to record the `symbol` argument.

```/**
* Delete Node structure.
* @param   n           Node structure.
* @return  NULL.
*/
Node *  node_delete(Node *n);
```

The `node_delete` function destructs the specified `Node` struct by recursively deleting its children and then deallocating itself.

```/**
* Dump Node structure.
* @param   n           Node structure.
* @param   stream      I/O to write to.
*/
void    node_dump(const Node *n, FILE *stream);
```

The `node_dump` function prints the `Node` structures attributes to the specified `stream` in the following format:

```Node{symbol: A, count: 0, left: 0x0, right: 0x0}
```

Notes:

• The `symbol` should be the `string` attribute of the `Value` union.

• The `left` and `right` pointers should be displayed in hexadecimal format (prefixed with `0x`).

```/**
* Set symbol attribute of Node structure.
* @param   n           Node structure.
* @param   letter      Node symbol letter value.
*/
void    node_set_symbol(Node *n, int64_t letter);
```

The internal `node_set_symbol` function is used by `node_create` to store the `letter` and then escape the `letter` value into a string (both stored in the `symbol` union).

Notes:

• The following `letters` need to be escaped: `"`, `\`, `\b`, `\f`, `\n`, `\r`, and `\t`.

#### Testing

As you implement `node.c`, you can test it by running the `test-node` target:

```\$ make test-node
Testing Huffman Node...
node_create                              ... Success
node_create    (valgrind)                ... Success
node_delete                              ... Success
node_delete    (valgrind)                ... Success
node_dump                                ... Success
node_dump      (output)                  ... Success
node_dump      (valgrind)                ... Success
Score 1.50
```

#### Debugging Macros

To help you with debugging, the `debug.h` header file includes the `debug` macro that you can use to insert debugging statements into your code. The advantage of this macro over printf is that `debug` displays information such as the file name, line number, and function name, along with the formatted message you specified. To take advantage of it, you simply use it as if it were printf:

```debug("n->symbol = %s", n->symbol.string);
```

Note: `debug` will automatically append a `\n` so you don't need to include that in your message.

### Task 2: `queue.c` (3 Points)

The next component you need to implement for the Huffman Library is the priority queue. In this project, we will use a binary heap with an array representation to implement an efficient priority queue. This `Queue` structure is defined in the `queue.h` header, which in addition to declaring function prototypes, also defines the following types:

```#define PRIORITY_QUEUE_CAPACITY (1<<8)
```

The `PRIORITY_QUEUE_CAPACITY` constant is used whenever the user specifies a `capacity` of `0` to the `queue_create` function.

```#define PARENT(i)       ((i-1)/2)
#define LEFT_CHILD(i)   (2*(i) + 1)
#define RIGHT_CHILD(i)  (2*(i) + 2)
```

These macros are used to access the `PARENT`, `LEFT_CHILD`, and `RIGHT_CHILD` of a `Node` located in an array at index `i`. This assumes that the binary tree is complete.

```typedef struct queue {
Node   **nodes;         /* Array of Node pointers */
size_t   size;          /* Number of Nodes in Queue */
size_t   capacity;      /* Total number of slots allocated */
} Queue;
```

The `Queue` structure stores represents the priority queue of `Node` structures contains of the following attributes:

Because of this design, the priority queue is implemented as a binary heap and grows like a vector in C++ when `size` reaches `capacity`.

#### Functions

To manipulate a `Queue` struct, the `queue.c` file defines the following functions:

```/**
* Create Queue structure.
* @param   capacity        Initial capacity.
* @return  Newly allocated Queue structure with specified capacity.
*/
Queue * queue_create(size_t capacity);
```

The `queue_create` function constructs a `Queue` struct with the specified initial `capacity`.

Note:

• You must check if memory allocation fails and handle those situations appropriately.

• Use `PRIORITY_QUEUE_CAPACITY` if the specified `capacity` is `0`.

• Consider using calloc instead of malloc to get zeroed chunk of memory.

```/**
* Delete Queue structure.
* @param   q           Queue structure.
* @return  NULL.
*/
Queue * queue_delete(Queue *q);
```

The `queue_delete` function destructs the specified `Queue` struct by deallocating the `nodes` array and itself.

```/**
* Push Node into Queue structure.
* @param   q           Queue structure.
* @param   n           Node structure.
* @return  Whether or not the operation was successful.
*/
bool    queue_push(Queue *q, Node *n);
```

The `queue_push` function inserts the specified `Node` struct into the priority queue.

Notes:

• This operation can be accomplished by placing the `Node` at the end of the `nodes` array and then bubbling up the `Node` using the internal `queue_bubble_up` function.

• Be sure to grow the `nodes` array using the internal `queue_resize` function if `size` has reached `capacity`.

```/**
* Pop minimum Node into Queue structure.
* @param   q           Queue structure.
* @return  Minimum Node structure (by count) or NULL.
*/
Node *  queue_pop(Queue *q);
```

The `queue_pop` function removes and returns the `Node` struct with the smallest value in priority queue.

Notes:

• This operation can be accomplished by replacing the root `Node` with the last `Node` in the `nodes` array and then bubbling down the `Node` using the internal `queue_bubble_down` function. Return the original root `Node` as the minimum value.

```/**
* Dump Queue structure.
* @param   q           Queue structure.
* @param   stream      I/O stream to write to.
*/
void    queue_dump(const Queue *q, FILE *stream);
```

The `queue_dump` function prints all the `Node` structures in the priority queue using the `node_dump` function.

```/**
* Resize Queue structure.
* @param   q           Queue structure.
* @param   capacity    New capacity.
* @return  Whether or not operation succeeded.
*/
bool    queue_resize(Queue *q, size_t capacity);
```

The internal `queue_resize` function grows the `nodes` array by the specified `capacity`.

Notes:

• Consider using realloc instead of malloc to avoid having to copy from the old array to the new one.

```/**
* Bubble up Node in Queue structure.
* @param   q           Queue structure.
* @param   i           Index to current Node structure.
*/
void    queue_bubble_up(Queue *q, size_t i);
```

The internal `queue_bubble_up` function continuously swaps the current `Node` with its parent if it is smaller than its parent, in order to maintain the binary heap invariant.

```/**
* Bubble down Node in Queue structure.
* @param   q           Queue structure.
* @param   i           Index to current Node structure.
*/
void    queue_bubble_down(Queue *q, size_t i);
```

The internal `queue_bubble_down` function continuously swaps the current `Node` with its minimum child if it is greater than this child `Node`, in order to maintain the binary heap invariant.

Notes:

• You should use the `queue_minimum_child` function to determine which child has the minimum value (e.g `count`).

```/**
* Determines index of child with minimum value.
* @param   q           Queue structure.
* @param   i           Index to current Node structure.
*/
size_t  queue_minimum_child(Queue *q, size_t i);
```

The internal `queue_minimum_child` function returns the index of the minimum child of the specified `Node`.

#### Testing

As you implement `queue.c`, you can test it by running the `test-queue` target:

```\$ make test-queue
Testing Priority Queue...
queue_create                             ... Success
queue_create   (valgrind)                ... Success
queue_delete                             ... Success
queue_delete   (valgrind)                ... Success
queue_push 0                             ... Success
queue_push 0   (valgrind)                ... Success
queue_push 1                             ... Success
queue_push 1   (valgrind)                ... Success
queue_push 2                             ... Success
queue_push 2   (valgrind)                ... Success
queue_pop                                ... Success
queue_pop      (valgrind)                ... Success
queue_dump                               ... Success
queue_dump     (output)                  ... Success
queue_dump     (valgrind)                ... Success
Score 3.00
```

#### GDB and Valgrind

If your program fails any of the tests, you can use gdb and valgrind to track down any of your bugs. To run a specific test case, simply pass the appropriate number as a command-line argument to the test program:

```\$ ./test_queue                # List all test cases
Usage: ./test_queue NUMBER

Where NUMBER is one of the following:
0. Test queue_create
1. Test queue_delete
2. Test queue_push 0
3. Test queue_push 1
4. Test queue_push 2
5. Test queue_pop
6. Test queue_dump

\$ ./test_queue 0              # Run queue_create test
```

### Task 3: `tree.c` (4 Points)

The last component of the Huffman Library is the binary tree required to construct the Huffman Coding. This `Tree` structure is defined in the `tree.h` header, which in addition to declaring function prototypes, also defines the following types:

```#define TREE_COUNTS_SIZE (1<<8)
```

The `TREE_COUNTS_SIZE` constant sets the size of the `counts` table, which is the total number of `char` values (ie. `256`).

```typedef struct {
Node   *root;
int64_t counts[TREE_COUNTS_SIZE];
} Tree;
```

The `Tree` struct represents the Huffman Tree and contains the following attributes:

• `root` is the root `Node` struct of the binary tree.

• `counts` is an array that represents the frequency counts of each letter in the data stream.

#### Functions

To manipulate a `Tree` struct, the `tree.c` file defines the following functions:

```/**
* Create Tree structure.
* @return  Newly allocated and initialized Tree structure.
*/
Tree *  tree_create();
```

The `tree_create` function constructs a `Tree` struct with an initialized `counts` array.

```/**
* Delete Tree structure.
* @param   t           Tree structure.
* @return  NULL.
*/
Tree *  tree_delete(Tree *t);
```

The `tree_delete` function destructs the specified `Tree` struct by deallocating `root` and the `Tree` itself.

```/**
* Build internal Tree structure.
* @param   t           Tree structure.
* @param   stream      I/O stream to read from.
* @return  Whether or not the operation succeeded.
*/
bool    tree_build(Tree *t, FILE *stream);
```

The `tree_build` function builds the Huffman tree from the specified text `stream`.

Notes:

This can be accomplished by following the process described at the beginning of the document:

1. Use the internal `tree_count` function to construct the frequencies table.

2. Use the internal `tree_queue` function to construct a priority queue of `Node` structs based on the `counts` array.

3. Process the priority queue using the greedy algorithm described above until there is only one `Node` left in the priority queue.

```/**
* Walk Tree structure.
* @param   t           Tree structure.
* @param   f           Node function.
* @param   arg         Ndoe function argument.
* @return  Whether or not the operation succeeded.
*/
void    tree_walk(const Tree *t, NodeFunc f, void *arg);
```

The `tree_walk` function walks the Huffman tree and calls the specified `NodeFunc` on each leaf `Node` by passing the current `encoding` and the specified `arg`.

Notes:

This can be accomplished by using the internal `tree_walk_node` to recursively walk the binary tree.

```/**
* Build internal counts table in Tree structure.
* @param   t           Tree structure.
* @param   stream      I/O stream to read from.
* @return
*/
void    tree_count(Tree *t, FILE *stream);
```

The internal `tree_count` function builds the `counts` frequency table using data from the specified `stream`.

```/**
* Create Priority Queue out of counts table in Tree structure.
* @param   t           Tree structure.
* @return  Priority Queue of Nodes corresponding to counts in Tree.
*/
Queue * tree_queue(Tree *t);
```

The internal `tree_queue` function creates a `Queue` struct and inserts new `Node` structs with values from the `counts` array.

Notes:

• You must check if memory allocation fails and handle those situations appropriately.

```/**
* Recursively walk Tree nodes.
* @param   n           Node structure.
* @param   f           Node function.
* @param   arg         Node function argument.
* @param   encoding    Node encoding.
*/
void    tree_walk_node(const Node *n, NodeFunc f, void *arg, char *encoding);
```

The internal `tree_walk_node` function recursively walks the binary tree, calling the `NodeFunc` with the specified `encoding` and `arg` when it reaches a leaf `Node`.

Notes:

• To build the `encoding` you should concatentate a `0` when you recurse to the left and a `1` when you recurse to the right.

• Be sure to handle the case where there is only one `Node` and an empty `encoding` by emitting a `0`.

#### Testing

As you implement `tree.c`, you can test it by running the `test-tree` target:

```\$ make test-tree
Testing Huffman Tree...
tree_create                              ... Success
tree_create    (valgrind)                ... Success
tree_delete                              ... Success
tree_delete    (valgrind)                ... Success
tree_build 0                             ... Success
tree_build 0   (valgrind)                ... Success
tree_build 1                             ... Success
tree_build 1   (valgrind)                ... Success
tree_walk 0                              ... Success
tree_walk 0    (valgrind)                ... Success
tree_walk 1                              ... Success
tree_walk 1    (valgrind)                ... Success
tree_walk 2                              ... Success
tree_walk 2    (valgrind)                ... Success
tree_walk 3                              ... Success
tree_walk 3    (valgrind)                ... Success
tree_walk 4                              ... Success
tree_walk 4    (valgrind)                ... Success
tree_walk 5                              ... Success
tree_walk 5    (valgrind)                ... Success
tree_walk 6                              ... Success
tree_walk 6    (valgrind)                ... Success
Score 4.00
```

## Activity 2: Huffman Utility (5.5 Points)

After completing the Huffman library, you can then implemet the Huffman utility, `huff`, which has the following command-line arguments:

```\$ ./huff -h
Usage: ./huff FILES...
Flags:

-f FORMAT    Output format (tsv, csv, yaml, json)
```

Here are some examples of `huff` in action:

```\$ ./huff <<< "BACABA"             # <<< Syntax only works on bash
A       0       3
B       10      2
\n      110     1
C       111     1

\$ ./huff -f csv <<< "BACABA"
"A","0",3
"B","10",2
"\n","110",1
"C","111",1

\$ ./huff -f yaml <<< "BACABA"
"A": {"encoding": "0", "count": 3}
"B": {"encoding": "10", "count": 2}
"\n": {"encoding": "110", "count": 1}
"C": {"encoding": "111", "count": 1}

\$ ./huff -f json <<< "BACABA"
{
"A": {"encoding": "0", "count": 3},
"B": {"encoding": "10", "count": 2},
"\n": {"encoding": "110", "count": 1},
"C": {"encoding": "111", "count": 1},
}
```

### Task 1: `huff.c` (2.5 Points)

To implement the Huffman utility, you will need to complete the `provided` huff.c, which is pretty bare bones. It is recommended that you review cat.c from Reading 08 and use it as the basis for your implementation of `huff`.

Notes:

• Consider using an enumeration to create constants you can check for the various output formats.

• You will probably want to define your own `NodeFunc` that prints out the Huffman coding and use it with the `tree_walk` function after you build the Huffman tree.

• KISS

#### Testing

As you implement `huff.c`, you can test it by running the `test-huff` target:

```\$ make test-huff
Testing Huffman utility...
Usage                                    ... Success
Input 0 (TSV)                            ... Success
Input 0 (TSV)  (valgrind)                ... Success
Input 0 (CSV)                            ... Success
Input 0 (CSV)  (valgrind)                ... Success
Input 0 (JSON)                           ... Success
Input 0 (JSON) (valgrind)                ... Success
Input 0 (YAML)                           ... Success
Input 0 (YAML) (valgrind)                ... Success
Input 1 (TSV)                            ... Success
Input 1 (TSV)  (valgrind)                ... Success
Input 1 (CSV)                            ... Success
Input 1 (CSV)  (valgrind)                ... Success
Input 1 (JSON)                           ... Success
Input 1 (JSON) (valgrind)                ... Success
Input 1 (YAML)                           ... Success
Input 1 (YAML) (valgrind)                ... Success
Score 2.50
```

### Task 2: `huff.py` (1.5 Points)

Once you have a working Huffman utility, you are to revisit our friend Python to make a script, `huff.py`, which utilizes the `huff` program to compute the Huffman Coding of a series of text files and then generates a table of the results in Markdown format:

```|        FILE        | OLD SIZE | NEW SIZE |   RATIO   |
|--------------------|----------|----------|-----------|
|     data/1342-0.txt|   5712512|   3238771|     56.70%|
|      data/219-0.txt|   1866312|   1065343|     57.08%|
|       data/84-0.txt|   3579736|   2005500|     56.02%|
|        data/844.txt|   1139056|    672349|     59.03%|
|       data/98-0.txt|   6298288|   3575868|     56.78%|
|       data/data.txt|       328|       127|     38.72%|
|      data/ideas.txt|      8360|      4934|     59.02%|
```

As can be seen, the `huff.py` should emit the name of the file, the old size of the file (in terms of bits), the new size of the file using the Huffman coding (in terms of bits), and then the compression ratio if we were to use the Huffman coding.

You must use this script on at least five different text files (of varying sizes).

Notes:

#### UTF-8 to ASCII

Most of the files from Project Gutenberg are in Unicode format rather than ASCII, which can cause your program to segfault. To avoid these errors, convert the Unicode files into plain ASCII by using the following command:

```\$ iconv -sc -f utf-8 -t iso-8859-1 unicode.txt > new.ascii
```

Replace `unicode.txt` with the name of the Unicode file and `new.ascii` with the name of the new file you want to create.

### Task 3: `README.md` (1.5 Points)

In your `README.md`, include the Markdown table your `huff.py` script generated and then answer the following questions:

1. Describe how the `symbol` is stored in the `Node` struct by diagramming the memory layout of the following `Node` structure:

```Node *n = node_create('\n', 2, NULL, NULL);
```

Be sure to indicate how much memory each `Node` structure occupies and how much is allocated for each attribute in the struct.

1. Explain what the code `assert(n->symbol.letter == '\n')` would result in?

2. Given that `symbol` is a union type, is it ever possible for both the `letter` and `string` of the symbol `\n` to co-exist (that is, the `string` doesn't clobber the `letter`)?

Hint: Consider alternative architectures.

2. To construct the Huffman code, you had to utilize both a priority queue (aka. min-heap) and a Huffman Tree, both of which were binary trees. For the former, you used an array representation while for the later you used a linked node representation. With this in mind, answer the following questions:

1. Would it have been easier to implement the priority queue with a linked node representation? Explain why this would be better or worse than the array representation.

2. Would it have been easier to implement the Huffman Tree with an array representation? Explain why this would be better or worse than the linked node representation.

3. Analyze the results of your experiements with you `huff.py` Python script. What sort of compression ratios were you able to achieve? How does this compare to standard utilities such as `gzip`, `bzip2`, and `xz`?

## Guru Point: ShareLaTeX (1 Point)

So far this semester, you have been using Markdown to format your `README` files and have dabbled a little bit with HTML, which powers the World Wide Web. For academic papers, many scholars and researchers use a language called LaTeX to create crisp documents such as this PhD Dissertation.

For extra credit, you are to use LaTeX to produce a document that contains a few sections, a few paragraphs, at least one diagram, and at least one table. You may choose, for instance, to write-up the responses to the questions in this assigment using LaTeX (just be sure to include the generated PDF and make a note of it in your `README`) or you may create a document for another class (or anything you want).

To produce a PDF from a LaTeX document, you first use your favorite `\$EDITOR` to compose the LaTeX source and then you compile the document using the following commands:

```\$ pdflatex article.tex
\$ pdflatex article.tex        # Run it twice
```

If you'd rather have a more graphical and real-time editing experience, you should consider ShareLaTeX. Notre Dame has a site license for this website, which is a great tool for interactive and collaborative LaTeX document editting. The website also provides great documentation:

You should skim these tutorials and take a look at the sample LaTeX file which will generate this sample PDF.

To get credit, show the final document and the LaTeX source code to either the instructor or one of the TAs.

## Submission

To submit your assignment, please commit your work to the `homework07` folder of your `homework07` branch in your assignments GitLab repository. Your `homework07` folder should only contain the following files:

• `Makefile`
• `README.md`
• `debug.h`
• `huff.c`
• `node.c`
• `node.h`
• `queue.c`
• `queue.h`
• `test_huff.sh`
• `test_node.c`
• `test_node.sh`
• `test_queue.c`
• `test_queue.sh`
• `test_tree.c`
• `test_tree.sh`
• `tree.c`
• `tree.h`

Note: Don't include any object files or executables.

```#--------------------------------------------------
# BE SURE TO DO THE PREPARATION STEPS IN ACTIVITY 0
#--------------------------------------------------

\$ cd homework07                           # Go to Homework 07 directory
...
\$ git add Makefile                        # Mark changes for commit
\$ git add node.c                          # Mark changes for commit
\$ git commit -m "homework07: node"        # Record changes
...
\$ git add Makefile                        # Mark changes for commit
\$ git add queue.c                         # Mark changes for commit
\$ git commit -m "homework07: queue"       # Record changes
...
\$ git add Makefile                        # Mark changes for commit
\$ git add tree.c                          # Mark changes for commit
\$ git commit -m "homework07: tree"        # Record changes
...
\$ git add Makefile                        # Mark changes for commit
\$ git add huff.c                          # Mark changes for commit
\$ git commit -m "homework07: huff"        # Record changes
...
\$ git add huff.py                         # Mark changes for commit
\$ git commit -m "homework07: huff.py"     # Record changes
...