Overview

The third project is to build your own implementation of malloc and free. That is, you will need to implement a library that interacts with the operating system to perform heap management on behalf of a user process as demonstrated in class.

As demonstrated in malloc0.c from Lecture 14, writing a functional heap management system is relatively straightforward:

Simply grow the heap using the sbrk system call whenever the user requests memory at run-time.

Unfortunately, this initial version of a heap manager has some serious problems:

  1. It doesn't actually free anything, so it is quite possible (and likely) that we would eventually run out of memory.

  2. It doesn't keep track of blocks that have been allocated but are no longer in-use (ie. free), so it will never re-use a previous allocation. This is both wasteful (as there is memory that we could use, but don't), and it is slow because we are then forced to call sbrk, which is a system call and thus incurs the overhead of context switching.

To remedy this, we wrote malloc1.c, which utilizes a circular doubly-linked free list to keep track of only the free blocks as shown below:

The malloc1.c shown in class has the following features:

  1. It maintains a free list of all the free blocks (previously allocated blocks that are no longer in use).

  2. malloc will search this free list for a free block it can re-use. If no such block exists, it will simply fall-back to growing the heap with sbrk.

  3. free will insert released blocks to the free list to make them available for future re-use.

While this second version is better than the first heap manager, it is still lacking important features. For this project, you are to extend the functionality of the malloc1.c allocator to incorporate the following:

  1. Ensure that all memory allocations are aligned to the nearest word size.

  2. Implement multiple search algorithms (First Fit, Best Fit, and Worst Fit).

    To support different strategies in the same source file, we will use C's #if directive to select different blocks of code at compile time.

  3. Implement shrinking the heap when a block is released.

  4. Implement splitting a block when it is re-used.

  5. Implement merging a block when it is inserted into the free list.

  6. Ensure that blocks in the free list are stored in sorted (ie. ascending) order.

  7. Implement the complementary calloc and realloc functions.

  8. Add counters to keep track of number of operations performed by the allocator and the amount of internal fragmentation and external fragmentation. When the process exits, the library should output the following type of information (based on the counters):

    blocks:      0
    free blocks: 0
    mallocs:     10
    frees:       10
    callocs:     0
    reallocs:    0
    reuses:      0
    grows:       10
    shrinks:     10
    splits:      0
    merges:      0
    requested:   10240
    heap size:   0
    internal:    0.00
    external:    0.00
    

Once you have a working malloc implementation (as verified by the provided functional and unit tests), you are to analyze the metrics above for the different heap management strategies.

Note: We will not consider thread-safety or support multiple threads at all for this project.

Working in groups of one or two people, you are to create a library that implements malloc and free by noon on Monday, November 18, 2024. More details about this project and your deliverables are described below.

Heap Management

Implementing your own heap management library is more common than you think. For instance, Firefox uses jemalloc, while GNU uses dlmalloc, and Google provides tcmalloc. As hardware and operating systems evolve, the optimal heap management strategies change as well, and so developers continue to advance the state of the art. For this project, we will focus on understanding the basics and getting something that works (good enough).

For a reference on how to create a simple heap management library, I recommend: A quick tutorial on implementing and debugging malloc, free, calloc, and realloc. This is what I initially used as the basis for this project (many moons ago).

Deliverables

As noted above, you are to work individually or in pairs to implement your heap management library. You must use C (not C++) as the implementation language. Any test scripts or auxillary tools can be written in any reasonable scripting language.

Timeline

Here is a timeline of events related to this project:

Date Event
Sunday, October 27 Project description and repository are available.
Sunday, November 10 Brainstorming should be completed.
Monday, November 18 Library is due (pushed to GitHub project03 branch).

Group Project

For the final submission, please open a Pull Request on your repository (for your project03 branch to the master branch) and assign it to the instructor.

Repository

To start this project, you must create a Project 03 repository on GitHub using the following template:

https://classroom.github.com/a/tSiXMZj7

Note: Do your work in a separate git branch that can be later merged into the master branch (ie. make a project03 branch, DO NOT COMMIT OR MERGE TO MASTER).

Documentation

The Project 03 repository includes a README.md file with the following sections:

  1. Student: This should be the names and email addresses of each group member.

  2. Video: This should have the link to your reflection video (described below).

  3. Brainstorming: This contains questions that should help you design and plan your approach to the project. You do not need to fill these out, but they are meant to be helpful.

  4. Errata: This should contain a description of any known errors or deficiencies in your implementation.

  5. Acknowledgments: This should list anyone you collaborated with or received help from (including TAs, students, and AI tools).

You must complete this document report as part of your project.

Source Code

As you can see, the base Project 03 repository contains a README.md file and the following folder hierarchy:

project03
    \_  Makefile        # This is the project Makefile
    \_  bin             # This contains the binary executables and test scripts
    \_  include
        \_  malloc      # This contains the malloc library header files
    \_  lib             # This contains the malloc library when built
    \_  src             # This contains the malloc library source code
    \_  tests           # This contains the functional and unit test code

You must maintain this folder structure for your project and place files in their appropriate place.

Compiling

To help you get started, we have provided a Makefile with all the necessary targets:

$ make                  # Build libraries and test applications
Building lib/libmalloc-ff.so
Building lib/libmalloc-bf.so
Building lib/libmalloc-wf.so
Building bin/test_00
Building bin/test_01
Building bin/test_02
Building bin/test_03
Building bin/test_04
Building bin/test_05
Building bin/unit_block
Building bin/unit_freelist

$ make clean            # Removes all targets
Removing libraries
Removing tests

K.I.S.S.

While the exact organization of the project code is up to you, keep in mind that you will be graded in part on coding style, cleaniness, and organization. This means your code should be consistently formatted, not contain any dead code, have reasonable comments, and appropriate naming among other things:

Please refer to these Coding Style slides for some tips and guidelines on coding style expectations.

Running

Once you have the library, you can use it to override the existing malloc functions from the standard library by using the LD_PRELOAD trick:

# Use our implementation of library
$ env LD_PRELOAD=lib/libmalloc-ff.so cat README.md

This will load our library first and thus link any calls to malloc and free in cat to our library rather than the standard C library. To make this more convenient, you should consider creating a bin/run.sh script that sets the environment variable for you and then runs the specified command.

Note: there are three versions of our custom malloc implementation, where each version corresponds to a different search algorithm (first fit (ff), best fit (bf), and worst fit (wf)).

Implementation

All of the C header files are in the include/malloc folder, while the C source code for the malloc library is in the src folder. To help you get started, parts of the project are already implemented:

[x] include/malloc/block.h      # Block structure header (implemented)
[x] include/malloc/counters.h   # Counters header (implemented)
[x] include/malloc/freelist.h   # Free List header (implemented)
[~] src/block.c                 # Block structure definitions (partially implemented)
[~] src/counters.c              # Counters definitions (partially implemented)
[~] src/freelist.c              # Free List definitions (partially implemented)
[~] src/posix.c                 # POSIX definitions (partially implemented)

Basically, the malloc1.c code demonstrated in Lecture 14 was taken and split into separate files to form the skeleton provided to you. Additionally, we have implemented most of the desired counting functionality for you, so you can concentrate on just the block manipulation and free list management operations. Each of the functions in the incomplete files above have comments that describe what needs to be done.

You will need to examine these source files and complete the implementation of a malloc library that uses a circular doubly-linked free list to track free blocks in sorted order (ie. smaller block addresses appear towards the front of the free list). To do so, you should start with the basic Block functions, then move to the free list functions. Once those pass the provided unit tests, you should then utilize those components to complete the malloc, free, calloc, and realloc functions:

  1. src/block.c: This file contains the functions for the Block structure, which contains the aligned capacity of the requested allocation and the unaligned size of the requested allocation (neither of which includes space for the embedded Block structure), and pointers to the previous and next Block structures. If the Block is detached, then it should just point to itself.

    You will need to think carefully about how to manipulate the Block structure in the context of circular doubly-linked list with a sentinel node.

  2. src/counters.c: This file contains the functions for keeping track of the various counters utilized in the library. Whenever malloc is called, it will call the init_counters function which will register the dump_counters function to run when the program properly terminates.

    You will need to implement the functions for computing the internal fragmentation and external fragmentation of the heap.

  3. src/freelist.c: This file contains the functions for the free list.

    You will need to think carefully about what needs to happen during searching and insertion. In particular, you will need to think about how to ensure the free list remains in sorted order and then how to exploit that ordering to merge adjacent blocks.

  4. src/posix.c: This file contains the POSIX functions: malloc, free, calloc, realloc.

    You will need to think carefully about which of the previous functions you want to utilize.

Note: There are also some TODOs to help identify which parts you need to implement, along with how you will want to modify the counters.

Alignment

For better performance, all memory allocations should be aligned by the word size of the current machine. For instance, if the user requests 14 bytes and our word size is 8 bytes, then we will actually allocate 16 bytes as this is the nearest multiple of the word size (that has sufficient capacity).

A ALIGN macro is provided to you that will perform this rounding automatically for you. Whenever you are comparing or determining sizes, you need to think carefully if you want to use the original size as requested by the user and stored in the Block structure (e.g. 14) or if you really want to use the capacity of the Block (e.g. block->capacity) which stores the original aligned allocation capacity.

Failure to properly utilize this ALIGN macro will most likely lead to subtle and frustrating errors.

Testing

As you develop your library, you should utilize the provided functional and unit tests that ensure your heap management library works correctly:

$ make test             # Build and runs test programs
Building bin/test_00
Building bin/test_01
Building bin/test_02
Building bin/test_03
Building bin/test_04
Building bin/test_05
Building bin/unit_block
Building bin/unit_freelist

Testing unit_block...
  block_allocate                         ... Success
  block_release                          ... Success
  block_detach                           ... Success
  block_merge                            ... Success
  block_split                            ... Success

Testing unit_freelist...
  free_list_search_ff                    ... Success
  free_list_search_bf                    ... Success
  free_list_search_wf                    ... Success
  free_list_insert                       ... Success
  free_list_length                       ... Success

Building lib/libmalloc-ff.so
Building lib/libmalloc-bf.so
Building lib/libmalloc-wf.so
Running run_test_00.sh
  Testing libmalloc-ff.so                ... Success
  Testing libmalloc-bf.so                ... Success
  Testing libmalloc-wf.so                ... Success

Running run_test_01.sh
  Testing libmalloc-ff.so                ... Success
  Testing libmalloc-bf.so                ... Success
  Testing libmalloc-wf.so                ... Success

Running run_test_02.sh
  Testing libmalloc-ff.so                ... Success
  Testing libmalloc-bf.so                ... Success
  Testing libmalloc-wf.so                ... Success

Running run_test_03.sh
  Testing libmalloc-ff.so                ... Success
  Testing libmalloc-bf.so                ... Success
  Testing libmalloc-wf.so                ... Success

Running run_test_04.sh
  Testing libmalloc-ff.so                ... Success
  Testing libmalloc-bf.so                ... Success
  Testing libmalloc-wf.so                ... Success

Running run_test_05.sh
  Timing libmalloc-ff.so                 ... 0m1.636s
  Timing libmalloc-bf.so                 ... 0m2.959s
  Timing libmalloc-wf.so                 ... 0m2.976s

Running run_test_06.sh
  Testing libmalloc-ff.so (cat)          ... Success
  Testing libmalloc-bf.so (cat)          ... Success
  Testing libmalloc-wf.so (cat)          ... Success
  Testing libmalloc-ff.so (md5sum)       ... Success
  Testing libmalloc-bf.so (md5sum)       ... Success
  Testing libmalloc-wf.so (md5sum)       ... Success
  Testing libmalloc-ff.so (sha1sum)      ... Success
  Testing libmalloc-bf.so (sha1sum)      ... Success
  Testing libmalloc-wf.so (sha1sum)      ... Success
  Testing libmalloc-ff.so (sort)         ... Success
  Testing libmalloc-bf.so (sort)         ... Success
  Testing libmalloc-wf.so (sort)         ... Success
  Testing libmalloc-ff.so (dd)           ... Success
  Testing libmalloc-bf.so (dd)           ... Success
  Testing libmalloc-wf.so (dd)           ... Success
  Testing libmalloc-ff.so (du)           ... Success
  Testing libmalloc-bf.so (du)           ... Success
  Testing libmalloc-wf.so (du)           ... Success
  Testing libmalloc-ff.so (find)         ... Success
  Testing libmalloc-bf.so (find)         ... Success
  Testing libmalloc-wf.so (find)         ... Success

Note, you can run tests individually:

$ ./bin/run_test_00.sh
  Testing libmalloc-ff.so                ... Success
  Testing libmalloc-bf.so                ... Success
  Testing libmalloc-wf.so                ... Success

Debugging

While running the tests, you may encounter some segfaults and other memory errors. Because we are side-loading our own malloc implementation, you will need to do the following to debug a test application using gdb.

$ gdb ./bin/test_01
...
(gdb) set exec-wrapper env LD_PRELOAD=./lib/libmalloc-ff.so
(gdb) run
...
(gdb) bt

Basically, you want to first load gdb with the test program that is crashing. Next, you need to tell gdb to utilize the appropriate malloc library by creating an exec-wrapper that loads it into memory. Next, simply run the program. Once it segfaults, you can print a stack trace by using the bt command. From there, you can explore the state of your program and see if you can determine what went wrong.

Reflection

To reflect on the project and the concepts involved, you will need to create a group video demonstration of your software artifact and complete an individual quiz (each member will submit their own answers to their own private assignments repository).

Video Demonstration

As part of your grade, your group will need to create a video that demonstrates and discusses the following:

  1. Your library passing the automated tests.

  2. Compares and contrast the three heap management strategies by discussing:

    a. Which heap management strategy uses the least amount of space?

    b. Which heap management strategy was the fastest?

    c. Which heap management strategy exhibited the least fragmentation?

  3. Any errata, quirks, or unresolved issues in your project.

  4. What you learned by doing this project.

The video should include narration to describe what is being presented and should cover the requirements enumerated above. It should be no longer than 5 minutes.

Please upload the video to either YouTube or Google Drive and provide a link to it in your README.md.

Individual Quiz

Once you have completed the project, answer the following Project 03 Quiz questions individually in your own personal assignments repository on GitHub:

The results of the quiz should look like the following:

Checking project03 quiz ...
      Q1 0.50
      Q2 0.50
      Q3 2.00
   Score 3.00 / 3.00
  Status Success

Individual Quiz

Each group member should do the quiz on their own and record their answers in the project03 folder in their assignments GitHub repository.

Once you have committed your work and pushed it to GitHub, remember to create a pull request and assign it to the appropriate teaching assistant from the Reading 09 TA List.

Grading

Your project will be graded on the following metrics:

Metric Points
Source Code
  1. General
    • Builds and cleans without warnings or errors
    • Manages resources such as memory and files appropriately
    • Is consistent, readable, and organized
    • Has regular commits and submitted on-time

  2. Block
    • Implements block_release properly
    • Implements block_detach properly
    • Implements block_merge properly
    • Implements block_split properly

  3. Counters
    • Implements internal_fragmentation properly
    • Implements external_fragmentation properly

  4. Free List
    • Implements free_list_search_ff properly
    • Implements free_list_search_bf properly
    • Implements free_list_search_wf properly
    • Implements free_list_insert properly
    • Implements free_list_length properly

  5. POSIX
    • Implements malloc properly
    • Implements free properly
    • Implements calloc properly
    • Implements realloc properly
10.00
  1. 1.00
    • 0.25
    • 0.25
    • 0.25
    • 0.25

  2. 2.5
    • 0.25
    • 0.25
    • 1.00
    • 1.00

  3. 0.50
    • 0.25
    • 0.25

  4. 3.0
    • 0.50
    • 0.50
    • 0.50
    • 1.00
    • 0.50

  5. 3.0
    • 1.0
    • 0.5
    • 0.5
    • 1.0
Reflection
  1. Group Video Demonstration
    • Exhibits reasonable audio and video quality
    • Demonstrates library passing automated tests
    • Compares and contrasts heap management strategies
    • Discusses errata, quirks, or unresolved issues
    • Discusses what the group learned from the project

  2. Individual Quiz
5.00
  1. 2.0
    • 0.25
    • 0.75
    • 0.50
    • 0.25
    • 0.25

  2. 3.0

Commit History

To encourage you to work on the project regularly (rather than leaving it until the last second) and to practice performing incremental development, part of your grade will be based on whether or not you have regular and reasonably sized commits in your git log.

That is, you will lose a point if you commit everything at once near the project deadline.