Annotated Bibliography

Before you begin your course project, you must spend some time learning about the general subject area. You don't have to have a specific project in mind just yet, but you should select an area that is of interest to you, and begin to read about it.

Your goal is to produce an annotated bibliography on a medium-size topic. An annotated bibliography is a list of publications accompanied by a short explanation of the value and purpose of each item. A good bibliography should serve as a guidepost for further research. It should identify what sort of research topics have been covered in the past, indicate the relationship between related publications, and suggest ideas for further research.

If you have no idea what you want to study, then begin by reading up on a very broad topic such as:

Distributed Filesystems

Virtual Machines

Memory Allocation

A bibliography on any of the three topics given above would be far too large! As you proceed with your research, look for themes that run through what you are reading. Begin to narrow your topic down to something more specific, such as:

Consistency Management in Distributed Filesystems

Performance Overhead of Virtual Machines

Ensuring Fairness in Memory Allocation

Ideally, your bibliography will encompass your course project. Some ideas for bibliography titles are given in the list of project ideas. However, you aren't committed to a particular project at this point.

Requirements

Your annotated bibliography must be a collection of references all reasonably related to your chosen title. Each entry must be given a complete citation and must be accompanied by one solid paragraph summarizing the paper and its relevance to your topic area.

The exact form of the citation is not crucial so long as you are consistent and complete. A citation should give the author's names, the title of the article, the title of the book/journal/conference, and enough information so that someone else could find it in another library. This means include the publisher, volume and number, page numbers, web address, and other details as appropriate. Each entry should explicitly indicate the type of citation: conference article, journal article, and so forth.

The descriptive paragraph should give enough information to help you or another reader recall its relevance to the scientific community. Describe what the paper is trying to communicate. Is it proposing a new algorithm or architecture, comparing several existing systems, relating experience with an existing system, or something else entirely? What is the main idea expressed in the abstract? How does it relate to previous work? Does it build upon or discredit previous ideas? Either way, chase down several references within the paper, and add them to your bibliography if warranted.

All told, your annotated bibliography should have:

At least 25 items total.

At least 20 refereed journal, conference, or workshop articles not on the course reading list.

Up to 3 books, book chapters, monographs, or dissertations.

Up to 3 other unrefereed items such as magazine articles, technical reports, and web pages.

2 items of any type from each of these decades: 1970, 1980, 1990, 2000.

Publication Types

Following are the types of publications that you should concentrate on:

Technical Reports: A technical report is a very preliminary research report that is written internally and then archived at an institution. For example, Notre Dame has its own technical report series. Technical reports are generally written and deposited without being refereed, or sometimes even proofread. However, they are an important vehicle that allow researchers to publically establish their activities or data without the delay of submitting to a conference or journal. Good technical reports are often revised and submitted to a conference or journal.

Conference Articles: A conference is usually a yearly gathering of researchers in the same area of specialization. A conference committee solicits papers for the conference perhaps six months in advance. Papers are refereed by the committee, and those with the best reviews are accepted to the conference. The authors attend the conference and give a short lecture on the paper. After the conference, a book is published, usually called "Proceedings of the Conference on XYZ," containing the papers submitted. The best papers in a conference are often invited to be published in a journal.

Journal Articles. Journals are typically published several times a year. Much like a conference, a journal has a primary editor and a committee of reviewers. Papers may be submitted to a journal at any time, but are generally longer and more polished than those submitted to a conference. If the paper is accepted, the referees may require the paper to be revised before publication. This whole process from submission to publication may take several years. In computer science, a journal paper is considered to be slightly more valuable than a conference paper. (In other fields, a journal paper is much more valuable than a conference paper.)

Books and Book Chapters. Researchers often write books once they have gained a large amount of experience in a given field. Sometimes, an academic book will be have each chapter written by a different author. Books are solely the work of the author(s) and are generally not peer reviewed. Thus, books can serve as an introduction to or overview of a given field, but are not likely to contain any hard research results.

Dissertations. A dissertation is the final result of a master's or doctoral degree. In some sense, it is peer-reviewed because it must pass the muster of the student's reviewing committee. Dissertations are usually a deposit of everything a student has learned in the last 2-7 years, and thus are long and quite detailed. A good dissertation should point you to other papers written of more digestible length by the same student.

Hints on Research

Tread lightly! You do not need to read each paper thoroughly. In fact, you do not have time to read all of the papers in your bibliography! Begin by reading the abstract. If it is not relevant to your topic, toss it out right away. If it is relevant, then read the introduction and conclusions and skim over the middle parts. Summarize the main points, save a copy or a printout, and move on. If there is a detailed algorithm or idea, jot it down and return to it later if you deem it to be important. Of course, you will have to return and read some of these papers carefully at a later time.

Start in a Known Place. Begin by skimming the papers on the class reading list related to your topic, and then follow the references that seem important. Likewise, skim appropriate sections in the recommended textbooks.

Be wary. Journal and conference articles vary widely. The vast majority are mediocre, and only a small number are of great value. Distinguishing between the two may be difficult at first -- that's ok! -- but you will gain confidence with this in time. If you are unsure about the value of a paper, there is no harm in mentioning this in the bibliography.

Search Effectively. Google Scholar (but not ordinary Google) is an easy place to start searching for papers, if you already know the right kinds of keywords to search for. For example, searching for "distributed file systems" in Google Scholar turns up a good selection of highly cited papers. However, you should not rely entirely upon Google, but you should also search in the archives of the professional organizations related to computer science and engineering: ACM, IEEE, and USENIX. Here are their library pages:

ACM Digital Library

IEEE Computer Society Digital Library

USENIX Publications

Going a little deeper, try looking through the tables of contents of well-known conferences and journals. The following are well-known publications specifically about operating systems:

OSDI - Operating Systems Design and Implementation

USENIX - Annual Conference

SOSP - Symposium on Operating Systems Principles

TOCS - Transactions on Computer Systems

The following are more specialized and may be appropriate, depending on your choice of bibliography topic:

ASPLOS - ACM Architectural Support of Programming Languages and Operating Systems

FAST - ACM File and Storage Systems

HPDC - IEEE High Performance Distributed Computing

MobiCom - ACM Mobile Computing

NSDI - USENIX Networked Systems Design and Implementation

IPDPS - IEEE Parallel and Distributed Processing Symposium

USENIX VM - Virtual Machines

PODC - ACM Principles of Distrbuted Computing (Theory and Algorithms.)

Now, suppose that you come across a reference to an article that is either quite old or otherwise not online. For example, the following paper appeared in the conference HPDC but is not available online at the HPDC website:

J. B. Weissman, A. S. Grimshaw, "Network Partitioning of Data Parallel Programs", Proceedings of the Third IEEE Symposium on High Performance Distributed Computing.

Here is where regular Google comes in. Do a search for the entire title with quotes around it: "Network Partitioning of Data Parallel Programs" and you may find a copy placed online by the authors or other readers. Or, you may find nothing.

Of course, before the web was invented, we all spent time in the library. Make a trip down to the first floor to get comfortable with an old friend. Conference proceedings are stored under the name of the conference, so find the ND library web site and search for "high performance distributed computing" in title keywords. The call number of the conference proceedings is QA 76.9 .D5 I593. Find the book and photocopy your article. Of course, while you are there, browse through other issues of the some conference to look for related work.

If you have difficulty finding what you are looking for, please see the instructor for some tips.

Example Bibliography

Transaction Support in File Systems
Iam A. Student

(Technical Report)Butler Lampson, Howard Sturgis, "Crash Recovery in a Distributed Data Storage System", Tech Report, Xerox Palo Alto Research Center 1979.
A complete transaction system is build from the ground up in four layers of abstraction, emphasizing the compositional nature of software. Everything is proven using simply exhaustive case analysis. I didn't totally understand the difference between errors and disasters, so I'll have to go back and read that again. Although it's just a technical report, there are many references to it, so it must be a classic.

(Conference Article) Michael A. Olson, "The Design and Implementation of the Inversion File System", Proceedings of the USENIX Winter 1993 Technical Conference.
A simple idea is proposed: Build a filesystem on top of a database by using tables for metadata and directory structure. Not surprisingly, there is a significant performance hit: only 30-80 percent of NFS throughput. On the other hand, you get vastly increased flexibility, including the possibility of using the server itself for computing. It seems like there should be a more efficient way of getting transactions into files. This paper relies heavily on Margo Seltzer's work below.

(Journal Article) M. Stonebraker, et al. "Mariposa: A Wide-Area Distributed Database System", VLDB Journal 5:1 January 1996, pages 48-63.
This paper proposes that databases distributed over the WAN are fundamentally different from databases distributed over the LAN because of the independence of individual nodes and the expense of moving data over the wide area. Although this is nominally about databases, I think it will apply to filesystems as well, because the same distinction between LAN and WAN is necessary. There is a long section on bidding that will require some careful reading. Stonebraker appears in many database papers.

(Book) Jim Gray and Andreas Reuter, "Transaction Processing: Concepts and Techniques", Morgan Kaufmann, San Francisco, 1993.
This book is an algorithmic bible for building transaction based systems. Starting with the basics of storage devices, it builds up algorithms for logging, transactions, recovery and more. Two surprising elements: One, there is a big section on fault tolerance and the underlying sources of failures; Two, although the focus is on databases, there is an entire section on filesystems. Note that Jim Gray was a major figure in the System-R database and the Tandem NonStop system.

(Dissertation) Margo Seltzer, "File System Performance and Transaction Support", Ph.D. Dissertation, University of California at Berkeley, 1992.
This dissertation explores adding transactions to file systems in excruciating detail. The first few chapters focus on simulation of varying system structures and workloads. Once a structure is chosen, a transaction-based filesystem is built and evaluated. I'll have to return to this to see exactly what designs were considered or discarded.