Your goal is to produce an annotated bibliography on a medium-size topic. An annotated bibliography is a list of publications accompanied by a short explanation of the value and purpose of each item. A good bibliography should serve as a guidepost for further research. It should identify what sort of research topics have been covered in the past, indicate the relationship between related publications, and suggest ideas for further research.
If you have no idea what you want to study, then begin by reading up on a very broad topic such as:
The exact form of the citation is not crucial so long as you are consistent and complete. A citation should give the author's names, the title of the article, the title of the book/journal/conference, and enough information so that someone else could find it in another library. This means include the publisher, volume and number, page numbers, web address, and other details as appropriate. Each entry should explicitly indicate the type of citation: conference article, journal article, and so forth.
The descriptive paragraph should give enough information to help you or another reader recall its relevance to the scientific community. Describe what the paper is trying to communicate. Is it proposing a new algorithm or architecture, comparing several existing systems, relating experience with an existing system, or something else entirely? What is the main idea expressed in the abstract? How does it relate to previous work? Does it build upon or discredit previous ideas? Either way, chase down several references within the paper, and add them to your bibliography if warranted.
All told, your annotated bibliography should have:
Technical Reports: A technical report is a very preliminary research report that is written internally and then archived at an institution. For example, Notre Dame has its own technical report series. Technical reports are generally written and deposited without being refereed, or sometimes even proofread. However, they are an important vehicle that allow researchers to publically establish their activities or data without the delay of submitting to a conference or journal. Good technical reports are often revised and submitted to a conference or journal.
Conference Articles: A conference is usually a yearly gathering of researchers in the same area of specialization. A conference committee solicits papers for the conference perhaps six months in advance. Papers are refereed by the committee, and those with the best reviews are accepted to the conference. The authors attend the conference and give a short lecture on the paper. After the conference, a book is published, usually called "Proceedings of the Conference on XYZ," containing the papers submitted. The best papers in a conference are often invited to be published in a journal.
Journal Articles. Journals are typically published several times a year. Much like a conference, a journal has a primary editor and a committee of reviewers. Papers may be submitted to a journal at any time, but are generally longer and more polished than those submitted to a conference. If the paper is accepted, the referees may require the paper to be revised before publication. This whole process from submission to publication may take several years. In computer science, a journal paper is considered to be slightly more valuable than a conference paper. (In other fields, a journal paper is much more valuable than a conference paper.)
Books and Book Chapters. Researchers often write books once they have gained a large amount of experience in a given field. Sometimes, an academic book will be have each chapter written by a different author. Books are solely the work of the author(s) and are generally not peer reviewed. Thus, books can serve as an introduction to or overview of a given field, but are not likely to contain any hard research results.
Dissertations. A dissertation is the final result of a master's or doctoral degree. In some sense, it is peer-reviewed because it must pass the muster of the student's reviewing committee. Dissertations are usually a deposit of everything a student has learned in the last 2-7 years, and thus are long and quite detailed. A good dissertation should point you to other papers written of more digestible length by the same student.
Start in a Known Place. Begin by skimming the papers on the class reading list related to your topic, and then follow the references that seem important. Likewise, skim appropriate sections in the recommended textbooks.
Be wary. Journal and conference articles vary widely. The vast majority are mediocre, and only a small number are of great value. Distinguishing between the two may be difficult at first -- that's ok! -- but you will gain confidence with this in time. If you are unsure about the value of a paper, there is no harm in mentioning this in the bibliography.
Search Effectively. Google Scholar (but not ordinary Google) is an easy place to start searching for papers, if you already know the right kinds of keywords to search for. For example, searching for "distributed file systems" in Google Scholar turns up a good selection of highly cited papers. However, you should not rely entirely upon Google, but you should also search in the archives of the professional organizations related to computer science and engineering: ACM, IEEE, and USENIX. Here are their library pages:
Here is where regular Google comes in. Do a search for the entire title with quotes around it: "Network Partitioning of Data Parallel Programs" and you may find a copy placed online by the authors or other readers. Or, you may find nothing.
Of course, before the web was invented, we all spent time in the library. Make a trip down to the first floor to get comfortable with an old friend. Conference proceedings are stored under the name of the conference, so find the ND library web site and search for "high performance distributed computing" in title keywords. The call number of the conference proceedings is QA 76.9 .D5 I593. Find the book and photocopy your article. Of course, while you are there, browse through other issues of the some conference to look for related work.
If you have difficulty finding what you are looking for, please see the instructor for some tips.
(Technical Report)Butler Lampson, Howard Sturgis, "Crash Recovery in a Distributed Data Storage System", Tech Report, Xerox Palo Alto Research Center 1979.
A complete transaction system is build from the ground up in four layers of
abstraction, emphasizing the compositional nature of software. Everything is
proven using simply exhaustive case analysis. I didn't totally understand the
difference between errors and disasters, so I'll have to go back and read that again.
Although it's just a technical report, there are many references to it, so it must
be a classic.
(Conference Article) Michael A. Olson, "The Design and Implementation of the
Inversion File System", Proceedings of the USENIX Winter 1993 Technical Conference.
A simple idea is proposed: Build a filesystem on top of a database by using tables for
metadata and directory structure. Not surprisingly, there is a significant performance
hit: only 30-80 percent of NFS throughput. On the other hand, you get vastly increased
flexibility, including the possibility of using the server itself for computing. It seems
like there should be a more efficient way of getting transactions into files. This paper
relies heavily on Margo Seltzer's work below.
(Journal Article) M. Stonebraker, et al. "Mariposa: A Wide-Area Distributed Database System", VLDB Journal 5:1 January 1996, pages 48-63.
This paper proposes that databases distributed over the WAN are fundamentally different
from databases distributed over the LAN because of the independence of individual
nodes and the expense of moving data over the wide area. Although this is nominally
about databases, I think it will apply to filesystems as well, because the same
distinction between LAN and WAN is necessary. There is a long section on bidding that
will require some careful reading. Stonebraker appears in many database papers.
(Book) Jim Gray and Andreas Reuter, "Transaction Processing: Concepts and Techniques", Morgan Kaufmann, San Francisco, 1993.
This book is an algorithmic bible for building transaction based systems.
Starting with the basics of storage devices, it builds up algorithms for logging,
transactions, recovery and more. Two surprising elements: One, there is a big section on
fault tolerance and the underlying sources of failures; Two, although the focus is
on databases, there is an entire section on filesystems. Note that Jim Gray
was a major figure in the System-R database and the Tandem NonStop system.
(Dissertation) Margo Seltzer, "File System Performance and Transaction Support", Ph.D. Dissertation, University of California at Berkeley, 1992.
This dissertation explores adding transactions to file systems in excruciating detail.
The first few chapters focus on simulation of varying system structures and workloads.
Once a structure is chosen, a transaction-based filesystem is built and evaluated.
I'll have to return to this to see exactly what designs were considered or discarded.