Scalable Solutions for DNA Sequence Analysis

February 17,2010

University of Notre Dame
Department of Computer Science & Engineering
Presents: Michael Schatz: University of Maryland

Thursday, February 18, 2010 2:30 p.m. 115-B Galvin

Scalable Solutions for DNA Sequence Analysis

We are at the dawn of a new era in computational biology. DNA sequencing projects that required years of effort and hundreds of millions of dollars of equipment just a few years ago can now be performed quickly and cheaply by individual labs This dramatic shift is ago, can now be performed quickly and cheaply by individual labs. This dramatic shift is expanding the scale and scope of sequencing to previously unimaginable limits, and will ultimately lead to new discoveries about our basic biology, the diversity of life, and personalized medicine. However, these ambitious goals can only be realized if we can develop new computational methods that can effectively analyze the overwhelming volumes of data generated.

In my presentation, I’ll describe my research developing efficient methods for analyzing large biological datasets including by using highly parallel commodity graphics large biological datasets, including by using highly parallel commodity graphics processing units produced by nVidia, and the parallel computing framework MapReduce developed by Google. My programs MUMmerGPU, CloudBurst, Crossbow, and Contrail demonstrate how these technologies can be applied to the critical tasks of large-scale alignment and assembly, enabling genotyping and de novo assembly of whole genome genomes from billions of short reads. Coupled with inexpensive cloud computing, these programs can quickly, cheaply, and accurately analyze tremendous biological datasets and have the potential to make otherwise infeasible studies practical.

Bio: Michael Schatz is a Ph.D. candidate in the Computer Science department at the University of Maryland, and holds positions at both the UMD Center for Bioinformatics and Computational Biology, and the UMD School of Medicine Institute for Genome Sciences.

Prior to starting his Ph.D., he worked for three years at the Institute for Genomic Research (TIGR) contributing to the assembly and analysis of the genomes of several significant organisms. His research interests include high performance computing and parallel organisms. His research interests include high performance computing and parallel algorithms design towards problems in computational biology and genomics. He received his M.S. in Computer Science from the University of Maryland in 2008, and his B.S. in Computer Science from Carnegie Mellon University in 2000.

More information about Michael’s research and publications is available at
http://www.cbcb.umd.edu/~mschatz.

Simulation - Vector Biology/Parasitology

CSE60239 / BIOS60579

Scalable Solutions for DNA Sequence Analysis