CCL Home

Research

Software Community Operations

Current Research Projects in the CCL

New Programming Models for Large Scale Distributed Computing

Large scientific applications need to harness hundreds to thousands of cores from clusters, clouds, and grids in order to attack problems of interest. At this scale, traditional programming languages are ineffective and unproductive. To address this problem, we are designing and implementing a variety of "little languages" that are easy to learn but are capable of operating at extreme scale. Examples of these little languages include Makeflow, Work Queue, All-Pairs, and Weaver. Our tools have been used to construct extraordinary applications in fields such as bioinformatics, molecular dynamics, and biometrics.

Selected publications in this area: (see all papers)

Long Term Preservation, Analysis and Evolution of Scientific Data

Many fields of science and engineering are awash in enormous amounts of data collected from physical instruments and computational simulations. For the typical researcher, it is a significant challenge to reproduce a result that was generated even a year ago, or to correlate new discoveries with previously collected data. We work closely with researchers in fields such as biometrics and molecular dynamics to design systems that preserve data for the long term while simultaneously enabling sophisticated analysis, reproduction and review. Update: We recently joined a team (DASPOS) charged with developing a long term data and software preservation strategy for the world-wide LHC high energy physics collaboration.

Selected publications in this area: (see all papers)

Collaborative Online Systems for Scientific Discovery

Modern science is inherently collaborative. Many results are obtained not by a single researcher toiling alone, but many researchers working closely together, sharing results, and improving on each other's work. To facilitate this sort of discovery, we construct collaborative online systems that enable researchers to quickly access common tools, execute them on distributed systems, and share the results with each other. Examples of these systems include the Biocompute web portal for bioinformatics, the BXGrid data analysis system for biometrics, the CondorLog workflow anlysis portal, and our work on open sourcing the design of civil infrastructure.

Selected publications in this area: (see all papers)

Completed Research Projects

  • CAREER: Data Intensive Grid Computing on Active Storage Clusters
  • HECURA: Data Intensive Abstractions for High End Biometric Applications
  • Filesystems for Grid Computing
  • Sub Identities: Practical Containment for Distributed Systems
  • Debugging Grids with Machine Learning Techniques
  • TeamTrak: A Testbed for Cooperative Mobile Computing
  • Our work has been generously supported by the U.S. National Science Foundation, the Department of Energy Office of Science, and the Department of Defense University Research Instrumentation Program.