DISC - Data Intensive Science Cluster
The DISC is a shared computing facility managed by the Cooperative Computing Lab and the Center for Research Computing at the University of Notre Dame. The facility provides unique capabilities for rapidly exploring, processing, and visualizing multi-terabyte datasets, in support of research groups in biology and bioinformatics, biometrics and computer vision, molecular dynamics, systems biology, and computer systems research.
The following interfaces are currently available for using the DISC:
The Biocompute web portal provides access to genomics data and processing tools stored on the DISC.
The BXGrid web portal provides access to biometrics research data stored on the DISC.
The Condor batch system provides access to the computing cycles available on the cluster.
The Hadoop data processing system provides the ability to run Map-Reduce jobs on the cluster.
The Chirp distributed filesystem presents the cluster as one big 180TB storage device visible at disc01.crc.nd.edu:9090.
The DISC cluster was acquired via a Notre Dame Equipment Replacement and Renewal grant in early 2011.
The five parties to the grant will have first priority to the resources available on the cluster, in approximately equal proportion:
Computer Vision Research Lab (Patrick Flynn (CSE) and Kevin Bowyer (CSE))
Bioinformatics and Biology (Scott Emrich (CSE), Jeanne Romero-Severson (BIOS), Frank Collins (BIOS), Nora Besansky (BIOS), Patricia Clark (Chem/Biochem), Michael Pfrender (BIOS))
Laboratory for Computational and Life Sciences (Jesus Izaguirre (CSE) and Chris Sweet (CRC)
Cyberinfrastructure Lab (Greg Madey (CSE))
The Cooperative Computing Lab (Douglas Thain CSE))
Other parties on campus are welcome to make use of the DISC by submitting
Condor jobs or by accessing data in Hadoop. However, such use will have
lower priority and may be interrupted if needed to service the primary parties.
Users should note that the cluster is primarily for the analysis and processing of large data sets.
While data in active use may stay resident on the cluster for some time, it is not meant to be a backup system,
nor is it guaranteed to be highly reliable, so valuable data should be backed up, and cold data should be stored elsewhere.
The DISC contains 26 nodes, consisting of:
12 x 2TB SATA disks.
2 x 8-core Intel Xeon E5620 CPUs @ 2.40GHz
The disks on each node are operated individually, and are currently configured as follows:
|Disk 1 || Operating System || /
|Disk 2 || Condor || /var/condor
|Disk 3 || Chirp - General || /data/chirp
|Disk 4 || Chirp - Biocompute || /data/chirp/biocompute
|Disk 5 || Chirp - Biometrics || /data/chirp/bxgrid
|Disk 6 || Hadoop || /data/hadoop/volume1
|Disk 7 || Hadoop || /data/hadoop/volume2
|Disk 8 || Hadoop || /data/hadoop/volume3
|Disk 9 || Hadoop || /data/hadoop/volume4
|Disk 10 || Unassigned || /data/scratch1
|Disk 11 || Unassigned || /data/scratch2
|Disk 12 || Unassigned || /data/scratch3
Both AFS and CRC /pscratch are mounted on all nodes of the cluster, to facilitate data transfer between systems.