Grid Engine - Batch Queue Updates - Parallel Environment

With the December arrival of over 3600 additional cores via the new HP AMD Istanbul based servers, the SGE queueing configuration required substantial reconfiguring to reduce fragmentation.

The first and most pressing change has come with the Parallel Environments (PE):

- The ompi PE will become ompi-4, ompi-8, and ompi-12
likewise
- The mpich1 PE will become mpich1-4, mpich1-8, and mpich1-12

The new PEs are already in place and users should begin to use them immediately. The ompi and mpich1 PEs will be removed on Feb 1st.

The numeric designation indicates the number of cores per server. ompi-4 will only map to servers with a total of 4 cores (ddcopts for example), ompi-8 will only map to servers with a total of 8 cores (dqcneh for example), ompi-12 will only map to servers with a total of 12 cores (d6copt for example). This ensures that jobs submitted to a PE will not overlap one another on the same server. It is important to note that the number of cores requested MUST be a multiple of the total number of cores per server.

This is the first of multiple changes. Additional near term changes will focus on the queues to include a 30min debug queue and a 4hr short queue in addition to the default 'long' queue which supports up to 30days run time.

We have added a new wiki page to document the SGE environment.

http://crcmedia.hpcc.nd.edu/wiki/index.php/CRC_SGE_Environment

Also please note that CRC engineers are available to provide CRC resource overview/training segments at your regular research group meetings upon request. Simply send the request to crcsupport@nd.edu and we will work to coordinate suitable schedules.
Regards,
The CRC Staff