BIM Compute Cluster

erstellt von admin — zuletzt verändert: Jul 21, 2010 03:14 PM

The BIM (Bioinformatics Initiative Munich) compute cluster is a high performance Linux cluster for the joint course in bioinformatics of LMU and TUM. The cluster consists of 24 default nodes, 2 extended nodes and 2 server nodes, all based on the AMD Opteron CPU architecture and Ubuntu Linux.

News:

2010-07-23: BIM Cluster planned maintenance on July 23rd
The whole BIM Cluster will be offline from Friday, July 23rd 14:00 until open end due to a major firmware upgrade on the storage systems. Make sure that you do not submit jobs that will run past maintenance starting time. All running jobs will have to be cancelled. Please recognize that all exported filesystems will not be available. The end of maintenance will be announced on the mailing list bimcluster@binfo.bio.wzw.tum.de. You may want to subscribe to this list at http://binfo.bio.wzw.tum.de/mailman/listinfo/bimcluster
2010-05-27: The BIM cluster Compute Nodes have been upgraded to Ubuntu 10.04 LTS (same version as in the CIP pools)
2009-10-19: We had a storage disconnect on one of the servers of the cluster. To recover the system, we had to terminate all batch jobs and to repair the filesystems. Please restart any batch jobs that were aborted. We are sorry for the inconvenience caused.
2008-12-23: The BIM cluster has been upgraded to Ubuntu 8.04 LTS (same version as in the CIP pools).
2008-06-04: New host keys had to be generated on all BIM cluster machines due to an openssh security upgrade. Please remove all previous hosts keys of BIM cluster hosts from your .ssh/known_hosts file and import the new keys during next login.

Technical overview

The BIM compute cluster is physically located at the TUM, Wissenschaftszentrum Weihenstephan, Freising. It fills two racks in the serverroom ot the Informationstechnologie Weihenstephan which is the local IT service group for the Wissenschaftszentrum Weihenstephan.
Although the hardware of this cluster consists of high performance server components, the configuration of the cluster is as close as possible to the computer pools at the LMU, that are known to all students from practical courses. File shares of the cluster will be mounted on the machines of the computer pool Amalienstr. 17 soon.
Some facts about the cluster:

104 CPU cores for computing (221 GFLOPS)
256 GB overall RAM for computing
1000Base-T Gigabit LAN interconnects between the machines and to the intranet of the munich universities
5 years warranty on all hardware components (01/2007-12/2011)
Batch computing controlled by SUN GridEngine available for serial and parallelized jobs (OpenMP, OpenMPI)
All file server resources needed for batch computing are locally available, therefore any interruption of the intranet services will not affect running calculations

Component	Description
Compute nodes	24 Sun Fire X2200 M2 x64 servers 2xAMD Opteron 2218 (2.6 GHz/1MB) dual core CPUs per node 8 GB ECC DDR2-667 RAM per node 2x250GB internal SATA harddrives per node (mirrored) hostnames bimsc01..24.cip.ifi.lmu.de
Server nodes and high memory compute nodes	4 Sun Fire X4200 M2 x64 servers 2xAMD Opteron 2218 (2.6 GHz/1MB) dual core CPUs per high memory compute node, 2xAMD Opteron 2216 (2.4 GHz/1MB) dual core CPUs per server node 32 GB ECC DDR2-667 RAM per high memory compute node, 16 GB ECC DDR2-667 RAM per server node 4x73GB internal SAS harddrives per node (mirrored) 4GBit/s Fibre channel host adapters in the server nodes hostnames bimsc25..26.cip.ifi.lmu.de for high memory compute nodes, bimscs1..2.cip.ifi.lmu.de for server nodes
Storage and UPS	Two redundant 4 GBit/s Fibre channel SAN switches Sun StorageTek 6140 Arrays with 2 GB Cache and 4 host ports, 8x500GB SATA harddrives (7.200 RPM) provides 4 TB SATA storage Sun StorageTek 6140 Arrays with 2 GB Cache and 4 host ports, 16x146GB FC harddrives (10.000 RPM) provides 2 TB FC storage APC Smart UPS 5000VA/3750W for the protection of storage and server nodes from power failures

Access to the cluster

All students and staff of the bioinformatics course, who have a user account of the Rechnerbetriebsgruppe des Instituts für Informatik der LMU, can access the cluster via ssh. To get a Login follow this link during Summer Semester and this one during Winter Semester.

There are two dedicated login nodes for students:

bimsc01.cip.ifi.lmu.de
bimsc02.cip.ifi.lmu.de

Scientists are permitted to access all nodes (bimsc01..26) interactively, but should also stick to the guidelines below. Students can get interactive access to all nodes, e.g. for their master or diploma thesis, upon request.

Usage of the login nodes

On the login nodes the following activities are permitted:

transfer of files
compilation, debugging and short-term (up to 15 min.) test of programs
submission of batch jobs

All other activities, especially software development and the use of interactive programs, have to be performed outside the cluster, e.g. on the computers in the CIP pools.

Cluster queues

The cluster has four queues: all.q, short.q, highmem.q and prakt.q. The queues should NOT be accessed directly, but using corresponding projects: long_proj, short_proj, highmem_proj and prakt_proj Jobs running in the shortproj ill be killed if they run longer than 30 minutes. To avoid that use either the option "-l h_rt=" option in the qsub command to specify the needed runtime, or specify explicitely the project you want to submit (-P long_proj for example).
The prakt_proj is a special project used only for practial courses in a temporary manner. If a practial course is running only students participating in this will have access to the nodes assigned to prakt_proj. For the usage of the grid system it does not matter whether a practical course is running or not, there are no changes needed from the user's side.

Access to high memory nodes

Batch access to the high memory nodes bimsc25 and bimsc26 is open to every user as described below. To these hosts only the projects short_proj and long_proj have access. If there is any job submitted requesting high memory (i.e. the highmem_proj is specified), this job will be as soon as possible executed on these hosts. If there is no job requesting high memory, short jobs are allowed to run on the hosts to enable full usage of the nodes.

File system resources

There are four cluster specific mount points that will be also available in the CIP pool Amalienstr.:

/home/proj/BIMSC/home (home directories, 805GB FC storage, daily incremental backup)
/home/proj/BIMSC/proj (shared directories for projects, 805GB FC storage, daily incremental backup)
/home/proj/BIMSC/apps (bioinformatics applications, 1TB SATA storage, daily incremental backup)
/home/proj/BIMSC/data (bioinformatics databases, 1.7TB SATA storage, no backup)

The lmu, wzw and cip folders in apps and data are automatically kept up to date by the respective institutions.

Submitting batch jobs

The main purpose of the cluster is to run batch jobs via the GridEngine batch system. Users that are not familiar with this system, should read the GridEngine users guide first.

Temporary files should be removed when a job is finished. Do not keep any data in /tmp, because regular service procedures remove all files in this directory automatically. To ensure unique temporary directories even if several jobs are running on the same machine, the GridEngine system creates for every job a specific, empty temporary directory and provides its full path in the environmental variable TMPDIR. This job specific temporary directory is automaticlly removed by the GridEngine after the job is finished.

For optimal performance batch jobs should not be shorter than 5 minutes. Otherwise the overhead of the batch system would waste too much CPU time. Due to regular system maintenance, the batch jobs should not run longer than one week. The optimal duration of a batch job is around one hour. Please distribute the work if possible to multiple jobs of this runtime.

If submitting massive amounts of jobs, decrease the priority of your jobs using the -p option, e.g. "qsub -p -500 ...".

Batch jobs up to 7800MB of RAM and running time < 30 minutes

Short batch jobs that need up to 7.800 MB of RAM per job run on all compute nodes.

This is an example job script containing all necessary options for the BIM cluster grid:

# Script submit.sh for running a blast job
# This option sets the short project for the job, which causes in turn
# that the job is allowed to run on all hosts, but the runtime may not
# excess 30 minutes.
#$-P short_proj
#
# This options tells gridware to change to the current directory before
# executing the job (default is the home of the user)
#$-cwd
#
# This option declares the amount of memory the job will
# use. Specify this value as accurat as possible. If you
# declare too low memory usage, your job may be aborted.
# If you declare too high memory usage, your job might be
# wait for a long time until it is started on a machine
# that has the sufficient amount of free memory.
#$-l vf=1500m
#
# This option declares the cpu time the job will
# use. Specify this value as accurate as possible.
# This example job request a cpu time of 15 minutes.
#$-l cpu=0:15:0
#
# Specify this option only for multithreaded jobs that use
# more than one cpu core. The value 1..4 denotes the number
# of requested cpu cores. Please note that multithreaded jobs
# are always calculated on a single machine - for parallel
# jobs you should use MPI instead.
# Another important hint: memory request by -l vf=... are
# multiplied by the number of requested cpu cores. Thus you
# should divide the overall memory consumption of your job by
# the number of parallel threads.
#$-pe serial 4
#
# Please insert your mail address!
#$-M username@host.de

# source the common environment
. /etc/profile

blastall -p blastp -i query.fa -d database.fa
# Script ends here.

Batch jobs up to 7800MB of RAM and running time > 30 minutes

Default batch jobs that need up to 7.800 MB of RAM per job run only a dynamically defined set of nodes, to allow the execution of short jobs even if there are several long jobs queued.

This is an example job script containing all necessary options for the BIM cluster grid:

# Script submit.sh for running a blast job
# This option sets the long project for the job, which causes in turn
# that the job is NOT allowed to run on all hosts, but the runtime is not limited.
# However, please avoid jobs with running times exceeding 2 days, and if submitting
# such _very_ long jobs please submit only a few (< 5) of them.
# This option is default, i.e. if -P is not specified -P long_project will be set.
#$-P long_proj
#
# This options tells gridware to change to the current directory before
# executing the job (default is the home of the user)
#$-cwd
#
# This option declares the amount of memory the job will
# use. Specify this value as accurat as possible. If you
# declare too low memory usage, your job may be aborted.
# If you declare too high memory usage, your job might be
# wait for a long time until it is started on a machine
# that has the sufficient amount of free memory.
#$-l vf=1500m
#
# This option declares the cpu time the job will
# use. Specify this value as accurate as possible.
# This example job request a cpu time of 90 minutes.
#$-l cpu=1:30:0
#
# Please insert your mail address!
#$-M username@host.de

# source the common environment
. /etc/profile

blastall -p blastp -i query.fa -d database.fa
# Script ends here.

Calculations that need more than 7800MB RAM (high_mem jobs)

The high memory nodes bimsc25 and bimsc26 provide each 32GB of RAM (up to 31.800MB can be used for batch jobs). Due to the limited number of jobs that can be run on these two nodes, high memory nodes have to be explicitly requested in the batch scripts. Furthermore, one for both machines can be disabled for batching if a user has requested these machines for interactive calculations.

This is an example job script containing all necessary options for the BIM cluster grid to run high memory jobs:

# Script submit.sh for running a blast job
#
# Selecting the high memory project to use only nodes with sufficient memory
#$-P highmem_proj
#
# Other options, for explanation see the above example
#$-cwd
# Please note: the vf amount is multiplied with the number of requested slots
# (in this case: 4 slots * 7900m = 31600m which fits the 32GB physical RAM)
#$-l vf=7900m
#$-l cpu=0:45:0
# Request 4 CPU clots on same machine (SMP)
#$-pe serial 4
#$-M username@host.de

# source the common environment
. /etc/profile

blastall -p blastp -i query.fa -d database.fa
# Script ends here.

Resource reservation and backfilling

Resource reservation enables you to reserve system resources (in our case either RAM or cpu cores) for specified pending jobs. When you reserve resources for a job, those resources are blocked from being used by jobs with lower priority. Jobs can reserve resources depending on criteria such as resource requirements, job priority, waiting time, resource sharing entitlements, and so forth. The scheduler enforces reservations in such a way that jobs with the highest priority get the earliest possible resource assignment. This avoids such well-known problems as "job starvation". You can use resource reservation to guarantee that resources are dedicated to jobs in job-priority order by using the "-R y" option of the qsub command, e.g.:

qsub -R y bigjob.sh

To keep the overall performance of the cluster high, it is important to not use reservation by default, but to limit reservation scheduling to only those jobs that are important and acquire high amounts of memory or parallel environments.

Backfilling enables lower-priority jobs to use blocked resources when using those resources does not interfere with the reservation. This feature requires to specify a hard limit of the jobs runtime (which is enforced by the scheduler), which is shorter than the default (not enforced) job runtime of 6 hours, by using the "-l h_rt=" option of the qsub command, e.g.:

qsub -l h_rt=00:30:00 smallshortjob.sh

Submitting batch jobs from computer pool Amalienstr.

Batch jobs can be dirextly submitted from the computer pool Amalienstr. 17 (without logging into the bim cluster). First source the grid environment:

source /home/proj/BIMSC/proj/gridengine/bimcluster/common/settings.sh

Then submit the batch job(s) as described above. Make sure that all files and applications needed by the batch jobs are stored in the /home/proj/BIMSC/ directories.

Benutzerspezifische Werkzeuge