CERM Cluster User's guide

Versione 2.2

Abstract

These pages collect some info to allow CERM users to use our computational resources at best. You can download this manual in PDF format here.


Table of Contents

1. How to get an account
2. How to login
3. Login Info and Enviroment Setup
4. OpenPBS/Torque Usage
Batch Processing
PBS Options
Maui commands
PBS Environment Variables
Job Script Template
Job script examples
Submitting a Job
Monitoring a Job
5. Running serial codes
6. Running MPI parallel codes
Running interactive MPI programs
Job Script Template

List of Tables

4.1. PBS Options
4.2. Maui commands
4.3. PBS Environment Variables
4.4. PBS Environment Variables
4.5. Commands to monitor a job

Chapter 1. How to get an account

To ask for an account send your request to morelli AT cerm.unifi.it specifying the project that you'll use for calculations. To request a cluster account:

  • You must to have an active project

  • You have to be the project's owner

Chapter 2. How to login

To ensure a secure login session, users must connect to machines using the secure shell, ssh program. Telnet is not allowed because of the security vulnerabilities associated with it. The "r" commands rlogin, rsh, and rcp are also disabled on this machine for similar reasons. These commands are replaced by the more secure alternatives included in SSH --- ssh,scp.

To submit, monitoring and deleting jobs, you have to login on the cluster server named athlon. On atlhon it's also possible to do backups on CD or DVD.

[Important]Important
Plase note that interactive login is only allowed on the cluster server (athlon). Computing nodes are accessed and used only using the queue system.

Chapter 3. Login Info and Enviroment Setup

Table of Contents

The default shell is the bash shell. To change it use the chsh command. At login /etc/motd file is displayed: please take care of reading it, because information about system are usally written there.

A basic default environment is already set up by means of system login configuration files, this includes variables and paths for the all the compilers and their MPI wrappers of the MPI standard, and OpenPBS/Torque batch queuing system with the MAUI scheduler.

Check your environment with the env command. You should be careful modifying the shell customization files (.cshrc .profile .login .bashrc), since they could overwrite the default values altering the behaviour of the compilers and of the batch queuing system.

Chapter 4. OpenPBS/Torque Usage

Batch Processing

The Portable Batch System, PBS, is a workload management system for Linux clusters. It supplies command to submit, monitor, and delete jobs. It has the following components.

  • Job Server - also called pbs_server provides the basic batch services such as receiving/creating a batch job, modifying the job, protecting the job against system crashes, and running the job.

  • Job Executor - a daemon (pbs_mom) that actually places the job into execution when it receives a copy of the job from the Job Server. Mom creates a new session as identical to a user login session as is possible and returns the job's output to the user.

  • Job Scheduler - a daemon that contains the site's policy controlling which job is run and where and when it is run. PBS allows each site to create its own Scheduler. We are using the Maui Scheduler. The Maui Scheduler can communicate with various Moms to learn about the state of a system's resources and with the Server to learn about the availability of jobs to execute.

Below are the steps needed to run production code:

  1. Create a job script containing the following PBS options:

    • request the resources that will be needed (i.e. number of processors, wall-clock time, etc.) and

    • use commands to prepare for execution of the executable (i.e. cd to working directory, etc.).

  2. Submit the job script file to PBS.

  3. Monitor the job.

PBS Options

Below are some of the commonly used PBS options in a job script file. The options start with "#PBS".

Table 4.1. PBS Options

OptionDescription
#PBS -N myJobAssigns a job name. The default is the name of PBS job script.
#PBS -l nodes=4:ppn=2The number of nodes and processors per node. Only for parallel jobs
#PBS -l walltime=01:00:00The maximum wall-clock time during which this job can run.
#PBS -o mypath/my.outThe path and file name for standard output.
#PBS -e mypath/my.errThe path and file name for standard error.
#PBS -j oeJoin option that merges the standard error stream with the standard output stream of the job.
#PBS -k oe Define which output of the batch job to retain on the execution host.
#PBS -W stagein=file_listCopies the file onto the execution host before the job starts. (*)
#PBS -W stageout=file_listCopies the file from the execution host after the job completes. (*)
#PBS -r nIndicates that a job should not rerun if it fails.
#PBS -VExports all environment variables to the job.
[Note]Note
(*) File staging can specify which files should be copied onto the execution host before the job starts and which files should be copied off the execution host when it completes. The file_list regardless of the direction of copy, is of the following form, where the name local_file is the name of the file on the system where the job executes, and the remote_file is the destination name on the host specified by hostname: local_file@hostname:remote_file. stagein=my.input@frontend-0:/home/login_name/my.input stageout=my.output@frontend-0:/home/login_name/my.output

Maui commands

There are some quite useful Maui commands:

Table 4.2. Maui commands

CommandDescription
showqShow a detailed list of submitted jobs
showbfShow the free resources (time and processors available) at the moment
checkjob job.ID show a detailed description of the job job.ID
showstart job.IDgives an estimate of the expected started time of the job job.ID

PBS Environment Variables

There are a number of predefined environment variables. These include the following:

  • Variables defined on the execution host;

  • Variables exported from the submission host to the execution host; and

  • Variables defined by PBS.

The following environment variables relate to the submission machine:

Table 4.3. PBS Environment Variables

VariableDescription
PBS_O_HOSTThe host machine on which the qsub command was run.
PBS_O_LOGNAMEThe login name on the machine on which the qsub was run.
PBS_O_HOMEThe home directory from which the qsub was run.
PBS_O_WORKDIRThe working directory from which the qsub was run.

The following variables relate to the environment where the job is executing:

Table 4.4. PBS Environment Variables

VariableDescription
PBS_ENVIRONMENT This is set to PBS_BATCH for batch jobs and to PBS_INTERACTIVE for interactive jobs.
PBS_O_QUEUEThe original queue to which the job was submitted.
PBS_JOBIDThe identifier that PBS assigns to the job.
PBS_JOBNAMEThe name of the job.
PBS_NODEFILE The file containing the list of nodes assigned to a parallel job.

Job Script Template

The following job script template should be modified for the need of the job.

A job script may consist of PBS directives, comments and executable statements. A PBS directive provides a way of specifying job attributes in addition to the command line options. For example:


#PBS -N Job_name
#PBS -l walltime=10:30,mem=320kb
#
step1 arg1 arg2
step2 arg3 arg4

Job script examples

Dyana/Pseudyana/Paramagneticdyana

To run dyana's programs family having a RUN script like:


#!/bin/bash
/prog/pseudyana << EOF
./ANNEAL
exit
EOF

you can write a job script named run, for example, with the following content:


#!/bin/bash -f
#PBS -k oe
#PBS -m n
LAUNCH="./RUN"
cd ${PBS_O_WORKDIR}
${LAUNCH}
exit


Amber8

To run amber calculations you can write the following job script (changing all the filename's occurences with real filenames and adding other options if you need):


#!/bin/bash -f
#PBS -k oe
#PBS -m n
#PBS -V
LAUNCH="/prog/amber8/exe/sander -O -i filename -o filename
         -c filename -p filename -r filename"
cd ${PBS_O_WORKDIR}
${LAUNCH}
exit

Bash script

To run bash script based calculations you can write the following job script (remember to change the LAUNCH entry):


#!/bin/bash -f
#PBS -k oe
#PBS -m n
LAUNCH="/home_nXX/project/bash_script"
cd ${PBS_O_WORKDIR}
${LAUNCH}
exit

Haddock 1.3

To run Haddock 1.3 calculations you can write the following job script (the WORKDIR entry points to the directory containing the user's haddock data, remember to change it):


#!/bin/bash
#PDS -j oe
#PBS -k oe
#PBS -V
HADDOCK="/prog/haddock1.3"
HADDOCKTOOLS="$HADDOCK/tools"
PYTHONPATH=$HADDOCK
NACCESS="/prog/naccess2.1.1/naccess"
PROFIT="/prog/profit/profit"
WORKDIR="/home_nXX/project/HADDOCK/run1"
LAUNCH="python $HADDOCK/Haddock/Runhaddock.py"
cd $WORKDIR
$LAUNCH

Submitting a Job

Use the qsub command to submit the job script (in this example the name of the job script is run).


$ qsub run


PBS assigns a job a unique job identifier once it is submitted (e.g. 123.athlon). After a job has been queued, it is selected for execution based on the time it has been in the queue, wall-clock time limit, and number of processors.

Monitoring a Job

Below are commands for monitoring a job:

Table 4.5. Commands to monitor a job

CommandDescription
qstat -acheck status of jobs, queues, and the PBS server
qstat -f get all the information about a job, i.e. resources requested, resource limits, owner, source, destination, queue, etc.
canceljob job.ID delete a job from the queue
qhold job.ID hold a job if it is in the queue
qrls job.ID release a job from hold

Chapter 5. Running serial codes

[Important]Important
On the master node no production is allowed and any serial execution program lasting more than 5 minutes is automatically deleted.
[Note]Note
Serial codes are all non parallel programs like dyana, pseudyana, cyana.

Execution of serial application on computational nodes can be only done the through the queuing system, even for interactive runs.

Chapter 6. Running MPI parallel codes

[Note]Note
To run MPI parallel program users have to use the lam environments.

Running interactive MPI programs

Suppose for instance you want to run your a test.x interactively on four processors then you could use the following sequence of commands:


$ qsub -l nodes=2:ppn=2,walltime=0:30:00 -I 

at this point (if there are free resources) you will enter in the batch interactive session, and you could run your test with:


$ lamboot -v $PBS_NODEFILE
$ cd testdir 

$ mpirun -n 4 -no-shmem test.x
$ mpirun -np 4 

Example of an interactive execution:


$ qsub -l nodes=2:ppn=2,walltime=0:30:00 -I
$ cd testdir 

$ mpirun -n 4 test.x 

Job Script Template

The following job script template should be modified for the need of the job.


#!/bin/bash -f
#PBS -l nodes=2:ppn=2
#PBS -k oe
LAMSTART="lamboot $PBS_NODEFILE"
LAMSTOP="lamhalt $PBS_NODEFILE"
HOME="/home_n01/guest"
LAUNCH="mpirun -np 4 cpmd.x"
WORKDIR="${HOME}/cp_test"
export PP_LIBRARY_PATH=${WORKDIR}
cd ${WORKDIR}
${LAMSTART}
${LAUNCH} au_surf_job1.in > au_surf_job1.out
${LAMSTOP}
#
exit

The following job scripts should be used for GROMACS parallel calculations. The first one is for pre-minimization and the second one is to launch the dinamic calculation.


#!/bin/bash -f
#PBS -k oe
#PBS -m n
PBS_O_WORKDIR="/home_n11/hetdyn/GROMACS/SPI_1ns_cluster"
lamboot
LAUNCH="./SPI_MINI.csh"
cd ${PBS_O_WORKDIR}
${LAUNCH}
exit


#!/bin/bash -f
#PBS -k oe
#PBS -m n
PBS_O_WORKDIR="/home_n11/hetdyn/GROMACS/SPI_1ns_cluster"
lamboot
LAUNCH="./SPI_MD_5PR_1ns.csh"
cd ${PBS_O_WORKDIR}
${LAUNCH}
exit