Versione 2.2
Abstract
These pages collect some info to allow CERM users to use our computational resources at best. You can download this manual in PDF format here.
Table of Contents
List of Tables
To ask for an account send your request to morelli AT cerm.unifi.it specifying the project that you'll use for calculations. To request a cluster account:
You must to have an active project
You have to be the project's owner
To ensure a secure login session, users must connect to machines using the secure shell, ssh program. Telnet is not allowed because of the security vulnerabilities associated with it. The "r" commands rlogin
, rsh
, and rcp
are also disabled on this machine for similar reasons. These commands are replaced by the more secure alternatives included in SSH --- ssh,scp.
To submit, monitoring and deleting jobs, you have to login on the cluster server named athlon. On atlhon it's also possible to do backups on CD or DVD.
Important | |
---|---|
Plase note that interactive login is only allowed on the cluster server (athlon). Computing nodes are accessed and used only using the queue system. |
The default shell is the bash shell. To change it use the chsh command. At login /etc/motd
file is displayed: please take care of reading it, because information about system are usally written there.
A basic default environment is already set up by means of system login configuration files, this includes variables and paths for the all the compilers and their MPI wrappers of the MPI standard, and OpenPBS/Torque batch queuing system with the MAUI scheduler.
Check your environment with the env command. You should be careful modifying the shell customization files (.cshrc .profile .login .bashrc), since they could overwrite the default values altering the behaviour of the compilers and of the batch queuing system.
Table of Contents
The Portable Batch System, PBS, is a workload management system for Linux clusters. It supplies command to submit, monitor, and delete jobs. It has the following components.
Job Server - also called pbs_server provides the basic batch services such as receiving/creating a batch job, modifying the job, protecting the job against system crashes, and running the job.
Job Executor - a daemon (pbs_mom) that actually places the job into execution when it receives a copy of the job from the Job Server. Mom creates a new session as identical to a user login session as is possible and returns the job's output to the user.
Job Scheduler - a daemon that contains the site's policy controlling which job is run and where and when it is run. PBS allows each site to create its own Scheduler. We are using the Maui Scheduler. The Maui Scheduler can communicate with various Moms to learn about the state of a system's resources and with the Server to learn about the availability of jobs to execute.
Below are the steps needed to run production code:
Create a job script containing the following PBS options:
request the resources that will be needed (i.e. number of processors, wall-clock time, etc.) and
use commands to prepare for execution of the executable (i.e. cd to working directory, etc.).
Submit the job script file to PBS.
Monitor the job.
Below are some of the commonly used PBS options in a job script file. The options start with "#PBS".
Table 4.1. PBS Options
Option | Description |
---|---|
#PBS -N myJob | Assigns a job name. The default is the name of PBS job script. |
#PBS -l nodes=4:ppn=2 | The number of nodes and processors per node. Only for parallel jobs |
#PBS -l walltime=01:00:00 | The maximum wall-clock time during which this job can run. |
#PBS -o mypath/my.out | The path and file name for standard output. |
#PBS -e mypath/my.err | The path and file name for standard error. |
#PBS -j oe | Join option that merges the standard error stream with the standard output stream of the job. |
#PBS -k oe | Define which output of the batch job to retain on the execution host. |
#PBS -W stagein=file_list | Copies the file onto the execution host before the job starts. (*) |
#PBS -W stageout=file_list | Copies the file from the execution host after the job completes. (*) |
#PBS -r n | Indicates that a job should not rerun if it fails. |
#PBS -V | Exports all environment variables to the job. |
Note | |
---|---|
(*) File staging can specify which files should be copied onto the execution host before the job starts and which files should be copied off the execution host when it completes. The file_list regardless of the direction of copy, is of the following form, where the name local_file is the name of the file on the system where the job executes, and the remote_file is the destination name on the host specified by hostname: local_file@hostname:remote_file. stagein=my.input@frontend-0:/home/login_name/my.input stageout=my.output@frontend-0:/home/login_name/my.output |
There are some quite useful Maui commands:
Table 4.2. Maui commands
Command | Description |
---|---|
showq | Show a detailed list of submitted jobs |
showbf | Show the free resources (time and processors available) at the moment |
checkjob job.ID | show a detailed description of the job job.ID |
showstart job.ID | gives an estimate of the expected started time of the job job.ID |
There are a number of predefined environment variables. These include the following:
Variables defined on the execution host;
Variables exported from the submission host to the execution host; and
Variables defined by PBS.
The following environment variables relate to the submission machine:
Table 4.3. PBS Environment Variables
Variable | Description |
---|---|
PBS_O_HOST | The host machine on which the qsub command was run. |
PBS_O_LOGNAME | The login name on the machine on which the qsub was run. |
PBS_O_HOME | The home directory from which the qsub was run. |
PBS_O_WORKDIR | The working directory from which the qsub was run. |
The following variables relate to the environment where the job is executing:
Table 4.4. PBS Environment Variables
Variable | Description |
---|---|
PBS_ENVIRONMENT | This is set to PBS_BATCH for batch jobs and to PBS_INTERACTIVE for interactive jobs. |
PBS_O_QUEUE | The original queue to which the job was submitted. |
PBS_JOBID | The identifier that PBS assigns to the job. |
PBS_JOBNAME | The name of the job. |
PBS_NODEFILE | The file containing the list of nodes assigned to a parallel job. |
The following job script template should be modified for the need of the job.
A job script may consist of PBS directives, comments and executable statements. A PBS directive provides a way of specifying job attributes in addition to the command line options. For example:
#PBS -N Job_name #PBS -l walltime=10:30,mem=320kb # step1 arg1 arg2 step2 arg3 arg4
To run dyana's programs family having a RUN script like:
#!/bin/bash /prog/pseudyana << EOF ./ANNEAL exit EOF
you can write a job script named run, for example, with the following content:
#!/bin/bash -f #PBS -k oe #PBS -m n LAUNCH="./RUN" cd ${PBS_O_WORKDIR} ${LAUNCH} exit
To run amber calculations you can write the following job script (changing all the filename's occurences with real filenames and adding other options if you need):
#!/bin/bash -f #PBS -k oe #PBS -m n #PBS -V LAUNCH="/prog/amber8/exe/sander -O -i filename -o filename -c filename -p filename -r filename" cd ${PBS_O_WORKDIR} ${LAUNCH} exit
To run bash script based calculations you can write the following job script (remember to change the LAUNCH entry):
#!/bin/bash -f #PBS -k oe #PBS -m n LAUNCH="/home_nXX/project/bash_script" cd ${PBS_O_WORKDIR} ${LAUNCH} exit
To run Haddock 1.3 calculations you can write the following job script (the WORKDIR entry points to the directory containing the user's haddock data, remember to change it):
#!/bin/bash #PDS -j oe #PBS -k oe #PBS -V HADDOCK="/prog/haddock1.3" HADDOCKTOOLS="$HADDOCK/tools" PYTHONPATH=$HADDOCK NACCESS="/prog/naccess2.1.1/naccess" PROFIT="/prog/profit/profit" WORKDIR="/home_nXX/project/HADDOCK/run1" LAUNCH="python $HADDOCK/Haddock/Runhaddock.py" cd $WORKDIR $LAUNCH
Use the qsub command to submit the job script (in this example the name of the job script is run).
$ qsub run
PBS assigns a job a unique job identifier once it is submitted (e.g. 123.athlon). After a job has been queued, it is selected for execution based on the time it has been in the queue, wall-clock time limit, and number of processors.
Below are commands for monitoring a job:
Table 4.5. Commands to monitor a job
Command | Description |
---|---|
qstat -a | check status of jobs, queues, and the PBS server |
qstat -f | get all the information about a job, i.e. resources requested, resource limits, owner, source, destination, queue, etc. |
canceljob job.ID | delete a job from the queue |
qhold job.ID | hold a job if it is in the queue |
qrls job.ID | release a job from hold |
Important | |
---|---|
On the master node no production is allowed and any serial execution program lasting more than 5 minutes is automatically deleted. |
Note | |
---|---|
Serial codes are all non parallel programs like dyana , pseudyana , cyana .
|
Execution of serial application on computational nodes can be only done the through the queuing system, even for interactive runs.
Table of Contents
Note | |
---|---|
To run MPI parallel program users have to use the lam environments. |
Suppose for instance you want to run your a
test.x
interactively on four processors then you could use the following sequence of commands:
$ qsub -l nodes=2:ppn=2,walltime=0:30:00 -I
at this point (if there are free resources) you will enter in the batch interactive session, and you could run your test with:
$ lamboot -v $PBS_NODEFILE $ cd testdir $ mpirun -n 4 -no-shmem test.x $ mpirun -np 4
Example of an interactive execution:
$ qsub -l nodes=2:ppn=2,walltime=0:30:00 -I $ cd testdir $ mpirun -n 4 test.x
The following job script template should be modified for the need of the job.
#!/bin/bash -f #PBS -l nodes=2:ppn=2 #PBS -k oe LAMSTART="lamboot $PBS_NODEFILE" LAMSTOP="lamhalt $PBS_NODEFILE" HOME="/home_n01/guest" LAUNCH="mpirun -np 4 cpmd.x" WORKDIR="${HOME}/cp_test" export PP_LIBRARY_PATH=${WORKDIR} cd ${WORKDIR} ${LAMSTART} ${LAUNCH} au_surf_job1.in > au_surf_job1.out ${LAMSTOP} # exit
The following job scripts should be used for GROMACS parallel calculations. The first one is for pre-minimization and the second one is to launch the dinamic calculation.
#!/bin/bash -f #PBS -k oe #PBS -m n PBS_O_WORKDIR="/home_n11/hetdyn/GROMACS/SPI_1ns_cluster" lamboot LAUNCH="./SPI_MINI.csh" cd ${PBS_O_WORKDIR} ${LAUNCH} exit
#!/bin/bash -f #PBS -k oe #PBS -m n PBS_O_WORKDIR="/home_n11/hetdyn/GROMACS/SPI_1ns_cluster" lamboot LAUNCH="./SPI_MD_5PR_1ns.csh" cd ${PBS_O_WORKDIR} ${LAUNCH} exit