You are here

Tips about Job Submission

Basics

  • Samples for each application can be found at /local/apl/lx/(application name)/samples directory. Those sample files can be used as template files for your own software.
  • skeleton for sh/bash

#!/bin/sh
#PBS -l select=...
#PBS -l walltime=(elaps)

# PBS_O_WORKDIR corresponds to the working directory when you submit the job.
# If you want to use the same working directory when running the job on the computation node(s),
# you need to "cd" to the directory beforehand.

if [ ! -z ${PBS_O_WORKDIR} ]; then
  cd ${PBS_O_WORKDIR} # if PBS_O_WORKDIR exists, cd to the dir.
fi

(actual commands; file copy, setenv, run, etc.)

  • skeleton for csh/tcsh

#!/bin/csh -f
#PBS -l select=...
#PBS -l walltime=(elaps)

if ( $?PBS_O_WORKDIR ) then
  cd $PBS_O_WORKDIR # same as sh/bash case.
endif

(actual commands; file copy, setenv, run, etc.)

Sample Job Header (select line)

  • The number following select= represents the number of nodes (1 if omitted).
  • Other numbers (ncpus, mpiprocs, ompthreads, ngpus) are the resource amounts per node.
  • OMP_NUM_THREADS environment variable is automatically set to the number specified by ompthreads.
  • If you employ one of MPI environments installed under /local/apl/lx (IntelMPI or OpenMPI), you don't need to specify hostlist (or machinefile) in mpirun argument.
    • "mpirun -np (number specified for "mpiprocs") command options" will work.
    • (If you add --with-tm=/local/apl/lx/pbs14 option to "configure" upon your OpenMPI build, you may not need to specify hostlist.)
  • When you request GPUs, you don' need to pay special attention on CUDA_VISIBLE_DEVICES environment variable. (The job server will handle resources wisely.)

jobtype=small, 1 node, for each node: 40 CPU cores, MPI*40, no OpenMP (Flat MPI) (1 node and 40 CPU cores in total), 72 hours (= 3 days)

#PBS -l select=1:ncpus=40:mpiprocs=40:ompthreads=1:jobtype=small
#PBS -l walltime=72:00:00

jobtype=small, 4 nodes, for each node: 40 CPU cores, MPI*40, no OpenMP (4 nodes and 160 CPU cores in total), 168 hours (= a week)

#PBS -l select=4:ncpus=40:mpiprocs=40:ompthreads=1:jobtype=small
#PBS -l walltime=168:00:00

jobtype=large, 1 node, for each node: 40 CPU cores, MPI*20, OpenMP*2 (1 node and 40 CPU cores in total), 30 minutes

#PBS -l select=1:ncpus=40:mpiprocs=20:ompthreads=2:jobtype=large
#PBS -l walltime=00:30:00

jobtype=large, 2 nodes, for each node: 40 CPU cores, MPI*40, no OpenMP (2 nodes and 80 CPU cores in total), 1 hour

#PBS -l select=2:ncpus=40:mpiprocs=40:ompthreads=1:jobtype=large
#PBS -l walltime=01:00:00

jobtype=core, 1 CPU core, 168 hours

#PBS -l select=1:ncpus=1:mpiprocs=1:ompthreads=1:jobtype=core
#PBS -l walltime=168:00:00

jobtype=core, 12 CPU cores, MPI*4, no OpenMP, 12 hours

#PBS -l select=1:ncpus=12:mpiprocs=4:ompthreads=1:jobtype=core
#PBS -l walltime=12:00:00

(Available memory amount is proportional to the number of cpu cores employed. If you want to increase memory amount but not mpiprocs, this types of description will be advantageous for jobtype=core.)

jobtype=core, 18 CPU cores, MPI*9, OpenMP*2, 3 hours

#PBS -l select=1:ncpus=18:mpiprocs=9:ompthreads=2:jobtype=core
#PBS -l walltime=02:00:00

jobtype=core, 32 CPU cores, MPI*8, OpenMP*4, 168 hours

#PBS -l select=1:ncpus=32:mpiprocs=8:ompthreads=4:jobtype=core
#PBS -l walltime=168:00:00

jobtype=core, 36 CPU cores, OpenMP*36, 168 hours

#PBS -l select=1:ncpus=36:mpiprocs=1:ompthreads=36:jobtype=core
#PBS -l walltime=168:00:00

jobtype=gpu, 1 CPU core, 1 GPU, 24 hours

#PBS -l select=1:ncpus=1:mpiprocs=1:ompthreads=1:ngpus=1:jobtype=gpu
#PBS -l walltime=24:00:00

jobtype=gpup, 12 CPU cores, MPI*12, no-OpenMP, 1 GPU, 12 hours

#PBS -l select=1:ncpus=12:mpiprocs=12:ompthreads=1:ngpus=1:jobtype=gpup
#PBS -l walltime=12:00:00

(In case of ngpus=1, ncpus must be <= 12.)

jobtype=gpup, 4 nodes, for each node: 8 CPU cores, MPI*2, OpenMP*4, 2 GPUs (4 nodes, 32 CPU cores, 8 GPUs), 12 hours

#PBS -l select=4:ncpus=8:mpiprocs=2:ompthreads=4:ngpus=2:jobtype=gpup
#PBS -l walltime=12:00:00

(Value of "mpiprocs" must be multiple of that of "ngpus". Even if you use only 1 CPU core and multiple GPUs, your "ncpus" specification must be equal to number of GPUs at least.)

jobtype=gpup, 2 nodes, for each node: 24 CPU cores, MPI*24, no OpenMP, 2 GPUs (2 nodes, 48 CPU cores, 4 GPUs), 24 hours

#PBS -l select=2:ncpus=24:mpiprocs=24:ompthreads=1:ngpus=2:jobtype=gpup
#PBS -l walltime=24:00:00

(If you employ multiple nodes in ngpus=2 and jobtype=(gpu|gpup) job, the MPI processes might be scattered among twice number of nodes you specified.)

jobtype=gpuv, 1 node: 4 CPU cores, MPI*4, no OpenMP, 4 GPUs, 30 minutes

#PBS -l select=1:ncpus=4:mpiprocs=4:ompthreads=1:ngpus=4:jobtype=gpuv
#PBS -l walltime=00:30:00

(program would be run via "mpirun -np 4 (prog)")

jobtype=gpuv, 1 node: 8 CPU cores, MPI*8, no OpenMP, 8 GPUs, 30 minutes

#PBS -l select=1:ncpus=8:mpiprocs=8:ompthreads=1:ngpus=8:jobtype=gpuv
#PBS -l walltime=00:30:00

(program would be run via "mpirun -np 8 (prog)". Multi-node jobs are not allowed for jobtype=gpuv.)

jobtype=gpuv, 1 node: 24 CPU cores, MPI*8, OpenMP*3, 8 GPUs, 30 minutes

#PBS -l select=1:ncpus=24:mpiprocs=8:ompthreads=3:ngpus=8:jobtype=gpuv
#PBS -l walltime=00:30:00

(program would be run via "mpirun -np 8 (prog)". Multi-node jobs are not allowed for jobtype=gpuv.)