Tips about Job Submission

Last update: Jul 22, 2024

Basics

  • Samples for each application can be found at /apl/(application name)/(version/revision)/samples directory. Those sample files can be used as template files for your own software.

Jobscript skeleton for /bin/sh or /bin/bash:

#!/bin/sh
#PBS -l select=...
#PBS -l walltime=(walltime; 24:00:00 for example)

# PBS_O_WORKDIR corresponds to the working directory when you submit the job.
# If you want to use the same working directory when running the job on the computation node(s),
# you need to "cd" to the directory beforehand. 
if [ ! -z ${PBS_O_WORKDIR} ]; then
  cd ${PBS_O_WORKDIR} # if PBS_O_WORKDIR exists, cd to that dir.
fi

(actual commands; file copy, setenv, run, etc.)

Joscript skeleton for csh/tcsh:

#!/bin/csh -f
#PBS -l select=...
#PBS -l walltime=(walltime; 24:00:00 for example)

if ( $?PBS_O_WORKDIR ) then
  cd $PBS_O_WORKDIR # same as sh/bash case.
endif

(actual commands; file copy, setenv, run, etc.)

Sample Job Header (select line)

  • The number following select= represents the number of nodes (1 if omitted).
  • Other numbers (ncpus, mpiprocs, ompthreads, ngpus) are the resource amounts per node.
  • OMP_NUM_THREADS environment variable is automatically set to the number specified by ompthreads.
  • If you employ one of MPI environments installed under /apl (OpenMPI, IntelMPI, MVAPICH), you don't need to specify hostlist (or machinefile) in mpirun argument.
    • "mpirun -np (number specified for "mpiprocs") command options" will work.
    • (If you add --with-tm=/apl/pbs/22.05.11 option to "configure" upon your OpenMPI build, you may not need to specify hostlist.)
  • When you request GPUs, you don' need to pay special attention on CUDA_VISIBLE_DEVICES environment variable. (The job server will handle resources wisely.)
    • (Some software may need a special setting when multiple GPU cards are employed.)

1 vnode, for each vnode: 64 CPU cores, MPI*64, no OpenMP (Flat MPI) (1 node and 64 CPU cores in total), 72 hours (= 3 days) (jobtype=vnode implicitly)

#PBS -l select=1:ncpus=64:mpiprocs=64:ompthreads=1
#PBS -l walltime=72:00:00

jobtype=small, 4 nodes, for each node: 128 CPU cores, MPI*128, no OpenMP (4 nodes and 512 CPU cores in total), 168 hours (= a week) (jobtype=vnode implicitly)

#PBS -l select=4:ncpus=128:mpiprocs=128:ompthreads=1
#PBS -l walltime=168:00:00

jobtype=largemem, 1 node, for each node: 128 CPU cores, MPI*64, OpenMP*2 (1 node and 128 CPU cores in total), 30 minutes

#PBS -l select=1:ncpus=128:mpiprocs=64:ompthreads=2:jobtype=largemem
#PBS -l walltime=00:30:00

jobtype=largemem, 2 vnodes, for each vnode: 64 CPU cores, MPI*64, no OpenMP (2 vnodes and 128 CPU cores in total), 1 hour

#PBS -l select=2:ncpus=64:mpiprocs=64:ompthreads=1:jobtype=largemem
#PBS -l walltime=01:00:00

This job may be performed on two separate nodes or on only one node. For the example one above, that will run on only one node.

1 CPU core, 168 hours (jobtype=core implicitly)

#PBS -l select=1:ncpus=1:mpiprocs=1:ompthreads=1
#PBS -l walltime=168:00:00

12 CPU cores, MPI*4, no OpenMP, 12 hours (jobtype=core implicitly)

#PBS -l select=1:ncpus=12:mpiprocs=4:ompthreads=1
#PBS -l walltime=12:00:00

(Available memory amount is proportional to the number of cpu cores employed. If you want to increase memory amount but not mpiprocs, this types of description is useful/necessary for jobtype=core.)

18 CPU cores, MPI*9, OpenMP*2, 3 hours (jobtype=core implicitly)

#PBS -l select=1:ncpus=18:mpiprocs=9:ompthreads=2
#PBS -l walltime=03:00:00

32 CPU cores, MPI*8, OpenMP*4, 168 hours (jobtype=core implicitly)

#PBS -l select=1:ncpus=32:mpiprocs=8:ompthreads=4
#PBS -l walltime=168:00:00

60 CPU cores, OpenMP*60, 168 hours (jobtype=core implicitly)

#PBS -l select=1:ncpus=60:mpiprocs=1:ompthreads=60
#PBS -l walltime=168:00:00

64 CPU cores, OpenMP*60, 168 hours (jobtype=vnode implicitly)

#PBS -l select=1:ncpus=64:mpiprocs=1:ompthreads=60
#PBS -l walltime=168:00:00

The calculation itself is the same as the example one above; 60 cores are used with OpenMP. However, CPU points per hour of this job is 45, smaller than the example one above (60 points/hour). This difference comes from the jobtype of these two jobs.

1 CPU core, 1 GPU, 24 hours (jobtype=gpu, implicitly)

#PBS -l select=1:ncpus=1:mpiprocs=1:ompthreads=1:ngpus=1
#PBS -l walltime=24:00:00

12 CPU cores, MPI*12, no-OpenMP, 1 GPU, 12 hours (jobtype=gpu, implicitly)

#PBS -l select=1:ncpus=12:mpiprocs=12:ompthreads=1:ngpus=1
#PBS -l walltime=12:00:00

(In case of ngpus=1, ncpus must be <= 16.)

4 nodes, for each node: 8 CPU cores, MPI*2, OpenMP*4, 2 GPUs (4 nodes, 32 CPU cores, 8 GPUs), 12 hours (jobtype=gpu, implicitly)

#PBS -l select=4:ncpus=8:mpiprocs=2:ompthreads=4:ngpus=2
#PBS -l walltime=12:00:00

(Value of "mpiprocs" must be multiple of that of "ngpus". Even if you use only 1 CPU core and multiple GPUs, your "ncpus" specification must be equal to number of GPUs at least.)

2 nodes, for each node: 24 CPU cores, MPI*24, no OpenMP, 2 GPUs (2 nodes, 48 CPU cores, 4 GPUs), 24 hours (jobtype=gpu, implicitly)

#PBS -l select=2:ncpus=24:mpiprocs=24:ompthreads=1:ngpus=2
#PBS -l walltime=24:00:00

(The MPI processes might be scattered up to 4 nodes.)