Submit Jobs (jsub)

(Last update: Apr 22, 2024)

You can submit your jobs with "jsub" command. Input files (and jobscript) should be placed on ccfep beforehand using scp or sftp etc. (For file transfer using scp and sftp, please check quick start guide page.)

For Gaussian jobs, there are special comands g16sub and g09sub. Please consider g16sub/g09sub first, although jsub can be used.

Basic Usage

You need to prepare a job script (named "job.sh" here; the detail is described below) first. You can submit a jobs using "jsub" command and "job.sh" file as follows. (Don't type leading $.) If the job is successfully accepted by the job server, job ID and server name will be shown.

$ jsub job.sh
6169323.ccpbs1
$

The "jsub" command itself finishes immediately. However, submitted job is not always executed immediately. If there are no available resources (CPU or GPU), the job needs to wait for a while. Submitted jobs remain in the system even when you log out.

The displayed number (6169323 in the example above) is the ID of the job (job ID). This is used when you delete the job. However, it is not necessary to always remember the displayed job ID. This is because job IDs can be listed with "jobinfo" command after the job submission.

Although you don't need to specify queue name, you can specify the queue name like "jsub -q H job.sh".

Optional parameters of "jsub" command can be listed with "jsub --help" command. Please use "jobinfo" command to check the status of submitted jobs. If you want to delete or cancel jobs, please use "jdel" command.

Prepare Jobscript

For applications installed by RCCS, there are samples of jobscript and input files. For the standard location of those files in RCCS, please check Sample Jobs page. These sample jobscripts can be used as a template file for your own jobs.

We here show an example jobscript. This request 64 CPU cores, 32 MPI processes and 2 OpenMP threads. The maximum wall time of this job is 24 hours. This will run "my-program" installed in the home directory. "my-program" uses Open MPI 4.1.6 installed by RCCS. (NOTE: annotation part beginning with ⇐ in the example shouldn't be present in the actual job script file. It will cause errors.)

#!/bin/sh   ⇐ 1 Choose Interpreter (shebang)
#PBS -l select=1:ncpus=64:mpiprocs=32:ompthreads=2  ⇐ 2 Request Resources
#PBS -l walltime=24:00:00  ⇐ 3 Max Walltime of Job
                           ⇐ 4 Optional Parameters can be added here

# chdir to the working directory when you submit job (recommended; possible to skip)
# When job starts on computation node, the working directory is usually your home directory.
# If you skip this operation, you may have trouble with input and output file paths for example.
cd ${PBS_O_WORKDIR}

# do module setting if necessary
module -s purge
module -s load openmpi/4.1.6

# you can set parameters for your application
# here 
export PATH="/home/users/${USER}/bin:$PATH"

INPUT=myinp.inp
OUTPUT=myout.out

# run program
mpirun -np 32 my-program -i ${INPUT} -o ${OUTPUT}

1 - Choose Interpreter (shebang; required)

Choose interpreter of the script (shebang). Normally, one of the following is used (corresponding to sh/bash, csh/tcsh, or zsh).

  • #!/bin/sh
  • #!/bin/csh -f
  • #!/bin/zsh

This interpreter definition is necessary.

2 - Request Resources (select; required)

Queue information and queue factors (CPU points definition) can be found at this page.
Some samples of this section is also available at this page.

Numbers of CPU cores, MPI processes, OpenMP threads, and GPUs are specified here ("#PBS -l" at the beginning of the line is also necessary). Basic format is summarized as follows.

#PBS -l select=(# of nodes):ncpus=(# of CPU cores):mpiprocs=(# of MPI processes):ompthreads=(# of OpenMP threads)[:ngpus=(# of GPUs)][:jobtype=largemem]

For number of nodes

  • If ncpus=1-63 and no GPUs (jobtype=core), the number of nodes is always 1.
  • If ncpus=64 or 128 (jobtype=vnode), the number of nodes is 1 or more. The total number of CPU cores is (# of nodes)x(# of CPU cores; ncpus).
  • If ngpus >= 1 (jobtype=gpu), the number of nodes is 1 or more. The total number of CPU cores is (# of nodes)x(# of CPU cores; ncpus). The total number of GPUs is (# of nodes)x(# of GPUs; ngpus).

For information other than the number of nodes, these numbers are per node values. The value of mpiprocs is related to machine file (host file) or MPI. The value of ompthreads corresponds to "OMP_NUM_THREADS" environment variable.

  • ncpus= specifies number of CPU cores per node. Valid values are 1-64 or 128.
  • mpiprocs= specifies number of MPI processes per node. Usually mpiprocs * ompthreads is equal to ncpus.
  • ompthreads= specifies number of OpenMP threads per process. Usually mpiprocs * ompthreads is equal to ncpus. For non-OpenMP thread parallelism, this value is simply ignored.
  • The available memory amount is proportional to ncpus= value. For jobtype=largemem,  available memory amount is 7.875 GB/core. For other types, that is 1.875 GB/core. Please also check queue information page.

Depending on how the MPI library is built, it may not properly respect machine file (host file) provided by the queuing system. In that case, you need to manually give the host file (defined in PBS_O_NODEFILE environment variable) to mpirun or mpiexec command. MPI libraries installed by RCCS can implicitly respect the host file provided by the queuing system.

  • Number of GPUs is specified with ngpus= keyword, where the number CPU cores per GPU (ncpus/ngpus) must be less than or equal to 16. The valid values of ngpus is 1-8, since one TypeG node has 8 GPUs.

# 32 CPU cores and 2 GPUs example
#PBS -l select=1:ncpus=32:mpiprocs=2:ompthreads=16:ngpus=2

  • You need to add jobtype=largemem to use TypeF node which has 1 TB/node memory. It is to be noted that there are only 14 TypeF nodes in total. Please try to use the regular TypeC nodes as much as possible. TypeF nodes are not equipped with GPUs. Therefore, ncpus is 64 or 128 and ngpus is not available in the case of jobtype=largemem.

# jobtype=largemem sample (2 nodes, 256 CPU cores in total)
#PBS -l select=2:ncpus=128:mpiprocs=64:ompthreads=2:jobtype=largemem

3 - Max Walltime of Job (walltime; required)

The format of wall time is:

#PBS -l walltime=(hours):(minutes):(seconds)

If you write "walltime=30:00", this is considered 30 minutes, not 30 hours. Please beware.

CPU points of jobs are not calculated from this walltime specification. CPU points are calculated from job duration time (elapsed time). Therefore, you can request a somewhat longer time. However, please do not specify a too long time. Very long jobs sometimes are disadvantaged especially during busy times.

If the requested time is longer than the remaining time until the next maintenance, that job won't run until the end of the next maintenance (jsub would issue a warning). In other words, those long jobs will begin to run immediately after the maintenance.

Special Usage