RCCS Reference Manual

[Last update] 2024-Mar-21

Table of Contents

Connection

Login to RCCS

  • Frontend nodes (ccfep.ims.ac.jp) can be accessed using ssh with public key authentication.
  • All computers will stop during 9:00-19:00 on the first Monday of each month because of maintenance. The maintenance time might be extended.
  • Access to frontend nodes is allowed only from IPv4 address assigned to Japan or other registered IP addresses. See Application for SSH connection from outside Japan for details.

Register SSH Public Key and  Password for Web

Please prepare a public/private key pair of ssh.  If you do not know the procedure, please search in internet by yourself.

First registration / Missing your username or password for web

  1. Open https://ccportal.ims.ac.jp/en/user/password to request mail for registration in web browser.
  2. Fill your email which is written in your application, then press button "E-mail new password".
  3. After you recieve an email from RCCS, open URL in the mail to login in web browser.
  4. Fill your new password in "Password" and "Confirm password".
  5. Paste your public key in "Public key".
  6. Press "Save" button.

Using your username and password for web

  1. Open https://ccportal.ims.ac.jp/en/frontpage in web browser, then fill your username and password and press "Log in" button.
  2. Press "My account" which is located in top right corner.
  3. Press "Edit" tab.
  4. To change password, fill current password and new passwords.
  5. Paste your public key.
  6. Press "Save" button.

Login Shell

  • /bin/bash, /bin/tcsh and /bin/zsh are available.
  • You can select login shell in the sampe page as ssh public key.  It will take some time to change login shell.
  • You can custumize your .login or .cshrc in your home directory, but be carefully.

Whole System of RCCS

  • Whole system of RCCS is shown in the figure below.
  • Interactive nodes are ccfep (4 nodes). You can build or debug applications on them.
  • There are four kinds of disks, namely /home, /save, /gwork, /lwork(computation node only) and /ramd(computation node only). Access speed and data lifetime are different.
  • Width of lines between disks and computers represents transfer speed. Wider is faster.
  • Disk /gwork, /lwork and /ramd is temporary space for your calculation. All files will be DELETED after the completion of your job.
  • Disk /lwork is NVMe SSD.  The capacity is about 1.5 TB. You can use 11.9 GB / core.
  • Disk /ramd is RAM disk of each node. The size is about 240 GB or 1,008 GB, depending on node type. The sum of memory used in the job and RAM disk is controlled by the queuing system.
  • There are no differences between /home and /save other than the names. (Formerly, there had been a difference regarding backuping policy.)
  • Use of /tmp, /var/tmp or /dev/shm is not allowed. Jobs using those dirctories will be killed by the administrator.
  • InfiniBand is used for interconnect.

RCCS Resources

CPU Points and Queue Factor

CPU points are spent when you use CPU or GPU.
Queue factors are determined as follows on each systems.

SystemCPU Queue FactorGPU Queue Factor
ccap
(jobtype=largemem)
60 points / (1 vnode * 1 hour)-
ccap
(jobtype=vnode)
45 points / (1 vnode * 1 hour)-
ccap
(jobtype=core)
1 point / (1 core * 1 hour)-
ccap
(jobtype=gpu)
1 point / (1 core * 1 hour)60 points / (1 GPU * 1 hour)
  • On ccfep, CPU points will be consumed according to CPU time spent.
  • On computation nodes, CPU points will be calculated from the elaps of processes.
  • We don't charge money for the supercomputer usage.

If you want to know your current CPU points, run "showlim -c".

Checking Resources

  • CPU points used by batch jobs and total disk usage (corresponding to "showlim -c" and "showlim -d", respectively) are updated every 10 minutes.
  • CPU points used in interactive nodes are updated on 2:20.
  • If CPU points run out, all running jobs of your group will be killed and further job submissions won't be allowed.
  • If your disk usage exceed the limit, new job submissions will be rejected.

Individulal Limitation of Resources

Access to "Limiting Resources Page"  with your web browser.

  • Only representative user can limit maximum resources of each members.
  • Normal user can only view the values of maximum resources.
  • Maximum available number of cpus, point and amount of disks can be limitted.

Queueing System

Overview of Queueing System

Queue Classes

Queue class for all users

SystemClass
(Jobtype)
NodeMemoryLimitation
for a job
# of cores per group# of total vnodes
(# of total cores)
Assigned points# of
cores/gpus
# of jobs
ccapH
(largemem)
TypeF7.875GB/core1-14 vnodes
(64-896 cores)
7,200,000 -
2,400,000 -
720,000 -
240,000 -
- 240,000
9,600/64
6,400/42
4,096/28
3,200/12
768/8
1,00028 vnodes
(1,792 cores)
ccapH
(vnode)
TypeC1.875GB/core1-50 vnodes
(64-3,200 cores)
1,248+ vnode
(79,872+ cores)
ccapH
(core)
TypeC1.875GB/core1-63 cores200+ vnodes
(12,800+ cores)
ccapPN
(gpu)
TypeG1.875GB/core1-48 gpus
1-16 cores/gpu
32 vnodes
(2,048 cores
128 GPUs)
  • There are additional limitations for core (ncpus<64 or gpu jobs) and jobtype=largemem jobs. The limit values can be shown by "jobinfo -s" command.
  • Jobs must be finished before the scheduled maintenance.
  • Only around half of available nodes may accept long jobs (more than a week).
  • You can omit jobtype in your jobscript except for jobtype=largemem; other types can be judged from the requested resources.
  • 80 nodes of TypeC (160 vnodes) will be shared by "vnode" and "core" jobs.
  • Short "vnode" jobs can be assigned to a "largemem" node.
  • Short "core" jobs can be assigned to a "gpu" node.

Special queue class

The settings of queue class are following.

SystemClassNodeWall TimeMemory# of cores per job# of cores per group
ccap(occupy)TypeC7 days4.4GB/coreask usallowed number of cores

Show Job Status

ccfep% jobinfo [-h HOST] [-q QUEUE] [-c|-s|-m|-w] [-n] [-g GROUP|-a] [-n]

choose additional information type

One of those options can be added.

  • -c  show the latest result (number of GPUs and jobtype etc. are not available)
  • -s  show status summary of the queue(s)
  • -m  show memory information
  • -w  show working directory of jobs
  • -n  show node status

jobs of other users

  • -g  group member's jobs are also shown
    • If you belong to multiple groups, try -g [GROUP name] to specify a group.
  • -a  show all users' jobs
    • most of information (username etc.) will be hidden.

queue specification

You don't need to specify this usually, since all the user available queues (H and HR[0-9]) are the default targets.

  • -h HOST: specify host type (only ccap is available)
  • -q QUEUE: specify queue name (such as H or HR1)

Example: show queue summary

In User/Group stat, number of jobs, cpu cores, and gpus can be shown by "jobinfo -s". Limitation about those resources are also shown. In Queue Status, number of waiting jobs and available nodes/cores/gpus will be shown.

ccfep% jobinfo -s

User/Group Stat:
--------------------------------------------------------------------------------
 queue: H                       | user(***)             | group(***)           
--------------------------------------------------------------------------------
   NJob (Run/Queue/Hold/RunLim) |    1/    0/    0/-    |    1/    0/    0/6400
   CPUs (Run/Queue/Hold/RunLim) |    4/    0/    0/-    |    4/    0/    0/6400
   GPUs (Run/Queue/Hold/RunLim) |    0/    0/    0/-    |    0/    0/    0/  48
   core (Run/Queue/Hold/RunLim) |    4/    0/    0/1200 |    4/    0/    0/1200
--------------------------------------------------------------------------------
note: "core" limit is for per-core assignment jobs (jobtype=core/gpu*)

Queue Status (H):
----------------------------------------------------------------------
      job         | free   |     free     | # jobs  |  requested  
      type        | nodes  | cores (gpus) | waiting | cores (gpus)
----------------------------------------------------------------------
week jobs
----------------------------------------------------------------------
1-4 vnodes        |    705 |  90240       |       0 |      0      
5+  vnodes        |    505 |  64640       |       0 |      0      
largemem          |      0 |      0       |       0 |      0      
core              |    179 |  23036       |       0 |      0      
gpu               |      0 |      0 (0)   |       0 |      0 (0)  
----------------------------------------------------------------------
long jobs
----------------------------------------------------------------------
1-4 vnodes        |    325 |  41600       |       0 |      0      
5+  vnodes        |    225 |  28800       |       0 |      0      
largemem          |      0 |      0       |       0 |      0      
core              |     50 |   6400       |       0 |      0      
gpu               |      0 |      0 (0)   |       0 |      0 (0)  
----------------------------------------------------------------------
Job Status at 2023-01-29 17:40:12

"core (Run/Queue/Hold/RunLim)" in User/Group Stat is a CPU cores limit for jobtype=core/gpu* jobs, where CPU cores used by jobtype=vnode/largemem jobs are not taken into account. For example, in this example, you can use up to 1200 cores in total.

Example: show status of jobs

You can see the latest status of jobs by specifying "-c" option. (-l option can be added but is ignored.)

ccfep% jobinfo -c

--------------------------------------------------------------------------------
Queue   Job ID Name            Status CPUs User/Grp       Elaps Node/(Reason)
--------------------------------------------------------------------------------
H       9999900 job0.csh       Run      16  zzz/---    24:06:10 ccc047
H       9999901 job1.csh       Run      16  zzz/---    24:03:50 ccc003
H       9999902 job2.sh        Run       6  zzz/---     0:00:36 ccc091
H       9999903 job3.sh        Run       6  zzz/---     0:00:36 ccc091
H       9999904 job4.sh        Run       6  zzz/---     0:00:36 ccc090
...
H       9999989 job89.sh       Run       1  zzz/---     0:00:11 ccg013
H       9999990 job90.sh       Run       1  zzz/---     0:00:12 ccg010
--------------------------------------------------------------------------------

If you don't specify "-c", you can also see some more details (jobtype and number of gpus). The information may be slightly (2-3 minutes usually) old, though.

ccfep% jobinfo
--------------------------------------------------------------------------------
Queue   Job ID Name            Status CPUs User/Grp       Elaps Node/(Reason)
--------------------------------------------------------------------------------
H(c)    9999900 job0.csh       Run      16  zzz/zz9    24:06:10 ccc047
H(c)    9999901 job1.csh       Run      16  zzz/zz9    24:03:50 ccc003
H(c)    9999902 job2.sh        Run       6  zzz/zz9     0:00:36 ccc091
H(c)    9999903 job3.sh        Run       6  zzz/zz9     0:00:36 ccc091
H(c)    9999904 job4.sh        Run       6  zzz/zz9     0:00:36 ccc090
...
H(g)    9999989 job89.sh       Run     1+1  zzz/zz9     0:00:11 ccg013
H(g)    9999990 job90.sh       Run     1+1  zzz/zz9     0:00:12 ccg010
--------------------------------------------------------------------------------

 

Example: show working directory

In case you forget where you ran jobs, try "-w" option. The working directories will be shown like below.

ccfep% jobinfo -w
--------------------------------------------------------------------------------
Queue   Job ID Name            Status Workdir
--------------------------------------------------------------------------------
H      9999920 H_12345.sh      Run    /home/users/zzz/gaussian/mol23
H      9999921 H_23456.sh      Run    /home/users/zzz/gaussian/mol74
...

(You can't use "-c" in this case.)

 

Submit Your Jobs

Description of the header part

Your have to write a script file which is written in C-shell to submit your job. An example for each system is following.

  • csh, bash (/bin/sh), zsh can be used for the job submission script.
  • lines started with #PBS are common, regardless of the shell type.
  • Sample scripts can be found in ccfep:/local/apl/lx/(application name)/samples/.
MeaningHeader partImportance
The First Line(csh)  #!/bin/csh -f
(bash) #!/bin/sh
(zsh)  #!/bin/zsh
Required
(choose one)
Needed Number of CPU#PBS -l select=[Nnode:]ncpus=Ncore:mpiprocs=Nproc:ompthreads=Nthread:jobtype=Jobtype[:ngpus=Ngpu]Required
Wall Time#PBS -l walltime=72:00:00Required
Mail at Start and End#PBS -m abeOptional
Prevent Rerun#PBS -r nOptional
Change to Submitted
Directory
cd ${PBS_O_WORKDIR}Recommended
  • Nnode: # of physical node or virtual node
  • Ncore: # of reserved cores per physical node or virutal node
    • largemem, vnode: 64 (for virtaul node) or 128 (for physical node)
  • Nproc: # of processes per node
  • Nthread: # of threads per process
  • Jobtype: largemem, small, core, gpu, gpup, gpuv
    • large: 7.875GB / core
    • vnoe: 1.875GB / core
    • core: job for less than 63 cores
    • gpu: GPU jobs
  • Ngpu: # of GPUs

example 1: use 5 nodes (640 (128*5) cores, 320 (64*5) MPI)

#PBS -l select=5:ncpus=128:mpiprocs=64:ompthreads=2

example 2:  use 10 vnodes (64 for each) (640 (64*10) cores, 320(32*10) MPI)

#PBS -l select=10:ncpus=64:mpiprocs=32:ompthreads=2

example 3: 16-core (16 MPI)

#PBS -l select=1:ncpus=16:mpiprocs=16:ompthreads=1

example 4: 16-core (16 OpenMP) +  1 GPU

#PBS -l select=1:ncpus=16:mpiprocs=1:ompthreads=16:ngpus=1

note: there are 8 GPUs in a node. # of CPU cores per GPU (ncpus/ngpus) must be <= 16.

example 5: 64 cores (1 vnode), large memory node (~500 GB of memory / vnode)

#PBS -l select=1:ncpus=64:mpiprocs=32:ompthreads=2:jobtype=largemem

You can find some other examples in this page.

Job submission

After you write the script, type following command to submit.

ccfep% jsub [-q HR[0-9]] [-g XXX] [-W depend=(afterok|afterany):JOBID1[:JOBID2...]] script.csh

  • If you run too many jobs in a short period of time, penalty might be imposed.  If you are planning to run thousands of jobs in a day, please merge them.
  • If you want to submit your jobs by 'Supercomputing Consortium for Computational Materials Science' group, use -g option.  (XXX is name of its group)
  • You can describe dependency of jobs using -W option.
  • If you want to describe dependency that a job should run after the dependent job exit successfully, use keyword "afterok".If a job should run after the dependent job including abnormal exit, use keyword "afterany".
  • Sample script files exist in ccfep:/apl/*/samples/.

 

Sequential Jobs (step jobs)

You can easily submit sequential jobs by using --step or --stepany command.
 

step job 1 (afterok): run a series of jobs sequentially. If a job exited abnormally, following jobs will be deleted.

ccfep% jsub -q (H|HR[0-9]) [-g XXX] --step [-W depend=(afterok|afterany):JOBID1[:JOBID2...]] script.csh script2.csh ...

step job 2 (afterany): run a series of jobs sequentially. If a job finished, next job will run regardless of the exit status.

ccfep% jsub -q (H|HR[0-9]) [-g XXX] --stepany [-W depend=(afterok|afterany):JOBID1[:JOBID2...]] script.csh script2.csh ...


Example:

ccfep% jsub -q H --stepany job1.csh job2.csh job3.csh

 

Define variables which can be used in jobscript

You can define variables via -v option. The argument to the option is comma-separated list of variable definitions, (variable-name)=(value).

ccfep% jsub -v INPUT=myfile.inp,OUTPUT=myout.log script.sh

In this example, $INPUT and $OUTPUT will be set to myfile.inp and myout.log in "script.sh", respectively.

Notes:

  • if multiple -v is specified, only the last one will be active.
  • this can't define ncpus, jobtype etc. values in select= section of the jobscript.

 

Delete Jobs

First of all, you should get ID of the target jobs using "jobinfo" command. Then, run following command, where the RequestID is the job id.

ccfep% jdel RequestID

Hold/Release Jobs

You can prevent queued job to run by the following command (hold a job, in other words). (Use jobinfo to get target job id.)

ccfep% jhold RequestID

You can release the restiction using jrls command.

ccfep% jrls RequestID

Get Information of Finished Jobs

You can get information of finished jobs, such as finish time, elaps time, parallel efficiency, by joblog command.

ccfep% joblog [-d ndays] [-o item1[,item2[,...]]]

If the target period is not specified, information about jobs of current FY will be shown. Following options can be used to specify the target period.

  • -d ndays: jobs finished within last "ndays"
    • -d 7 : jobs finished within last 7 days
  • -y year: jobs in FY "year"
    • -y 2021 : jobs in FY 2021 (2021/4-2022/3)
  • -f YYYY[MM[DD[hh[mm]]]] -t YYYY[MM[DD[hh[mm]]]]: from-to specification. (month, day, hour, etc. can be omitted)
    • -f 202107 -t 202108 : jobs finished in July or August of year 2021.

You can customize items which are displayed in -o option.  Available keywords are:

  • queue: Queue name
  • jobid: Job ID
  • user: User name
  • group: Group name
  • node: Job head node name
  • Node: All node names
  • type: jobtype
  • start: Start time(YYYY/MM/DD HH:MM)
  • Start: Start time(YYYY/MM/DD HH:MM:SS)
  • finish: Finish time(YYYY/MM/DD HH:MM)
  • Finish: Finish time(YYYY/MM/DD HH:MM:SS)
  • elaps: Elaps
  • cputime: Total CPU time
  • used_memory: Used memory size
  • ncpu: Number of reserved cpu cores
  • ngpu: Number of reserved gpus
  • nproc: Number of MPI processes
  • nsmp: Number of threads per process
  • peff: Efficiency of job
  • attention: Bad efficiency job
  • command: Job name
  • point: CPU points
  • all: show all

 

e.g. 1: Show job ID, start and end datetime, CPU points of jobs finished within last 10 days.

ccfep% joblog -d 10 -o jobid,start,finish,point

e.g. 2: Show job ID, end datetime, CPU points, and workding directory of jobs in FY2020.

ccfep% joblog -y 2020 -o jobid,finish,point,Workdir

e.g. 3: Show all the parameters of jobs finished within last two days.

ccfep% joblog -d 2 -o all

 

Build and Run

Command to Build

  • gcc, aocc, NVIDIA HPC SDK are already installed.
  • For Intel oneAPI, only the libraries are installed. Compilers (icc, ifort, icpc etc.) are not installed. If you need Intel compilers, please install oneAPI Base Toolkit or/and oneAPI HPC Toolkit by yourself into your directory.
  • For gcc,  system default one (8.5) and gcc-toolset ones (versions 9.2, 10.3, and 11.2) are installed. You can use gcc-toolset gccs by module command.  (e.g. module load gcc-toolset/11)

Please also check the package program list page for the available libraries and MPI environments.

How to load my oneAPI environment

oneAPI Base Toolkit can be downloaded from this page. Compilers, MKL, and MPI are included. Please use offline or online one for Linux. You need to install oneAPI HPC Toolkit if you need fortran compiler (ifort, ifx). Please download from this page.

For bash users, following module method can work. But just loading ~/intel/oneapi/setvars.sh is easier.
You can use individual component of oneAPI such as compilers and MKL by loading "vars.sh" file in each component directory.
(e.g. source ~/intel/oneapi/compiler/latest/env/vars.sh)

We here introduce a simple way using module command. There maybe several ways to do it.
It is assumed that oneAPI is already installed under ~/intel.

$ cd ~/intel/oneapi
$ sh modulefiles-setup.sh
$ cd modulefiles/
$ module use .
$ module save

This will gather oneapi module files into modulesfiles/ directory and that directory is registered in the module search path.
Final module save command saves the setting. The saved environment will be restored upon login in RCCS system.
If you want to use Intel compilers, you need to run "module load compiler/latest".
(Please check "module avail" for details about packages and their versions.)

You can load compilers and other packages before "module save".
In this case, you can use compilers and libraries immediately after your next login.

$ cd modulefiles/
$ module use .
$ module load compiler/latest
$ module load mkl/latest
$ module load mpi/latest
$ module save

If you want to remove saved module environment, please try "module saverm". Please note that your saved environment is independent from the system default setting. Changes in the system default setting do nothing to your environment.

Running Parallel Program

Please specify correct numbers of MPI processes and OpenMP threads in "mpiprocs" and "ompthreads" values of select line in your job script.
e.g. 12 CPUs, 4 MPI processes and 3 OpenMP threads case -> ncpus=12:mpiprocs=4:ompthreads=3

Please also check sample jobs scripts by RCCS (locating at /apl/(software name)/samples ).
 

Number of threads specification for OpenMP jobs

When submitting job using "jsub", the value specified by "ompthreads" is considered as number of OpenMP threads.
You can specify or change the value by setting OMP_NUM_THREADS envirnment variable manually.
If you run a command not via "jsub" (on frontend node for example), you should set OMP_NUM_THREADS environment variable manually.
 

Host list specification for MPI

List of hosts can be found in the file name specified in PBS_NODEFILE environment variable.
MPI environments installed by RCCS (Intel MPI and OpenMPI) automatically consider this environment variable.
Therefore, you can skip specification of machine file when invoking "mpirun" command.

e.g. 4 MPI * 3 OpenMP hybrid parallel

#!/bin/sh
#PBS -l select=1:ncpus=12:mpiprocs=4:ompthreads=3:jobtype=core
#PBS -l walltime=24:00:00
cd $PBS_O_WORKDIR
mpirun -np 4 /some/where/my/program options

  • OpenMP thread number is determined from the value specified by "ompthreads" in select. (equivalent to OMP_NUM_THREADS=3)
  • MPI machine file name can be omitted when invoking mpirun (if that mpirun is installed by RCCS).
    • You can specify machine file to filename specified in PBS_NODEFILE env variable. (This doesn't change anything.)

Environment Modules

  • In jobscript, csh users need to run "source /etc/profile.d/modules.csh" before using module command.
    • In the jobscript, "source /etc/profile.d/modules.csh" is necessary for /bin/bash users.
    • ". /etc/profile.d/modules.sh" is necessary for /bin/sh or /bin/bash jobscript if your login shell is /bin/csh
  • You should add -s to module command in the script. (e.g. module -s load openmpi/3.1.6)
  • You can save current module status by "module save" command. The saved status will be restored automatically upon login.
  • In case "setvars.sh" is loaded in ~/.bashrc, sftp (including WinSCP) failed to connect to ccfep due to the output from that script.
    • You may be able to avoid this error by discarding output from setvars.sh like "source ~/intel/oneapi/setvars.sh >& /dev/null"
    • (Load setvars.sh only if $PS1 is not null should work. Just move that line to ~/.bash_profile may also work.)
  • See this page for detailed information.

Package Programs

  • The installed pacakge programs for each systems are listed in https://ccportal.ims.ac.jp/en/installed_applications.
  • You can see the similar list by "module avail" command. (To quit module avail, press "q" key or scroll to the bottom of the page.)
  • Sample script files are located in ccfep:/apl/appname/samples/.
  • Installed directories are located in /apl/appname/ on each systems.
  • Applications compiled by center are listed in https://ccportal.ims.ac.jp/en/how_to_configure with detail description.
  • Many software is quitting preparation of config script in csh. We recommend csh users to use module command.

Request installation you want to use

Please fill the following items and send it rccs-admin[at]ims.ac.jp.

  • Software name and version that you want to use
  • Overview of the software and its feature
  • Necessity of installation to supercomputers in RCCS
  • URL of the software development

Special Commands of RCCS

Related Queueing System

You can find descriptions about "jobinfo", "jsub", "jdel", "jhold", "jrls", and "joblog" above in this page.

Submitting Gaussian Jobs

Case of Gaussian 16

ccfep% g16sub [-q "QUE_NAME"] [-j "jobtype"] [-g "XXX"] [-walltime "hh:mm:ss"] [-noedit] \
              [-rev "g16xxx"] [-np "ncpus"] [-ngpus "n"] [-mem "size"] [-save] [-mail] input_files

  • Command "g09sub" are also available to use Gaussian 09.
  • Default walltime is set to 72 hours.  Please set excepted time for calculation and extra to do long job or run early.
  • If you want to know the meaning of options and examples, please just type "g16sub".
  • %mem, %nproc, %cpu in the original input files will be overwritten by g16sub / g09sub. Please specify those values with -mem, -np etc. command line options.
    • You can prevent this automatic overwrite by -noedit option. But this is not recommended.
    • Safe maximum %mem value will be automatically assigned by g09sub/g16sub. You may not need to set that manually unless you need to reduce the amount of memory.
  • If you want to use large memory node (jobtype=largemem) you need to add "-j largemem".
  • Jobtype=vnode will be used if -np 64 or -np 128 is specified.

basic usage ( 8 cores、72 hours)

[user@ccfep somewhere]$ g16sub input.gjf

more cores, more longer (16 cores、168 hours)

[user@ccfep somewhere]$ g16sub -np 16 --walltime 168:00:00 input.gjf

To use formchk, please check this FAQ item.

Job wait time estimator (waitest)

On ccfep, you can estimate start time of job with "waitest" command, where waitest assumes jobs will run for the time specified with "walltime" parameter". Therefore, the estimated datetime will be the worst case estimation. On the other hand, other user's jobs submitted later might run earlier than your ones. (Many parameters, such as job priority, jobtype, and remedies for (small) groups, are involved.) You shouldn't expect high accuracy for the estimation. You may also need to check queue status (jobinfo -s) additionally.

Basic Usage

Estimate start time of submitted job(s):

$ waitest [jobid1] ([jobid2] ...)

Estimate start time of not yet submitted job(s):

$ waitest -s [job script1] ([jobscript2] ...)

Example 1: estimate start time of a submitted job

[user@ccfep2]$ waitest 4923556
Current Date   : 2023-02-15 14:32:30
2023-02-15 14:32:30 ...
2023-02-15 16:40:44 ...
2023-02-15 22:26:07 ...
2023-02-16 00:43:43 ...
2023-02-16 03:03:11 ...
2023-02-16 05:58:00 ...
2023-02-16 11:34:12 ...
Job 4923556 will run at 2023-02-16 13:03:11 on ccc500.
Estimation completed.

Example 2: estimate start time of not yet submitted jobs

[user@ccfep2]$ waitest -s vnode4N1D.sh vnode1N1D.sh

Job Mapping "vnode4N1D.sh" -> jobid=1000000000
Job Mapping "vnode1N1D.sh" -> jobid=1000000001

Current Date   : 2023-09-06 16:43:10
2023-09-06 16:43:10 ...
2023-09-06 18:43:42 ...
2023-09-06 21:19:19 ...
Job 1000000001 will run at 2023-09-06 21:39:34 on ccf013.
2023-09-06 22:02:09 ...
2023-09-07 01:02:14 ...
2023-09-07 03:34:18 ...
Job 1000000000 will run at 2023-09-07 05:28:07 on ccc428,ccc571,ccc356,ccc708.
Estimation completed.

(In some case, larger jobs will run first due to the job priority and other parameters.)

Example 3: show periodically estimated wait time for general types of jobs

For some of basic types of jobs, wait time for those jobs are estimated periodically. The result can be accessed via the following command.

[user@ccfep2]$ waitest --showref

Showing Used Resources

ccfep% showlim (-cpu|-c|-disk|-d) [-m]

  • -cpu|-c: Show used point and limited value.
  • -disk|-d: Show current disk size and limited value.
  • -m: Show values of each members.

example 1: show CPU points (approved and used) of YOU and WHOLE YOUR GROUP

ccfep% showlim -c

example 2: show CPU points (approved and used) of YOU, YOUR GROUP MEMBERS, and WHOLE GROUP

ccfep% showlim -c -m

example 3: show disk status (approved and used)  of YOU, YOUR GROUP MEMBERS, and WHOLE GROUP

ccfep% showlim -d -m

 

Utility Commands for Batch Jobs

Limit the walltime of command

  • ps_walltime command which was provided from RCCS was deleted.  Please use timeout command instead.

Showing statistic of current job

  • jobstatistic command which was provided from RCCS was deleted.  Please use joblog command instead.

Manipulation of Files on Computation Nodes

You can access local files on computation nodes which cannot be directly accessed from the frontend nodes (ccfep) via "remsh" command.

remsh hostname command options

  • hostname: Hostname such as cccc???, cccca???, ccnn???, or ccnf???.
  • command: Command to be executed on the node. Acceptable commands are ls, cat, cp, head, tail, and find.
  • options: Options of command.

Example: how to access ramdisk of a computation node, ccnnXXX, by user "zzz"

remsh ccnnXXX ls /ramd/users/zzz

remsh ccnnXXX tail /ramd/users/zzz/99999/fort.10

Host names and jobids of your jobs can be found in the output of "jobinfo" command. (see above for the usage)

Inquiry

See https://ccportal.ims.ac.jp/en/contact.