Please prepare a public/private key pair of ssh. If you do not know the procedure, please search in internet by yourself.
CPU points are spent when you use CPU or GPU.
Queue factors are determined as follows on each systems.
System | CPU Queue Factor | GPU Queue Factor |
---|---|---|
ccap (jobtype=largemem) |
60 points / (1 vnode * 1 hour) | - |
ccap (jobtype=vnode) |
45 points / (1 vnode * 1 hour) | - |
ccap (jobtype=core) |
1 point / (1 core * 1 hour) | - |
ccap (jobtype=gpu) |
1 point / (1 core * 1 hour) | 60 points / (1 GPU * 1 hour) |
If you want to know your current CPU points, run "showlim -c".
Access to "Limiting Resources Page" with your web browser.
System | Class (Jobtype) |
Node | Memory | Limitation for a job |
# of cores per group | ||||
---|---|---|---|---|---|---|---|---|---|
Assigned points | # of cores/gpus |
# of jobs | |||||||
ccap | H (largemem) |
TypeF | 7.875GB/core | 1-14 vnodes (64-896 cores) |
7,200,000 - 2,400,000 - 720,000 - 240,000 - - 240,000 |
9,600/64 6,400/42 4,096/28 3,200/12 768/8 |
1,000 | ||
ccap | H (vnode) |
TypeC | 1.875GB/core | 1-50 vnodes (64-3,200 cores) |
|||||
ccap | H (core) |
TypeC | 1.875GB/core | 1-63 cores | |||||
ccap | PN (gpu) |
TypeG | 1.875GB/core | 1-48 gpus 1-16 cores/gpu |
The settings of queue class are following.
System | Class | Node | Wall Time | Memory | # of cores per job | # of cores per group |
---|---|---|---|---|---|---|
ccap | (occupy) | TypeC | 7 days | 4.4GB/core | ask us | allowed number of cores |
ccfep% jobinfo [-h HOST] [-q QUEUE] [-c|-s|-m|-w] [-n] [-g GROUP|-a] [-n]
One of those options can be added.
You don't need to specify this usually, since all the user available queues (PN and PNR[0-9]) are the default targets.
In User/Group stat, number of jobs, cpu cores, and gpus can be shown by "jobinfo -s". Limitation about those resources are also shown. In Queue Status, number of waiting jobs and available nodes/cores/gpus will be shown.
ccfep% jobinfo -s
User/Group Stat:
--------------------------------------------------------------------------------
queue: H | user(***) | group(***)
--------------------------------------------------------------------------------
NJob (Run/Queue/Hold/RunLim) | 1/ 0/ 0/- | 1/ 0/ 0/6400
CPUs (Run/Queue/Hold/RunLim) | 4/ 0/ 0/- | 4/ 0/ 0/6400
GPUs (Run/Queue/Hold/RunLim) | 0/ 0/ 0/- | 0/ 0/ 0/ 48
core (Run/Queue/Hold/RunLim) | 4/ 0/ 0/1200 | 4/ 0/ 0/1200
--------------------------------------------------------------------------------
note: "core" limit is for per-core assignment jobs (jobtype=core/gpu*)Queue Status (H):
----------------------------------------------------------------------
job | free | free | # jobs | requested
type | nodes | cores (gpus) | waiting | cores (gpus)
----------------------------------------------------------------------
week jobs
----------------------------------------------------------------------
1-4 vnodes | 705 | 90240 | 0 | 0
5+ vnodes | 505 | 64640 | 0 | 0
largemem | 0 | 0 | 0 | 0
core | 179 | 23036 | 0 | 0
gpu | 0 | 0 (0) | 0 | 0 (0)
----------------------------------------------------------------------
long jobs
----------------------------------------------------------------------
1-4 vnodes | 325 | 41600 | 0 | 0
5+ vnodes | 225 | 28800 | 0 | 0
largemem | 0 | 0 | 0 | 0
core | 50 | 6400 | 0 | 0
gpu | 0 | 0 (0) | 0 | 0 (0)
----------------------------------------------------------------------
Job Status at 2023-01-29 17:40:12
"core (Run/Queue/Hold/RunLim)" in User/Group Stat is a CPU cores limit for jobtype=core/gpu* jobs, where CPU cores used by jobtype=vnode/largemem jobs are not taken into account. For example, in this example, you can use up to 1200 cores in total.
You can see the latest status of jobs by specifying "-c" option. (-l option can be added but is ignored.)
ccfep% jobinfo -c
--------------------------------------------------------------------------------
Queue Job ID Name Status CPUs User/Grp Elaps Node/(Reason)
--------------------------------------------------------------------------------
H 9999900 job0.csh Run 16 zzz/--- 24:06:10 ccc047
H 9999901 job1.csh Run 16 zzz/--- 24:03:50 ccc003
H 9999902 job2.sh Run 6 zzz/--- 0:00:36 ccc091
H 9999903 job3.sh Run 6 zzz/--- 0:00:36 ccc091
H 9999904 job4.sh Run 6 zzz/--- 0:00:36 ccc090
...
H 9999989 job89.sh Run 1 zzz/--- 0:00:11 ccg013
H 9999990 job90.sh Run 1 zzz/--- 0:00:12 ccg010
--------------------------------------------------------------------------------
If you don't specify "-c", you can also see some more details (jobtype and number of gpus). The information may be slightly (2-3 minutes usually) old, though.
ccfep% jobinfo
--------------------------------------------------------------------------------
Queue Job ID Name Status CPUs User/Grp Elaps Node/(Reason)
--------------------------------------------------------------------------------
H(c) 9999900 job0.csh Run 16 zzz/zz9 24:06:10 ccc047
H(c) 9999901 job1.csh Run 16 zzz/zz9 24:03:50 ccc003
H(c) 9999902 job2.sh Run 6 zzz/zz9 0:00:36 ccc091
H(c) 9999903 job3.sh Run 6 zzz/zz9 0:00:36 ccc091
H(c) 9999904 job4.sh Run 6 zzz/zz9 0:00:36 ccc090
...
H(g) 9999989 job89.sh Run 1+1 zzz/zz9 0:00:11 ccg013
H(g) 9999990 job90.sh Run 1+1 zzz/zz9 0:00:12 ccg010
--------------------------------------------------------------------------------
In case you forget where you ran jobs, try "-w" option. The working directories will be shown like below.
ccfep% jobinfo -w
--------------------------------------------------------------------------------
Queue Job ID Name Status Workdir
--------------------------------------------------------------------------------
H 9999920 H_12345.sh Run /home/users/zzz/gaussian/mol23
H 9999921 H_23456.sh Run /home/users/zzz/gaussian/mol74
...
(You can't use "-c" in this case.)
Your have to write a script file which is written in C-shell to submit your job. An example for each system is following.
Meaning | Header part | Importance |
---|---|---|
The First Line | (csh) #!/bin/csh -f (bash) #!/bin/sh (zsh) #!/bin/zsh |
Required (choose one) |
Needed Number of CPU | #PBS -l select=[Nnode:]ncpus=Ncore:mpiprocs=Nproc:ompthreads=Nthread:jobtype=Jobtype[:ngpus=Ngpu] | Required |
Wall Time | #PBS -l walltime=72:00:00 | Required |
Mail at Start and End | #PBS -m abe | Optional |
Prevent Rerun | #PBS -r n | Optional |
Change to Submitted Directory |
cd ${PBS_O_WORKDIR} | Recommended |
#PBS -l select=3:ncpus=64:mpiprocs=64:ompthreads=1:jobtype=vnode |
#PBS -l select=1:ncpus=6:mpiprocs=1:ompthreads=1:jobtype=gpu:ngpus=1 |
You can find some other examples in this page.
After you write the script, type following command to submit.
ccfep% jsub -q (H|HR[0-9]) [-g XXX] [-W depend=(afterok|afterany):JOBID1[:JOBID2...]] script.csh
If you want to submit your jobs by 'Supercomputing Consortium for
You can describe dependency of jobs using -W option.
If you want to describe dependency that a job should run after the dependent job exit successfully, use keyword "afterok".
If a job should run after the dependent job including abnormal exit, use keyword "afterany".
Sample script files exist in ccfep:/apl/*/samples/.
You can easily submit sequential jobs by using --step or --stepany command.
ccfep% jsub -q (H|HR[0-9]) [-g XXX] --step [-W depend=(afterok|afterany):JOBID1[:JOBID2...]] script.csh script2.csh ...
ccfep% jsub -q (H|HR[0-9]) [-g XXX] --stepany [-W depend=(afterok|afterany):JOBID1[:JOBID2...]] script.csh script2.csh ...
Example:
ccfep% jsub -q H --stepany job1.csh job2.csh job3.csh
You can define variables via -v option. The argument to the option is comma-separated list of variable definitions, (variable-name)=(value).
ccfep% jsub -v INPUT=myfile.inp,OUTPUT=myout.log script.sh
In this example, $INPUT and $OUTPUT will be set to myfile.inp and myout.log in "script.sh", respectively.
Notes:
First of all, you should get ID of the target jobs using "jobinfo" command. Then, run following command, where the RequestID is the job id.
ccfep% jdel RequestID
You can prevent queued job to run by the following command (hold a job, in other words). (Use jobinfo to get target job id.)
ccfep% jhold RequestID
You can release the restiction using jrls command.
ccfep% jrls RequestID
You can get information of finished jobs, such as finish time, elaps time, parallel efficiency, by joblog command.
ccfep% joblog [-d ndays] [-o item1[,item2[,...]]]
If the target period is not specified, information about jobs of current FY will be shown. Following options can be used to specify the target period.
You can customize items which are displayed in -o option. Available keywords are:
ccfep% joblog -d 10 -o jobid,start,finish,point
ccfep% joblog -y 2020 -o jobid,finish,point,Workdir
ccfep% joblog -d 2 -o all
Please also check the package program list page for the available libraries and MPI environments.
Please specify correct numbers of MPI processes and OpenMP threads in "mpiprocs" and "ompthreads" values of select line in your job script.
e.g. 12 CPUs, 4 MPI processes and 3 OpenMP threads case -> ncpus=12:mpiprocs=4:ompthreads=3
Please also check sample jobs scripts by RCCS (locating at /apl/(software name)/samples ).
When submitting job using "jsub", the value specified by "ompthreads" is considered as number of OpenMP threads.
You can specify or change the value by setting OMP_NUM_THREADS envirnment variable manually.
If you run a command not via "jsub" (on frontend node for example), you should set OMP_NUM_THREADS environment variable manually.
List of hosts can be found in the file name specified in PBS_NODEFILE environment variable.
MPI environments installed by RCCS (Intel MPI and OpenMPI) automatically consider this environment variable.
Therefore, you can skip specification of machine file when invoking "mpirun" command.
e.g. 4 MPI * 3 OpenMP hybrid parallel
#!/bin/sh
#PBS -l select=1:ncpus=12:mpiprocs=4:ompthreads=3:jobtype=core
#PBS -l walltime=24:00:00
cd $PBS_O_WORKDIR
mpirun -np 4 /some/where/my/program options
Environment Modules ("module" command) is available from July 2018. See this page for detailed information.
Please fill the following items and send it rccs-admin[at]ims.ac.jp.
You can find descriptions about "jobinfo", "jsub", "jdel", "jhold", "jrls", and "joblog" above in this page.
ccfep% g16sub [-q "QUE_NAME"] [-j "jobtype"] [-g "XXX"] [-walltime "hh:mm:ss"] [-noedit] \
[-rev "g16xxx"] [-np "ncpus"] [-ngpus "n"] [-mem "size"] [-save] [-mail] input_files
ccfep% g16sub -rev g16c01 -walltime 24:00:00 -np 12 my_gaussian_input.gjf
To use formchk, please check this FAQ item.
On ccfep, you can estimate start time of job with "waitest" command, where waitest assumes jobs will run for the time specified with "walltime" parameter". Therefore, the estimated datetime will be the worst case estimation. On the other hand, other user's jobs submitted later might run earlier than your ones. (Many parameters, such as job priority, jobtype, and remedies for (small) groups, are involved.) You shouldn't expect high accuracy for the estimation. You may also need to check queue status (jobinfo -s) additionally.
Estimate start time of submitted job(s):
$ waitest [jobid1] ([jobid2] ...)
Estimate start time of not yet submitted job(s):
$ waitest -s [job script1] ([jobscript2] ...)
[user@ccfep2]$ waitest 4923556
Current Date : 2019-10-15 14:32:30
2019-10-15 14:32:30 ...
2019-10-15 16:40:44 ...
2019-10-15 22:26:07 ...
2019-10-16 00:43:43 ...
2019-10-16 03:03:11 ...
2019-10-16 05:58:00 ...
2019-10-16 11:34:12 ...
Job 4923556 will run at 2019-10-16 13:03:11 on ccnn500,ccnn496,ccnn497,ccnn494,ccnn495,ccnn489.
Estimation completed.
[user@ccfep2]$ waitest -s run_small_4N.sh run_small_6N.sh run_small_8N.sh
Current Date : 2019-10-15 14:34:39Job Mapping "run_small_4N.sh" -> jobid=1000000000
Job Mapping "run_small_6N.sh" -> jobid=1000000001
Job Mapping "run_small_8N.sh" -> jobid=10000000022019-10-15 14:34:39 ...
2019-10-15 16:40:52 ...
2019-10-15 22:26:15 ...
2019-10-16 00:43:51 ...
2019-10-16 03:03:18 ...
2019-10-16 05:58:08 ...
2019-10-16 11:34:10 ...
Job 1000000001 will run at 2019-10-16 13:03:18 on ccnn500,ccnn496,ccnn497,ccnn494,ccnn495,ccnn489.
2019-10-16 13:28:41 ...
2019-10-16 16:00:36 ...
2019-10-16 20:52:10 ...
2019-10-17 01:08:27 ...
Job 1000000002 will run at 2019-10-17 03:03:18 on ccnn458,ccnn329,ccnn515,ccnn520,ccnn373,ccnn437,ccnn380,ccnn352.
2019-10-17 03:33:56 ...
2019-10-17 06:43:51 ...
2019-10-17 08:50:03 ...
2019-10-17 11:34:10 ...
2019-10-17 13:03:18 ...
2019-10-17 16:08:16 ...
2019-10-17 18:35:08 ...
2019-10-17 20:36:56 ...
2019-10-17 22:38:49 ...
Job 1000000000 will run at 2019-10-18 00:18:34 on ccnn760,ccnn789,ccnn791,ccnn787.
Estimation completed.
(In some case, larger jobs will run first due to the job priority and other parameters.)
For some of basic types of jobs, wait time for those jobs are estimated periodically. The result can be accessed via the following command.
[user@ccfep2]$ waitest --showref
ccfep% showlim (-cpu|-c|-disk|-d) [-m]
ccfep% showlim -c
ccfep% showlim -c -m
ccfep% showlim -d -m
You can access local files on computation nodes which cannot be directly accessed from the frontend nodes (ccfep) via "remsh" command.
remsh hostname command options
remsh ccnnXXX ls /ramd/users/zzz
remsh ccnnXXX tail /ramd/users/zzz/99999/fort.10
Host names and jobids of your jobs can be found in the output of "jobinfo" command. (see above for the usage)