Status of Jobs (jobinfo)

(Last update: Dec 3, 2024 add description about "sncore" limit values shown with "jobinfo -s")

Status of submitted jobs can be checked with "jobinfo" command.

Show Job Status ("jobinfo -c" and "jobinfo")

When -c option is added, the latest status of jobs are shown. (You can add -l option as in the previous system case.)

$ jobinfo -c

--------------------------------------------------------------------------------
Queue   Job ID Name            Status CPUs User/Grp       Elaps Node/(Reason)
--------------------------------------------------------------------------------
H       9999900 job0.csh       Run      16  zzz/---    24:06:10 ccc047
H       9999901 job1.csh       Run      16  zzz/---    24:03:50 ccc003
H       9999902 job2.sh        Run       6  zzz/---     0:00:36 ccc091
H       9999903 job3.sh        Run       6  zzz/---     0:00:36 ccc091
H       9999904 job4.sh        Run       6  zzz/---     0:00:36 ccc090
...
H       9999989 job89.sh       Run       1  zzz/---     0:00:11 ccg013
H       9999990 job90.sh       Run       1  zzz/---     0:00:12 ccg010
--------------------------------------------------------------------------------

If -c is not present, slightly old but detailed information (such as number of GPUs (1+1 represents 1 CPU core and 1 GPU) and group name) is also shown. Jobtype is also added to the queue name column ( (c): core, (v): vnode, (l): largemem, (g): gpu ).

$ jobinfo
--------------------------------------------------------------------------------
Queue   Job ID Name            Status CPUs User/Grp       Elaps Node/(Reason)
--------------------------------------------------------------------------------
H(c)    9999900 job0.csh       Run      16  zzz/zz9    24:06:10 ccc047
H(c)    9999901 job1.csh       Run      16  zzz/zz9    24:03:50 ccc003
H(c)    9999902 job2.sh        Run       6  zzz/zz9     0:00:36 ccc091
H(c)    9999903 job3.sh        Run       6  zzz/zz9     0:00:36 ccc091
H(c)    9999904 job4.sh        Run       6  zzz/zz9     0:00:36 ccc090
...
H(g)    9999989 job89.sh       Run     1+1  zzz/zz9     0:00:11 ccg013
H(g)    9999990 job90.sh       Run     1+1  zzz/zz9     0:00:12 ccg010
--------------------------------------------------------------------------------

Job Status

In the job status column, one of the following values is shown.

  • Run : job is running now.
  • Queue : job is waiting now.
  • Hold : job is held due to the reason like job dependency
  • Array : not finished array job
  • Exit : job process finished. Termination process is being performed by the system.

Reason for Waiting

When a job is waiting for running, the reason for waiting is shown at the far right column.

  • (cpu) : not enough CPU cores
  • (gpu) : not enough GPUs
  • (cpu/gpu) : not enough resources (CPU and/or GPU etc.)
  • (long) : too long; this job won't run until the end of next maintenance
    • If you submit long jobs before the scheduled maintenance and leave those jobs with this status, they will run after the maintenance. (Please be careful about walltime setting.)
  • (group) : won't run due to the resource (usually CPU or GPU) limit for the group
  • (user) : won't run due to the resource (usually CPU or GPU) limit for you
  • (other) : other reasons. The status of jobs immediately after the submission is usually this value.
  • (njob) : won't run due to the number of jobs limitation
  • (never) : won't run. If you have question about this, please ask us with the ID of corresponding job.
  • (error), (close) : there should be something wrong with the system. Please wait for a while. If this state lasts long, please ask us.

Show Queue Usage and Availability Status (jobinfo -s)

"jobinfo -s" will show the number of running, waiting, holding jobs and the maximum number of jobs/CPU-cores/GPUs/etc. Also, queue usage and availability are shown, where "week jobs" and "long jobs" mean jobs within one week and more than one week, respectively.

$ jobinfo -s

User/Group Stat:
--------------------------------------------------------------------------------
 queue: H                       | user(***)             | group(***)           
--------------------------------------------------------------------------------
   NJob (Run/Queue/Hold/RunLim) |    1/    0/    0/-    |    1/    0/    0/5000
   CPUs (Run/Queue/Hold/RunLim) |    4/    0/    0/-    |    4/    0/    0/6400
   GPUs (Run/Queue/Hold/RunLim) |    0/    0/    0/-    |    0/    0/    0/48
   core (Run/Queue/Hold/RunLim) |    4/    0/    0/1200 |    4/    0/    0/1200
   lmem (Run/Queue/Hold/RunLim) |    0/    0/    0/-    |    0/    0/    0/896
 sncore (Run/Queue/Hold/RunLim) |    0/    0/    0/-    |    0/    0/    0/6400
--------------------------------------------------------------------------------
note: "core" limit is for per-core assignment jobs (jobtype=core/gpu*)
note: "lmem" limit is for jobtype=largemem
note: "sncore" limit is for 64 or 128 cores jobs

Queue Status (H):
----------------------------------------------------------------------
      job         | free   |     free     | # jobs  |  requested  
      type        | nodes  | cores (gpus) | waiting | cores (gpus)
----------------------------------------------------------------------
week jobs
----------------------------------------------------------------------
1-4 vnodes        |    705 |  90240       |       0 |      0      
5+  vnodes        |    505 |  64640       |       0 |      0      
largemem          |      0 |      0       |       0 |      0      
core              |    179 |  23036       |       0 |      0      
gpu               |      0 |      0 (0)   |       0 |      0 (0)  
----------------------------------------------------------------------
long jobs
----------------------------------------------------------------------
1-4 vnodes        |    325 |  41600       |       0 |      0      
5+  vnodes        |    225 |  28800       |       0 |      0      
largemem          |      0 |      0       |       0 |      0      
core              |     50 |   6400       |       0 |      0      
gpu               |      0 |      0 (0)   |       0 |      0 (0)  
----------------------------------------------------------------------
Job Status at 2023-01-29 17:40:12

The first part shows the usage status of you and your group. Number of running jobs (NJob), number of using CPU cores (CPUs), number of using GPUs are shown. For each line, number of (running/queue(waiting)/holding/limit) for each resource is available.

In the example output above, this group can use 6,400 CPU cores and 48 GPUs.

The "core (Run/Queue/Hold/RunLim)" line is the usage status and limit for per core jobs (jobtype=core, gpu). In the example above, you can use up to 1,200 cores with jobtype=core (ncpus<64) or gpu (ngpus>0) jobs. The "lmem (Run/Queue/Hold/RunLim)" line is the usage status and limit for jobtype=largemem. In the example above, your group can use up to 896 cores (7 nodes) with jobtype=largemem. The "sncore (Run/Queue/Hold/RunLim)" line is the usage staud and limit for single node jobs; total ncpus value is 64 or 128. The limit value matches with CPUs limit when there are enough free nodes. On contrary, when there are not enough free nodes, the limit value becomes small. Please be careful about this value if you are trying to run so many ncpus=64,128 jobs. Groups with fewer resources (CPUs limit is 768) are not influenced by "sncore" limit; the limit always matches with CPUs one.

The second part of the output shows the availability for each jobtypes.

Show Working Directories of Jobs (jobinfo -w)

Working directory of your jobs ($PBS_O_WORKDIR) can be shown by adding "-w"option. This is useful when you cannot find the job you are looking for by the name or ID.

$ jobinfo -w
--------------------------------------------------------------------------------
Queue   Job ID Name            Status Workdir
--------------------------------------------------------------------------------
H      9999920 H_12345.sh      Run    /home/users/zzz/gaussian/mol23
H      9999921 H_23456.sh      Run    /home/users/zzz/gaussian/mol74
...

This option is not compatible with "-c". If a job has just been submitted, you may have to wait a few minutes before it can be displayed.

Show Memory Usage of Jobs (jobinfo -m)

You can see memory usage of jobs by adding "-m" option. The "Used.Mem/MB" column shows the maximum memory usage of the job (in MB unit).  The "Lim.Mem/MB" column shows the available memory amount for that job (memory used by system processes is allocated separately).

$ jobinfo -m
--------------------------------------------------------------------------------
Queue   Job ID Name            Status CPUs User/Grp   Used.Mem/MB    Lim.Mem/MB
--------------------------------------------------------------------------------
H(v)    6245590 sample.sh      Run     128  ***/???         66520        245760
H(v)    6245593 sample.sh      Run     128  ***/???         78918        245760

This option is not compatible with "-c". If a job has just been submitted, you may have to wait a few minutes before it can be displayed. The memory usage information of finished jobs can be checked with "joblog" command.

Options of jobinfo

オプション説明
--helpshow help message
-cshow the latest information.
Not compatible with -m, -w, or -s option.
-sshow queue usage and availability.
Not compatible with other options.
-wshow working directory of jobs.
Not compatible with "-c" etc.
-mshow memory usage status.
Not compatible with "-c" etc.
-ldo nothing.
This remains just to keep compatibility.
-Lshow full list of computation nodes
-nshow status of nodes
-gshow also group members jobs
-g [group name]if you belong to multiple groups, you can specify group name with this option.
-q [queue name]specify queue. Usually you don't need to add this.
-ashow all the jobs.
User names and job names are hidden.
[job id]If used in combination with "-c", only specified jobs will be shown.