Notices on Use
Last update: Jul 12, 2024
User Account
You cannot share the user account with anyone. Do not let other person to use your account.
Group members can be added/removed after the approval of the application.
When submitting large number of jobs
Please pay some attention to following points when submitting large number (1,000 or more) of jobs.
E-mail notification
Please skip e-mail notifications (e.g. #PBS -m abe) for massive jobs. Resultant massive number of emails will get the mail queue stuck. Also, those mails might be considered as spam by some servers and software.
Additional notes about massive short jobs
In case you have kept submitting short jobs (< 1 minute) as many as you can and each job, the load of job scheduler can be extremely heavy. In extreme cases, we will delete your jobs. Please pack some jobs into one.
File System Usage
Avoid creating massive number of files
Distributed file system is not good at handling numerous files (especially if these files are in the single directory).
If you need to keep many small files, you would be better off combining them into single file with "tar" command.
Moreover, you can save disk space by compressing that combined (archived) file using "gzip" or others.
If you found huge number of files is inevitable, please ask us. We might be able to avoid the issue by changing job script or by utilizing tools like "fuse2fs".
Avoid accessing single large file from many processes
This type of operation can result in significant performance degradation of overall storage.
For example, if 500 of jobs (processes) try to read single 100 GB file simultaneously, it would cause overall system trouble.
We request you to avoid this type of file access. If you found this type of access is inevitable, please ask us first.
(There can be a way to overcome this. For example, you can scatter the large data into several disks with "migrate" function of lustre.)
Avoid massive write to stdout/stderr of jobs
Contents of job stdout/stderr are stored in disk of the computation node(s), and then copied to specified directory (default: working directory) after the termination of the job. Massive write to stdout/stderr might cause unexpected termination of job or an error upon file copy in the last stage.
Storage Types
There are following disk types. Use them wisely.
name | backup | data retention period | disk quota | purpose |
/home | NO | 1 year after the end of use | YES | There are no differences other than name between /home and /save now. |
/save | ||||
/gwork | NO | basically, only during job is running | NO | temporary disk space for jobs |
/lwork | NO | only during job is running | YES | temporary disk space for jobs will be created as /lwork/users/${USER}/${PBS_JOBID} on computation node will be removed immediately after the job termination |
- Retention period may be extended if enough disk space is available.
System Trouble and CPU Points
Once system trouble (hardware/software) happened, jobs can be killed/stopped by the system. In this case, CPU points for these jobs won't be consumed.
If the job was restarted after the unexpected system crash, CPU points will be billed only for the restart run.
Running Programs on Login Servers
You can run short test jobs and data analysis calculations directly on the login servers. For this purpose, setting of login servers is equivalent to that of computation nodes.
However, there are some restrictions.
- We will kill processes using huge resources for long time or processes causing troubles to others without any warning.
- You cannot expect for a comparable performance to computation nodes due to the special limitations on login servers.
- Performance evaluation should be done at computation nodes.