Notices on Use | Research Center for Computational Science

Last update: Jul 12, 2024

User Account

You cannot share the user account with anyone. Do not let other person to use your account.
Group members can be added/removed after the approval of the application.

When submitting large number of jobs

Please pay some attention to following points when submitting large number (1,000 or more) of jobs.

E-mail notification

Please skip e-mail notifications (e.g. #PBS -m abe) for massive jobs. Resultant massive number of emails will get the mail queue stuck. Also, those mails might be considered as spam by some servers and software.

Additional notes about massive short jobs

In case you have kept submitting short jobs (< 1 minute) as many as you can and each job, the load of job scheduler can be extremely heavy. In extreme cases, we will delete your jobs. Please pack some jobs into one.

File System Usage

Avoid creating massive number of files

Distributed file system is not good at handling numerous files (especially if these files are in the single directory).
If you need to keep many small files, you would be better off combining them into single file with "tar" command.
Moreover, you can save disk space by compressing that combined (archived) file using "gzip" or others.

If you found huge number of files is inevitable, please ask us. We might be able to avoid the issue by changing job script or by utilizing tools like "fuse2fs".

Avoid accessing single large file from many processes

This type of operation can result in significant performance degradation of overall storage.
For example, if 500 of jobs (processes) try to read single 100 GB file simultaneously, it would cause overall system trouble.

We request you to avoid this type of file access. If you found this type of access is inevitable, please ask us first.
(There can be a way to overcome this. For example, you can scatter the large data into several disks with "migrate" function of lustre.)

Avoid massive write to stdout/stderr of jobs

Contents of job stdout/stderr are stored in disk of the computation node(s), and then copied to specified directory (default: working directory) after the termination of the job. Massive write to stdout/stderr might cause unexpected termination of job or an error upon file copy in the last stage.

Storage Types

There are following disk types. Use them wisely.

name	backup	data retention period	disk quota	purpose
/home	NO	1 year after the end of use	YES	There are no differences other than name between /home and /save now.
/save	NO	1 year after the end of use	YES
/gwork	NO	basically, only during job is running	NO	temporary disk space for jobs
/lwork	NO	only during job is running	YES	temporary disk space for jobs will be created as /lwork/users/${USER}/${PBS_JOBID} on computation node will be removed immediately after the job termination

Retention period may be extended if enough disk space is available.

System Trouble and CPU Points

Once system trouble (hardware/software) happened, jobs can be killed/stopped by the system. In this case, CPU points for these jobs won't be consumed.
If the job was restarted after the unexpected system crash, CPU points will be billed only for the restart run.

Running Programs on Login Servers

You can run short test jobs and data analysis calculations directly on the login servers. For this purpose, setting of login servers is equivalent to that of computation nodes.
However, there are some restrictions.

We will kill processes using huge resources for long time or processes causing troubles to others without any warning.
You cannot expect for a comparable performance to computation nodes due to the special limitations on login servers.
- Performance evaluation should be done at computation nodes.

View PDF