July 13: temporary power failure of cooling system due to the lightning (Jul 20 update)

At around 18:50 on July 13th, temporary power failure of our cooling system (colling tower) occurred due to the lightning. Due to the trouble, this website went down and some of commands in RCCS system (such as showlim) had been unavailbale during July 13 20:00-21:00. Also, 6 of computation nodes went down. Those problems (excluding computation node ones) were fixed until July 13 22:00. File system, frontend nodes, and the other computation nodes were not seriously affected by the issue.
(Jul 20 update) CPU points of the following affected jobs were paid back.

IDs of killed jobs: CPU points paid back on Jul 20.

  • 3437723
  • 3438122
  • 3442078
  • 3435416
  • 3439766
  • 3443992
  • 3443996
  • 3443999

Other than killed jobs above, job 3427986 was affected by the trouble and reran.