Skip to content

Slurm Workload Management

H Ruthrash edited this page Dec 27, 2023 · 7 revisions

Meeting notes 12/10/22:

  • check [video](link goes here) on compute Canada
  • using utorid to login into "login node". run no jobs but it should -run jobs. - slurm talks to comps connected to assign a job - they should have a shared file system(chat with andrew how to setup).
  • from login node we should be able to access the computers
  • /project, /scratch, /temp different hierarchies of timed backups
  • research groups should have shared common /dataset directories
  • sbatch command should say give me n GPUs+nCPUs for x amount of time also save output in specific directories. it should allow selecting one specific robot.
  • for teaching you get only n = 1 farms but for research n > 1. so we need to look at user groups.

| |

Clone this wiki locally