-
Notifications
You must be signed in to change notification settings - Fork 0
Slurm Workload Management
H Ruthrash edited this page Dec 27, 2023
·
7 revisions
- Slurm for dummies
- official quick start user guide
- tutorials for slurm by U of Utah
- pytorch with slurm scheduling
- tensorflow with slurm
Meeting notes 12/10/22:
- check [video](link goes here) on compute Canada
- using utorid to login into "login node". run no jobs but it should -run jobs. - slurm talks to comps connected to assign a job - they should have a shared file system(chat with andrew how to setup).
- from login node we should be able to access the computers
- /project, /scratch, /temp different hierarchies of timed backups
- research groups should have shared common /dataset directories
- sbatch command should say give me n GPUs+nCPUs for x amount of time also save output in specific directories. it should allow selecting one specific robot.
- for teaching you get only n = 1 farms but for research n > 1. so we need to look at user groups.