TLDR: Search for todo and update all occurrences to your desired name
Docker and singularity is not a must unless you cannot install some dependencies locally on HPC shell environment due to permission issue
-
Change LICENSE if necessary
-
Modify .pre-commit-config.yaml according to your need
-
Modify/add GitHub workflow status badges in README.md
Continue on a machine where you have docker permission, HPC clusters usually restrict docker access for security reasons
-
Modify
todo-docker-user,todo-base-image,todo-image-name,todo-image-userin .env- .env will be loaded when you use docker compose for build/run/push
todo-docker-userrefers to your docker hub account usernametodo-base-imageis the image dockerfile is based on, such asnvidia/cuda:13.0.0-cudnn-devel-ubuntu24.04todo-image-userrefers to the default user inside the image, which is used to determine home folder
-
Modify the service name from
todo-service-nameto your service name in docker-compose.yml, add additional volume mounting options such as dataset directories -
Update Dockerfile and .dockerignore
- Existing dockerfile has screen & tmux config, oh-my-zsh, cmake, and other basic goodies
- Add any additional dependency installations at appropriate locations
-
build_docker_image.sh to build and test the image locally in your machine's architecture
- The scripts uses buildx to build multi-arch image, you can disable this by removing redundant archs in docker-compose.yml
- Building stage does not have GPU access, if some of your dependencies need GPU, build them inside a running container and commit to the final image
-
run_docker_container.sh or
docker compose up -dto run and test a built image- The service by default will mount the whole repository onto
CODE_FOLDERinside the container so any modification inside also takes effect outside, which is useful when you use vscode remote extension to develop inside a running container with remote docker context - You should be able to run and see GUI applications inside the container if
DISPLAYis set correctly when you run the script
- The service by default will mount the whole repository onto
-
push_docker_image.sh to push the multi-arch image to docker hub
- You should have the docker hub repository set up before pushing
Continue on the actual HPC cluster environment
-
pull_singularity_image.sh to build the singularity image locally
- Singularity image can be built upon existing docker image
- You should see the image
todo-image-name_latest.defafter successfully built
-
run_singularity_instance.sh to test the image
- Add additional volume binding options to the script such as dataset directories, best practice is to define in .env then export in variables.sh with
resolve_host_pathto turn relative path into absolute real path - Singularity instances by default have less environment isolation than docker containers unless you specify the additional options like the script
- Add additional volume binding options to the script such as dataset directories, best practice is to define in .env then export in variables.sh with
-
Modify job specifications under
jobs/- Each (HPC) Slurm environment has different partition definitions, which are often heterogeneous, you can query this by
sinfowith some options --ntasks-per-nodespecifies number of parallelization, and it's convenient to tie other resources to task, e.g.,--gpus-per-task,--cpus-per-task,--mem-per-gpu, so that you only need to increase ntasks to scale up on a node- All the jobs have
-l(login) options in shebang so that any command working in your current shell environment should also run as a job
- Each (HPC) Slurm environment has different partition definitions, which are often heterogeneous, you can query this by
-
sbatch jobs/your-cluster/your-job.joborjobs/your-cluster/your-job.jobto submit jobs- You should see a file
todo_your_job_name_slurm_job_id.outin the base folder of this repository, which contains job logs
- You should see a file
-
Recommend turm for job monitor outside the job, use
turm -u your-slurm-userafter installation
- Run dev_setup.sh to setup the development environment
- Mukai (Tom Notch) Yu: [email protected]