TSCC and Snakemake

Content and Goal

Content:

how to submit jobs in TSCC
how to organize jobs using snakemake

Goal:

present a global picture about jobs in TSCC
provide a relatively complete reference

Installation of snakemake (three ways)

Install snakemake using conda (Install miniforge (fast and preferred) or conda)
1. conda install -n base -c conda-forge mamba
2. conda activate base
3. mamba create -c conda-forge -c bioconda -n snakemake snakemake
conda install -c conda-forge -c bioconda snakemake in a given conda env
make install under the current directory, see Makefile for detail

Prepare the demo scripts

git clone https://github.com/beyondpie/tools
cd useSnakemake

Introduction to TSCC

TSCC, Triton Shared Computing Cluster in UCSD.

Login nodes:

when you enter tscc using SSH, you are assigned to these nodes
SSH Keys: fast access without password
- like how we set github SSH Keys
- See Generating SSH Keys in TSCC

Use SSH config to simplify ssh commands with a demo

You can paste this into the file ~/.ssh/config
Next time, you can use ssh tscc as a short command.
NOTE: replace all the contents within [] including [ and ]

Host *
    ControlMaster auto
    ServerAliveInterval 5
    ServerAliveCountMax 2
    ControlPath ~/.ssh/control-%h-%p-%r
    ControlPersist yes

Host tscc
    HostName [tscc login node addres]
    User [your name]
    IdentityFile ~/.ssh/[your private key file]

Computing nodes:

when you submit jobs (either interactive or not) with qsub
TSCC condo program
- CPU Nodes:
  - The Gold node: 2 CPUs * 18 cores/CPU, 256GB RAM in toal (so 7GB/core for 36 cores in total)
  - The Plantinum node: 2 CPUs * 32 cores/CPU, 1TB RAM in total (so 15GB/core for 64 cores in total)
  - nodes=1:ppn=16:icelake:mem1024
    - will start your job with 16 CPU cores on the Platinum node (i.e., with 1TB mem)
    - only condo / home / glean / gpu-condo queue can use this
      - condo has 8 hours limitation.
      - home means the nodes we purchase
      - this is not ask for 1024 GB RAM, but to locate icelake with 1024GB node.
  - hotel nodes are not the Plantinum/Gold nodes
- GPU Nodes:
  - CPU model: 2 CPUs *32 cores/CPU, 256GB RAM in total (15GB/core)
  - NVIDIA Ampere A100, 40GB GRAM

Data-intensive nodes:

named with dm1
specific ones for data-intensive works, like copy big data.
No needs to submit jobs, just run your commands.

Storage

OASIS: 800+ TB Data Oasis Lustre-based high performance parallel file system
- Each use has 25TB space.
- 90-day purge policy
- Use touch [your file] to update the file timestamp
- Use find . -exec touch {} \; to update the files in current directories.
- Efficiently to handle large files; not good to write many small files.
HOME: ~, 100 GB per user
- Jobs’ output should be written into this.
  - This NFS file system has only one server handling all the metadata and storage requirements.
PROJECT: condo storage we buy
- Typically 300 TB shared by all the users in our lab
- use df -h [project dir] to check the space information
TMPDIR
- handle temp many files and will be removed after job is done

SU: how we pay to TSCC

1 SU: 1 core * 1 hour

core resources will be calculated once we submit jobs
time resources will be re-estimated
$250 (10K SUs):
- 1 SU ~ $0.025; 40 SUs ~ $1; 200 SUs ~ Starbuck coffee

how to check money left:

gbalance -u [user name]
add one line alias mymoney="gbalance -u [user name]" into your ~/.bashrc file and source ~/.bashrc. Then use mymoney to check your status.

Queue: assign a queue when you submitting a job

hotel
- max walltime: 168 hours (1 week); max cores/user: 128
home
- max walltime: unlimited; max cores/user: unlimited
glean: free of charge but may be stoped by system at any time
- max walltime: 8 hours; max cores/users: 1024
gpu-related queues:
- gpu-hotel: like hotel
- gpu-condo: max walltime: 8 hours; max cores/user: 84

Submitting jobs in TSCC

Job manager/schedular in HPC (High-Performace Computing) system

TORQUE Resource Manager (or Portable Batch System, PBS)
- TSCC now uses this
Slurm workload manager
- TSCC 2.0 will use this

Typical PBS script

A draft of PBS script

#! /bin/bash
#PBS -q glean
#PBS -N test_pbs
#PBS -l nodes=1:ppn=1
#PBS -l walltime=[hh:mm:ss]
#PBS -o [output file]
#PBS -e [error file]
#PBS -V
#PBS -M [email address list]
#PBS -m abe
#PBS -A ren-group
[All the shell commands you want to have here]

Create a script like the one above then qsub [the_script]
Use qstat -u [user name] to get the status of the submitted job

Interactive job

qsub -I -q glean -l nodes=1:ppn=2 -l walltime=08:00:00
Add alias myjob="qsub -I -q glean -l nodes=1:ppn=2 -l walltime=08:00:00" to your ~/.bashrc, then source ~./bashrc. You can then use myjob to quickly start an interactive job without needing to remember the details.

Snakemake

Why we need it

I want to submit 1,000 jobs.
Some of them are failed, I need to rerun them.
I have a pipeline, which means jobs have dependencies

Jobs manager:

handle jobs’ dependencies
automatically run the pipeline from where it failed
if some intermedia files / scripts are updated, then automatically update all the later rules depend on them.

Snakemake = Snake[Python] + make[GNU make]

make: dependency-tracking build utilities, written in 1976

Still widely used now, especially GNU make
Drawback:
- Lack of configuration file support, like yaml, json.
- Lack of support for jobs on HPC
Snakemake may die in the future, but make should be still alive.

Snakemake:

written in Python, which makes it simple to use
support config files like yaml, json.
support HPC: both PBS and Slurm

How it looks like:

 ## optional config file
 configfile: "config.yaml"
 content = "say hi"
 samples = ["a", "b"]
 ## all the output files you want to have
 rule all:
     input:
        expand("flag/pre_{s}.done", s = samples)
        expand("flag/first_{s}.done", s = samples)

 ## then set up the rules about how to generate them
 rule pre:
     output:
        # s will be infered based on rule all output
        # here it will be a or b.
        # snakemake will run s=a and s=b in parallel if possible
        # touch will be automatically  generate the flag file
        # once the rule is done.
        touch("flag/pre_{s}.done")
     log:
        "log/{s}.log"
     shell:
        """
        # wildcards.s to get the a / b
        echo "pre:" {content} {wildcards.s} 2> {log}
        """
rule first:
    # snakemake will know that it depends on the output of pre
    # then the rule will run after pre
    input:
        "flag/pre_{s}.done"
    output:
        touch("flag/first_{s}.done")
    log:
        "log/{s}.log"
    shell:
        """
        echo "first:" {content} {wildcards.s} 2> {log}
        """

Then save the above into a file named demo.snakefile.
snakemake --snakefile demo.snakefile to run the snakemake.

Summary

snakefile
- just python syntax + snakemake key words, such as config, rule, expand, wildcards and so on.
- [OPTION but good in practice] config.yaml to claim variables
The input in rule all is used to claim all the final outputs
input and output of rules to organize the depencies of tasks.
wildcards
- inferred based on file names
- the key mechanism to run multiple jobs in parallel

Use Snakemake to control the jobs in TSCC

Use profile to setup the particular enviroment of HPC

A more specific version for Bing’s lab: https://github.com/beyondpie/pbs-torque

Demo for PBS

mkdir profile

under profile, we create two files.

one is config.yaml.

cluster-config: "profile/cluster.yaml"
cluster: "qsub -N {cluster.jobname} -l nodes={cluster.nodes}:ppn={cluster.ppn},walltime={cluster.walltime} -A {cluster.account} -q {cluster.queue} -M {cluster.email} -m {cluster.mailon} -j {cluster.jobout} -e {cluster.logdir} -V "
jobs: 100

one is cluster.yaml .

__default__:
    jobname: "{rule}.{wildcards}"
    nodes: 1
    ppn: 1
    walltime: "02:00:00"
    account: "ren-group"
    queue: "glean"
    email: "debug.pie@gmail.com"
    mailon: "ae"
    jobout: "oe"
    log: "{rule}.{wildcards}.tscc.log"
pre:
    ppn: 1
    queue: "glean"
    walltime: 00:10:00
first:
    ppn: 1
    queue: "glean"
    walltime: 00:10:00

Then snakemake --snakefile demo.snakemake --profile profile to submit jobs into PBS

Tips

Use screen in login nodes, then start snakemake
Use glean to test your pipeline with no payment
Use glean to run your snakemake with no payment
- As long as the job can be finished within 8 hours
- If jobs fail by unknown reason, just rerun snakemake.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.org

README.org

TSCC and Snakemake

Content and Goal

Content:

Goal:

Installation of snakemake (three ways)

Prepare the demo scripts

Introduction to TSCC

TSCC, Triton Shared Computing Cluster in UCSD.

Login nodes:

Computing nodes:

Data-intensive nodes:

Storage

SU: how we pay to TSCC

1 SU: 1 core * 1 hour

how to check money left:

Queue: assign a queue when you submitting a job

Submitting jobs in TSCC

Job manager/schedular in HPC (High-Performace Computing) system

Typical PBS script

A draft of PBS script

Interactive job

Snakemake

Why we need it

Jobs manager:

Snakemake = Snake[Python] + make[GNU make]

make: dependency-tracking build utilities, written in 1976

Snakemake:

How it looks like:

Summary

Use Snakemake to control the jobs in TSCC

Use profile to setup the particular enviroment of HPC

Demo for PBS

Tips

Files

README.org

Latest commit

History

README.org

File metadata and controls

TSCC and Snakemake

Content and Goal

Content:

Goal:

Installation of snakemake (three ways)

Prepare the demo scripts

Introduction to TSCC

TSCC, Triton Shared Computing Cluster in UCSD.

Login nodes:

Computing nodes:

Data-intensive nodes:

Storage

SU: how we pay to TSCC

1 SU: 1 core * 1 hour

how to check money left:

Queue: assign a queue when you submitting a job

Submitting jobs in TSCC

Job manager/schedular in HPC (High-Performace Computing) system

Typical PBS script

A draft of PBS script

Interactive job

Snakemake

Why we need it

Jobs manager:

Snakemake = Snake[Python] + make[GNU make]

make: dependency-tracking build utilities, written in 1976

Snakemake:

How it looks like:

Summary

Use Snakemake to control the jobs in TSCC

Use profile to setup the particular enviroment of HPC

Demo for PBS

Tips