Skip to content

PMBB-Informatics-and-Genomics/pmbb-nf-toolkit-exwas-meta-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Documentation for ExWAS Meta-Analysis

Module Overview

This module performs many different types of statistical tests for meta-analyzing effects and/or p-values from exome-wide association studies (ExWASs). It looks at both gene-burden region-based tests and single rare variants, performing analyses including Fisher’s tests and Inverse Variance-Weighted meta-analysis.

Example Module Config File

Example nextflow.config File

Cloning Github Repository

  • Command: git clone https://github.com/PMBB-Informatics-and-Genomics/geno_pheno_workbench.git

  • Navigate to relevant workflow directory...

Software Requirements

Commands for Running the Workflow

  • Singularity Command: singularity build exwas_meta.sif docker://pennbiobank/exwas-meta:latest

  • Docker Command: docker pull pennbiobank/exwas-meta:latest

  • Pull from Google Container Registry: docker pull gcr.io/verma-pmbb-codeworks-psom-bf87/exwas-meta:latest

  • Run Command: nextflow run /path/to/toolkit/module/exwas_meta_analysis.nf

  • Common nextflow run flags:

    • -resume flag picks up workflow where it left off

    • -stub performs a dry run, checks channels without executing code

    • -profile selects the compute profiles in nextflow.config

    • -profile standard uses the Docker image to execute processes

    • -profile cluster uses the Singularity container and submits processes to a queue

    • -profile all_of_us uses the Docker image on All of Us Workbench

  • More info: Nextflow documentation

Input Files for ExWAS_Meta-Analysis

  • ExWAS Singles Summary Statistics

    • Input summary statistics of the singles tests. They are expected to be organized in the directory from which you’re running the pipeline like so: “COHORT/Sumstats/PHENO.SUFFIX”. This matches output of your other workflows, but if you’re starting here, you may use the python script “scripts/set_up_cohort_directory_structure.py” to create symlinks with the correct structure

    • Type: Summary Statistics

    • Format: tsv.gz

    • File Header:

    #CHROM  BP      A1      A2      BETA    OR      SE      P       A1_FREQ N       N_CASES N_CONTROLS
    21      46482292        G       T       0.06082811129594274     1.0627162295714525      0.061462174537242044    0.48893376459755766     0.10478298166200951     49575   12384      37191
    21      38975062        T       C       0.17065668690623242     1.1860834811261391      0.705489644020694       0.7748787400623763      0.026454221287319335    49575   12384      37191
    21      44118844        A       C       0.1934617601789459      1.2134429838217287      0.09367821095381963     0.09458445160078595     0.2519137577341014      49575   12384      37191
    21      24024049        G       A       0.2981114423114272      1.3473119270688356      0.08850429395727123     0.0027432656092397415   0.07047045978001559     49575   12384      37191
    
    
  • ExWAS Regions Summary Statistics

    • Input summary statistics of the regions tests. They are expected to be organized in the directory from which you’re running the pipeline like so: “COHORT/Sumstats/PHENO.SUFFIX”. This matches output of your other workflows, but if you’re starting here, you may use the python script “scripts/set_up_cohort_directory_structure.py” to create symlinks with the correct structure

    • Type: Summary Statistics

    • Format: tsv.gz

    • File Header:

    BETA    OR      SE      P       N       N_CASES N_CONTROLS      REGION  MAX_MAF ANNOT
    0.10749207338367353     1.113482034566681       0.10861255486849287     0.48893376459755766     16082   5316    10766   ENSG00000160256 0.01    pLOF
    0.2752070632715491      1.3168033082414594      1.1376977756875413      0.7748787400623763      16082   5316    10766   ENSG00000160256 0.01    damaging_missense
    0.09866625123248854     1.103697880275314       0.04777625246679377     0.09458445160078595     16082   5316    10766   ENSG00000160256 0.01    other_missense
    0.1869359835432516      1.2055501075333186      0.05549816240001925     0.0027432656092397415   16082   5316    10766   ENSG00000160256 0.01    synonymous
    
    
  • Gene Location File

    • CSV file of

    • Type: Data Table

    • Format: tsv

    • File Header:

    gene_id chromosome  seq_region_start    seq_region_end  gene_symbol
    GENE1   1   1   90  GS1
    GENE2   2   91  100 GS2
    

Output Files for ExWAS_Meta-Analysis

  • Meta-Analysis Sample Sizes

    • A table containing the maximum sample size of each of the meta-analyses based on the input cohorts and phenotypes. The actual numbers for each test may vary if there is missingness for certain variants, but this captures the largest sample size.

    • Type: Summary Table

    • Format: csv

    • File Header:

    ANALYSIS,PHENO,N_Samples
    AFR_EUR,AAA,31265
    AFR_EUR,AAA,31265
    AFR_EUR,BMI_median,38134
    AFR_EUR,BMI_median,38134
    
    
  • Singles Meta-Analysis Top Hits Table

    • A FILTERED top hits csv summary file of results including cohort, phenotype, gene, group annotation, p-values, and other counts. One single summary file will be aggregated from all the “top hits” in each “Singles (Variant) Summary Statistics” file.

    • Type: Summary Table

    • Format: csv

    • File Header:

    chr,pos,effect_allele,other_allele,analysis,phenotype,p_single_stouffer_meta,p_single_stouffer_N_eff,p_single_stouffer_N_studies,beta_single_inv_var_meta,se_single_inv_var_meta,p_single_inv_var_meta,N_eff_inv_var_meta,N_studies_inv_var_meta,p_single_chi2_stat,p_single_heterogeneity
    1,69745,T,C,ALL_M,AAA,0.579096,15172.0,2,-1.06221,1.91491,0.5790965097927716,15172.0,2,,
    1,930282,A,G,ALL_M,LDL_median,0.4106934,12364.0,2,-8.493,10.3236,0.410691052172855,12364.0,2,,
    1,935839,T,C,ALL,T2D,0.619276,39632.0,2,-0.334515,0.673235,0.6192757781492377,39632.0,2,,
    1,935849,C,G,ALL,T2D,0.1225267999999999,39632.0,2,-0.515683,0.333937,0.1225272089986704,39632.0,2,,
    
  • Singles Meta-Analysis Summary Statistics

    • A gzipped, unfiltered TSV (tab-separated) file of the results for the variant (singles) analysis. One file will be created for each unique Cohort, Phenotype, and analysis (regular, cauchy, rare, ultra rare) combination.

    • Type: Summary Statistics

    • Format: tsv.gz

    • File Header:

    phenotype|chromosome|base_pair_location|variant_id        |other_allele|effect_allele|effect_allele_count|effect_allele_frequency|missing_rate|beta      |standard_error|t_statistic|variance|p_value   |p_value_na|is_spa_test|allele_freq_case|allele_freq
    T2Diab   |21        |41801254          |21_41801254_TCTG_T|TCTG        |T            |277                |0.0046611              |0.0         |-0.099231 |0.167775      |-3.52526   |35.5258 |0.5542179 |0.5542179 |False      |0.00426841      |0.00474126
    T2Diab   |21        |41801360          |21_41801360_C_T   |C           |T            |41                 |0.00068991             |0.0         |-0.864441 |0.633121      |-3.98228   |5.38237 |0.08606924|0.08606924|False      |0.000297796     |0.000769948
    T2Diab   |21        |41801603          |21_41801603_C_T   |C           |T            |24                 |0.00040385             |0.0         |0.322923  |0.570593      |0.991852   |3.07148 |0.5714322 |0.5714322 |False      |0.000496327     |0.000384974
    T2Diab   |21        |41801645          |21_41801645_G_A   |G           |A            |58                 |0.000975971            |0.0         |0.0167811 |0.35132       |0.135962   |8.10206 |0.9619027 |0.9619027 |False      |0.00109192      |0.000952304
    
      * Parallel By: Cohort, Phenotype
    
  • Singles Meta-Analysis QQ Plots

    • A QQ Plot of the Null Model vs Log10P results of the analysis for variants. One plot will be created for each unique combination of phenotype, cohort, annotation group (pLof, etc.), and MAF threshold.

    • Type: QQ Plot

    • Format: png

      • Parallel By: Cohort, Phenotype
  • Singles Meta-Analysis Manhattan Plots

    • A dot plot (manhattan plot) of significant variants associated with a phenotype. One plot will be created for each unique combination of phenotype, cohort, annotation group (pLof, etc.), and MAF threshold.

    • Type: Manhattan Plot

    • Format: png

      • Parallel By: Cohort, Phenotype
  • Regions Meta-Analysis Top Hits Table

    • A FILTERED top hits csv summary file of results including cohort, phenotype, gene, group annotation, p-values, and other counts. One single summary file will be aggregated from all the “top hits” in each “Regions Summary Statistics” file.

    • Type: Summary Table

    • Format: csv

    • File Header:

    region,annot_group,max_maf,chr,pos_start,pos_stop,gene_symbol,analysis,phenotype,p_burden_stouffer_meta,p_burden_stouffer_N_eff,p_burden_stouffer_N_studies,beta_burden_inv_var_meta,se_burden_inv_var_meta,p_burden_inv_var_meta,N_eff_inv_var_meta,N_studies_inv_var_meta,p_burden_chi2_stat,p_burden_heterogeneity
    ENSG00000000419,damaging_missense,0.0001,20,50934867,50959140,DPM1,Leave_EUR_Out,T2D,0.606368639793734,11702.0,3,0.0336482650736499,0.0653029773095249,0.6063686397937349,11702.0,3,,
    ENSG00000000419,damaging_missense,0.0001,20,50934867,50959140,DPM1,AFR_EUR,LDL_median,0.4348806366999341,24879.0,2,-0.2393382100454541,0.3751452681678022,0.5234814383226609,24879.0,2,38.57666608752835,
    ENSG00000000419,damaging_missense,0.0001,20,50934867,50959140,DPM1,ALL_F,LDL_median,0.0654224944953294,12515.0,2,-0.9459153671090688,0.5391298446427648,0.0793410420455428,12515.0,2,25.260017289413383,
    ENSG00000000419,damaging_missense,0.0001,20,50934867,50959140,DPM1,ALL_M,AAA,0.4794196005233119,15172.0,2,-0.0451279677642891,0.0638088900735573,0.4794196005233122,15172.0,2,,
    
    
  • Regions Meta-Analysis Summary Statistics

    • A gzipped, unfiltered TSV (tab-separated) file of the results for the gene (regions) analysis if run. One file will be created for each unique Cohort, Phenotype, and analysis (regular, cauchy, rare, ultra rare) combination.

    • Type: Summary Statistics

    • Format: tsv.gz

    • File Header:

    phenotype|gene           |annot            |max_maf|p_value           |p_value_burden    |p_value_skat      |beta_burden        |se_burden         |mac   |mac_case|mac_control|rare_var_count|ultrarare_var_count
    T2Diab   |ENSG00000141956|pLoF             |0.0001 |0.0479451461682565|0.0479451461682565|0.0479451461682565|0.0588652858997042 |0.0297621953331829|12.0  |5.0     |7.0        |0.0           |9.0
    T2Diab   |ENSG00000141956|pLoF             |0.001  |0.0479451461682565|0.0479451461682565|0.0479451461682565|0.0588652858997042 |0.0297621953331829|12.0  |5.0     |7.0        |0.0           |9.0
    T2Diab   |ENSG00000141956|pLoF             |0.01   |0.0479451461682565|0.0479451461682565|0.0479451461682565|0.0588652858997042 |0.0297621953331829|12.0  |5.0     |7.0        |0.0           |9.0
    T2Diab   |ENSG00000141956|damaging_missense|0.0001 |0.464219450219203 |0.464219450219203 |0.464219450219203 |-0.0110759683619445|0.0151328276810456|52.0  |7.0     |45.0       |0.0           |41.0
    
      * Parallel By: Cohort, Phenotype
    
  • Regions Meta-Analysis QQ Plots

    • A QQ Plot of the Null Model vs Log10P results of the analysis for gene regions. One plot will be created for each unique combination of phenotype, cohort, annotation group (pLof, etc.), and MAF threshold.

    • Type: QQ Plot

    • Format: png

      • Parallel By: Cohort, Phenotype, Annot Group, MAF
  • Regions Meta-Analysis Manhattan Plots

    • A dot plot (manhattan plot) of significant gene regions associated with a phenotype. One plot will be created for each unique combination of phenotype, cohort, annotation group (pLof, etc.), and MAF threshold.

    • Type: Manhattan Plot

    • Format: png

      • Parallel By: Cohort, Phenotype, Annot Group, MAF

Parameters for ExWAS_Meta-Analysis

Post-Processing

  • region_plot_pcol (Type: String)

    • One of three values: p_burden, p_skat, or p_skato. While all possible p-values will be utilized for meta-analyses, this flag chooses which will be plotted. Can be left null (defaults to p_skato)
  • gene_location_file (Type: File Path)

    • This file is used for getting gene-based coordinates for plotting .

    • Corresponding Input File: Gene Location File

      • CSV file of

      • Type: Data Table

      • Format: tsv

      • File Header:

      gene_id chromosome  seq_region_start    seq_region_end  gene_symbol
      GENE1   1   1   90  GS1
      GENE2   2   91  100 GS2
      

Pre-Processing

  • singles_effect_cols (Type: Map (Dictionary))

    • A map with seven keys: chr, pos, effect_allele, other_allele, n, n_case, and n_control, where the values are the corresponding test information columns in the singles files.
  • skato_p_col (Type: String)

    • The name of the SKAT-O test p-value from the ExWAS summary stats. It can be set to null if you don’t want to meta-analyze these p-values.
  • skat_p_col (Type: String)

    • The name of the SKAT test p-value from the ExWAS summary stats. It can be set to null if you don’t want to meta-analyze these p-values.
  • burden_cols (Type: Map (Dictionary))

    • A map with three keys: beta, se, and p_value, where the values are the corresponding gene burden test statistic columns in the regions files.
  • singles_info_cols (Type: Map (Dictionary))

    • A map with three keys: beta, se, and p_value, where the values are the corresponding test statistic columns in the singles files.
  • regions_info_cols (Type: Map (Dictionary))

    • A map with six keys: region, annot_group, max_maf, n, n_case, n_control, where the values are the corresponding test information columns in the regions files.
  • singles_sumstats_suffix (Type: String)

    • Suffix for singles files from the ExWAS summary stats

    • Corresponding Input File: ExWAS Singles Summary Statistics

      • Input summary statistics of the singles tests. They are expected to be organized in the directory from which you’re running the pipeline like so: “COHORT/Sumstats/PHENO.SUFFIX”. This matches output of your other workflows, but if you’re starting here, you may use the python script “scripts/set_up_cohort_directory_structure.py” to create symlinks with the correct structure

      • Type: Summary Statistics

      • Format: tsv.gz

      • File Header:

      #CHROM  BP      A1      A2      BETA    OR      SE      P       A1_FREQ N       N_CASES N_CONTROLS
      21      46482292        G       T       0.06082811129594274     1.0627162295714525      0.061462174537242044    0.48893376459755766     0.10478298166200951     49575   12384      37191
      21      38975062        T       C       0.17065668690623242     1.1860834811261391      0.705489644020694       0.7748787400623763      0.026454221287319335    49575   12384      37191
      21      44118844        A       C       0.1934617601789459      1.2134429838217287      0.09367821095381963     0.09458445160078595     0.2519137577341014      49575   12384      37191
      21      24024049        G       A       0.2981114423114272      1.3473119270688356      0.08850429395727123     0.0027432656092397415   0.07047045978001559     49575   12384      37191
      
      
  • regions_sumstats_suffix (Type: String)

    • Suffix for regions files from the ExWAS summary stats

    • Corresponding Input File: ExWAS Regions Summary Statistics

      • Input summary statistics of the regions tests. They are expected to be organized in the directory from which you’re running the pipeline like so: “COHORT/Sumstats/PHENO.SUFFIX”. This matches output of your other workflows, but if you’re starting here, you may use the python script “scripts/set_up_cohort_directory_structure.py” to create symlinks with the correct structure

      • Type: Summary Statistics

      • Format: tsv.gz

      • File Header:

      BETA    OR      SE      P       N       N_CASES N_CONTROLS      REGION  MAX_MAF ANNOT
      0.10749207338367353     1.113482034566681       0.10861255486849287     0.48893376459755766     16082   5316    10766   ENSG00000160256 0.01    pLOF
      0.2752070632715491      1.3168033082414594      1.1376977756875413      0.7748787400623763      16082   5316    10766   ENSG00000160256 0.01    damaging_missense
      0.09866625123248854     1.103697880275314       0.04777625246679377     0.09458445160078595     16082   5316    10766   ENSG00000160256 0.01    other_missense
      0.1869359835432516      1.2055501075333186      0.05549816240001925     0.0027432656092397415   16082   5316    10766   ENSG00000160256 0.01    synonymous
      
      

Workflow

  • my_python (Type: File Path)

    • Path to the python executable to be used for python scripts - often it comes from the docker/singularity container (/opt/conda/bin/python)
  • analyses (Type: Map (Dictionary))

    • Map of lists where keys are meta-analysis group nicknames and lists are groups of cohorts to include in that meta-analysis. This allows for multiple combinations of meta-analyses, for example all cohorts of one sex/ancestry, leave-one-biobank-out.
  • bin_pheno_list (Type: List)

    • Binary phenotype list

Configuration and Advanced Workflow Files

Example Config File Contents (From Path)

params {
    // Map the overall "analyses" (meta-analysis combinations) to cohort/study population lists
    analyses = [
        'AFR_EUR': ['PMBB_AFR_ALL', 'PMBB_EUR_ALL'],
        'ALL': ['PMBB_AFR_ALL', 'PMBB_EUR_ALL', 'PMBB_EAS_ALL', 'PMBB_AMR_ALL', 'PMBB_SAS_ALL'],
        'ALL_M': ['PMBB_AFR_M', 'PMBB_EUR_M', 'PMBB_EAS_M', 'PMBB_AMR_M', 'PMBB_SAS_M'],
        'ALL_F': ['PMBB_AFR_F', 'PMBB_EUR_F', 'PMBB_EAS_F', 'PMBB_AMR_F', 'PMBB_SAS_F'],
        'Leave_EUR_Out': ['PMBB_AFR_ALL', 'PMBB_EAS_ALL', 'PMBB_AMR_ALL', 'PMBB_SAS_ALL']
    ]

    // Executable for python
    my_python = '/opt/conda/bin/python'

    // Lists of phenotypes
    bin_pheno_list =  ['T2D', 'AAA']
    quant_pheno_list = ['LDL_median', 'BMI_median']

    // Pre- and Post-Processing Params (probably starts with .)
    regions_sumstats_suffix = '.exwas_regions.saige.gz'
    singles_sumstats_suffix = '.exwas_singles.saige.gz'

    // Top-Hits tables will be filtered to this p-value
    p_cutoff_summarize = 0.00001

    // When plotting regions, choose p-values from different tests:
    // Possible values are p_burden, p_skat, and p_skato
    region_plot_pcol = 'p_burden'

    // this is for getting ENSEMBL gene symbols and coordinates for summary stats and plotting
    // tab-separated, columns include: gene_id, chromosome, seq_region_start, seq_region_end, gene_symbol
    gene_location_file = '/path/to/data/homo_sapiens_111_b38.txt'

    // Meta-Analysis Test Info
    regions_info_cols = [
        'region': 'gene',
        'annot_group': 'annot',
        'max_maf': 'max_maf',
        'n': 'N',
        'n_case': 'N_case',
        'n_control': 'N_ctrl'
    ]

    // Single-Variant Test Info
    singles_info_cols = [
        'chr': 'chromosome',
        'pos': 'base_pair_location',
        'effect_allele': 'effect_allele',
        'other_allele': 'other_allele',
        'n': 'n',
        'n_case': 'n_case',
        'n_control': 'n_ctrl'
    ]

    // set any of these column parameters to null 
    // if you don't want to meta-analyze those effects
    burden_cols = ['beta': 'beta_burden', 'se': 'se_burden', 'p_value' : 'p_value_burden']
    skat_p_col = 'p_value_skat'
    skato_p_col = null
    singles_effect_cols = ['beta': 'beta', 'se': 'standard_error', 'p_value': 'p_value']
}

Current Dockerfile for Container/Image

FROM continuumio/miniconda3
WORKDIR /app

# biofilter version argument
ARG BIOFILTER_VERSION=2.4.3

RUN apt-get update \    
    # install packages needed to install biofilter and NEAT-plots
    && apt-get install -y --no-install-recommends libz-dev g++ gcc git wget tar unzip make \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    # install python packages needed for pipeline
    && conda install -y -n base -c conda-forge wget libtiff conda-build scipy pandas seaborn matplotlib numpy apsw sqlite \
    && conda clean --all --yes \
    # install NEAT-plots
    && git clone https://github.com/PMBB-Informatics-and-Genomics/NEAT-Plots.git \
    && mv NEAT-Plots/manhattan-plot/ /app/ \
    && conda develop /app/manhattan-plot/ \
    # install biofilter
    && wget https://github.com/RitchieLab/biofilter/releases/download/Biofilter-${BIOFILTER_VERSION}/biofilter-${BIOFILTER_VERSION}.tar.gz -O biofilter.tar.gz \
    && tar -zxvf biofilter.tar.gz --strip-components=1 -C /app \
    && /opt/conda/bin/python setup.py install \
    # make biofilter executable
    && chmod a+rx /app/biofilter.py \
    # remove biofilter tarball and NEAT-plots directory
    && rm -R biofilter.tar.gz NEAT-Plots

USER root

Current nextflow.config contents

includeConfig 'exwas_meta_analysis.config'

profiles {
    non_docker_dev {
        process.executor = awsbatch-or-lsf-or-slurm-etc
        process.queue = 'epistasis_normal'
        process.memory = '15GB'
    }

    standard {
        process.executor = awsbatch-or-lsf-or-slurm-etc
        process.container = 'guarelin/exwas_meta:latest'
        docker.enabled = true
    }

    cluster0 {
        process.executor = awsbatch-or-lsf-or-slurm-etc
        process.queue = 'epistasis_normal'
        process.memory = '15GB'
    	process.container = 'exwas_meta.sif'
        singularity.enabled = true
        singularity.runOptions = '-B /root/,/directory/,/names/'
    }

    all_of_us {
        process.executor = awsbatch-or-lsf-or-slurm-etc
        process.memory = '15GB'
        process.container = 'gcr.io/ritchie-aou-psom-9015/exwas_meta:latest'
        docker.enabled = true
    }
}

params {
    skip_postprocessing_errors = true
}

process {
    withLabel: safe_to_skip {
        errorStrategy=params.skip_postprocessing_errors ? 'ignore' : 'terminate'
    }
}

Detailed Pipeline Steps

from pathlib import Path

detailed_steps_file = Path("Markdowns/Pipeline_Detailed_Steps.md")

Write the detailed steps content to a separate file

detailed_steps_file

Detailed Steps for Runnning One of our Pipelines

Note: test data were obtained from the SAIGE github repo.

Part I: Setup

  1. Start your own tools directory and go there. You may do this in your project analysis directory, but it often makes sense to clone into a general tools location
# Make a directory to clone the pipeline into
TOOLS_DIR="/path/to/tools/directory"
mkdir $TOOLS_DIR
cd $TOOLS_DIR
  1. Download the source code by cloning from git
git clone https://github.com/PMBB-Informatics-and-Genomics/pmbb-nf-toolkit-saige-family.git
cd ${TOOLS_DIR}/pmbb-nf-toolkit-saige-family/
  1. Build the saige.sif singularity image
  • you may call the image whatever you like, and store it wherever you like. Just make sure you specify the name in nextflow.conf
  • this does NOT have to be done for every saige-based analysis, but it is good practice to re-build every so often as we update regularly.
cd ${TOOLS_DIR}/pmbb-nf-toolkit-saige-family/
singularity build saige.sif docker://pennbiobank/saige:latest

Part II: Configure your run

  1. Make a separate analysis/run/working directory.
    • The quickest way to get started, is to run the analysis in the folder the pipeline is run. However, subsequent analyses will over-write results from previous analyses.
    • ❗This step is optional, but We Highly recommend making a tools directory separate from your run directory. The only items that need to be in the run directory are the nextflow.conf file and the ${workflow}.conf file.
WDIR="/path/to/analysis/run1"
mkdir -p 
cd $WDIR
  1. Fill out the nextflow.config file for your system.

    • See Nextflow configuration documentation for information on how to configure this file. An example can be found on our GitHub: Nextflow Config.
    • ❗IMPORTANTLY, you must configure a user-defined profile for your run environments (local, docker, saige, cluster, etc.). If multiple profiles are specified, run with a specific profile using nextflow run -profile ${MY_PROFILE}.
    • For singularity, The profile's attribute process.container should be set to '/path/to/saige.sif' (replace /path/to with the location where you built the image above). See Nextflow Executor Information for more details.
    • ⚠️As this file remains mostly unchanged for your system, We recommend storing this file in the tools/pipeline directory and symlinking it to your run directory.
  2. Create a pipeline-specific .config file specifying your run parameters and input files. See Below for workflow-specific parameters and what they mean.

    • Everything in here can be configured in nextflow.config, however we find it easier to separate the system-level profiles from the individual run parameters.
    • Examples can be found in our Pipeline-Specific Example Config Files.
    • you can compartamentalize your config file as much as you like by passing
    • There are 2 ways to specify the config file during a run:
      • with the -c option on the command line: nextflow run -c /path/to/workflow.conf
      • in the nextflow.conf: at the top of the file add: includeConfig '/path/to/workflow.conf'

Part III: Run your analysis

  • ❗We HIGHLY recommend doing a STUB run to test the analysis using the -stub flag. This is a dry run to make sure your environment, parameters, and input_files are specified and formatted correctly.
  • ❗We HIGHLY recommend doing a test run with the included test data in ${TOOLS_DIR}/pmbb-nf-toolkit-saige-family/test_data
  • in the test_data/ directory for each pipeline, we have several pre-configured analyses runs with input data and fully-specified config files.
# run an exwas stub
nextflow run /path/to/pmbb-nf-toolkit-saige-family/workflows/saige_exwas.nf -profile cluster -c /path/to/run1/exwas.conf -stub
# run an exwas for real
nextflow run /path/to/pmbb-nf-toolkit-saige-family/workflows/saige_exwas.nf -profile cluster -c /path/to/run1/exwas.conf
# resume an exwas run if it was interrupted or ran into an error
nextflow run /path/to/pmbb-nf-toolkit-saige-family/workflows/saige_exwas.nf -profile cluster -c /path/to/run1/exwas.conf -resume

About

repository for exwas meta analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •