MS2CrossCompare Workflow

Overview

This workflow enables efficient all-vs-all MS2 spectra comparison between two folders, supporting both mzML and mgf formats. It uses fast indexing and cosine similarity, and is designed for large-scale GNPS2 batch spectral comparison tasks.

Main Features

Input two folders (each can contain multiple mzML/mgf files); all spectra are automatically discovered recursively.
Only compares spectra between folder1 and folder2 (no intra-folder comparison).
Flexible parameter configuration: tolerance, threshold, minimum matched peaks, alignment strategy, peak filtering, threads, and output filename.
Outputs all high-scoring matches as a TSV file with columns: set1, set2, delta_mz, cosine.

Key Parameters

Parameter	Description	Default Value
inputspectra1	Path to input folder 1	(must be specified)
inputspectra2	Path to input folder 2	(must be specified)
tolerance	MS2 tolerance (Da)	0.01
threshold	Cosine similarity threshold	0.7
minmatches	Minimum number of matched peaks	6
alignment_strategy	Alignment strategy (index_single_charge/index_multi_charge)	index_single_charge
enable_peak_filtering	Enable peak filtering (yes/no)	no
threads	Number of threads	1
output	Output filename	cross_compare_results.tsv

Usage

Command Line

python bin/ms2crosscompare.py data/round1 data/round2 \
    --tolerance 0.01 \
    --threshold 0.7 \
    --minmatches 6 \
    --alignment_strategy index_single_charge \
    --enable_peak_filtering no \
    --threads 1 \
    --output cross_compare_results.tsv

Nextflow Workflow

All parameters can be configured via workflowinput.yaml and are compatible with GNPS2 web interface forms.

nextflow run nf_workflow.nf --inputspectra1 data/round1 --inputspectra2 data/round2 --tolerance 0.01 --threshold 0.7 --minmatches 6 --alignment_strategy index_single_charge --enable_peak_filtering no --threads 1 --output cross_compare_results.tsv

Output

The output is a TSV file with the following columns:

set1: Spectrum ID from folder 1 (including filename and scan number)
set2: Spectrum ID from folder 2
delta_mz: Precursor mass difference
cosine: Cosine similarity score

Workflow Download and Test

Make sure to have the NextflowModule updated if you plan on using it:

git submodule update --init --remote --recursive

To run the workflow to test simply do

make run

To learn NextFlow checkout this documentation:

https://www.nextflow.io/docs/latest/index.html

Installation

You will need to have conda, mamba, and nextflow installed to run things locally.

GNPS2 Workflow Input information

Check the definition for the workflow input and display parameters: https://wang-bioinformatics-lab.github.io/GNPS2_Documentation/workflowdev/

Deployment to GNPS2

In order to deploy, we have a set of deployment tools that will enable deployment to the various gnps2 systems. To run the deployment, you will need the following setup steps completed:

Checked out of the deployment submodules
Conda environment and dependencies
SSH configuration updated

Checking out the deployment submodules

use the following commands from the deploy_gnps2 folder.

You might need to checkout the module, do this by running

git submodule init
git submodule update

You will also need to specify the user on the server that you've been given that your public key has been associated with. If you want to not enter this every time you do a deployment, you can create a Makefile.credentials file in the deploy_gnps2 folder with the following contents

USERNAME=<enter the username>

Deployment Dependencies

You will need to install the dependencies in GNPS2_DeploymentTooling/requirements.txt on your own local machine.

You can find this here.

One way to do this is to use conda to create an environment, for example:

conda create -n deploy python=3.8
pip install -r GNPS2_DeploymentTooling/requirements.txt

SSH Configuration

Also update your ssh config file to include the following ssh target:

Host ucr-gnps2-dev
    Hostname ucr-lemon.duckdns.org

Deploying to Dev Server

To deploy to development, use the following command, if you don't have your ssh public key installed onto the server, you will not be able to deploy.

make deploy-dev

Deploying to Production Server

To deploy to production, use the following command, if you don't have your ssh public key installed onto the server, you will not be able to deploy.

make deploy-prod

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
GNPS2_DeploymentTooling @ a92a048		GNPS2_DeploymentTooling @ a92a048
bin		bin
data		data
debug		debug
deploy_gnps2		deploy_gnps2
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
Makefile		Makefile
README.md		README.md
nextflow.config		nextflow.config
nextflow_slurm.config		nextflow_slurm.config
nf_workflow.nf		nf_workflow.nf
workflowdisplay.yaml		workflowdisplay.yaml
workflowinput.yaml		workflowinput.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MS2CrossCompare Workflow

Overview

Main Features

Key Parameters

Usage

Command Line

Nextflow Workflow

Output

Workflow Download and Test

Installation

GNPS2 Workflow Input information

Deployment to GNPS2

Checking out the deployment submodules

Deployment Dependencies

SSH Configuration

Deploying to Dev Server

Deploying to Production Server

About

Uh oh!

Releases

Packages

Languages

Wang-Bioinformatics-Lab/MS2CrossCompare

Folders and files

Latest commit

History

Repository files navigation

MS2CrossCompare Workflow

Overview

Main Features

Key Parameters

Usage

Command Line

Nextflow Workflow

Output

Workflow Download and Test

Installation

GNPS2 Workflow Input information

Deployment to GNPS2

Checking out the deployment submodules

Deployment Dependencies

SSH Configuration

Deploying to Dev Server

Deploying to Production Server

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages