GitHub - Wang-Bioinformatics-Lab/PublicDataset_ReDU_Metadata_Workflow

GNPS Metadata Tools
This repository contains three Python scripts for working with GNPS metadata files: gnps_downloader.py, gnps_validator.py, and gnps_name_matcher.py. These scripts allow you to aggregate lists of GNPS metadata files, validate the files using the GNPS metadata validator, and match the files to their respective CCMS peak dataset names.

Installation
To use these scripts, you'll need to have Python 3 installed on your system.
You can download Python from the official Python website: https://www.python.org/downloads/

-> You can install dependencies using pip:
pip install pandas urllib requests

Usage
gnps_downloader.py
This script aggregates a list of GNPS metadata files, sorts the files by their creation time, and downloads the latest GNPS metadata file. The script then appends the file path and file name into a TSV file.

To run the script, use the following command:
python3 gnps_downloader.py

gnps_validator.py
This script runs the downloaded GNPS metadata files against the metadata validator and stores the list of file names that have passed through the validator. The script also rejects files that haven't passed and appends the passed file names into a TSV file.

To run the script, use the following command:
python3 gnps_validator.py

gnps_name_matcher.py
This script matches the GNPS metadata files to their respective CCMS peak dataset names and gives out a TSV file that contains all the names that match unambiguously.

To run the script, use the following command:
python3 gnps_name_matcher.py

data/allowed_terms.json
The terms allowed in REDU are pulled from this json. Terms from controlled ontologies for variables MassSpectrometer, NCBITaxonomy, UBERONBodyPartName, and DOIDCommonName are added to json within the workflow. Run the data/get_data.sh to download required data. Additional terms can be added to the json, but dont forget to update also in the the google sheet (https://docs.google.com/spreadsheets/d/1v71bnUd8fiXX51zuZIUAvYETWmpwFQj-M3mu4CNsHBU/edit#gid=791995663).

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
GNPS2_DeploymentTooling @ 659667b		GNPS2_DeploymentTooling @ 659667b
bin		bin
data		data
deploy_gnps2		deploy_gnps2
jupyter		jupyter
.gitignore		.gitignore
.gitmodules		.gitmodules
Makefile		Makefile
README.md		README.md
nextflow.config		nextflow.config
nextflow_hpcc.config		nextflow_hpcc.config
nf_workflow.nf		nf_workflow.nf
requirements.txt		requirements.txt
workflowdisplay.yaml		workflowdisplay.yaml
workflowinput.yaml		workflowinput.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Wang-Bioinformatics-Lab/PublicDataset_ReDU_Metadata_Workflow

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages