-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy path07-annotation.Rmd
438 lines (246 loc) · 26.8 KB
/
07-annotation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
---
bibliography: references.bib
---
# Annotation
When you get the peaks table or features table, annotation of the peaks would help you. Check this review[@domingo-almenara2018] or other reviews[@chaleckis2019; @lai2018; @nash2019; @viant2017; @allard2017; @domingo-almenara2018] for a detailed notes on annotation. The first paper proposed five levels regarding currently computational annotation strategies.
- Level 1: Peak Grouping: MS Psedospectra extraction based on peak shape similarity and peak abundance correlation
- Level 2: Peak Annotation: Adducts, Neutral losses, isotopes, and other mass relationships based on mass distances
- Level 3: Biochemical knowledge based on putative identification, potential biochemical reaction and related statistical analysis
- Level 4: Use and integration of tandem MS data based on data dependent/independent acquisition mode or **in silico** prediction
- Level 5: Retention time prediction based on library-available retention index or quantitative structure-retention relationships (QSRR) models.
Most of the software are at level 1 or 2. If we only have compounds structure, we could guess ions under different ionization method. If we have mass spectrum, we could match the mass spectral by a similarity analysis to the database. In metabolomics, we only have mass spectrum or mass-to-charge ratios. Single mass-to-charge ratio is not enough for identification. That's the one bottleneck for annotation. So prediction is always performed on MS/MS data.
## Issues in annotation
The major issue in annotation is the redundancy peaks from same metabolite. Unlike genomes, peaks or features from peak selection are not independent with each other. Adducts, in-source fragments and isotopes would lead to wrong annotation. A common solution is that use known adducts, neutral losses, molecular multimers or multiple charged ions to compare mass distances.
Another issue is about the MS/MS database. Only 10% of known metabolites in databases have experimental spectral data. Thus *in silico* prediction is required. Some works try to fill the gap between experimental data, theoretical values(from chemical database like chemspider) and prediction together. Here is a nice review about MS/MS prediction[@hufsky2014].
## Peak misidentification
- Isomer
Use separation methods such as chromatography, ion mobility MS, MS/MS. Reversed-phase ion-pairing chromatography and HILIC is useful. Chemical derivatization is another option.
- Interfering compounds
20ppm is the least exact mass accuracy for HRMS.
- In-source degradation products
## Annotation v.s. identification
According to the definition from the Chemical Analysis Working Group of the Metabolomics Standards Intitvative[@sumner2007; @viant2017]. Four levels of confidence could be assigned to identification:
- Level 1 'identified metabolites'
- Level 2 'Putatively annotated compounds'
- Level 3 'Putatively characterised compound classes'
- Level 4 'Unknown'
In practice, data analysis based annotation could reach level 2. For level 1, we need at extra methods such as MS/MS, retention time, accurate mass, 2D NMR spectra, and so on to confirm the compounds. However, standards are always required for solid proof.
For specific group of compounds such as PFASs, the communication of confidence level could be slightly different[@charbonnet2022].
Through MS/MS seemed a required step for identification, recent study found ESI might also generate fragments ions for structure identification [@xue2020a; @xue2021;@bernardo-bermejo2023;@xue2023].
## Molecular Formula Assignment
Cheminformatics will help for MS annotation. The first task is molecular formula assignment. For a given accurate mass, the formula should be constrained by predefined element type and atom number, mass error window and rules of chemical bonding, such as double bond equivalent (DBE) and the nitrogen rule. The nitrogen rule is that an odd nominal molecular mass implies also an odd number of nitrogen. This rule should only be used with nominal (integer) masses. Degree of unsaturation or DBE use rings-plus-double-bonds equivalent (RDBE) values, which should be interger. The elements oxygen and sulphur were not taken into account. Otherwise the molecular formula will not be true.
$$RDBE = C+Si - 1/2(H+F+Cl+Br+I) + 1/2(N+P)+1 $$
To assign molecular formula to a mass to charge ratio, Seven Golden Rules [@kind2007] for heuristic filtering of molecular formulas should be considered:
- Apply heuristic restrictions for number of elements during formula generation. This is the table for known compounds:
```{r echo=FALSE}
df <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
`Mass.Range.[Da]` = c("< 500", NA, "< 1000", NA, "< 2000", NA, "< 3000"),
Library = c("DNP", "Wiley", "DNP", "Wiley", "DNP", "Wiley", "DNP"),
C.max = c(29L, 39L, 66L, 78L, 115L, 156L, 162L),
H.max = c(72L, 72L, 126L, 126L, 236L, 180L, 208L),
N.max = c(10L, 20L, 25L, 20L, 32L, 20L, 48L),
O.max = c(18L, 20L, 27L, 27L, 63L, 40L, 78L),
P.max = c(4L, 9L, 6L, 9L, 6L, 9L, 6L),
S.max = c(7L, 10L, 8L, 14L, 8L, 14L, 9L),
F.max = c(15L, 16L, 16L, 34L, 16L, 48L, 16L),
Cl.max = c(8L, 10L, 11L, 12L, 11L, 12L, 11L),
Br.max = c(5L, 4L, 8L, 8L, 8L, 10L, 8L),
Si.max = c(NA, 8L, NA, 14L, NA, 15L, NA)
)
df
```
- Perform LEWIS and SENIOR check. The LEWIS rule demands that molecules consisting of main group elements, especially carbon, nitrogen and oxygen, share electrons in a way that all atoms have completely filled s, p-valence shells ('[octet rule](https://en.wikipedia.org/wiki/Octet_rule)'). Senior's theorem requires three essential conditions for the existence of molecular graphs
- The sum of valences or the total number of atoms having odd valences is even;
- The sum of valences is greater than or equal to twice the maximum valence;
- The sum of valences is greater than or equal to twice the number of atoms minus 1.
- Perform isotopic pattern filter. Isotope ratio abundance was included in the algorithm as an additional orthogonal constraint, assuming high quality data acquisitions, specifically sufficient ion statistics and high signal/noise ratio for the detection of the M+1 and M+2 abundances. For monoisotopic elements (F, Na, P, I) this rule has no impact. isotope pattern will be useful for brominated, chlorinated small molecules and sulphur-containing peptides.
- Perform H/C ratio check (hydrogen/carbon ratio). In most cases the hydrogen/carbon ratio does not exceed H/C \> 3 with rare exception such as in methylhydrazine (CH6N2). Conversely, the H/C ratio is usually smaller than 2, and should not be less than 0.125 like in the case of tetracyanopyrrole (C8HN5).
- Perform NOPS ratio check (N, O, P, S/C ratios).
```{r echo=FALSE}
df <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Element.ratios = c("H/C","F/C","Cl/C","Br/C",
"N/C","O/C","P/C","S/C","Si/C"),
`Common.range.(covering.99.7%)` = c("0.2–3.1","0–1.5","0–0.8",
"0–0.8","0–1.3","0–1.2","0–0.3","0–0.8",
"0–0.5"),
`Extended.range.(covering.99.99%)` = c("0.1–6","0–6","0–2","0–2",
"0–4","0–3","0–2","0–3","0–1"),
`Extreme.range.(beyond.99.99%)` = c("< 0.1 and 6–9","> 1.5","> 0.8",
"> 0.8","> 1.3","> 1.2","> 0.3",
"> 0.8","> 0.5")
)
df
```
- Perform heuristic HNOPS probability check (H, N, O, P, S/C high probability ratios)
```{r}
df <- data.frame(
stringsAsFactors = FALSE,
Element.counts = c("NOPS all > 1","NOP all > 3","OPS all > 1",
"PSN all > 1","NOS all > 6"),
Heuristic.Rule = c("N< 10, O < 20, P < 4, S < 3",
"N < 11, O < 22, P < 6","O < 14, P < 3, S < 3",
"P < 3, S < 3, N < 4","N < 19 O < 14 S < 8"),
DB.examples.for.maximum.values = c("C15H34N9O8PS, C22H44N4O14P2S2, C24H38N7O19P3S","C20H28N10O21P4, C10H18N5O20P5",
"C22H44N4O14P2S2, C16H36N4O4P2S2",
"C22H44N4O14P2S2, C16H36N4O4P2S2","C59H64N18O14S7")
)
df
```
- Perform TMS check (for GC-MS if a silylation step is involved). For TMS derivatized molecules detected in GC/MS analyses, the rules on element ratio checks and valence tests are hence best applied after TMS groups are subtracted, in a similar manner as adducts need to be first recognized and subtracted in LC/MS analyses.
Seven Golden Rules were built for GC-MS and Hydrogen Rearrangement Rules were major designed for LC-CID-MS/MS[@tsugawa2016]. Based on extensively curated database records and enthalpy calculations, "hydrogen rearrangement (HR) rules" could be extending the even-electron rule for carbon (C) and heteroatoms, oxygen (O), nitrogen (N), phosphorus (P), and sulfur (S). They used high abundance MS/MS peaks that exceeded 10% of their base peaks to identify common features in terms of 4 HR rules for positive mode and 5 HR rules for negative mode.
Seven Golden Rules and Hydrogen Rearrangement Rules might also be captured by statistical models. However, such heuristic rules could reduce the searching space of possible formula.
[molgen](http://molgen.de) generating all structures (connectivity isomers, constitutions) that correspond to a given molecular formula, with optional further restrictions, e.g. presence or absence of particular substructures [@gugisch2015].
[mfFinder](http://www.chemcalc.org/mf_finder/mfFinder_em_new) can predict formula based on accurate mass [@patiny2013].
RAMSI is the robust automated mass spectra interpretation and chemical formula calculation method using mixed integer linear programming optimization [@baran2013].
Here is some other Cheminformatics tools, which could be used to assign meaningful formula or structures for mass spectra.
- [RDKit](https://www.rdkit.org/) Open-Source Cheminformatics Software
- [cdk](https://sourceforge.net/projects/cdk/) The Chemistry Development Kit (CDK) is a scientific, LGPL-ed library for bio- and cheminformatics and computational chemistry written in Java [@guha2007].
- [Open Babel](http://openbabel.org/wiki/Main_Page) Open Babel is a chemical toolbox designed to speak the many languages of chemical data [@oboyle2011].
- [ClassyFire](http://classyfire.wishartlab.com/) is a tool for automated chemical classification with a comprehensive, computable taxonomy [@djoumboufeunang2016].
- BUDDY can perform molecular formula discovery via bottom-up MS/MS interrogation[@xing2023].
## Redundant peaks
Full scan mass spectra always contain lots of redundant peaks such as adducts, isotope, fragments, multiple charged ions and other oligomers. Such peaks dominated the features table[@xu2015; @sindelar2020; @mahieu2017]. Annotation tools could label those peaks either by known list or frequency analysis of the paired mass distances[@ju2020; @kouril2020].
### Adducts list
You could find adducts list [here](https://github.com/stanstrup/commonMZ) from commonMZ project.
### Isotope
Here is [Isotope](https://www.envipat.eawag.ch/index.php) pattern prediction.
### CAMERA
Common [annotation](https://bioconductor.org/packages/release/bioc/html/CAMERA.html) for xcms workflow[@kuhl2012].
### RAMClustR
The software could be found [here](https://github.com/cbroeckl/RAMClustR) [@broeckling2014; @broeckling2016]. The package included a vignette to follow.
### BioCAn
BioCAn combines the results from database searches and in silico fragmentation analyses and places these results into a relevant biological context for the sample as captured by a metabolic model [@alden2017].
### mzMatch
[mzMatch](https://github.com/andzajan/mzmatch.R) is a modular, open source and platform independent data processing pipeline for metabolomics LC/MS data written in the Java language. [@chokkathukalam2013; @scheltema2011] and MetAssign is a probabilistic annotation method using a Bayesian clustering approach, which is part of mzMatch[@daly2014].
### xMSannotator
The software could be found [here](https://github.com/yufree/xMSannotator)[@uppal2017].
### mWise
[mWise](https://github.com/b2slab/mWISE) is an Algorithm for Context-Based Annotation of Liquid Chromatography--Mass Spectrometry Features through Diffusion in Graphs[@barranco-altirriba2021].
### MAIT
You could find source code [here](https://github.com/jpgroup/MAIT)[@fernandez-albert2014a].
### pmd
[Paired Mass Distance(PMD)](https://github.com/yufree/pmd) analysis for GC/LC-MS based nontarget analysis to remove redundant peaks[@yu2019a].
### nontarget
[nontarget](https://github.com/blosloos/nontarget) could find Isotope & adduct peak grouping, and perform homologue series detection [@loos2017].
### Binner
[Binner](https://binner.med.umich.edu/) Deep annotation of untargeted LC-MS metabolomics data [@kachman2020]
### mz.unity
You could find source code [here](https://github.com/nathaniel-mahieu/mz.unity) [@mahieu2016a] and it's for detecting and exploring complex relationships in accurate-mass mass spectrometry data.
### MS-FLO
[ms-flo](https://bitbucket.org/fiehnlab/ms-flo/src/657d85ec7bdd?at=master) A Tool To Minimize False Positive Peak Reports in Untargeted Liquid Chromatography--Mass Spectroscopy (LC-MS) Data Processing [@defelice2017].
### CliqueMS
CliqueMS is a computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network [@senan2019].
### InterpretMSSpectrum
This [package](https://github.com/cran/InterpretMSSpectrum) is for annotate and interpret deconvoluted mass spectra (mass\*intensity pairs) from high resolution mass spectrometry devices. You could use this package to find molecular ions for GC-MS [@jaeger2016].
### NetID
NetID is a global network optimization approach to annotate untargeted LC-MS metabolomics data[@chen2021].
### ISfrag
De Novo Recognition of In-Source Fragments for Liquid Chromatography--Mass Spectrometry Data[@guo2021]
### FastEI
Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library[@yang2023a]
## MS1 MS2 connection
### PMDDA
Three step workflow: MS1 full scan peak-picking, GlobalStd algorithm to select precursor ions for MS2 from MS1 data and collect the MS2 data and annotation with GNPS[@yu2022b].
### HERMES
A molecular-formula-oriented method to target the metabolome[@gine2021].
### dpDDA
Similar work can be found here with inclusion list of differential and preidentified ions (dpDDA)[@zhang2023].
## MS2 MSn connection
A computational approach to generate adatabase of high-resolution-MS n spectra by converting existing low-resolution MSn spectra using complementary high-resolution-MS2 spectra generated by beam-type CAD[@lieng2023].
## MS/MS annotation
MS/MS annotation is performed to generate a matching score with library spectra. The most popular matching algorithm is dot product similarity. A recent study found spectral entropy algorithm outperformed dot product similarity [@li2021;@li2023b;]. Comparison of Cosine, Modified Cosine, and Neutral Loss Based Spectrum Alignment showed modified cosine similarity outperformed neutral loss matching and the cosine similarity in all cases. The performance of MS/MS spectrum alignment depends on the location and type of the modification, as well as the chemical compound class of fragmented molecules[@bittremieux2022]. This work proposed a method weighting low-intensity MS/MS ions and m/z frequency for spectral library annotation, which will be help to annotate unknown spectra[@englerhart2024]. [BLINK](https://github.com/biorack/blink) enables ultrafast tandem mass spectrometry cosine similarity scoring[@harwood2023]. MS2Query enable the reliable and scalable MS2 mass spectra-based analogue search by machine learning[@dejonge2023]. However, A spectroscopic test suggests that fragment ion structure annotations in MS/MS libraries are frequently incorrect[@vantetering2024].
Machine learning can also be applied for MS2 annotation[@codrean2023;@guo2023;@bilbao2023].
You could check $$Workflow$$ section for popular platform. Here are some stand-alone annotation software:
### Matchms
[Matchms](https://github.com/matchms/matchms) is an open-source Python package to import, process, clean, and compare mass spectrometry data (MS/MS). It allows to implement and run an easy-to-follow, easy-to-reproduce workflow from raw mass spectra to pre- and post-processed spectral data. Spectral data can be imported from common formats such mzML, mzXML, msp, metabolomics-USI, MGF, or json (e.g. GNPS-syle json files). Matchms then provides filters for metadata cleaning and checking, as well as for basic peak filtering. Finally, matchms was build to import and apply different similarity measures to compare large amounts of spectra. This includes common Cosine scores, but can also easily be extended by custom measures. Example for spectrum similarity measures that were designed to work in matchms are Spec2Vec and MS2DeepScore[@huber2020].
### MetDNA
MetDNA is the Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics [@shen2019].
### MetFusion
Java based [integration](https://github.com/mgerlich/MetFusion) of compound identification strategies. You could access the application [here](https://msbi.ipb-halle.de/MetFusion/) [@gerlich2013].
### MS2Analyzer
MS2Analyzer could annotate small molecule substructure from accurate tandem mass spectra. [@ma2014a]
### MetFrag
[MetFrag](http://c-ruttkies.github.io/MetFrag/) could be used to make *in silico* prediction/match of MS/MS data[@ruttkies2016; @wolf2010].
### CFM-ID
[CFM-ID](https://sourceforge.net/projects/cfm-id/) use Metlin's data to make prediction [@allen2014] and 4.0 [@allen2014].
### LC-MS2Struct
A machine learning framework for structural annotation of small-molecule data arising from liquid chromatography–tandem mass spectrometry (LC-MS2) measurements.[@bach2022]
### LipidFrag
[LipidFrag](https://msbi.ipb-halle.de/msbi/lipidfrag) could be used to make *in silico* prediction/match of lipid related MS/MS data [@witting2017].
### Lipidmatch
[in silico](http://secim.ufl.edu/secim-tools/lipidmatch/): *in silico* lipid mass spectrum search [@koelmel2017].
### BarCoding
Bar coding select mass-to-charge regions containing the most informative metabolite fragments and designate them as bins. Then translate each metabolite fragmentation pattern into a binary code by assigning 1's to bins containing fragments and 0's to bins without fragments. Such coding annotation could be used for MRM data [@spalding2016].
### iMet
This online [application](http://imet.seeslab.net/) is a network-based computation method for annotation [@aguilar-mogas2017].
### DNMS2Purifier
XGBoost based MS/MS spectral cleaning tool using intensity ratio fluctuation, appearance rate, and relative intensity[@zhao2023].
### IDSL.CSA
Composite Spectra Analysis for Chemical Annotation of Untargeted Metabolomics Datasets[@baygi2023].
## Knowledge based annotation
### Experimental design
Physicochemical Property can be used for annotation with a specific experimental design[@abrahamsson2023].
### Chromatographic retention-related criteria
For targeted analysis, chromatographic retention time could be the qualitative method for certain compounds with a carefully designed pre-treatment. For untargeted analysis, such information could also be used for annotation. GC-MS usually use retention index for certain column while LC-MS might not show enough reproducible results as GC. Such method could be tracked back to quantitative structure-retention relationship (QSRR) models or linear solvation energy relationship (LSER). However, such methods need molecular descriptors as much as possible. For untargeted analysis, retention time and mass to charge ratio could not generate enough molecular descriptors to build QSPR models. In this case, such criteria might be usefully for validation instead of annotation unless we could measure or extract more information such as ion mobility from unknown compounds.
- [Retip](https://www.retip.app/) Retention Time Prediction for Compound Annotation in Untargeted Metabolomics [@bonini2020].
- JAVA based [MolFind](http://metabolomics.pharm.uconn.edu/?q=Software.html) could make annotation for unknown chemical structure by prediction based on RI, ECOM50, drift time and CID spectra [@menikarachchi2012].
- [For-ident](https://water.for-ident.org/#!search) could give a score for identification with the help of logD(relative retention time) and/or MS/MS.
- [RT-Transformer](https://github.com/01dadada/RT-Transformer): retention time prediction for metabolite annotation to assist in metabolite identification,which is a novel deep neural network model coupled with graph attention network and 1D-Transformer, which can predict retention times under any chromatographic methods.
- RT prediction model(random forest) of unified-HILIC/AEX/HRMS/MS, which enables the comprehensive structural annotation of polar metabolites(Unified-HILIC/AEX/HRMS/MS)[@torigoe2024a].
### ProbMetab
Provides probability ranking to candidate compounds assigned to masses, with the prior assumption of connected sample and additional previous and spectral information modeled by the user. You could find source code [here](https://github.com/rsilvabioinfo/ProbMetab) [@silva2014].
### MI-Pack
You could find python software [here](http://www.biosciences-labs.bham.ac.uk/viant/mipack/) [@weber2010].
### MetExpert
[MetExpert](https://sourceforge.net/projects/metexpert/) is an expert system to assist users with limited expertise in informatics to interpret GCMS data for metabolite identification without querying spectral databases [@qiu2018].
### MycompoundID
[MycompoundID](http://www.mycompoundid.org/mycompoundid_IsoMS/single.jsp) could be used to search known and unknown metabolites online [@li2013a].
### MetFamily
[Shiny app](https://msbi.ipb-halle.de/MetFamily/) for MS and MS/MS data annotation [@treutler2016].
### CoA-Blast
For certain group of compounds such as [Acyl-CoA](https://github.com/urikeshet/CoA-Blast), you might build a class level in silico database to annotated compounds with certain structure[@keshet2022].
### KGMN
Knowledge-guided multi-layer network (KGMN) integrates three-layer networks, including knowledge-based metabolic reaction network, knowledge-guided MS/MS similarity network, and global peak correlation network for annotaiton [@zhou2022].
### CCMN
CCMNs were then constructed using metabolic features shared classes, which facilitated the structure- or class annotation for completely unknown metabolic features[@zhang2024].
## MS Database for annotation
### MS
- [Fiehn Lab](http://fiehnlab.ucdavis.edu/projects/binbase-setup)
- [NIST](https://www.nist.gov/srd/nist-standard-reference-database-1a-v17): No free
- [Spectral Database for Organic Compounds, SDBS](https://sdbs.db.aist.go.jp/sdbs/cgi-bin/cre_index.cgi?lang=eng)
- [MINE](http://minedatabase.mcs.anl.gov/#/faq) is an open access database of computationally predicted enzyme promiscuity products for untargeted metabolomics. The annotation would be accurate for general compounds database.
### MS/MS
LibGen can generate high quality spectral libraries of Natural Products for EAD-, UVPD-, and HCD-High Resolution Mass Spectrometers[@kong2023].
- [MoNA](http://mona.fiehnlab.ucdavis.edu/) Platform to collect all other open source database
- [MassBank](http://www.massbank.jp/?lang=en)
- [GNPS](https://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp) use inner correlationship in the data and make network analysis at peaks' level instand of annotated compounds to annotate the data.
- [ReSpect](http://spectra.psc.riken.jp/): phytochemicals
- [Metlin](https://metlin.scripps.edu/) is another useful online application for annotation[@guijas2018].
- [LipidBlast](http://fiehnlab.ucdavis.edu/projects/LipidBlast): *in silico* prediction
- [Lipid Maps](http://www.lipidmaps.org/)
- [MZcloud](https://www.mzcloud.org/)
- [NIST](https://www.nist.gov/srd/nist-standard-reference-database-1a-v17): Not free
- [GMDB](https://jcggdb.jp/rcmg/glycodb/Ms_ResultSearch) a multistage tandem mass spectral database using a variety of structurally defined glycans.
- [HMDB](http://www.hmdb.ca/) is a freely available electronic database containing detailed information about small molecule metabolites found in the human body.
- [KEGG](https://www.genome.jp/kegg/compound/) is a collection of small molecules, biopolymers, and other chemical substances that are relevant to biological systems.
## Compounds Database
- [PubChem](https://pubchem.ncbi.nlm.nih.gov/) is an open chemistry database at the National Institutes of Health (NIH).
- [Chemspider](http://www.chemspider.com/) is a free chemical structure database providing fast text and structure search access to over 67 million structures from hundreds of data sources.
- [ChEBI](https://www.ebi.ac.uk/chebi/) is a freely available dictionary of molecular entities focused on 'small' chemical compounds.
- [RefMet](http://www.metabolomicsworkbench.org/databases/refmet/index.php) A Reference list of Metabolite names.
- [CAS](https://www.cas.org/support/documentation/chemical-substances/cas-registry-100-millionth-fun-facts) Largest substance database
- [CompTox](https://comptox.epa.gov/dashboard) compounds, exposure and toxicity database. [Here](https://www.epa.gov/chemical-research/downloadable-computational-toxicology-data) is related data.
- [T3DB](http://www.t3db.ca/) is a unique bioinformatics resource that combines detailed toxin data with comprehensive toxin target information.
- [FooDB](http://foodb.ca/) is the world's largest and most comprehensive resource on food constituents, chemistry and biology.
- [Phenol explorer](http://phenol-explorer.eu) is the first comprehensive database on polyphenol content in foods.
- [Drugbank](https://www.drugbank.ca/releases/latest) is a unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information.
- [LMDB](http://lmdb.ca) is a freely available electronic database containing detailed information about small molecule metabolites found in different livestock species.
- [HPV](https://iaspub.epa.gov/oppthpv/hpv_hc_characterization.get_report?doctype=2) High Production Volume Information System
There are also metabolites atlas for specific domain.
- PMhub 1.0: a comprehensive plant metabolome database[@tian2023]
- Atlas of Circadian Metabolism[@dyar2018]
- Plantmat [excel library](https://sourceforge.net/projects/plantmat/) based prediction for plant metabolites[@qiu2016].