Merge pull request #36 from ypriverol/readthedocs

ypriverol · web-flow · commit d097ade4d08d · 2021-11-26T10:43:59.000Z
Readthedocs
diff --git a/docs/formats.rst b/docs/formats.rst
@@ -14,7 +14,42 @@ Apart of this three main file formats, additionally, multiple file formats are u
 Input formats
 ---------------------------
 
-The quantms should receive three main inputs: Spectra data files (RAW or mzML); Protein database (Fasta);  Experimental design (SDRF).
+The quantms should receive three main inputs: Experimental design (SDRF); Spectra data files (RAW or mzML); Protein database (Fasta).
+
+SDRF: experimental design
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The HUPO-PSI and ProteomeXchange recently developed the MAGE-TAB an standard file format for experimental design representation. Within the MAGE-TAB, the Sample and Data Relationship Format (SDRF) is a lightweight tab delimited format to represent the sample metadata and its relation with the data files (RAW or mzML files).
+
+.. image:: https://raw.githubusercontent.com/bigbio/proteomics-metadata-standard/master/sdrf-proteomics/images/sdrf-nutshell.png
+   :width: 900
+   :align: center
+
+|
+Multiple concepts from SDRF and **relevant and important** for the quantms pipeline:
+
+**Peptide Search Parameters**:
+
+- comment[cleavage agent details]: enzyme used in the experiment, including sites and positions.
+- comment[modification parameters]: post-translation modifications that will be consider within the peptide/protein search
+- comment[precursor mass tolerance], comment[fragment mass tolerance]: Precursor mass tolerance use for the peptide search. Both each engines Comet and MSGF+ use this parameter.
+
+**Experimental Design**:
+
+- factor value[disease]: The factor value is the variable under study. In a proteomics study it can be the disease, organism part, tumor location, etc. The study variable will have multiple values depending of the samples and conditions. For example, in the SDRF above, the variable under study **factor value[phenotype]** has to values (one for each sample), control (sample 1) and primary tumor (sample 2).
+
+.. important:: When multiple conditions are under study, the user can create multiple SDRFs (one for each variable under study). This is needed because in the LFQ data analysis when match between runs is enable (MBR), the proteomicsLFQ quantification step needs to match samples that belongs to the same condition value.
+
+- characteristics[biological replicate]: Biological replicates are samples that belongs to the same condition value and material source.
+- comment[technical replicate]: Technical replicates are repetitions of measures of the same sample.
+- comment[fraction identifier]: Fraction identifiers are use to numbered and identified each fraction (for any fractionation method).
+- comment[label]: Label is used by quantms to associate samples to labels/channels in the experiment (e.g. TMT127).
+
+Spectra Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The spectra data can be provided in RAW files (Thermo instruments) or preferably in mzML. If RAW files are provided, the first step of the identification pipeline `convert them into mzML <https://quantms.readthedocs.io/en/latest/identification.html#mass-spectra-processing-raw-conversion>`_.
+
 
 Protein databases
 ~~~~~~~~~~~~~~~~~~
@@ -23,10 +58,6 @@ Protein databases can be download from multiple sources; the most common ones ar
 
 .. hint:: Contaminants should be appended to the database. For each contaminant protein the prefix ``CONTAMINANT_`` should be added as prefix of the protein.
 
-Spectra Data
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The spectra data can be provided in RAW files (Thermo instruments) or preferably in mzML. If RAW files are provided, the first step of the identification pipeline convert them into mzML, read :ref:`identification:Mass spectra processing: Raw conversion`.
 
 Output formats
 ---------------------------
diff --git a/docs/index.rst b/docs/index.rst
@@ -3,6 +3,10 @@ quantms: A cloud-based workflow for peptide and protein quantification.
 
 Welcome to the `quantms workflow <https://github.com/bigbio/quantms>`_, a cloud-based workflow for quantitative proteomics analysis of large mass-spectrometric data sets. Several labeling techniques as well as label-free quantification are supported.
 
+.. image:: images/ms-proteomics.png
+   :width: 500
+   :align: center
+
 Contents
 --------
 
@@ -25,6 +29,8 @@ Contents
    .. toctree::
    :maxdepth: 2
 
+|
+
 The following links should be follow to get support and help with the quantms maintainers:
 
 |Get help on Slack|   |Report Issue| |Get help on GitHub Forum|
diff --git a/docs/introduction.rst b/docs/introduction.rst
@@ -8,6 +8,7 @@ Bottom-up proteomics is a common method to identify proteins and characterize th
    :width: 400
    :align: center
 
+|
 
 .. sidebar:: Pipelines and tools
    :subtitle: **It can make your life easier** if you want to explore individual tools:
@@ -30,7 +31,8 @@ Mass spectrometry quantitative data analysis can be divided in three main steps:
 - downstream data analysis and quality control
 
 .. image:: images/quantms.png
-   :width: 350
+   :width: 450
+   :align: center
 
 References
 --------------------------------