-
Notifications
You must be signed in to change notification settings - Fork 2
1.2.1 Data files
The user has to provide his data files within the data folder. Original data files themselves
have to be placed in the dataset subfolder (within data/
) along with a metadata tabular file that contains the experimental setup corresponding to the data.
Both quantification and metadata files must be provided as in .csv 'tab' delimited format.
Note: if you provide at least one type of measure, you can still run some of the analyses, by making sure that the data you provide is suitable for the analysis that you choose.
The structure of the tabular metadata file has to contain 6 columns named
name_to_plot
, timepoint
,
timenum
, condition
,
compartment
, original_name
.
Explanation of the metadata format (click to show/hide)
Here is the semantics of the columns:
-
name_to_plot
is the string that will appear on the figures produced by DIMet -
condition
is the experimental condition -
timepoint
is the sampling time as it is defined in your experimental setup (it is an arbitary string that can contain non numerical characters) -
timenum
is the numerical encoding of thetimepoint
-
compartment
is the name of the cellular compartment for which the measuring has been done (e.g. "endo", "endocellular", "cyto", etc) -
original_name
contains the column names that are provided in the quantification files
Example:
name_to_plot | condition | timepoint | timenum | compartment | original_name |
---|---|---|---|---|---|
Cond1 T0 | cond1 | T0 | 0 | comp_name | T0_cond_1 |
Cond1 T24 | cond1 | T24 | 24 | comp_name | T24_cond_1 |
Cond2 T0 | cond2 | T0 | 0 | comp_name | T0_cond_2 |
Cond3 T24 | cond2 | T24 | 24 | comp_name | T24_cond_2 |
Each quantification file is expected to correspond to one type of measure. Supported measure types are:
- Isotopologue absolute values
- Total metabolite abundances
- Mean enrichment (also called Fractional contribution)
- Isotopologue proportions
Expected format of the quantification files with examples (click to show/hide)
Each row in the quantification files contains measurements for a given metabolite. Expected columns are the following:
-
ID
contains the molecule identifiers - All the other columns contain measures in numeric format (no letters or symbols, only numbers).
Note 1: quantification columns' names have to match with the column original_name
in the metadata file.
Note 2: For the isotopologues, the ID must follow the convention: metaboliteID_m+X
(for example: AMP_m+4
, cit_m+0
, cit_m+1
)
The total metabolites' Abundances file:
ID | T0_cond_1 | T24_cond_1 | T0_cond_2 | T24_cond_2 |
---|---|---|---|---|
PEP | 3364610.46 | 10250098.25 | 1124772.29 | 1035932.25 |
citrate | 5783654.51 | 5934305.65 | 3546334.99 | 3460334.88 |
fumarate | 354387.74 | 360087.74 | 334287.74 | 350387.74 |
OA | 9435186.33 | 9435186.33 | 9435186.33 | 9435186.33 |
The Mean enrichment (also called Fractional contribution) file:
ID | T0_cond_1 | T24_cond_1 | T0_cond_2 | T24_cond_2 |
---|---|---|---|---|
PEP | 0.5603 | 0.6391 | 0.9591 | 0.9553 |
citrate | 0.8057 | 0.8870 | 0.7809 | 0.6918 |
fumarate | 0.001 | 0 | 0.1508 | 0.1511 |
OA | 0.7030 | 0.7006 | 0.001 | 0 |
The Isotopologue absolute values file:
ID | T0_cond_1 | T24_cond_1 | T0_cond_2 | T24_cond_2 |
---|---|---|---|---|
PEP_m+0 | 357354.66 | 387054.66 | 0 | 0 |
PEP_m+1 | 965435.68 | 975030.68 | 668.91 | 568.87 |
PEP_m+2 | 1435050.95 | 7987654.66 | 136749.05 | 137709.05 |
PEP_m+3 | 606769.17 | 900358.25 | 987354.33 | 897654.33 |
The Isotopologue proportions file :
ID | T0_cond_1 | T24_cond_1 | T0_cond_2 | T24_cond_2 |
---|---|---|---|---|
PEP_m+0 | 0.106 | 0.038 | 0.000 | 0.000 |
PEP_m+1 | 0.287 | 0.095 | 0.001 | 0.001 |
PEP_m+2 | 0.427 | 0.779 | 0.122 | 0.133 |
PEP_m+3 | 0.180 | 0.088 | 0.878 | 0.867 |
DIMet offers the possitibilty of pathway-based integration of the metabolome and the transcriptome though metabolograms.
Data files required for omics integration (click to show/hide)
Two data types are required:- Metabolite quantification files in the dataset subfolder.
- Results, provided by the user, of the differential analysis of the transcriptome data placed in the dataset subfolder
Together with the files with differentially expressed genes provided, the user must also provide the pathways files (details in item 2.2 of this subsection).
Thus the expected project data structure becomes:
MYPROJECT
├── config
│ ├── analysis
│ │ ├── dataset
│ │ │ └── # --->'dataset configuration' yml files
│ │ ├── # --->'analysis configuration' yml files
│ ├── # ---> 'general configuration' yml files
└── data
└── DATASET1_data
├── # ---> tabular .csv files of metabolomics data
├── # ---> .csv files required for omics integration (genes and pathways)
- 2.1 Files for differentially expressed genes (DEGs)
Files for differentially expressed genes (DEGs) must be provided in the tab delimited .csv format. For each file:
- The rows represent the genes (except the first one, which is the header having the names of the columns)
- The columns provide the information to be integrated, two columns are compulsory:
- the gene names, given as strings
- the Fold-Changes (or the log2 Fold-Changes) in numeric format (no letters or symbols, only numbers)
Formatting example of differentially expressed genes files:
log2FoldChange | gene_symbol |
---|---|
-16.1660338229612 | GPI |
3.32192809488736 | HK1 |
2.32192809488736 | RPIA |
0.807354922057604 | PFKL |
- 2.2 The metabolites per pathway and genes or transcripts per pathway files
These files contain the user-provided metabolites and genes for each pathway. It is allowed for a metabolite or gene to appear in several pathways. Identifiers must match with those appearing in the quantification files in the dataset subfolder. Gene names must match with those appearing in the DEGs file
Example for metabolites per pathway:
GLYCOLYSIS | PENTOSE_PHOSPHATE | ... |
---|---|---|
Glucose_6P | Ribose_5P | ... |
Pyruvate | Xylulose_5P | ... |
PEP | Glucose_6P | ... |
... | ... | ... |
Example for genes per pathway:
GLYCOLYSIS | PENTOSE_PHOSPHATE | ... |
---|---|---|
GPI | RPIA | ... |
HK1 | PGD | ... |
PKFL | RBKS | ... |
... | ... | ... |
All these files must be provided in the tab delimited .csv format.