You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: DEPLOYING.md
+6-4Lines changed: 6 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -66,7 +66,7 @@ To import all databases in MongoDB:
66
66
Alternatively (but **not recommended** due to high computational demands) you can run a separate ETL process to download from source, process and import the databases into MongoDB.
67
67
68
68
1. Install the necessary requirements:
69
-
- [R languaje](https://www.r-project.org/). Version 4.1.2 or later (Only necessary if you want to update the Gene information database from Ensembl)
69
+
- [R language](https://www.r-project.org/). Version 4.1.2 or later (Only necessary if you want to update the Gene information database from Ensembl)
70
70
- Some python packages. They can be installed using:
2. The ETL process is programmed in a single bash script for each database. Edit in the bash file of the database that you want to update the **user** and **password** parameters, using the same values that you set in the `docker-compose.yml` file. Bash files can be found in the *'databases'* folder, within the corresponding directories for each database:
@@ -76,7 +76,8 @@ Alternatively (but **not recommended** due to high computational demands) you ca
76
76
- For Gene information ([Ensembl genomic data](https://www.ensembl.org/biomart/martview/), [RefSeq gene summaries](https://www.ncbi.nlm.nih.gov/refseq/), and [CiVIC gene descriptions](https://civicdb.org/welcome)) use "databases/gene_info" directory and the *geneinfo2mongodb.sh* file.
77
77
- For Oncokb cancer genes and drug information, it is necessary to download some datasets from their [official site](https://www.oncokb.org/actionableGenes) (**registration required**). You need to download the _Therapeutic, Diagnostic, and Prognostic_ dataset from [Actionable Genes page](https://www.oncokb.org/actionableGenes) by clicking the _Association button_. Place it within the directory "databases/oncokb" with the name "oncokb_biomarker_drug_associations.tsv". Then, download the dataset from the [Cancer Genes](https://www.oncokb.org/cancerGenes) page by clicking the _Cancer Gene List_ button. Place it within the same directory as above, with the name "cancerGeneList.tsv". Finally, execute the `oncokb2mongodb.sh` script to load both datasets into MongoDB.
78
78
- For cancer related drugs ([Pharmacogenomics Knowledge Base (PharmGKB) ](https://www.pharmgkb.org/)) use "databases\pharmGKB" directory and the *pharmgkb2mongodb.sh* file.
79
-
- For Gene ontology ([Gene Ontology (GO)](http://geneontology.org/use/)) "databases\gene_ontology" directory and the *go2mongodb.sh* file. **NOTE:** This import needs the "Gene nomenclature" databases (2) already imported to properly process the gene ontology databases
79
+
- For Gene ontology ([Gene Ontology (GO)](http://geneontology.org/)) use "databases\gene_ontology" directory and the *go2mongodb.sh* file. **NOTE:** This import needs the "Gene nomenclature" databases (2) already imported to properly process the gene ontology databases
80
+
- For predicted functional associations network (String) it is necessary to download some datasets from their [official site](https://string-db.org/cgi/download), make sure that the **selected organism is Homo Sapiens** (the file sizes should be in Mb), from "INTERACTION DATA" download "protein network data (full network, incl. distinction: direct vs. interologs)" and rename it to "protein.links.full.txt.gz" then from "ACCESSORY DATA" download "list of STRING proteins incl. their display names and descriptions" and rename it to "protein.aliases.txt.gz", place the 2 files in the "databases\string" directory and the *string2mongodb.sh* file.
80
81
3. Run bash files.
81
82
`./<file.sh>`
82
83
where file.sh can be *cpdb2mongodb.sh*, *hgnc2mongodb.sh*, *gtex2mongodb.sh*, *go2mongodb.sh*, *pharmgkb2mongodb.sh*, or *ensembl_gene2mongodb.sh*, as appropriate.
@@ -110,10 +111,11 @@ Where *\<service\>* could be `nginx`, `web` or `mongo`.
110
111
111
112
## Update genomic databases
112
113
If new versions are released forthe genomic databases includedin BioAPI, you can update them by following the instructions below:
113
-
- For the "Metabolic pathways (ConsensusPathDB)", "Gene nomenclature (HUGO Gene Nomenclature Committee)", "Gene ontology (GO)", "Cancer related drugs (PharmGKB)","Gene information (from Ensembl and CiVIC)" and "Cancer and Accionable genes (OncoKB)" databases, it is not necessary to make any modifications to any script. This is because the datasets are automatically downloaded in their most up-to-date versions when the bash file foreach database is executed as describedin the **Manually import the different databases** section of this file.
114
+
- For the "Metabolic pathways (ConsensusPathDB)", "Gene nomenclature (HUGO Gene Nomenclature Committee)", "Gene ontology (GO)", "Cancer related drugs (PharmGKB)","Gene information (from Ensembl and CiVIC)" and "Cancer and Actionable genes (OncoKB)" databases, it is not necessary to make any modifications to any script. This is because the datasets are automatically downloaded in their most up-to-date versions when the bash file foreach database is executed as describedin the **Manually import the different databases** section of this file.
114
115
**Important notes**:
115
116
- For OncoKB the download is not automatic since it requires registration, but the steps to download them manually are explained in the same section mentioned above.
116
-
- For RefSeq gene summaries, the R package [GeneSummary](https://bioconductor.org/packages/release/data/annotation/html/GeneSummary.html) is used. The update of the database will depend on the version that the package includes.
117
+
- For RefSeq gene summaries, the R package [GeneSummary](https://bioconductor.org/packages/release/data/annotation/html/GeneSummary.html) is used. The update of the database will depend on the version that the package includes.
118
+
- For String the download is not automatic, but the steps to download them manually are explained in the same section mentioned above.
117
119
- If you need to update the "Gene expression (Genotype-Tissue Expression)" database, you should also follow the procedures in the section named above, but first you should edit the bash file as follows:
118
120
1. Modify the **gtex2mongodb.sh** file. Edit the variables *"expression_url"* and *"annotation_url"*.
119
121
1. In the *expession_url* variable, set the url corresponding to the GTEx "RNA-Seq Data" compressed file (gz compression). This file should contain the Gene TPMs values (Remember that Gene expression on the GTEx Portal are shown in Transcripts Per Million or TPMs).
Copy file name to clipboardExpand all lines: README.md
+88-38Lines changed: 88 additions & 38 deletions
Original file line number
Diff line number
Diff line change
@@ -463,25 +463,25 @@ as significant. Must be a float. Not recommended to set it higher than 0.05.
463
463
- Code: 200
464
464
- Content:
465
465
The response you get is a list. Each element of the list is a GO term that fulfills the conditions of the query. GO terms can contain name, definition, relations to other terms, etc.
466
-
- `go_id`: Unique identifier.
467
-
- `name`: human-readable term name.
468
-
- `ontology_type`: Denotes which of the three sub-ontologies (cellular component, biological process or molecular function) the term belongs to.
469
-
- `definition`: A textual description of what the term represents, plus reference(s) to the source of the information.
466
+
- `<go_id>`: Unique identifier.
467
+
- `<name>`: human-readable term name.
468
+
- `<ontology_type>`: Denotes which of the three sub-ontologies (cellular component, biological process or molecular function) the term belongs to.
469
+
- `<definition>`: A textual description of what the term represents, plus reference(s) to the source of the information.
470
470
- relations to other terms: Each go term can be related to many other terms wit a [variety of relations](http://geneontology.org/docs/ontology-relations/).
471
-
- `synonyms`: Alternative words or phrases closely related in meaning to the term name, with indication of the relationship between the name and synonym given by the synonym scope.
472
-
- `subset`: Indicates that the term belongs to a designated subset of terms.
473
-
- `relations_to_genes`: list of elements of type Json. Each element corresponds to a to a gene and how it's related to the term.
474
-
- `gene`: name of the gene.
475
-
- `relation_type`: the type of relation between the gene and the GO term. When `filter_type` is enrichment, extra relation will be gather from g:Profiler database. These relations will be shown as "relation obtained from gProfiler".
476
-
- `evidence`: evidence code to indicate how the annotation to a particular term is supported.
477
-
- `enrichment_metrics`: .
478
-
- `p_value`: Hypergeometric p-value after correction for multiple testing.
479
-
- `intersection_size`: The number of genes in the query that are annotated to the corresponding term.
480
-
- `effective_domain_size`: The total number of genes "in the universe " which is used as one of the four parameters for the hypergeometric probability function of statistical significance.
481
-
- `query_size`: The number of genes that were included in the query.
482
-
- `term_size`: The number of genes that are annotated to the term.
483
-
- `precision`: The proportion of genes in the input list that are annotated to the function. Defined as intersection_size/query_size.
484
-
- `recall`: The proportion of functionally annotated genes that the query recovers. Defined as intersection_size/term_size.
471
+
- `<synonyms>`: Alternative words or phrases closely related in meaning to the term name, with indication of the relationship between the name and synonym given by the synonym scope.
472
+
- `<subset>`: Indicates that the term belongs to a designated subset of terms.
473
+
- `<relations_to_genes>`: list of elements of type Json. Each element corresponds to a to a gene and how it's related to the term.
474
+
- `<gene>`: name of the gene.
475
+
- `<relation_type>`: the type of relation between the gene and the GO term. When `filter_type` is enrichment, extra relation will be gather from g:Profiler database. These relations will be shown as "relation obtained from gProfiler".
476
+
- `<evidence>`: evidence code to indicate how the annotation to a particular term is supported.
477
+
- `<enrichment_metrics>`: .
478
+
- `<p_value>`: Hypergeometric p-value after correction for multiple testing.
479
+
- `<intersection_size>`: The number of genes in the query that are annotated to the corresponding term.
480
+
- `<effective_domain_size>`: The total number of genes "in the universe " which is used as one of the four parameters for the hypergeometric probability function of statistical significance.
481
+
- `<query_size>`: The number of genes that were included in the query.
482
+
- `<term_size>`: The number of genes that are annotated to the term.
483
+
- `<precision>`: The proportion of genes in the input list that are annotated to the function. Defined as intersection_size/query_size.
484
+
- `<recall>`: The proportion of functionally annotated genes that the query recovers. Defined as intersection_size/term_size.
485
485
- Example:
486
486
- URL: http://localhost:8000/genes-to-terms
487
487
- body:
@@ -528,20 +528,20 @@ Gets the list of related terms to a term.
528
528
- URL: /related-terms
529
529
- Method: POST
530
530
- Params: A body in Json format with the following content
531
-
- `term_id`: the term if of the term you want to search
532
-
- `relations`: filters the non-hierarchical relations between terms. By default it's ["part_of","regulates","has_part"]. It should always be a list
533
-
- `ontology_type`: filters the ontology type of the terms in the response. By default it's ["biological_process", "molecular_function", "cellular_component"]It should always be a list containing any permutation of the default relations
534
-
- `general_depth`: the search depth with the non-hierarchical relations
535
-
- `hierarchical_depth_to_children`: the search depth with the hierarchical relations in the direction of the children
531
+
- `term_id`: The term ID of the term you want to search
532
+
- `relations`: Filters the non-hierarchical relations between terms. By default it's ["part_of","regulates","has_part"]. It should always be a list
533
+
- `ontology_type`: Filters the ontology type of the terms in the response. By default it's ["biological_process", "molecular_function", "cellular_component"]It should always be a list containing any permutation of the default relations
534
+
- `general_depth`: The search depth for the non-hierarchical relations
535
+
- `hierarchical_depth_to_children`: The search depth for the hierarchical relations in the direction of the children
536
536
- `to_root`: 0 for false 1 fot true. If true get all the terms in the hierarchical relations in the direction of the root
537
537
- Success Response:
538
538
- Code: 200
539
539
- Content: The response you get is a list of GO terms related to the searched term that fulfills the conditions of the query. Each term has:
540
-
- `go_id`: id of the GO term
541
-
- `name`: name of the GO term
542
-
- `ontology_type`: the ontology that the GO term belongs to
543
-
- `relations`: dictionary of relations
544
-
- `relation type`: list of terms related by that relation type to the term
540
+
- `<go_id>`: ID of the GO term
541
+
- `<name>`: Name of the GO term
542
+
- `<ontology_type>`: The ontology that the GO term belongs to
543
+
- `<relations>`: Dictionary of relations
544
+
- `<relation type>`: List of terms related by that relation type to the term
545
545
- Example:
546
546
- URL: http://localhost:8000/related-terms
547
547
- body:
@@ -574,7 +574,7 @@ Gets the list of related terms to a term.
574
574
575
575
### Cancer related drugs (PharmGKB)
576
576
577
-
Gets the list of related drugs to a list of genes.
577
+
Gets a list of related drugs to a list of genes.
578
578
579
579
- URL: /drugs-pharm-gkb
580
580
- Method: POST
@@ -583,14 +583,14 @@ Gets the list of related drugs to a list of genes.
583
583
- Success Response:
584
584
- Code: 200
585
585
- Content: The response you get is a list of genes containing the related drug information
586
-
- `pharmGKB_id`: Identifier assigned to this drug label by PharmGKB
587
-
- `name`: Name assigned to the label by PharmGKB
588
-
- `source`: The source that originally authored the label (e.g. FDA, EMA)
589
-
- `biomarker_flag`: "On" if drug in this label appears on the FDA Biomarker list; "Off (Formerly On)" if the label was on the FDA Biomarker list at one time; "Off (Never On)" if the label was never listed on the FDA Biomarker list (to PharmGKB's knowledge)
590
-
- `Testing Level`: PGx testing level as annotated by PharmGKB based on definitions at https://www.pharmgkb.org/page/drugLabelLegend
591
-
- `Chemicals`: Related chemicals
592
-
- `Genes`: List of related genes
593
-
- `Variants-Haplotypes`: Related variants and/or haplotypes
586
+
- `<pharmGKB_id>`: Identifier assigned to this drug label by PharmGKB
587
+
- `<name>`: Name assigned to the label by PharmGKB
588
+
- `<source>`: The source that originally authored the label (e.g. FDA, EMA)
589
+
- `<biomarker_flag>`: "On" if drug in this label appears on the FDA Biomarker list; "Off (Formerly On)" if the label was on the FDA Biomarker list at one time; "Off (Never On)" if the label was never listed on the FDA Biomarker list (to PharmGKB's knowledge)
590
+
- `<Testing Level>`: PGx testing level as annotated by PharmGKB based on definitions at https://www.pharmgkb.org/page/drugLabelLegend
591
+
- `<Chemicals>`: Related chemicals
592
+
- `<Genes>`: List of related genes
593
+
- `<Variants-Haplotypes>`: Related variants and/or haplotypes
594
594
- Example:
595
595
- URL: http://localhost:8000/drugs-pharm-gkb
596
596
- body:
@@ -613,7 +613,57 @@ Gets the list of related drugs to a list of genes.
The possible error codes are 400, 404 and 500. The content of each of them is a Json with a unique key called "error" where its value is a description of the problem that produces the error. For example:
0 commit comments