Skip to content

Commit c5155ab

Browse files
Merge branch 'release3' of https://github.com/bio2rdf/bio2rdf-scripts into release3
2 parents 74af641 + 94e91e2 commit c5155ab

23 files changed

+32156
-27983
lines changed

clinicaltrials/clinicaltrials.php

+6-2
Original file line numberDiff line numberDiff line change
@@ -174,8 +174,12 @@ function parse_dir(){
174174
if($i % 10000 == 0) {parent::clear();}
175175
$trial_id = basename($file,'.xml');
176176
if(parent::getParameterValue('id_list') == '' || in_array($trial_id, $ids)) {
177-
echo "Processing $trial_id".PHP_EOL;
178-
$this->process_file($file);
177+
if(filesize($file)!=0) {
178+
echo "Processing $trial_id".PHP_EOL;
179+
$this->process_file($file);
180+
} else{
181+
echo "Processing $trial_id -> Empty!".PHP_EOL;
182+
}
179183
}
180184
}
181185
echo "Finished.".PHP_EOL;

linkedSPLs/LinkedSPLs-activeMoiety/README

+3-42
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,11 @@ This folder will hold a graph that maps the active moiety components of an SPL t
77
Inputs are original mappings listed as belows:
88

99
(1) PT to UNII
10-
1110
(2) UNII to RxNORM
12-
1311
(3) PT to Drugbank
14-
1512
(4) PT to ChEBI
16-
1713
(5) RxNORM to OMOP
18-
1914
(6) RxNORM to DrOn
20-
2115
(7) RxNORM to NDFRT (EPC)
2216

2317
Outputs is a RDF/XML graph that represents all active moiety with linked resouces.
@@ -27,28 +21,10 @@ Outputs is a RDF/XML graph that represents all active moiety with linked resouce
2721
Procedures to get active moieties RDF graph
2822
################################################################################
2923

30-
<STEP 1>: prepare mappings in folder mappings/
31-
32-
----------------------------------------------
33-
mappings of PT to UNII (FDA)
34-
----------------------------------------------
24+
<1>: query to get omopid mapping and put in mapping folder
3525

36-
FROM "../../LinkedSPLs-update/data/FDA/FDAPreferredSubstanceToUNII.txt"
26+
mappings/active-ingredient-omopid-rxcui.dsv
3727

38-
----------------------------------------------
39-
mappings of UNII to RXCUI (UMLS)
40-
----------------------------------------------
41-
42-
FROM "../../LinkedSPLs-update/data/UMLS/UNIIs-Rxcuis-from-UMLS.txt"
43-
44-
----------------------------------------------
45-
mappings/dron-to-chebi-and-rxnorm.txt
46-
----------------------------------------------
47-
FROM "../../LinkedSPLs-update/mappings/DrOn-to-RxNorm/dron-to-chebi-and-rxnorm.txt"
48-
49-
----------------------------------------------
50-
mappings/active-ingredient-omopid-rxcui-09042015.dsv
51-
----------------------------------------------
5228

5329
query OMOP CDM V5 (GeriOMOP) by SQL query below:
5430
SELECT cpt.CONCEPT_ID as omopid, cpt.CONCEPT_CODE as rxcui FROM
@@ -60,21 +36,6 @@ cpt.CONCEPT_CLASS = 'Ingredient';
6036

6137
right click result table and export to csv ('|' delimited)
6238

63-
----------------------------------------------
64-
mappings/UNIIToChEBI-<DATE>.txt
65-
----------------------------------------------
66-
FROM "../../LinkedSPLs-update/mappings/PT-UNII-ChEBI-mapping/UNIIToChEBI-<DATE>.txt"
67-
68-
----------------------------------------------
69-
mappings/pt_drugbank-<DATE>.txt
70-
----------------------------------------------
71-
FROM "../../LinkedSPLs-update/mappings/ChEBI-DrugBank-bio2rdf-mapping/fda-substance-preferred-name-to-drugbank-<DATE>.txt"
72-
73-
----------------------------------------------
74-
mappings/EPC_extraction_most_recent_<DATE>.txt"
75-
----------------------------------------------
76-
FROM "../../LinkedSPLs-update/mappings/pharmacologic_class_indexing/EPC_extraction_most_recent_<DATE>.txt"
77-
7839

7940
<STEP 2>: run python script to merge those mappings together
8041

@@ -92,7 +53,7 @@ outputs: activeMoietySub-in-rdf.xml
9253

9354

9455
################################################################################
95-
PRE#REQUISITES:
56+
PRE-REQUISITES:
9657
################################################################################
9758

9859
python libraries:

linkedSPLs/LinkedSPLs-activeMoiety/mergeToActiveMoiety.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,12 @@
2020
PT_UNII = "../LinkedSPLs-update/data/FDA/FDAPreferredSubstanceToUNII.txt"
2121
UNII_RXCUI = "../LinkedSPLs-update/data/UMLS/UNIIs-Rxcuis-from-UMLS.txt"
2222

23-
PT_CHEBI = "mappings/UNIIToChEBI-06102015.txt"
24-
PT_DRUGBANK = "mappings/fda-substance-preferred-name-to-drugbank-06102015.txt"
23+
PT_CHEBI = "../LinkedSPLs-update/mappings/PT-UNII-ChEBI-mapping/UNIIToChEBI.txt"
24+
PT_DRUGBANK = "../LinkedSPLs-update/mappings/ChEBI-DrugBank-bio2rdf-mapping/fda-substance-preferred-name-to-drugbank.txt"
2525

26-
UNII_NUI_PREFERRED_NAME_ROLE = "mappings/EPC_extraction_most_recent_06102015.txt"
27-
DRON_CHEBI_RXCUI = "mappings/cleaned-dron-chebi-rxcui-ingredient-06222015.txt"
28-
OMOP_RXCUI = "mappings/active-ingredient-omopid-rxcui-09042015.dsv"
26+
UNII_NUI_PREFERRED_NAME_ROLE = "../LinkedSPLs-update/mappings/pharmacologic_class_indexing/EPC_extraction_most_recent.txt"
27+
DRON_CHEBI_RXCUI = "../LinkedSPLs-update/mappings/DrOn-to-RxNorm/cleaned-dron-chebi-rxcui-ingredient.txt"
28+
OMOP_RXCUI = "mappings/active-ingredient-omopid-rxcui.dsv"
2929

3030
## Get UNII - PT - RXCUI
3131
unii_pt_cols = ['unii','pt']

linkedSPLs/LinkedSPLs-activeMoiety/mergedActiveMoiety.csv

+23,924-9,625
Large diffs are not rendered by default.

linkedSPLs/LinkedSPLs-clinicalDrug/README

+7-1
Original file line numberDiff line numberDiff line change
@@ -37,14 +37,20 @@ converted_rxnorm_mappings_<DATE>.txt
3737

3838
Dailymed product label indexing mapping from "linkedSPLs/LinkedSPLs-update/mappings/RxNORM-mapping/converted_rxnorm_mappings_<DATE>"
3939

40-
$ cat converted_rxnorm_mappings_06262015.txt | cut -f1,2 -d\| | sort | uniq > setid_rxcui.txt
40+
$ cd mappings
41+
42+
$ cp ../../LinkedSPLs-update/mappings/RxNORM-mapping/converted_rxnorm_mappings_10132015.txt converted_rxnorm_mappings.txt
43+
44+
$ cat converted_rxnorm_mappings.txt | cut -f1,2 -d\| | sort | uniq > setid_rxcui.txt
4145

4246
----------------------------------------------------------
4347
mappings of FDA preferred term and setId
4448
----------------------------------------------------------
4549

4650
$ mysql -u <username> -p
4751

52+
$ use linkedSPLs;
53+
4854
SELECT setId, fullName FROM linkedSPLs.structuredProductLabelMetadata INTO OUTFILE '/tmp/setid_fullname.txt' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
4955

5056
$ cp /tmp/setid_fullname.txt ../linkedSPLs/LinkedSPLs-clinicalDrug/mappings/

linkedSPLs/LinkedSPLs-clinicalDrug/mergeToClinicalDrug.py

+5-4
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,12 @@
1717
RXNORM_BASE_URI = "http://purl.bioontology.org/ontology/RXNORM/"
1818

1919
## Define data inputs
20-
DRON_RXCUI = "mappings/cleaned-dron-to-rxcui-drug-06222015.txt"
21-
OMOP_RXCUI = "mappings/imeds_drugids_to_rxcuis.csv"
20+
DRON_RXCUI = "../LinkedSPLs-update/mappings/DrOn-to-RxNorm/cleaned-dron-to-rxcui-drug.txt"
2221
SETID_RXCUI = "mappings/setid_rxcui.txt"
2322
FULLNAME_SETID = "mappings/setid_fullname.txt"
24-
#OMOP_RXCUI = "mappings/clinical-drug-omopid-rxcui-09042015.dsv"
23+
24+
#OMOP_RXCUI = "mappings/imeds_drugids_to_rxcuis.csv"
25+
OMOP_RXCUI = "mappings/clinical-drug-omopid-rxcui-09042015.dsv"
2526

2627

2728

@@ -71,6 +72,6 @@
7172
output_DF = fullname_rxcui_setid_DF.merge(dron_omop_rxcui_DF, on=['rxcui'], how='left')
7273

7374
print output_DF.info()
74-
print output_DF
75+
#print output_DF
7576

7677
output_DF.to_csv('mergedClinicalDrug.tsv', sep='\t', index=False)

linkedSPLs/LinkedSPLs-update/README

+58-34
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
CODE TO GENERATE THE SQL AND LINKED-DATA VERSION OF LINKEDSPLS
22
Authors: Richard Boyce, Greg Gardner, Yifan Ning
3-
Date: 09/25/2015
3+
4+
Last updated Date: 09/25/2015
5+
6+
problematic labels:
7+
spls/e8ae0b66-25de-41b1-8013-7f414bbb7568.xml
8+
49

510
################################################################################
611
OVERVIEW
@@ -37,34 +42,49 @@ Update all linkedSPLs mappings by command below
3742
$ ant linkedSPLs-update
3843

3944
Update piece by piece (recommended)
45+
create schema linkedSPLs
46+
modify db connection information in db-connection.properties (if creating linkedSPLs)
47+
$ ant unzip-spls (if creating linkedSPLs)
48+
$ ant createTableSchema (if creating linkedSPLs)
49+
$ ant load-loincSection (if creating linkedSPLs)
50+
$ ant loadDailymedSPLsToRDB (ensure tables are truncated)
4051

4152
$ ant load-FDAPreferredSubstanceToUNII
4253
$ ant load-FDA_UNII_to_ChEBI
54+
$ ant load-FDAPreferredSubstanceToRxNORM
55+
$ ant load-FDA_SUBSTANCE_TO_DRUGBANK_BIO2RDF
56+
$ ant load-SPLSetIDToRxNORM
57+
$ ant load-RXNORM_NDFRT_INGRED_Table
58+
$ ant load-FDA_EPC_Table
4359
$ ant load-ChEBI_DRUGBANK_BIO2RDF
44-
$ ant loadDailymedSPLsToRDB
45-
$ ant load-DrOn_RXCUI_DRUG
46-
$ ant load-DrOn_RXCUI_INGREDIENT
47-
$ ant load-FDA_EPC_Table
60+
4861
$ ant load-FDAPharmgxTable
49-
$ ant load-FDAPharmgxTableToOntologyMap
50-
$ ant load-FDAPreferredSubstanceToRxNORM
51-
$ ant load-FDAPreferredSubstanceToRxNORM-restAPI
52-
$ ant load-FDA_SUBSTANCE_TO_DRUGBANK_BIO2RDF
53-
$ ant load-loincSection
62+
$ ant load-FDAPharmgxTableToOntologyMap
63+
64+
$ ant load-DrOn_RXCUI_DRUG
65+
$ ant load-DrOn_RXCUI_INGREDIENT
66+
67+
-- deprecated --
68+
$ ant load-FDAPreferredSubstanceToRxNORM-restAPI
5469
$ ant load-OMOPId-RXCUIs-from-OHDSI
55-
$ ant load-RXNORM_NDFRT_INGRED_Table
56-
$ ant load-SPLSetIDToRxNORM
5770

5871

5972

6073
################################################################################
6174
PRE-REQUISITES (Download all source data before run any ant command)
6275
################################################################################
6376

77+
Install python libs if deploy linkedSPLs on new environment
78+
79+
apt-get install libxml2-dev libxslt1-dev python-dev
80+
apt-get install python-lxml
81+
apt-get install python-feedparser
82+
83+
6484
Download and organize all source data files in data folder
6585

6686
--------------------------------------------------------
67-
product label sections and mappings from Dailymed:
87+
Dailymed (product label sections, indexing and mappings)
6888
--------------------------------------------------------
6989

7090
(1) dailymed-labels:
@@ -91,7 +111,7 @@ unzip XMLs to folder "pharmacologic_class_indexing_spl_files"
91111
$ cd pharmacologic_class_indexing_spl_files; unzip \*.zip; rm \*.zip
92112

93113
--------------------------------------------------------
94-
FDA Preferred terms, UNIIs from FDA:
114+
FDA (Preferred terms, UNIIs):
95115
--------------------------------------------------------
96116

97117
Download from http://fdasis.nlm.nih.gov/srs/jsp/srs/uniiListDownload.jsp
@@ -109,47 +129,51 @@ Keep in directory LinkedSPLs-update/data/FDA
109129
Edit LinkedSPLs-update/data-source.properties to reset FDA_UNII_NAMES and FDA_UNII_RECORDS
110130

111131
--------------------------------------------------------
112-
Drug bank Id from Drugbank:
132+
Drugbank (Drug bank Id) :
113133
--------------------------------------------------------
114134

115135
Download from http://www.drugbank.ca/downloads
116136

117137
download drugbank.xml as drugbankX.X and keep in directory LinkedSPLs-update/data/DrugBank
118138

119139
--------------------------------------------------------
120-
UMLS:
140+
UMLS (rxcui):
121141
--------------------------------------------------------
122142

123143
Download RXNORM mappings (full rxnorm) from UMLS at "http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html"
124144

125145
keep in directory: "LinkedSPLs-update/data/UMLS"
126146

127-
--------------------------------------------------------
128-
umlsdbmi:
129-
--------------------------------------------------------
130-
Repository: https://bitbucket.org/uamsdbmi/dron
147+
------------------------------------------------------------------------
148+
PharmagxTable && FDAPharmgxTableToOntologyMap
149+
-----------------------------------------------------------------------
131150

132-
DrOn to RxNorm and ChEBI
151+
Get CSVs from solomon
152+
biomarker-to-ontology-mapping.csv
153+
genetic-biomarker-table-raw-import.csv
133154

134-
Dron mapping file (dron-rxnorm.owl for drug and dron-ingredient.owl for ingredients) download from:
135-
https://bitbucket.org/uamsdbmi/dron/src
155+
put at "LinkedSPLs-update/mappings/FDA-pharmacogenetic-info-mapping/"
136156

137-
Install readland:
138-
$ sudo apt-get install redland-utils
157+
Edit data-source.properties
158+
ex.
159+
BIOMARKER = mappings/FDA-pharmacogenetic-info-mapping/biomarker-to-ontology-mapping-07242015.xlsx
160+
GENETIC= genetic-biomarker-table-update-07242015.csv
139161

140-
Load dron mapping for ingredient into a triple store by:
141-
rdfproc -n dron parse dron-ingredient.owl
162+
--------------------------------------------------------
163+
umlsdbmi (DronId for drug and ingredient):
164+
--------------------------------------------------------
165+
Repository: https://bitbucket.org/uamsdbmi/dron
166+
Code: https://bitbucket.org/uamsdbmi/dron/src
142167

143-
rdfproc -c dron-ingredient query sparql - '
144-
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dron: <http://purl.obolibrary.org/obo/dron#> SELECT * WHERE { ?dron dron:DRON_00010000 ?rxcui. }' > dron-chebi-rxcui-ingredient.txt
168+
Download repository from "https://bitbucket.org/uamsdbmi/dron/downloads"
145169

170+
unzip repository and then:
171+
copy dron-rxnorm.owl at data/umasdbmi/dron-rxnorm.owl
172+
copy dron-ingredient.owl at data/umasdbmi/dron-ingredient.owl
146173

147-
Load dron mapping for drug product into triple store by:
148-
rdfproc -n dron-drug parse dron-rxnorm.owl
174+
dron-rxnorm.owl for drug product
175+
dron-ingredient.owl for active ingredients
149176

150-
Mappings pulled using:
151-
rdfproc -c dron-drug query sparql - '
152-
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dron: <http://purl.obolibrary.org/obo/dron#> SELECT * WHERE { ?dron dron:DRON_00010000 ?rxcui. }' > dron-rxcui-drug.txt
153177

154178
------------------------------------------------------------------------
155179
OMOP concept Id from OHDSI or query OMOP CDM V5 (GeriOMOP) by SQL query

0 commit comments

Comments
 (0)