1
1
CODE TO GENERATE THE SQL AND LINKED-DATA VERSION OF LINKEDSPLS
2
2
Authors: Richard Boyce, Greg Gardner, Yifan Ning
3
- Date: 6/08 /2015
3
+ Date: 09/25 /2015
4
4
5
5
################################################################################
6
6
OVERVIEW
@@ -28,24 +28,37 @@ Config mysql db connection at : db-connection.properties
28
28
29
29
Run shell commands below to create database schema and unzip and parse dailymed XMLs and load product label sections from dailymed XMLs into Mysql Schema linkedSPLs
30
30
31
- $ cd LinkedSPLs-update
31
+ $ cd bio2rdf/linkedSPLs/LinkedSPLs-update
32
+
32
33
$ ant linkedSPLs-setup
33
34
34
35
Update all linkedSPLs mappings by command below
35
- (1) mappings of preferred term to UNII
36
- (2) mappings of preferred term to ChEBI
37
- (3) mappings of Preferred term and Rxnorm
38
- (4) mappings of Preferred term, UNII and Drugbank URI
39
- (5) mappings of Preferred term, rxcui and Dailymed setid
40
- (6) mappings of RxNORM, NUI and NDFRT label
41
- (7) mappings of setId, UNII, NUI and PreferredNameAndRole
42
-
43
36
44
37
$ ant linkedSPLs-update
45
38
39
+ Update piece by piece (recommended)
40
+
41
+ $ ant load-FDAPreferredSubstanceToUNII
42
+ $ ant load-FDA_UNII_to_ChEBI
43
+ $ ant load-ChEBI_DRUGBANK_BIO2RDF
44
+ $ ant loadDailymedSPLsToRDB
45
+ $ ant load-DrOn_RXCUI_DRUG
46
+ $ ant load-DrOn_RXCUI_INGREDIENT
47
+ $ ant load-FDA_EPC_Table
48
+ $ ant load-FDAPharmgxTable
49
+ $ ant load-FDAPharmgxTableToOntologyMap
50
+ $ ant load-FDAPreferredSubstanceToRxNORM
51
+ $ ant load-FDAPreferredSubstanceToRxNORM-restAPI
52
+ $ ant load-FDA_SUBSTANCE_TO_DRUGBANK_BIO2RDF
53
+ $ ant load-loincSection
54
+ $ ant load-OMOPId-RXCUIs-from-OHDSI
55
+ $ ant load-RXNORM_NDFRT_INGRED_Table
56
+ $ ant load-SPLSetIDToRxNORM
57
+
58
+
46
59
47
60
################################################################################
48
- PRE-REQUISITES
61
+ PRE-REQUISITES (Download all source data before run any ant command)
49
62
################################################################################
50
63
51
64
Download and organize all source data files in data folder
@@ -56,7 +69,8 @@ product label sections and mappings from Dailymed:
56
69
57
70
(1) dailymed-labels:
58
71
59
- Download dm_spl_release_human_rx.zip and dm_spl_release_human_otc.zip from http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-drug-labels.cfm
72
+ Download dm_spl_release_human_rx.zip from http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-drug-labels.cfm
73
+ (skip otc drugs - dm_spl_release_human_otc.zip)
60
74
61
75
Put in folder at "bio2rdf/linkedSPLs/LinkedSPLs-update/data/dailymed-labels/"
62
76
@@ -72,11 +86,15 @@ Download pharmacologic_class_indexing_spl_files.zip from http://dailymed.nlm.nih
72
86
73
87
Put in folder at "bio2rdf/linkedSPLs/LinkedSPLs-update/data/dailymed-indexings/"
74
88
89
+ unzip XMLs to folder "pharmacologic_class_indexing_spl_files"
90
+
91
+ $ cd pharmacologic_class_indexing_spl_files; unzip \*.zip; rm \*.zip
92
+
75
93
--------------------------------------------------------
76
94
FDA Preferred terms, UNIIs from FDA:
77
95
--------------------------------------------------------
78
96
79
- Download from http://www.fda. gov/ForIndustry/DataStandards/StructuredProductLabeling/ucm162523.htm
97
+ Download from http://fdasis.nlm.nih. gov/srs/jsp/srs/uniiListDownload.jsp
80
98
81
99
(1) FDA_UNII_Names
82
100
Downloads UNII List ('UNIIs <DATE> Names.txt' as UNII lists)
@@ -86,8 +104,9 @@ Downloads UNII Data ('UNIIs <DATE> Records.txt' as UNII records)
86
104
87
105
Keep in directory LinkedSPLs-update/data/FDA
88
106
89
- Edit LinkedSPLs-update/data-source.properties to reset FDA_UNII_NAMES and FDA_UNII_RECORDS
107
+ (replace whitespace ' ' in file name to underscore '_')
90
108
109
+ Edit LinkedSPLs-update/data-source.properties to reset FDA_UNII_NAMES and FDA_UNII_RECORDS
91
110
92
111
--------------------------------------------------------
93
112
Drug bank Id from Drugbank:
@@ -101,7 +120,7 @@ download drugbank.xml as drugbankX.X and keep in directory LinkedSPLs-update/dat
101
120
UMLS:
102
121
--------------------------------------------------------
103
122
104
- Download RXNORM mappings from UMLS at "http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html"
123
+ Download RXNORM mappings (full rxnorm) from UMLS at "http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html"
105
124
106
125
keep in directory: "LinkedSPLs-update/data/UMLS"
107
126
@@ -132,41 +151,37 @@ Mappings pulled using:
132
151
rdfproc -c dron-drug query sparql - '
133
152
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dron: <http://purl.obolibrary.org/obo/dron#> SELECT * WHERE { ?dron dron:DRON_00010000 ?rxcui. }' > dron-rxcui-drug.txt
134
153
135
- --------------------------------------------------------
136
- OMOP concept Id from OHDSI:
137
- --------------------------------------------------------
154
+ ------------------------------------------------------------------------
155
+ OMOP concept Id from OHDSI or query OMOP CDM V5 (GeriOMOP) by SQL query
156
+ -----------------------------------------------------------------------
138
157
139
158
Download from "https://github.com/OHDSI/KnowledgeBase/tree/master/LAERTES/terminology-mappings/StandardVocabToRxNorm/imeds_drugids_to_rxcuis.csv"
140
159
160
+ OR
161
+
162
+ SELECT cpt.CONCEPT_ID as omopid, cpt.CONCEPT_CODE as rxcui FROM
163
+ CONCEPT cpt
164
+ WHERE
165
+ cpt.CONCEPT_CLASS = 'Clinical Drug';
166
+
167
+ right click result table and export to csv ('|' delimited)
168
+ keep csv in LinkedSPLs-clinicalDrug/mappings/
169
+
170
+ AND
171
+
172
+ query OMOP CDM V5 (GeriOMOP) by SQL query below:
173
+ SELECT cpt.CONCEPT_ID as omopid, cpt.CONCEPT_CODE as rxcui FROM
174
+ CONCEPT cpt
175
+ WHERE
176
+ cpt.CONCEPT_CLASS = 'Ingredient';
177
+
178
+ keep csv in LinkedSPLs-activeMoiety/mappings/
179
+
141
180
142
181
################################################################################
143
182
Details for update each linkedSPLs mappings
144
183
################################################################################
145
184
146
- -Scripts are using in Ant task linkedSPLs-setup
147
-
148
- update_lodd_dailymed.py:
149
- To do a full update, run update_lodd_dailymed.py directly. This simply executes
150
- dailymed_rss.run() and loadDailymedToSql.update() with a custom logger (update_lodd_dailymed.log)
151
-
152
- dailymed_rss.py:
153
- Several functions for downloading and extracting the spls updated
154
- within the past 7 days from the rss feed
155
- http://dailymed.nlm.nih.gov/dailymed/rss.cfm. The feedparser module
156
- is used to parse the rss feed. Each entry in the feed provides a link
157
- to an information page for the insert. The html on this page is
158
- parsed for the link to the zipped xml file, which is then downloaded
159
- to a temp directory. After all inserts in the feed have been
160
- downloaded, the xml files are extracted into the temp directory. If
161
- they don't exist, the script will create two other directories in the
162
- current directory, ./spls (holds a master set of all spls in their
163
- most current form) and ./spls/updates (holds the spls from the most
164
- recent execution of dailymed_rss.run()). All files in ./spls/updates
165
- are removed. All xml files in the temp directory are then copied to
166
- ./spls/updates. Finally, the temp directory and all files in it are
167
- removed.
168
-
169
- loadDailymedToSql.py
170
185
A number of functions for parsing spls and loading the information to
171
186
the local lodd_dailymed mysql database. In particular, run() is used
172
187
to insert new spls into the database, and update() is used for
@@ -179,6 +194,8 @@ insert in the database, removes that filename from ./spls and copies
179
194
the updated spl to ./spls. NOTE: if the script is ran directly, it
180
195
will truncate all SPL tables and load all SPLs in the 'spls' folder.
181
196
197
+ $ cd bio2rdf/linkedSPLs/LinkedSPLs-update/load-dailymed-spls
198
+ $ loadDailymedToSql.py
182
199
183
200
------------------------------------------------------------
184
201
LOADING THE FDA UNII TO CHEBI MAPPING
@@ -403,6 +420,33 @@ CREATE TABLE `linkedSPLs`.`OMOP_RXCUI` (
403
420
404
421
LOAD DATA LOCAL INFILE 'data/OMOP-OHDSI/imeds_drugids_to_rxcuis.csv' INTO TABLE `OMOP_RXCUI` FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' IGNORE 1 LINES (OMOPConceptId, RxCUI);
405
422
423
+
424
+
425
+ --------------------------------------
426
+ CREATE AND LOAD UNII to ChEBI MAPPING
427
+ -------------------------------------
428
+
429
+ - Method: use Bioportal's SPARQL endpoint to identify exact string matches between the UNII preferred names and the RDF label of concepts in Bioportal
430
+
431
+ - Base folder in SVN:
432
+ <linkedSPLs/LinkedSPLs-update/mappings/UNII-to-ChEBI-mapping>
433
+
434
+ - Date performed: 04/13/2012 and 09/14/2012
435
+
436
+ - Input: active_moieties.txt -- all unique UNII preferred names from listed in "UNIIs 2Mar2012.txt"
437
+
438
+ - Script: sparql1-for-drug-entities.py
439
+
440
+ - Results Files: FDA-UNII-to-ChEBI-bioportal-mapping-04132012.txt, FDA-UNII-to-ChEBI-bioportal-mapping-09142012.txt, FDA-UNII-to-ChEBI-bioportal-mapping-04132012-PLUS-09142012.txt
441
+
442
+ - Results (4/13/2012): 4,234 mappings
443
+
444
+ - Results (09/14/2012): 2,180 mappings
445
+
446
+ - Combined unique results: 4,411 (loaded into linkedSPLs: FDA-UNII-to-ChEBI-bioportal-mapping-04132012-PLUS-09142012.txt)
447
+
448
+
449
+
406
450
------------------------------------------------------------
407
451
TESTING THE D2R SERVER ON THE DEVELOPMENT MACHINE
408
452
------------------------------------------------------------
@@ -454,32 +498,6 @@ SQL> update DB.DBA.load_list set ll_state = 0 where ll_file = '<name of RDF file
454
498
SQL> rdf_loader_run();
455
499
SQL> select * from DB.DBA.load_list;
456
500
457
- -----------
458
-
459
- *UNII to ChEBI
460
-
461
- Approach 1:
462
-
463
- - Method: use Bioportal's SPARQL endpoint to identify exact string matches between the UNII preferred names and the RDF label of concepts in Bioportal
464
-
465
- - Base folder in SVN:
466
- <linkedSPLs/LinkedSPLs-update/mappings/UNII-to-ChEBI-mapping>
467
-
468
- - Date performed: 04/13/2012 and 09/14/2012
469
-
470
- - Input: active_moieties.txt -- all unique UNII preferred names from listed in "UNIIs 2Mar2012.txt"
471
-
472
- - Script: sparql1-for-drug-entities.py
473
-
474
- - Results Files: FDA-UNII-to-ChEBI-bioportal-mapping-04132012.txt, FDA-UNII-to-ChEBI-bioportal-mapping-09142012.txt, FDA-UNII-to-ChEBI-bioportal-mapping-04132012-PLUS-09142012.txt
475
-
476
- - Results (4/13/2012): 4,234 mappings
477
-
478
- - Results (09/14/2012): 2,180 mappings
479
-
480
- - Combined unique results: 4,411 (loaded into linkedSPLs: FDA-UNII-to-ChEBI-bioportal-mapping-04132012-PLUS-09142012.txt)
481
-
482
- -----------
483
501
484
502
485
503
- The D2R file has the mapping from RDB tables to RDF: ../LinkedSPLs-core/linkedSPLs_dump_rdf_config.n3
0 commit comments