11CODE TO GENERATE THE SQL AND LINKED-DATA VERSION OF LINKEDSPLS
22Authors: Richard Boyce, Greg Gardner, Yifan Ning
3- Date: 6/08 /2015
3+ Date: 09/25 /2015
44
55################################################################################
66OVERVIEW
@@ -28,24 +28,37 @@ Config mysql db connection at : db-connection.properties
2828
2929Run shell commands below to create database schema and unzip and parse dailymed XMLs and load product label sections from dailymed XMLs into Mysql Schema linkedSPLs
3030
31- $ cd LinkedSPLs-update
31+ $ cd bio2rdf/linkedSPLs/LinkedSPLs-update
32+
3233$ ant linkedSPLs-setup
3334
3435Update all linkedSPLs mappings by command below
35- (1) mappings of preferred term to UNII
36- (2) mappings of preferred term to ChEBI
37- (3) mappings of Preferred term and Rxnorm
38- (4) mappings of Preferred term, UNII and Drugbank URI
39- (5) mappings of Preferred term, rxcui and Dailymed setid
40- (6) mappings of RxNORM, NUI and NDFRT label
41- (7) mappings of setId, UNII, NUI and PreferredNameAndRole
42-
4336
4437$ ant linkedSPLs-update
4538
39+ Update piece by piece (recommended)
40+
41+ $ ant load-FDAPreferredSubstanceToUNII
42+ $ ant load-FDA_UNII_to_ChEBI
43+ $ ant load-ChEBI_DRUGBANK_BIO2RDF
44+ $ ant loadDailymedSPLsToRDB
45+ $ ant load-DrOn_RXCUI_DRUG
46+ $ ant load-DrOn_RXCUI_INGREDIENT
47+ $ ant load-FDA_EPC_Table
48+ $ ant load-FDAPharmgxTable
49+ $ ant load-FDAPharmgxTableToOntologyMap
50+ $ ant load-FDAPreferredSubstanceToRxNORM
51+ $ ant load-FDAPreferredSubstanceToRxNORM-restAPI
52+ $ ant load-FDA_SUBSTANCE_TO_DRUGBANK_BIO2RDF
53+ $ ant load-loincSection
54+ $ ant load-OMOPId-RXCUIs-from-OHDSI
55+ $ ant load-RXNORM_NDFRT_INGRED_Table
56+ $ ant load-SPLSetIDToRxNORM
57+
58+
4659
4760################################################################################
48- PRE-REQUISITES
61+ PRE-REQUISITES (Download all source data before run any ant command)
4962################################################################################
5063
5164Download and organize all source data files in data folder
@@ -56,7 +69,8 @@ product label sections and mappings from Dailymed:
5669
5770(1) dailymed-labels:
5871
59- Download dm_spl_release_human_rx.zip and dm_spl_release_human_otc.zip from http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-drug-labels.cfm
72+ Download dm_spl_release_human_rx.zip from http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-drug-labels.cfm
73+ (skip otc drugs - dm_spl_release_human_otc.zip)
6074
6175Put in folder at "bio2rdf/linkedSPLs/LinkedSPLs-update/data/dailymed-labels/"
6276
@@ -72,11 +86,15 @@ Download pharmacologic_class_indexing_spl_files.zip from http://dailymed.nlm.nih
7286
7387Put in folder at "bio2rdf/linkedSPLs/LinkedSPLs-update/data/dailymed-indexings/"
7488
89+ unzip XMLs to folder "pharmacologic_class_indexing_spl_files"
90+
91+ $ cd pharmacologic_class_indexing_spl_files; unzip \*.zip; rm \*.zip
92+
7593--------------------------------------------------------
7694FDA Preferred terms, UNIIs from FDA:
7795--------------------------------------------------------
7896
79- Download from http://www.fda. gov/ForIndustry/DataStandards/StructuredProductLabeling/ucm162523.htm
97+ Download from http://fdasis.nlm.nih. gov/srs/jsp/srs/uniiListDownload.jsp
8098
8199(1) FDA_UNII_Names
82100Downloads UNII List ('UNIIs <DATE> Names.txt' as UNII lists)
@@ -86,8 +104,9 @@ Downloads UNII Data ('UNIIs <DATE> Records.txt' as UNII records)
86104
87105Keep in directory LinkedSPLs-update/data/FDA
88106
89- Edit LinkedSPLs-update/data-source.properties to reset FDA_UNII_NAMES and FDA_UNII_RECORDS
107+ (replace whitespace ' ' in file name to underscore '_')
90108
109+ Edit LinkedSPLs-update/data-source.properties to reset FDA_UNII_NAMES and FDA_UNII_RECORDS
91110
92111--------------------------------------------------------
93112Drug bank Id from Drugbank:
@@ -101,7 +120,7 @@ download drugbank.xml as drugbankX.X and keep in directory LinkedSPLs-update/dat
101120UMLS:
102121--------------------------------------------------------
103122
104- Download RXNORM mappings from UMLS at "http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html"
123+ Download RXNORM mappings (full rxnorm) from UMLS at "http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html"
105124
106125keep in directory: "LinkedSPLs-update/data/UMLS"
107126
@@ -132,41 +151,37 @@ Mappings pulled using:
132151rdfproc -c dron-drug query sparql - '
133152PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dron: <http://purl.obolibrary.org/obo/dron#> SELECT * WHERE { ?dron dron:DRON_00010000 ?rxcui. }' > dron-rxcui-drug.txt
134153
135- --------------------------------------------------------
136- OMOP concept Id from OHDSI:
137- --------------------------------------------------------
154+ ------------------------------------------------------------------------
155+ OMOP concept Id from OHDSI or query OMOP CDM V5 (GeriOMOP) by SQL query
156+ -----------------------------------------------------------------------
138157
139158Download from "https://github.com/OHDSI/KnowledgeBase/tree/master/LAERTES/terminology-mappings/StandardVocabToRxNorm/imeds_drugids_to_rxcuis.csv"
140159
160+ OR
161+
162+ SELECT cpt.CONCEPT_ID as omopid, cpt.CONCEPT_CODE as rxcui FROM
163+ CONCEPT cpt
164+ WHERE
165+ cpt.CONCEPT_CLASS = 'Clinical Drug';
166+
167+ right click result table and export to csv ('|' delimited)
168+ keep csv in LinkedSPLs-clinicalDrug/mappings/
169+
170+ AND
171+
172+ query OMOP CDM V5 (GeriOMOP) by SQL query below:
173+ SELECT cpt.CONCEPT_ID as omopid, cpt.CONCEPT_CODE as rxcui FROM
174+ CONCEPT cpt
175+ WHERE
176+ cpt.CONCEPT_CLASS = 'Ingredient';
177+
178+ keep csv in LinkedSPLs-activeMoiety/mappings/
179+
141180
142181################################################################################
143182Details for update each linkedSPLs mappings
144183################################################################################
145184
146- -Scripts are using in Ant task linkedSPLs-setup
147-
148- update_lodd_dailymed.py:
149- To do a full update, run update_lodd_dailymed.py directly. This simply executes
150- dailymed_rss.run() and loadDailymedToSql.update() with a custom logger (update_lodd_dailymed.log)
151-
152- dailymed_rss.py:
153- Several functions for downloading and extracting the spls updated
154- within the past 7 days from the rss feed
155- http://dailymed.nlm.nih.gov/dailymed/rss.cfm. The feedparser module
156- is used to parse the rss feed. Each entry in the feed provides a link
157- to an information page for the insert. The html on this page is
158- parsed for the link to the zipped xml file, which is then downloaded
159- to a temp directory. After all inserts in the feed have been
160- downloaded, the xml files are extracted into the temp directory. If
161- they don't exist, the script will create two other directories in the
162- current directory, ./spls (holds a master set of all spls in their
163- most current form) and ./spls/updates (holds the spls from the most
164- recent execution of dailymed_rss.run()). All files in ./spls/updates
165- are removed. All xml files in the temp directory are then copied to
166- ./spls/updates. Finally, the temp directory and all files in it are
167- removed.
168-
169- loadDailymedToSql.py
170185A number of functions for parsing spls and loading the information to
171186the local lodd_dailymed mysql database. In particular, run() is used
172187to insert new spls into the database, and update() is used for
@@ -179,6 +194,8 @@ insert in the database, removes that filename from ./spls and copies
179194the updated spl to ./spls. NOTE: if the script is ran directly, it
180195will truncate all SPL tables and load all SPLs in the 'spls' folder.
181196
197+ $ cd bio2rdf/linkedSPLs/LinkedSPLs-update/load-dailymed-spls
198+ $ loadDailymedToSql.py
182199
183200------------------------------------------------------------
184201LOADING THE FDA UNII TO CHEBI MAPPING
@@ -403,6 +420,33 @@ CREATE TABLE `linkedSPLs`.`OMOP_RXCUI` (
403420
404421LOAD DATA LOCAL INFILE 'data/OMOP-OHDSI/imeds_drugids_to_rxcuis.csv' INTO TABLE `OMOP_RXCUI` FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' IGNORE 1 LINES (OMOPConceptId, RxCUI);
405422
423+
424+
425+ --------------------------------------
426+ CREATE AND LOAD UNII to ChEBI MAPPING
427+ -------------------------------------
428+
429+ - Method: use Bioportal's SPARQL endpoint to identify exact string matches between the UNII preferred names and the RDF label of concepts in Bioportal
430+
431+ - Base folder in SVN:
432+ <linkedSPLs/LinkedSPLs-update/mappings/UNII-to-ChEBI-mapping>
433+
434+ - Date performed: 04/13/2012 and 09/14/2012
435+
436+ - Input: active_moieties.txt -- all unique UNII preferred names from listed in "UNIIs 2Mar2012.txt"
437+
438+ - Script: sparql1-for-drug-entities.py
439+
440+ - Results Files: FDA-UNII-to-ChEBI-bioportal-mapping-04132012.txt, FDA-UNII-to-ChEBI-bioportal-mapping-09142012.txt, FDA-UNII-to-ChEBI-bioportal-mapping-04132012-PLUS-09142012.txt
441+
442+ - Results (4/13/2012): 4,234 mappings
443+
444+ - Results (09/14/2012): 2,180 mappings
445+
446+ - Combined unique results: 4,411 (loaded into linkedSPLs: FDA-UNII-to-ChEBI-bioportal-mapping-04132012-PLUS-09142012.txt)
447+
448+
449+
406450------------------------------------------------------------
407451TESTING THE D2R SERVER ON THE DEVELOPMENT MACHINE
408452------------------------------------------------------------
@@ -454,32 +498,6 @@ SQL> update DB.DBA.load_list set ll_state = 0 where ll_file = '<name of RDF file
454498SQL> rdf_loader_run();
455499SQL> select * from DB.DBA.load_list;
456500
457- -----------
458-
459- *UNII to ChEBI
460-
461- Approach 1:
462-
463- - Method: use Bioportal's SPARQL endpoint to identify exact string matches between the UNII preferred names and the RDF label of concepts in Bioportal
464-
465- - Base folder in SVN:
466- <linkedSPLs/LinkedSPLs-update/mappings/UNII-to-ChEBI-mapping>
467-
468- - Date performed: 04/13/2012 and 09/14/2012
469-
470- - Input: active_moieties.txt -- all unique UNII preferred names from listed in "UNIIs 2Mar2012.txt"
471-
472- - Script: sparql1-for-drug-entities.py
473-
474- - Results Files: FDA-UNII-to-ChEBI-bioportal-mapping-04132012.txt, FDA-UNII-to-ChEBI-bioportal-mapping-09142012.txt, FDA-UNII-to-ChEBI-bioportal-mapping-04132012-PLUS-09142012.txt
475-
476- - Results (4/13/2012): 4,234 mappings
477-
478- - Results (09/14/2012): 2,180 mappings
479-
480- - Combined unique results: 4,411 (loaded into linkedSPLs: FDA-UNII-to-ChEBI-bioportal-mapping-04132012-PLUS-09142012.txt)
481-
482- -----------
483501
484502
485503- The D2R file has the mapping from RDB tables to RDF: ../LinkedSPLs-core/linkedSPLs_dump_rdf_config.n3
0 commit comments