Skip to content

Commit 6539d4d

Browse files
author
Richard Boyce
committed
edit readme
1 parent 4137c68 commit 6539d4d

File tree

6 files changed

+149
-90
lines changed

6 files changed

+149
-90
lines changed

linkedSPLs/LinkedSPLs-clinicalDrug/mergeToClinicalDrug.py

+3
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@
2121
OMOP_RXCUI = "mappings/imeds_drugids_to_rxcuis.csv"
2222
SETID_RXCUI = "mappings/setid_rxcui.txt"
2323
FULLNAME_SETID = "mappings/setid_fullname.txt"
24+
#OMOP_RXCUI = "mappings/clinical-drug-omopid-rxcui-09042015.dsv"
25+
26+
2427

2528
## read mappings of dron and rxcui
2629

linkedSPLs/LinkedSPLs-update/README

+86-68
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
CODE TO GENERATE THE SQL AND LINKED-DATA VERSION OF LINKEDSPLS
22
Authors: Richard Boyce, Greg Gardner, Yifan Ning
3-
Date: 6/08/2015
3+
Date: 09/25/2015
44

55
################################################################################
66
OVERVIEW
@@ -28,24 +28,37 @@ Config mysql db connection at : db-connection.properties
2828

2929
Run shell commands below to create database schema and unzip and parse dailymed XMLs and load product label sections from dailymed XMLs into Mysql Schema linkedSPLs
3030

31-
$ cd LinkedSPLs-update
31+
$ cd bio2rdf/linkedSPLs/LinkedSPLs-update
32+
3233
$ ant linkedSPLs-setup
3334

3435
Update all linkedSPLs mappings by command below
35-
(1) mappings of preferred term to UNII
36-
(2) mappings of preferred term to ChEBI
37-
(3) mappings of Preferred term and Rxnorm
38-
(4) mappings of Preferred term, UNII and Drugbank URI
39-
(5) mappings of Preferred term, rxcui and Dailymed setid
40-
(6) mappings of RxNORM, NUI and NDFRT label
41-
(7) mappings of setId, UNII, NUI and PreferredNameAndRole
42-
4336

4437
$ ant linkedSPLs-update
4538

39+
Update piece by piece (recommended)
40+
41+
$ ant load-FDAPreferredSubstanceToUNII
42+
$ ant load-FDA_UNII_to_ChEBI
43+
$ ant load-ChEBI_DRUGBANK_BIO2RDF
44+
$ ant loadDailymedSPLsToRDB
45+
$ ant load-DrOn_RXCUI_DRUG
46+
$ ant load-DrOn_RXCUI_INGREDIENT
47+
$ ant load-FDA_EPC_Table
48+
$ ant load-FDAPharmgxTable
49+
$ ant load-FDAPharmgxTableToOntologyMap
50+
$ ant load-FDAPreferredSubstanceToRxNORM
51+
$ ant load-FDAPreferredSubstanceToRxNORM-restAPI
52+
$ ant load-FDA_SUBSTANCE_TO_DRUGBANK_BIO2RDF
53+
$ ant load-loincSection
54+
$ ant load-OMOPId-RXCUIs-from-OHDSI
55+
$ ant load-RXNORM_NDFRT_INGRED_Table
56+
$ ant load-SPLSetIDToRxNORM
57+
58+
4659

4760
################################################################################
48-
PRE-REQUISITES
61+
PRE-REQUISITES (Download all source data before run any ant command)
4962
################################################################################
5063

5164
Download and organize all source data files in data folder
@@ -56,7 +69,8 @@ product label sections and mappings from Dailymed:
5669

5770
(1) dailymed-labels:
5871

59-
Download dm_spl_release_human_rx.zip and dm_spl_release_human_otc.zip from http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-drug-labels.cfm
72+
Download dm_spl_release_human_rx.zip from http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-drug-labels.cfm
73+
(skip otc drugs - dm_spl_release_human_otc.zip)
6074

6175
Put in folder at "bio2rdf/linkedSPLs/LinkedSPLs-update/data/dailymed-labels/"
6276

@@ -72,11 +86,15 @@ Download pharmacologic_class_indexing_spl_files.zip from http://dailymed.nlm.nih
7286

7387
Put in folder at "bio2rdf/linkedSPLs/LinkedSPLs-update/data/dailymed-indexings/"
7488

89+
unzip XMLs to folder "pharmacologic_class_indexing_spl_files"
90+
91+
$ cd pharmacologic_class_indexing_spl_files; unzip \*.zip; rm \*.zip
92+
7593
--------------------------------------------------------
7694
FDA Preferred terms, UNIIs from FDA:
7795
--------------------------------------------------------
7896

79-
Download from http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/ucm162523.htm
97+
Download from http://fdasis.nlm.nih.gov/srs/jsp/srs/uniiListDownload.jsp
8098

8199
(1) FDA_UNII_Names
82100
Downloads UNII List ('UNIIs <DATE> Names.txt' as UNII lists)
@@ -86,8 +104,9 @@ Downloads UNII Data ('UNIIs <DATE> Records.txt' as UNII records)
86104

87105
Keep in directory LinkedSPLs-update/data/FDA
88106

89-
Edit LinkedSPLs-update/data-source.properties to reset FDA_UNII_NAMES and FDA_UNII_RECORDS
107+
(replace whitespace ' ' in file name to underscore '_')
90108

109+
Edit LinkedSPLs-update/data-source.properties to reset FDA_UNII_NAMES and FDA_UNII_RECORDS
91110

92111
--------------------------------------------------------
93112
Drug bank Id from Drugbank:
@@ -101,7 +120,7 @@ download drugbank.xml as drugbankX.X and keep in directory LinkedSPLs-update/dat
101120
UMLS:
102121
--------------------------------------------------------
103122

104-
Download RXNORM mappings from UMLS at "http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html"
123+
Download RXNORM mappings (full rxnorm) from UMLS at "http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html"
105124

106125
keep in directory: "LinkedSPLs-update/data/UMLS"
107126

@@ -132,41 +151,37 @@ Mappings pulled using:
132151
rdfproc -c dron-drug query sparql - '
133152
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dron: <http://purl.obolibrary.org/obo/dron#> SELECT * WHERE { ?dron dron:DRON_00010000 ?rxcui. }' > dron-rxcui-drug.txt
134153

135-
--------------------------------------------------------
136-
OMOP concept Id from OHDSI:
137-
--------------------------------------------------------
154+
------------------------------------------------------------------------
155+
OMOP concept Id from OHDSI or query OMOP CDM V5 (GeriOMOP) by SQL query
156+
-----------------------------------------------------------------------
138157

139158
Download from "https://github.com/OHDSI/KnowledgeBase/tree/master/LAERTES/terminology-mappings/StandardVocabToRxNorm/imeds_drugids_to_rxcuis.csv"
140159

160+
OR
161+
162+
SELECT cpt.CONCEPT_ID as omopid, cpt.CONCEPT_CODE as rxcui FROM
163+
CONCEPT cpt
164+
WHERE
165+
cpt.CONCEPT_CLASS = 'Clinical Drug';
166+
167+
right click result table and export to csv ('|' delimited)
168+
keep csv in LinkedSPLs-clinicalDrug/mappings/
169+
170+
AND
171+
172+
query OMOP CDM V5 (GeriOMOP) by SQL query below:
173+
SELECT cpt.CONCEPT_ID as omopid, cpt.CONCEPT_CODE as rxcui FROM
174+
CONCEPT cpt
175+
WHERE
176+
cpt.CONCEPT_CLASS = 'Ingredient';
177+
178+
keep csv in LinkedSPLs-activeMoiety/mappings/
179+
141180

142181
################################################################################
143182
Details for update each linkedSPLs mappings
144183
################################################################################
145184

146-
-Scripts are using in Ant task linkedSPLs-setup
147-
148-
update_lodd_dailymed.py:
149-
To do a full update, run update_lodd_dailymed.py directly. This simply executes
150-
dailymed_rss.run() and loadDailymedToSql.update() with a custom logger (update_lodd_dailymed.log)
151-
152-
dailymed_rss.py:
153-
Several functions for downloading and extracting the spls updated
154-
within the past 7 days from the rss feed
155-
http://dailymed.nlm.nih.gov/dailymed/rss.cfm. The feedparser module
156-
is used to parse the rss feed. Each entry in the feed provides a link
157-
to an information page for the insert. The html on this page is
158-
parsed for the link to the zipped xml file, which is then downloaded
159-
to a temp directory. After all inserts in the feed have been
160-
downloaded, the xml files are extracted into the temp directory. If
161-
they don't exist, the script will create two other directories in the
162-
current directory, ./spls (holds a master set of all spls in their
163-
most current form) and ./spls/updates (holds the spls from the most
164-
recent execution of dailymed_rss.run()). All files in ./spls/updates
165-
are removed. All xml files in the temp directory are then copied to
166-
./spls/updates. Finally, the temp directory and all files in it are
167-
removed.
168-
169-
loadDailymedToSql.py
170185
A number of functions for parsing spls and loading the information to
171186
the local lodd_dailymed mysql database. In particular, run() is used
172187
to insert new spls into the database, and update() is used for
@@ -179,6 +194,8 @@ insert in the database, removes that filename from ./spls and copies
179194
the updated spl to ./spls. NOTE: if the script is ran directly, it
180195
will truncate all SPL tables and load all SPLs in the 'spls' folder.
181196

197+
$ cd bio2rdf/linkedSPLs/LinkedSPLs-update/load-dailymed-spls
198+
$ loadDailymedToSql.py
182199

183200
------------------------------------------------------------
184201
LOADING THE FDA UNII TO CHEBI MAPPING
@@ -403,6 +420,33 @@ CREATE TABLE `linkedSPLs`.`OMOP_RXCUI` (
403420

404421
LOAD DATA LOCAL INFILE 'data/OMOP-OHDSI/imeds_drugids_to_rxcuis.csv' INTO TABLE `OMOP_RXCUI` FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' IGNORE 1 LINES (OMOPConceptId, RxCUI);
405422

423+
424+
425+
--------------------------------------
426+
CREATE AND LOAD UNII to ChEBI MAPPING
427+
-------------------------------------
428+
429+
- Method: use Bioportal's SPARQL endpoint to identify exact string matches between the UNII preferred names and the RDF label of concepts in Bioportal
430+
431+
- Base folder in SVN:
432+
<linkedSPLs/LinkedSPLs-update/mappings/UNII-to-ChEBI-mapping>
433+
434+
- Date performed: 04/13/2012 and 09/14/2012
435+
436+
- Input: active_moieties.txt -- all unique UNII preferred names from listed in "UNIIs 2Mar2012.txt"
437+
438+
- Script: sparql1-for-drug-entities.py
439+
440+
- Results Files: FDA-UNII-to-ChEBI-bioportal-mapping-04132012.txt, FDA-UNII-to-ChEBI-bioportal-mapping-09142012.txt, FDA-UNII-to-ChEBI-bioportal-mapping-04132012-PLUS-09142012.txt
441+
442+
- Results (4/13/2012): 4,234 mappings
443+
444+
- Results (09/14/2012): 2,180 mappings
445+
446+
- Combined unique results: 4,411 (loaded into linkedSPLs: FDA-UNII-to-ChEBI-bioportal-mapping-04132012-PLUS-09142012.txt)
447+
448+
449+
406450
------------------------------------------------------------
407451
TESTING THE D2R SERVER ON THE DEVELOPMENT MACHINE
408452
------------------------------------------------------------
@@ -454,32 +498,6 @@ SQL> update DB.DBA.load_list set ll_state = 0 where ll_file = '<name of RDF file
454498
SQL> rdf_loader_run();
455499
SQL> select * from DB.DBA.load_list;
456500

457-
-----------
458-
459-
*UNII to ChEBI
460-
461-
Approach 1:
462-
463-
- Method: use Bioportal's SPARQL endpoint to identify exact string matches between the UNII preferred names and the RDF label of concepts in Bioportal
464-
465-
- Base folder in SVN:
466-
<linkedSPLs/LinkedSPLs-update/mappings/UNII-to-ChEBI-mapping>
467-
468-
- Date performed: 04/13/2012 and 09/14/2012
469-
470-
- Input: active_moieties.txt -- all unique UNII preferred names from listed in "UNIIs 2Mar2012.txt"
471-
472-
- Script: sparql1-for-drug-entities.py
473-
474-
- Results Files: FDA-UNII-to-ChEBI-bioportal-mapping-04132012.txt, FDA-UNII-to-ChEBI-bioportal-mapping-09142012.txt, FDA-UNII-to-ChEBI-bioportal-mapping-04132012-PLUS-09142012.txt
475-
476-
- Results (4/13/2012): 4,234 mappings
477-
478-
- Results (09/14/2012): 2,180 mappings
479-
480-
- Combined unique results: 4,411 (loaded into linkedSPLs: FDA-UNII-to-ChEBI-bioportal-mapping-04132012-PLUS-09142012.txt)
481-
482-
-----------
483501

484502

485503
- The D2R file has the mapping from RDB tables to RDF: ../LinkedSPLs-core/linkedSPLs_dump_rdf_config.n3

linkedSPLs/LinkedSPLs-update/build.xml

+50-16
Original file line numberDiff line numberDiff line change
@@ -284,34 +284,68 @@
284284
folder: PT-UNII-ChEBI-mapping
285285
-->
286286

287-
<target name="load-FDA_UNII_to_ChEBI" >
287+
<target name="load-FDA_UNII_to_ChEBI">
288288

289289
<delete file="${PT-UNII-ChEBI-mapping}/UNIIToChEBI-${TODAY_US}.txt"/>
290290

291-
<exec executable="python" failonerror="true">
292-
<arg line="${PT-UNII-ChEBI-mapping}/getChebiMappingsFromJSON.py data/FDA/FDAPreferredSubstanceToUNII.txt" />
293-
<redirector append="true">
294-
<outputmapper type="merge" to="${PT-UNII-ChEBI-mapping}/UNIIToChEBI-${TODAY_US}.txt"/>
295-
<errormapper type="merge" to="${ERROR_LOG}"/>
296-
</redirector>
297-
</exec>
291+
<delete dir="${PT-UNII-ChEBI-mapping}/ChEBIJavaClient/bin"/>
292+
<mkdir dir="${PT-UNII-ChEBI-mapping}/ChEBIJavaClient/bin" />
293+
294+
295+
<path id="external.classpath">
296+
<pathelement location="${PT-UNII-ChEBI-mapping}/ChEBIJavaClient/libs/chebiWS-client-2.2.1.jar"/>
297+
</path>
298+
299+
<javac srcdir="${PT-UNII-ChEBI-mapping}/ChEBIJavaClient/src" destdir="${PT-UNII-ChEBI-mapping}/ChEBIJavaClient/bin" includeantruntime="false" debug="on">
300+
<classpath>
301+
<path refid="external.classpath" />
302+
</classpath>
303+
</javac>
304+
305+
<delete dir="${PT-UNII-ChEBI-mapping}/ChEBIJavaClient/jar" />
306+
<mkdir dir="${PT-UNII-ChEBI-mapping}/ChEBIJavaClient/jar" />
307+
308+
309+
<jar destfile="${PT-UNII-ChEBI-mapping}/ChEBIJavaClient/jar/ChEBIJavaClient.jar" basedir="${PT-UNII-ChEBI-mapping}/ChEBIJavaClient/bin">
310+
<manifest>
311+
<attribute name="Main-Class" value="chebi.service.GetChEBIbyNames"/>
312+
</manifest>
313+
</jar>
314+
315+
<echo>[INFO] : It takes about 4 hour ... </echo>
316+
317+
<java classname="chebi.service.GetChEBIbyNames" fork="true" >
318+
<classpath>
319+
<path refid="external.classpath"/>
320+
<pathelement location="${PT-UNII-ChEBI-mapping}/ChEBIJavaClient/jar/ChEBIJavaClient.jar"/>
321+
</classpath>
322+
</java>
323+
324+
<!-- old apporach that query bioportal -->
325+
<!-- <exec executable="python" failonerror="true"> -->
326+
<!-- <arg line="${PT-UNII-ChEBI-mapping}/getChebiMappingsFromJSON.py data/FDA/FDAPreferredSubstanceToUNII.txt" /> -->
327+
<!-- <redirector append="true"> -->
328+
<!-- <outputmapper type="merge" to="${PT-UNII-ChEBI-mapping}/UNIIToChEBI-${TODAY_US}.txt"/> -->
329+
<!-- <errormapper type="merge" to="${ERROR_LOG}"/> -->
330+
<!-- </redirector> -->
331+
<!-- </exec> -->
298332

299333
<antcall target="set.timestamp">
300-
<param name="message" value="mappings of FDA preferred term and ChEBI is created at ${PT-UNII-ChEBI-mapping}/UNIIToChEBI-${TODAY_US}.txt" />
334+
<param name="message" value="mappings of FDA preferred term and ChEBI is created at ${PT-UNII-ChEBI-mapping}/UNIIToChEBI.txt" />
301335
</antcall>
302336

303337
<sql
304-
driver="com.mysql.jdbc.Driver"
305-
url="jdbc:mysql://localhost:3306/${mysql-schema}"
306-
userid="${mysql-u}"
307-
password="${mysql-p}" >
338+
driver="com.mysql.jdbc.Driver"
339+
url="jdbc:mysql://localhost:3306/${mysql-schema}"
340+
userid="${mysql-u}"
341+
password="${mysql-p}" >
308342
<classpath>
309343
<pathelement location="${mysql-driver}"/>
310344
</classpath>
311345

312346
<transaction>
313-
truncate FDA_UNII_to_ChEBI;
314-
LOAD DATA LOCAL INFILE "${PT-UNII-ChEBI-mapping}/UNIIToChEBI-${TODAY_US}.txt" INTO TABLE FDA_UNII_to_ChEBI(PreferredSubstance, ChEBI)
347+
truncate FDA_UNII_to_ChEBI;
348+
LOAD DATA LOCAL INFILE "${PT-UNII-ChEBI-mapping}/UNIIToChEBI.txt" INTO TABLE FDA_UNII_to_ChEBI(PreferredSubstance, ChEBI)
315349
</transaction>
316350
</sql>
317351

@@ -486,7 +520,7 @@
486520

487521
<exec executable="bash">
488522
<arg value="-c" />
489-
<arg value ='cat ${RXNORM_SETID} | cut -f 1,3,4 -d "|" | sort > ${RxNORM-mapping}/converted_rxnorm_mappings_${TODAY_US}.txt' />
523+
<arg value ='tail -n +2 ${RXNORM_SETID} | cut -f 1,3,4 -d "|" | sort > ${RxNORM-mapping}/converted_rxnorm_mappings_${TODAY_US}.txt' />
490524
</exec>
491525

492526
<antcall target="set.timestamp">

linkedSPLs/LinkedSPLs-update/data-source.properties

+2-2
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ SCHEMA_SQL = load-dailymed-spls/TableSchema.sql
88

99
## Mapping data sources:
1010

11-
FDA_UNII_NAMES = data/FDA/UNIIs_27Mar2015_Names.txt
12-
FDA_UNII_RECORDS = data/FDA/UNIIs_27Mar2015_Records.txt
11+
FDA_UNII_NAMES = data/FDA/UNIIs_1Sept2015_Names.txt
12+
FDA_UNII_RECORDS = data/FDA/UNIIs_1Sept2015_Records.txt
1313
DRUGBANK_XML = data/DrugBank/drugbank.xml
1414
PG_CLASS_INDEXING_SPLS = data/dailymed-indexing/pharmacologic_class_indexing_spl_files/
1515
RXNORM_SETID = data/dailymed-mappings/rxnorm_mappings.txt

linkedSPLs/LinkedSPLs-update/mappings/PT-UNII-ChEBI-mapping/ChEBIJavaClient/src/chebi/service/GetChEBIbyNames.java

+7-3
Original file line numberDiff line numberDiff line change
@@ -53,8 +53,9 @@ public static Map<String, String> getMappingsD(List<String> drugL) {
5353

5454
for (String drug : drugL) {
5555
if (!drug.isEmpty()) {
56-
System.out.println(drug);
5756
String chebiURI = getChEBIByName(drug);
57+
58+
System.out.println(drug + "\t" + chebiURI);
5859
if (!chebiURI.isEmpty())
5960
mappingD.put(drug.trim(), ("http://purl.obolibrary.org/obo/"+chebiURI.replace(":","_")));
6061
}
@@ -133,8 +134,11 @@ public static List<String> readDrugListFile(String filePath) {
133134
break;
134135
}
135136

136-
if (!line.isEmpty())
137-
drugL.add(line.trim());
137+
if (!line.isEmpty()){
138+
int idx = line.indexOf("\t");
139+
String drugname = line.substring(idx+1);
140+
drugL.add(drugname.trim());
141+
}
138142
}
139143
br.close();
140144

linkedSPLs/LinkedSPLs-update/mappings/pharmacologic_class_indexing/README

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
This data, totally 9479, is up to date with all current SPLs as of March 20 2014.
22

33
(1) To update, get the most current SPL class mappings from Dailymed, unzip
4-
the file into the /pharmacologic_class_indexing_spl_files_most_recent/ folder.
4+
the file into the /pharmacologic_class_indexing_spl_files/ folder.
55

66
(2) Run python script parseEPCfromXMLs.py that write SPL EPCs into a txt file with columns setid, UNII,
77
NUI and display name.

0 commit comments

Comments
 (0)