Skip to content

Bio2RDF v5 release preparation #462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 79 commits into from
Jun 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
42e2978
Added MIT license hopefully as per mailing list discussion.
Mar 19, 2013
fa4b693
Merge remote-tracking branch 'bio2rdf-master/master'
micheldumontier Jun 16, 2013
38ed7b3
Merge pull request #194 from jeremycarroll/with-license
micheldumontier Aug 8, 2013
ef0c906
Update README.md
micheldumontier Aug 8, 2013
7c5b436
Fixed (line 392): removed premature closing parenthesis in chembl qua…
theoryno3 Sep 17, 2013
761f759
Merge pull request #320 from theoryno3/patch-1
micheldumontier Sep 17, 2013
37775ec
added license notice at the bottom
jctoledo Jan 24, 2014
5d3cfc8
Update README.md
jctoledo Jan 24, 2014
7eddae2
Update README.md
jctoledo Jan 24, 2014
a383804
Update MIT-LICENSE.txt
jctoledo Jan 24, 2014
4baf88c
Update README.md
jctoledo Jan 24, 2014
74ea49f
merged release3
zorino Feb 2, 2016
7012bbe
Merge pull request #428 from zorino/master
micheldumontier Feb 2, 2016
33c8245
Merge branch 'master' of https://github.com/bio2rdf/bio2rdf-scripts
micheldumontier Feb 2, 2016
bb06b37
changed gene id to uppercase organism id #430
micheldumontier Mar 27, 2016
59801de
include more than just human genes
micheldumontier Mar 27, 2016
38e86a5
fixed field tokenizer #429
micheldumontier Mar 27, 2016
7dbd974
fixed multi entry parsing error #432
micheldumontier Mar 29, 2016
e32c860
parse the mesh ids out of the xml files #433
micheldumontier Mar 29, 2016
be0a89e
use Bio2RDF's GO to replace labels with codes #431
micheldumontier Mar 29, 2016
5b2b525
fixed changed column issue #436
micheldumontier Apr 1, 2016
ff3b213
fixed parsing of multi-value fields for pharmgkb gene data #436
micheldumontier Apr 1, 2016
6fd2ca6
added parselist to drugs. more ids extracted #437
micheldumontier Apr 1, 2016
84dcc73
Fixed disease data parsing #438
micheldumontier Apr 1, 2016
36001d3
fixed pharmgkb gene labels #436
micheldumontier Apr 1, 2016
349384c
generating only a single entity-entity-association #439
micheldumontier Apr 1, 2016
be0c05c
Fixed clinical and variant annotations parsing #441
micheldumontier Apr 11, 2016
8959c9d
Fixed phargmbk pathway download #442
micheldumontier Apr 11, 2016
f7e4a6f
complete pharmgkb pathway parse #442
micheldumontier Apr 11, 2016
b565f38
fix for multiple control and cell types #442
micheldumontier Apr 12, 2016
108801d
added 'map' to kegg pathway identifiers, to conform with source #440
micheldumontier Apr 12, 2016
8e94574
Update README.md
micheldumontier May 13, 2016
913d139
fixed premature end of parse due to blank entry
micheldumontier May 19, 2016
83f3c7c
simplify column id and includes a column fix for aliases
micheldumontier Jul 22, 2016
b04a2f2
Merge pull request #444 from micheldumontier/master
micheldumontier Jul 25, 2016
f1e2855
MD5 Hash Drugbank Mixture URIs
maulikkamdar Nov 1, 2016
3c50194
Merge pull request #448 from bio2rdf/release3
Jan 23, 2017
317f22d
Merge pull request #449 from bio2rdf/release3
Jan 23, 2017
7c99b15
Merge pull request #450 from bio2rdf/release3
Jan 30, 2017
3d21e00
Merge pull request #451 from bio2rdf/release3
Jan 30, 2017
ce09d83
Array_shift only pops the first location from the stack, therefore ra…
Apr 14, 2017
c30e9b3
adding arc2 through composer
micheldumontier Aug 30, 2017
9619f97
added check for null oversight value
micheldumontier Jan 23, 2018
8b9922d
Link DOIs to preferred resolver
katrinleinweber Mar 27, 2018
2a31ccc
Merge pull request #456 from katrinleinweber/resolve-DOIs-securely
micheldumontier Mar 27, 2018
93a2c32
fixed merge error
micheldumontier Oct 29, 2018
4bc0cce
Use HTTPS instead of HTTP to resolve dependencies
JLLeitschuh Feb 11, 2020
45cd715
update path
micheldumontier May 25, 2020
87330b6
updated with new format
micheldumontier May 25, 2020
fc42265
updates to file processing; removed automatic download
micheldumontier May 26, 2020
6fc18ae
update the orphanet disease processor
micheldumontier May 26, 2020
e26d8d5
escape the definition
micheldumontier May 26, 2020
103506d
fixes for orphanet genes
micheldumontier May 26, 2020
b021002
correctly parse the list of external references for the genes
micheldumontier May 26, 2020
777a99b
added source of validation to gene disease association
micheldumontier May 26, 2020
85d2f7c
revised processing of orphanet signs and frequencies
micheldumontier May 26, 2020
103b826
id fixes; addition of prevalence parser
micheldumontier May 27, 2020
c62841f
fix for weird character exceptions and multiple entries
micheldumontier May 31, 2020
1d28031
added check for entries without abstract
micheldumontier Jun 1, 2020
b7b5382
changed output file names
micheldumontier Jun 1, 2020
9164b90
removed html tags, extra spaces, and escaped special chars in abstract
micheldumontier Jun 1, 2020
b5bff4e
update the path and version number of the latest entry
micheldumontier Jun 1, 2020
f9e22c2
added domain namespaces; new url for goa; changed output file format
micheldumontier Jun 1, 2020
15e100f
made it so you can create the bioportal download directory for apo.obo
micheldumontier Jun 1, 2020
dc77276
replaced empty string initialisation to array
micheldumontier Jun 1, 2020
62a1b47
fixed string initialisation of array
micheldumontier Jun 1, 2020
0d1b76b
removed files no longer available; changed output file names
micheldumontier Jun 1, 2020
2570f3a
escape allele label
micheldumontier Jun 1, 2020
c16a16a
downloads zip file with all entries; fixes xml string processing prob…
micheldumontier Jun 1, 2020
748a480
updated download file location
micheldumontier Jun 1, 2020
416c212
updates for mgi inc. download location, column mappings, and identifiers
micheldumontier Jun 1, 2020
1fc4163
fixed comparator error
micheldumontier Jun 1, 2020
df74a3a
updates to wormbase
micheldumontier Jun 1, 2020
75e0743
fixes for taxonomy
micheldumontier Jun 2, 2020
fc9bea9
changes to download urls; fixes to record processor
micheldumontier Jun 2, 2020
efd924b
Merge pull request #460 from bio2rdf/release3
micheldumontier Jun 9, 2020
84944dc
fix for namespace not provided in obo file
micheldumontier Jun 9, 2020
2871972
Merge pull request #459 from JLLeitschuh/fix/JLL/use_https_to_resolve…
micheldumontier Jun 30, 2020
4d6c512
Merge pull request #454 from jvsoest/include_all_locations
micheldumontier Jun 30, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions MIT-LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Copyright 2014 Bio2RDF project team and other contributors
http://bio2rdf.org

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Bio2RDF-scripts
===============
This Git repository holds all of the RDF converter scripts used to generate Bio2RDF linked data.

Requirements
-------------
See the [wiki](https://github.com/bio2rdf/bio2rdf-scripts/wiki) for details.

---
Licensed under [MIT License](http://en.wikipedia.org/wiki/MIT_License), see [license page](https://github.com/bio2rdf/bio2rdf-scripts/wiki/MIT-License) for details.
25 changes: 16 additions & 9 deletions bioportal/bioportal.php
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ function __construct($argv) {
parent::__construct($argv,'bioportal');
parent::addParameter('files',true,null,'all','all or comma-separated list of ontology short names to process');
parent::addParameter('download_url',false,null,'http://data.bioontology.org/');
parent::addParameter('exclude',false,null,"AURA",'ontologies to exclude - use acronyms');
parent::addParameter('exclude',false,null,"AURA,HOOM",'ontologies to exclude - use acronyms');
parent::addParameter('continue_from',false,null,"",'the ontology abbreviation to restart from');
parent::addParameter('ncbo_api_key',false,null,null,'BioPortal API key (please use your own)');
parent::addParameter('ncbo_api_key_file',false,null,'ncbo.api.key','BioPortal API key file');
Expand Down Expand Up @@ -123,7 +123,6 @@ function Run()
if(isset($ls['description'])) $description = $ls['description'];

$rfile = $ls['ontology']['links']['download'];

$lfile = $abbv.".".$format.".gz";
if(!file_exists($idir.$lfile) or parent::getParameterValue('download') == 'true') {
echo "downloading ... ";
Expand All @@ -134,7 +133,7 @@ function Run()
$ret = curl_setopt($ch, CURLOPT_HEADER, 1);
$ret = curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$ret = curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$ret = curl_setopt($ch, CURLOPT_TIMEOUT, 300);
$ret = curl_setopt($ch, CURLOPT_TIMEOUT, 600);
$ret = curl_exec($ch);
if(!$ret) {echo "no content";continue;}

Expand Down Expand Up @@ -167,12 +166,13 @@ function Run()

// process
echo "converting ... ";
set_time_limit(0);

// let's double check the format
$fp = gzopen($idir.$lfile,"r");
$l = gzgets($fp);
if(strstr($l,"xml")) $format= "owl";
gzclose($fp);

if($format == 'obo') {
$this->OBO2RDF($abbv);
} else if($format == 'owl') {
Expand All @@ -182,6 +182,7 @@ function Run()
} else {
echo "no processor for $label (format $format)".PHP_EOL;
}

if(!file_exists($odir.$ofile)) { echo "no output".PHP_EOL;continue;}
parent::getWriteFile()->close();
parent::clear();
Expand Down Expand Up @@ -366,7 +367,7 @@ public function TriplifyMap($a, $prefix)

} else {
parent::addRDF(
parent::triplifyString($s_uri,$p_uri,$a['o'],(($a['o_datatype'] == '')?null:$a['o_datatype']),(($a['o_lang'] == '')?null:$a['o_lang']))
parent::triplifyString($s_uri,$p_uri,addslashes($a['o']),(($a['o_datatype'] == '')?null:$a['o_datatype']),(($a['o_lang'] == '')?null:$a['o_lang']))
);
}

Expand Down Expand Up @@ -394,7 +395,7 @@ function OBO2RDF($abbv)
$graph_uri = '<'.parent::getRegistry()->getFQURI(parent::getGraphURI()).'>';
$bid = 1;

while($l = parent::getReadFile()->read()) {
while(FALSE !== ($l = parent::getReadFile()->read())) {
$lt = trim($l);
if(strlen($lt) == 0) continue;
if($lt[0] == '!') continue;
Expand Down Expand Up @@ -461,6 +462,7 @@ function OBO2RDF($abbv)
else {$ns = strtolower($c[0]);$id=$c[1];}
$id = str_replace( array("(",")"), array("_",""), $id);
$tid = $ns.":".$id;
echo $tid.PHP_EOL;
} else if($a[0] == "name") {
$buf .= parent::describeClass($tid,addslashes(stripslashes($a[1])));
} else if($a[0] == "is_a") {
Expand All @@ -483,7 +485,8 @@ function OBO2RDF($abbv)
$buf .= $t;
$is_deprecated = true;
} else if($a[0] == "id") {
parent::getRegistry()->parseQName($a[1],$ns,$id);
parent::getRegistry()->parseQName($a[1],$ns,$id);
if(trim($ns) == '') $ns = "unspecified";
$tid = "$ns:$id";
// $buf .= parent::describeClass($tid,null,"owl:Class");
// $buf .= parent::triplify($tid,"rdfs:isDefinedBy",$ouri);
Expand Down Expand Up @@ -610,6 +613,7 @@ function OBO2RDF($abbv)
} else if($a[0] == "is_a") {
// do subclassing
parent::getRegistry()->parseQName($a[1],$ns,$id);
if(trim($ns) == '') $ns = "unspecified";
$t = parent::triplify($tid,"rdfs:subClassOf","$ns:$id");
$buf .= $t;
$min .= $t;
Expand Down Expand Up @@ -657,17 +661,19 @@ function OBO2RDF($abbv)
$c = explode(" ",$a[1]);
if(count($c) == 1) { // just a class
parent::getRegistry()->parseQName($c[0],$ns,$id);
if(trim($ns) == '') $ns = "unspecified";
$relationship .= parent::getRegistry()->getFQURI("$ns:$id");
$buf .= parent::triplify($tid,"rdfs:subClassOf","$ns:$id");

} else if(count($c) == 2) { // an expression
parent::getRegistry()->parseQName($c[0],$pred_ns,$pred_id);
parent::getRegistry()->parseQName($c[1],$obj_ns,$obj_id);
if(trim($obj_ns) == '') $obj_ns = "unspecified";

$relationship .= '_:b'.$bid.' <'.parent::getRegistry()->getFQURI('owl:onProperty').'> <'.parent::getRegistry()->getFQURI("obo_vocabulary:".$pred_id)."> $graph_uri .".PHP_EOL;
$relationship .= '_:b'.$bid.' <'.parent::getRegistry()->getFQURI('owl:someValuesFrom').'> <'.parent::getRegistry()->getFQURI("$obj_ns:$obj_id")."> $graph_uri .".PHP_EOL;

$buf .= parent::triplify($tid,"obo_vocabulary:$pred_id","$obj_ns:$obj_id");
$buf .= parent::triplify($tid,"obo_vocabulary:$pred_id","$obj_ns:$obj_id"); #@todo this causes problem with OGG-MM
}
} else {
// default handler
Expand All @@ -676,7 +682,8 @@ function OBO2RDF($abbv)
} else {
//header
//format-version: 1.0
$buf .= parent::triplifyString($ouri,"obo_vocabulary:$a[0]",str_replace( array('"','\:'), array('\"',':'), isset($a[1])?$a[1]:""));
$buf .= parent::triplifyString($ouri,"obo_vocabulary:$a[0]",
str_replace( array('"','\:'), array('\"',':'), isset($a[1])?$a[1]:""));
}

if($minimal || $minimalp) parent::getWriteFile()->write($min);
Expand Down
63 changes: 62 additions & 1 deletion chembl/chembl.php
Original file line number Diff line number Diff line change
Expand Up @@ -1017,8 +1017,68 @@ function compounds($connection) {
}
parent::writeRDFBufferToWriteFile();
}
}
$result->free();
}

$result->free();
/*
* parse the assays tables
*/
function process_assays() {

$this->set_write_file("assays");

$allIDs = mysql_query(
"SELECT DISTINCT * FROM assays, assay_type " .
"WHERE assays.assay_type = assay_type.assay_type"
);

$num = mysql_numrows($allIDs);

while ($row = mysql_fetch_assoc($allIDs)) {

$assay = "chembl:assay_".$row['assay_id'];
$this->AddRDF($this->QQuad($assay,"rdf:type","chembl_vocabulary:Assay"));

//chembl assay id
$chembl = "chembl:". $row['chembl_id'];
$this->AddRDF($this->QQuadl($assay,"dc:identifier",$row['chembl_id']));
$this->AddRDF($this->QQuad($assay,"owl:equivalentClass",$chembl));
$this->AddRDF($this->QQuad($chembl,"owl:equivalentClass",$assay));
$this->WriteRDFBufferToWriteFile();

if ($row['description']) {
# clean up description
$description = $row['description'];
$description = str_replace("\\", "\\\\", $description);
$description = str_replace("\"", "\\\"", $description);
$this->AddRDF($this->QQuadl($assay,"chembl_vocabulary:hasDescription",$description));
}

if ($row['doc_id']){
$this->AddRDF($this->QQuad($assay,"chembl_vocabulary:citesAsDataSource","chembl:reference_".$row['doc_id']));
}

$props = mysql_query("SELECT DISTINCT * FROM assay2target WHERE assay_id = " . $row['assay_id']);

while ($prop = mysql_fetch_assoc($props)) {
if ($prop['tid']) {
$target = "chembl:target_".$prop['tid'];
$this->AddRDF($this->QQuad($assay,"chembl_vocabulary:hasTarget",$target));

if ($prop['confidence_score']) {
$targetScore = "chembl:tscore_".md5($assay.$prop['tid']);
$this->AddRDF($this->QQuad($assay,"chembl_vocabulary:hasTargetScore",$targetScore));
$this->AddRDF($this->QQuad($targetScore,"chembl_vocabulary:forTarget",$target));
$this->AddRDF($this->QQuadl($targetScore,"rdf:value",$prop['confidence_score']));
}
}

$this->WriteRDFBufferToWriteFile();

}
$this->AddRDF($this->QQuad($assay,"chembl_vocabulary:hasAssayType","chembl_vocabulary:".$row['assay_desc']));
$this->WriteRDFBufferToWriteFile();
}
}

Expand Down Expand Up @@ -1287,4 +1347,5 @@ function protein_families($connection){
}
}
}

?>
Loading