Skip to content

Commit db83035

Browse files
authored
Merge pull request #347 from dice-group/general_adjustments
Triplestore KB and Ontosample integration
2 parents ba15bbf + 13814b0 commit db83035

29 files changed

+1365
-1182
lines changed

docs/conf.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,8 @@
3636
]
3737

3838
# autoapi for ontolearn and owlapy. for owlapy we need to refer to its path in GitHub Action environment
39-
autoapi_dirs = ['../ontolearn', '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/owlapy']
39+
autoapi_dirs = ['../ontolearn', '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/owlapy',
40+
'/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/ontosample']
4041

4142
# by default all are included but had to reinitialize this to remove private members from shoing
4243
autoapi_options = ['members', 'undoc-members', 'show-inheritance', 'show-module-summary', 'special-members',

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Ontolearn is an open-source software library for explainable structured machine
1818
usage/09_further_resources
1919
autoapi/ontolearn/index
2020
autoapi/owlapy/index
21+
autoapi/ontosample/index
2122

2223

2324
.. raw:: latex

docs/usage/01_introduction.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Ontolearn
22

3-
**Version:** ontolearn 0.6.2
3+
**Version:** ontolearn 0.6.1
44

55
**GitHub repository:** [https://github.com/dice-group/Ontolearn](https://github.com/dice-group/Ontolearn)
66

@@ -31,12 +31,13 @@ Owlready2 library made it possible to build more complex algorithms.
3131

3232
---------------------------------------
3333

34-
**Ontolearn (including owlapy) can do the following:**
34+
**Ontolearn (including owlapy and ontosample) can do the following:**
3535

3636
- Load/save ontologies in RDF/XML, OWL/XML.
3737
- Modify ontologies by adding/removing axioms.
3838
- Access individuals/classes/properties of an ontology (and a lot more).
3939
- Define learning problems.
40+
- Sample ontologies.
4041
- Construct class expressions.
4142
- Use concept learning algorithms to classify positive examples in a learning problem.
4243
- Use local datasets or datasets that are hosted on a triplestore server, for the learning task.

docs/usage/04_knowledge_base.md

Lines changed: 98 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ kb = KnowledgeBase(path="file://KGs/father.owl")
4949
What happens in the background is that the ontology located in this path will be loaded
5050
in the `OWLOntology` object of `kb` as done [here](03_ontologies.md#loading-an-ontology).
5151

52-
In our recent version you can also initialize the KnowledgeBase using a dataset hosted in a triplestore.
52+
In our recent version you can also initialize a knowledge base using a dataset hosted in a triplestore.
5353
Since that knowledge base is mainly used for executing a concept learner, we cover that matter more in depth
5454
in _[Use Triplestore Knowledge Base](06_concept_learners.md#use-triplestore-knowledge-base)_
5555
section of _[Concept Learning](06_concept_learners.md)_.
@@ -285,6 +285,103 @@ requires only the `mode` parameter.
285285
> only the latter subsumption axiom will be returned.
286286

287287

288+
## Sampling the Knowledge Base
289+
290+
Sometimes ontologies and therefore knowledge bases can get very large and our
291+
concept learners become inefficient in terms of runtime. Sampling is an approach
292+
to extract a portion of the whole knowledge base without changing its semantic and
293+
still being expressive enough to yield results with as little loss of quality as
294+
possible. [OntoSample](https://github.com/alkidbaci/OntoSample/tree/main) is
295+
a library that we use to perform the sampling process. It offers different sampling
296+
techniques which fall into the following categories:
297+
298+
- Node-based samplers
299+
- Edge-based samplers
300+
- Exploration-based samplers
301+
302+
and almost each sampler is offered in 3 modes:
303+
304+
- Classic
305+
- Learning problem first (LPF)
306+
- Learning problem centered (LPC)
307+
308+
You can check them [here](ontosample).
309+
310+
When operated on its own, Ontosample uses a light version of Ontolearn (`ontolearn_light`)
311+
to reason over ontologies, but when both packages are installed in the same environment
312+
it will use `ontolearn` module instead. This is made for compatibility reasons.
313+
314+
Ontosample treats the knowledge base as a graph where nodes are individuals
315+
and edges are object properties. However, Ontosample also offers support for
316+
data properties sampling, although they are not considered as _"edges"_.
317+
318+
#### Sampling steps:
319+
1. Initialize the sample using a `KnowledgeBase` object. If you are using an LPF or LPC
320+
sampler than you also need to pass the set of learning problem individuals (`lp_nodes`).
321+
2. To perform the sampling use the `sample` method where you pass the number
322+
of nodes (`nodes_number`) that you want to sample, the amount of data properties in percentage
323+
(`data_properties_percentage`) that you want to sample which is represented by float values
324+
form 0 to 1 and jump probability (`jump_prob`) for samplers that
325+
use "jumping", a technique to avoid infinite loops during sampling.
326+
3. The `sample` method returns the sampled knowledge which you can store to a
327+
variable, use directly in the code or save locally by using the static method
328+
`save_sample`.
329+
330+
Let's see an example where we use [RandomNodeSampler](ontosample.classic_samplers.RandomNodeSampler) to sample a
331+
knowledge base:
332+
333+
```python
334+
from ontosample.classic_samplers import RandomNodeSampler
335+
336+
# 1. Initialize KnowledgeBase object using the path of the ontology
337+
kb = KnowledgeBase(path="KGs/Family/family-benchmark_rich_background.owl")
338+
339+
# 2. Initialize the sampler and generate the sample
340+
sampler = RandomNodeSampler(kb)
341+
sampled_kb = sampler.sample(30) # will generate a sample with 30 nodes
342+
343+
# 3. Save the sampled ontology
344+
sampler.save_sample(kb=sampled_kb, filename="some_name")
345+
```
346+
347+
Here is another example where this time we use an LPC sampler:
348+
349+
```python
350+
from ontosample.lpc_samplers import RandomWalkerJumpsSamplerLPCentralized
351+
from owlapy.model import OWLNamedIndividual,IRI
352+
import json
353+
354+
# 0. Load json that stores the learning problem
355+
with open("examples/uncle_lp2.json") as json_file:
356+
examples = json.load(json_file)
357+
358+
# 1. Initialize KnowledgeBase object using the path of the ontology
359+
kb = KnowledgeBase(path="KGs/Family/family-benchmark_rich_background.owl")
360+
361+
# 2. Initialize learning problem (required only for LPF and LPC samplers)
362+
pos = set(map(OWLNamedIndividual, map(IRI.create, set(examples['positive_examples']))))
363+
neg = set(map(OWLNamedIndividual, map(IRI.create, set(examples['negative_examples']))))
364+
lp = pos.union(neg)
365+
366+
# 3. Initialize the sampler and generate the sample
367+
sampler = RandomWalkerJumpsSamplerLPCentralized(graph=kb, lp_nodes=lp)
368+
sampled_kb = sampler.sample(nodes_number=40,jump_prob=0.15)
369+
370+
# 4. Save the sampled ontology
371+
sampler.save_sample(kb=sampled_kb, filename="some_other_name")
372+
```
373+
374+
> WARNING! Random walker and Random Walker with Prioritization are two samplers that suffer
375+
> from non-termination in case that the ontology contains nodes that point to each other and
376+
> form an inescapable loop for the "walker". In this scenario you can use their "jumping"
377+
> version to make the "walker" escape these loops and ensure termination.
378+
379+
To see how to use a sampled knowledge base for the task of concept learning check
380+
the `sampling_example.py` in [examples](https://github.com/dice-group/Ontolearn/tree/develop/examples)
381+
folder. You will find descriptive comments in that script that will help you understand it better.
382+
383+
For more details about OntoSample you can see [this paper](https://dl.acm.org/doi/10.1145/3583780.3615158).
384+
288385
-----------------------------------------------------------------------------------------------------
289386

290387
Since we cannot cover everything here in details, see [KnowledgeBase API documentation](ontolearn.knowledge_base.KnowledgeBase)

docs/usage/05_reasoner.md

Lines changed: 25 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,6 @@ from. Currently, there are the following reasoners available:
3434
The structural reasoner requires an ontology ([OWLOntology](owlapy.model.OWLOntology)).
3535
The second argument is `isolate` argument which isolates the world (therefore the ontology) where the reasoner is
3636
performing the reasoning. More on that on _[Reasoning Details](07_reasoning_details.md#isolated-world)_.
37-
The remaining argument, `triplestore_address`, is used in case you want to
38-
retrieve instances from a triplestore (go to
39-
[_Using a Triplestore for Reasoning Tasks_](#using-a-triplestore-for-reasoning-tasks) for details).
4037

4138

4239

@@ -60,8 +57,7 @@ from. Currently, there are the following reasoners available:
6057
which is just an enumeration with two possible values: `BaseReasoner_Owlready2.HERMIT` and `BaseReasoner_Owlready2.PELLET`.
6158
You can set the `infer_property_values` argument to `True` if you want the reasoner to infer
6259
property values. `infer_data_property_values` is an additional argument when the base reasoner is set to
63-
`BaseReasoner_Owlready2.PELLET`. The rest of the arguments `isolated` and `triplestore_address`
64-
are inherited from the base class.
60+
`BaseReasoner_Owlready2.PELLET`. The argument `isolated` is inherited from the base class
6561

6662

6763
- [**OWLReasoner_FastInstanceChecker**](ontolearn.base.fast_instance_checker.OWLReasoner_FastInstanceChecker) **(FIC)**
@@ -87,6 +83,29 @@ from. Currently, there are the following reasoners available:
8783
`sub_properties` is another boolean argument to specify whether you want to take sub properties in consideration
8884
for `instances()` method.
8985

86+
87+
- [**TripleStoreReasoner**](ontolearn.triple_store.TripleStoreReasoner)
88+
89+
Triplestores are known for their efficiency in retrieving data, and they can be queried using SPARQL.
90+
Making this functionality available in Ontolearn makes it possible to use concept learners that
91+
fully operates in datasets hosted on triplestores. Although that is the main goal, the reasoner can be used
92+
independently for reasoning tasks.
93+
94+
In Ontolearn, we have implemented `TripleStoreReasoner`, to query triplestore endpoints using SPARQL queries.
95+
It has only one required parameter:
96+
- `ontology` - a [TripleStoreOntology](ontolearn.triple_store.TripleStoreOntology) that can be instantiated
97+
using a string that contains the URL of the triplestore host/server.
98+
99+
This reasoner inherit from OWLReasoner, and therefore you can use it like any other reasoner.
100+
101+
**Initialization:**
102+
103+
```python
104+
from ontolearn.triple_store import TripleStoreReasoner, TripleStoreOntology
105+
106+
reasoner = TripleStoreReasoner(TripleStoreOntology("http://some_domain/some_path/sparql"))
107+
```
108+
90109
## Usage of the Reasoner
91110
All the reasoners available in the Ontolearn library inherit from the
92111
class: [OWLReasonerEx](ontolearn.base.ext.OWLReasonerEx). This class provides some
@@ -139,7 +158,7 @@ You can get all the types of a certain individual using `types` method:
139158
<!--pytest-codeblocks:cont-->
140159

141160
```python
142-
anna = list( onto.individuals_in_signature()).pop()
161+
anna = list(onto.individuals_in_signature()).pop()
143162

144163
anna_types = ccei_reasoner.types(anna)
145164
```
@@ -229,43 +248,6 @@ for ind in male_individuals:
229248
print(ind)
230249
```
231250

232-
### Using a Triplestore for Reasoning Tasks
233-
234-
As we mentioned earlier, OWLReasoner has an argument for enabling triplestore querying:
235-
- `triplestore_address` - a string that contains the URL of the triplestore host/server. If specified, it tells
236-
the reasoner that for its operations it should query the triplestore hosted on the given address.
237-
238-
Triplestores are known for their efficiency in retrieving data, and they can be queried using SPARQL.
239-
Making this functionality available for reasoners in Ontolearn makes it possible to use concept learners that
240-
fully operates in datasets hosted on triplestores. Although that is the main goal, the reasoner can be used
241-
independently for reasoning tasks. Therefore, you can initialize a reasoner to use triplestore as follows:
242-
243-
```python
244-
from ontolearn.base import OWLReasoner_Owlready2
245-
246-
reasoner = OWLReasoner_Owlready2(onto, triplestore_address="http://some_domain/some_path/sparql")
247-
```
248-
249-
Now you can use the reasoner methods as you would normally do:
250-
251-
```python
252-
# Retrieving the male instances using `male` variable that we declared earlier
253-
males = reasoner.instances(male, direct=False)
254-
```
255-
256-
**Some important notice are given below:**
257-
258-
> Not all the methods of the reasoner are implemented to use triplestore but the main methods
259-
> such as 'instance' and those used to get sub/super classes/properties will work just fine.
260-
261-
> **You cannot pass the triplestore argument directly to FIC constructor.**
262-
> Because of the way it is implemented, if the base reasoner is set to use triplestore,
263-
> then FIC's is considered to using triplestore.
264-
265-
> When using triplestore all methods, including `instances` method **will default to the base
266-
> implementation**. This means that no matter which type of reasoner you are using, the results will be always
267-
> the same.
268-
269251
-----------------------------------------------------------------------
270252

271253
In this guide we covered the main functionalities of the reasoners in Ontolearn. More

docs/usage/06_concept_learners.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -323,23 +323,24 @@ Let's see what it takes to make use of it.
323323
First of all you need a server which should host the triplestore for your ontology. If you don't
324324
already have one, see [Loading and Launching a Triplestore](#loading-and-launching-a-triplestore) below.
325325

326-
Now you can simply initialize the `KnowledgeBase` object that will server as an input for your desired
326+
Now you can simply initialize a `TripleStoreKnowledgeBase` object that will server as an input for your desired
327327
concept learner as follows:
328328

329329
```python
330-
from ontolearn.knowledge_base import KnowledgeBase
330+
from ontolearn.triple_store import TripleStoreKnowledgeBase
331331

332-
kb = KnowledgeBase(triplestore_address="http://your_domain/some_path/sparql")
332+
kb = TripleStoreKnowledgeBase("http://your_domain/some_path/sparql")
333333
```
334334

335-
Notice that we did not provide a value for the `path` argument. When using triplestore, it is not required. Keep
336-
in mind that the `kb` will create a default reasoner that uses the triplestore. Passing a custom
337-
reasoner will not make any difference, because they all behave the same when using the triplestore.
338-
You may wonder what happens to the `Ontology` object of the `kb` since no path was given. A default ontology
339-
object is created that will also use the triplestore for its processes. Basically every querying process concerning
340-
concept learning is now using the triplestore.
335+
Notice that the triplestore endpoint is the only argument that you need to pass.
336+
Also keep in mind that this knowledge base contains a
337+
[TripleStoreOntology](ontolearn.triple_store.TripleStoreOntology)
338+
and [TripleStoreReasoner](ontolearn.triple_store.TripleStoreReasoner) which means that
339+
every querying process concerning concept learning is now using the triplestore.
341340

342341
> **Important notice:** The performance of a concept learner may differentiate when using triplestore.
342+
> This happens because some SPARQL queries may not yield the exact same results as the local querying methods.
343+
343344

344345
## Loading and Launching a Triplestore
345346

@@ -401,14 +402,15 @@ you pass this url to `triplestore_address` argument, you have to add the
401402
`/sparql` sub-path indicating to the server that we are querying via SPARQL queries. Full path now should look like:
402403
`http://localhost:3030/father/sparql`.
403404

404-
You can now create a knowledge base or a reasoner object that uses this URL for their
405+
You can now create a triplestore knowledge base or a reasoner that uses this URL for their
405406
operations:
406407

407408
```python
408-
from ontolearn.knowledge_base import KnowledgeBase
409+
from ontolearn.triple_store import TripleStoreKnowledgeBase
410+
411+
father_kb = TripleStoreKnowledgeBase("http://localhost:3030/father/sparql")
409412

410-
father_kb = KnowledgeBase(triplestore_address="http://localhost:3030/father/sparql")
411-
# ** Execute the learning algorithm as you normally would. ** .
413+
# ** Continue to execute the learning algorithm as you normally do. ** .
412414
```
413415

414416
-------------------------------------------------------------------

docs/usage/09_further_resources.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
You can find more details in the related papers for each algorithm:
44

5+
Concept Learning:
6+
57
- **NCES2** &rarr; (soon) [Neural Class Expression Synthesis in ALCHIQ(D)](https://papers.dice-research.org/2023/ECML_NCES2/NCES2_public.pdf)
68
- **Drill** &rarr; [Deep Reinforcement Learning for Refinement Operators in ALC](https://arxiv.org/pdf/2106.15373.pdf)
79
- **NCES** &rarr; [Neural Class Expression Synthesis](https://link.springer.com/chapter/10.1007/978-3-031-33455-9_13)
@@ -10,12 +12,24 @@ You can find more details in the related papers for each algorithm:
1012
- **CLIP** &rarr; (soon) [Learning Concept Lengths Accelerates Concept Learning in ALC](https://link.springer.com/chapter/10.1007/978-3-031-06981-9_14)
1113
- **CELOE** &rarr; [Class Expression Learning for Ontology Engineering](https://www.sciencedirect.com/science/article/abs/pii/S1570826811000023)
1214

15+
Sampling:
16+
- **OntoSample** &rarr; [Accelerating Concept Learning via Sampling](https://dl.acm.org/doi/10.1145/3583780.3615158)
17+
1318
## Citing
1419

1520
Currently, we are working on our manuscript describing our framework.
1621
If you find our work useful in your research, please consider citing the respective paper:
1722

1823
```
24+
# DRILL
25+
@inproceedings{demir2023drill,
26+
author = {Demir, Caglar and Ngomo, Axel-Cyrille Ngonga},
27+
booktitle = {The 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023},
28+
title = {Neuro-Symbolic Class Expression Learning},
29+
url = {https://www.ijcai.org/proceedings/2023/0403.pdf},
30+
year={2023}
31+
}
32+
1933
# NCES2
2034
@inproceedings{kouagou2023nces2,
2135
author={Kouagou, N'Dah Jean and Heindorf, Stefan and Demir, Caglar and Ngonga Ngomo, Axel-Cyrille},
@@ -56,12 +70,33 @@ address="Cham"
5670
year={2022},
5771
publisher={Springer Nature Switzerland}
5872
}
73+
74+
# OntoSample
75+
@inproceedings{10.1145/3583780.3615158,
76+
author = {Baci, Alkid and Heindorf, Stefan},
77+
title = {Accelerating Concept Learning via Sampling},
78+
year = {2023},
79+
isbn = {9798400701245},
80+
publisher = {Association for Computing Machinery},
81+
address = {New York, NY, USA},
82+
url = {https://doi.org/10.1145/3583780.3615158},
83+
doi = {10.1145/3583780.3615158},
84+
abstract = {Node classification is an important task in many fields, e.g., predicting entity types in knowledge graphs, classifying papers in citation graphs, or classifying nodes in social networks. In many cases, it is crucial to explain why certain predictions are made. Towards this end, concept learning has been proposed as a means of interpretable node classification: given positive and negative examples in a knowledge base, concepts in description logics are learned that serve as classification models. However, state-of-the-art concept learners, including EvoLearner and CELOE exhibit long runtimes. In this paper, we propose to accelerate concept learning with graph sampling techniques. We experiment with seven techniques and tailor them to the setting of concept learning. In our experiments, we achieve a reduction in training size by over 90\% while maintaining a high predictive performance.},
85+
booktitle = {Proceedings of the 32nd ACM International Conference on Information and Knowledge Management},
86+
pages = {3733–3737},
87+
numpages = {5},
88+
keywords = {concept learning, graph sampling, knowledge bases},
89+
location = {, Birmingham, United Kingdom, },
90+
series = {CIKM '23}
91+
}
5992
```
6093

6194
## More Inside the Project
6295

6396
Examples and test cases provide a good starting point to get to know
64-
the project better. Find them in the folders `examples` and `tests`.
97+
the project better. Find them in the folders
98+
[examples](https://github.com/dice-group/Ontolearn/tree/develop/examples)
99+
and [tests](https://github.com/dice-group/Ontolearn/tree/develop/tests).
65100

66101
## Contribution
67102

0 commit comments

Comments
 (0)