Skip to content

Commit e39ac25

Browse files
authored
Merge pull request #348 from dice-group/develop
Develop
2 parents f4a47b7 + db83035 commit e39ac25

33 files changed

+1919
-1223
lines changed

docs/conf.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,8 @@
3636
]
3737

3838
# autoapi for ontolearn and owlapy. for owlapy we need to refer to its path in GitHub Action environment
39-
autoapi_dirs = ['../ontolearn', '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/owlapy']
39+
autoapi_dirs = ['../ontolearn', '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/owlapy',
40+
'/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/ontosample']
4041

4142
# by default all are included but had to reinitialize this to remove private members from shoing
4243
autoapi_options = ['members', 'undoc-members', 'show-inheritance', 'show-module-summary', 'special-members',

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Ontolearn is an open-source software library for explainable structured machine
1818
usage/09_further_resources
1919
autoapi/ontolearn/index
2020
autoapi/owlapy/index
21+
autoapi/ontosample/index
2122

2223

2324
.. raw:: latex

docs/usage/01_introduction.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Ontolearn
22

3-
**Version:** ontolearn 0.6.2
3+
**Version:** ontolearn 0.6.1
44

55
**GitHub repository:** [https://github.com/dice-group/Ontolearn](https://github.com/dice-group/Ontolearn)
66

@@ -31,12 +31,13 @@ Owlready2 library made it possible to build more complex algorithms.
3131

3232
---------------------------------------
3333

34-
**Ontolearn (including owlapy) can do the following:**
34+
**Ontolearn (including owlapy and ontosample) can do the following:**
3535

3636
- Load/save ontologies in RDF/XML, OWL/XML.
3737
- Modify ontologies by adding/removing axioms.
3838
- Access individuals/classes/properties of an ontology (and a lot more).
3939
- Define learning problems.
40+
- Sample ontologies.
4041
- Construct class expressions.
4142
- Use concept learning algorithms to classify positive examples in a learning problem.
4243
- Use local datasets or datasets that are hosted on a triplestore server, for the learning task.

docs/usage/04_knowledge_base.md

Lines changed: 133 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ kb = KnowledgeBase(path="file://KGs/father.owl")
4949
What happens in the background is that the ontology located in this path will be loaded
5050
in the `OWLOntology` object of `kb` as done [here](03_ontologies.md#loading-an-ontology).
5151

52-
In our recent version you can also initialize the KnowledgeBase using a dataset hosted in a triplestore.
52+
In our recent version you can also initialize a knowledge base using a dataset hosted in a triplestore.
5353
Since that knowledge base is mainly used for executing a concept learner, we cover that matter more in depth
5454
in _[Use Triplestore Knowledge Base](06_concept_learners.md#use-triplestore-knowledge-base)_
5555
section of _[Concept Learning](06_concept_learners.md)_.
@@ -253,11 +253,140 @@ You can now:
253253
print(evaluated_concept.ic) # 3
254254
```
255255

256+
## Obtaining axioms
257+
258+
You can retrieve Tbox and Abox axioms by using `tbox` and `abox` methods respectively.
259+
Let us take them one at a time. The `tbox` method has 2 parameters, `entities` and `mode`.
260+
`entities` specifies the owl entity from which we want to obtain the Tbox axioms. It can be
261+
a single entity, a `Iterable` of entities, or `None`.
262+
263+
The allowed types of entities are:
264+
- OWLClass
265+
- OWLObjectProperty
266+
- OWLDataProperty
267+
268+
Only the Tbox axioms related to the given entit-y/ies will be returned. If no entities are
269+
passed, then it returns all the Tbox axioms.
270+
The second parameter `mode` _(str)_ sets the return format type. It can have the
271+
following values:
272+
1) `'native'` -> triples are represented as tuples of owlapy objects.
273+
2) `'iri'` -> triples are represented as tuples of IRIs as strings.
274+
3) `'axiom'` -> triples are represented as owlapy axioms.
275+
276+
For the `abox` method the idea is similar. Instead of the parameter `entities`, there is the parameter
277+
`individuals` which accepts an object of type OWLNamedIndividuals or Iterable[OWLNamedIndividuals].
278+
279+
If you want to obtain all the axioms (Tbox + Abox) of the knowledge base, you can use the method `triples`. It
280+
requires only the `mode` parameter.
281+
282+
> **NOTE**: The results of these methods are limited only to named and direct entities.
283+
> That means that especially the axioms that contain anonymous owl objects (objects that don't have an IRI)
284+
> will not be part of the result set. For example, if there is a Tbox T={ C ⊑ (A ⊓ B), C ⊑ D },
285+
> only the latter subsumption axiom will be returned.
286+
287+
288+
## Sampling the Knowledge Base
289+
290+
Sometimes ontologies and therefore knowledge bases can get very large and our
291+
concept learners become inefficient in terms of runtime. Sampling is an approach
292+
to extract a portion of the whole knowledge base without changing its semantic and
293+
still being expressive enough to yield results with as little loss of quality as
294+
possible. [OntoSample](https://github.com/alkidbaci/OntoSample/tree/main) is
295+
a library that we use to perform the sampling process. It offers different sampling
296+
techniques which fall into the following categories:
297+
298+
- Node-based samplers
299+
- Edge-based samplers
300+
- Exploration-based samplers
301+
302+
and almost each sampler is offered in 3 modes:
303+
304+
- Classic
305+
- Learning problem first (LPF)
306+
- Learning problem centered (LPC)
307+
308+
You can check them [here](ontosample).
309+
310+
When operated on its own, Ontosample uses a light version of Ontolearn (`ontolearn_light`)
311+
to reason over ontologies, but when both packages are installed in the same environment
312+
it will use `ontolearn` module instead. This is made for compatibility reasons.
313+
314+
Ontosample treats the knowledge base as a graph where nodes are individuals
315+
and edges are object properties. However, Ontosample also offers support for
316+
data properties sampling, although they are not considered as _"edges"_.
317+
318+
#### Sampling steps:
319+
1. Initialize the sample using a `KnowledgeBase` object. If you are using an LPF or LPC
320+
sampler than you also need to pass the set of learning problem individuals (`lp_nodes`).
321+
2. To perform the sampling use the `sample` method where you pass the number
322+
of nodes (`nodes_number`) that you want to sample, the amount of data properties in percentage
323+
(`data_properties_percentage`) that you want to sample which is represented by float values
324+
form 0 to 1 and jump probability (`jump_prob`) for samplers that
325+
use "jumping", a technique to avoid infinite loops during sampling.
326+
3. The `sample` method returns the sampled knowledge which you can store to a
327+
variable, use directly in the code or save locally by using the static method
328+
`save_sample`.
329+
330+
Let's see an example where we use [RandomNodeSampler](ontosample.classic_samplers.RandomNodeSampler) to sample a
331+
knowledge base:
332+
333+
```python
334+
from ontosample.classic_samplers import RandomNodeSampler
335+
336+
# 1. Initialize KnowledgeBase object using the path of the ontology
337+
kb = KnowledgeBase(path="KGs/Family/family-benchmark_rich_background.owl")
338+
339+
# 2. Initialize the sampler and generate the sample
340+
sampler = RandomNodeSampler(kb)
341+
sampled_kb = sampler.sample(30) # will generate a sample with 30 nodes
342+
343+
# 3. Save the sampled ontology
344+
sampler.save_sample(kb=sampled_kb, filename="some_name")
345+
```
346+
347+
Here is another example where this time we use an LPC sampler:
348+
349+
```python
350+
from ontosample.lpc_samplers import RandomWalkerJumpsSamplerLPCentralized
351+
from owlapy.model import OWLNamedIndividual,IRI
352+
import json
353+
354+
# 0. Load json that stores the learning problem
355+
with open("examples/uncle_lp2.json") as json_file:
356+
examples = json.load(json_file)
357+
358+
# 1. Initialize KnowledgeBase object using the path of the ontology
359+
kb = KnowledgeBase(path="KGs/Family/family-benchmark_rich_background.owl")
360+
361+
# 2. Initialize learning problem (required only for LPF and LPC samplers)
362+
pos = set(map(OWLNamedIndividual, map(IRI.create, set(examples['positive_examples']))))
363+
neg = set(map(OWLNamedIndividual, map(IRI.create, set(examples['negative_examples']))))
364+
lp = pos.union(neg)
365+
366+
# 3. Initialize the sampler and generate the sample
367+
sampler = RandomWalkerJumpsSamplerLPCentralized(graph=kb, lp_nodes=lp)
368+
sampled_kb = sampler.sample(nodes_number=40,jump_prob=0.15)
369+
370+
# 4. Save the sampled ontology
371+
sampler.save_sample(kb=sampled_kb, filename="some_other_name")
372+
```
373+
374+
> WARNING! Random walker and Random Walker with Prioritization are two samplers that suffer
375+
> from non-termination in case that the ontology contains nodes that point to each other and
376+
> form an inescapable loop for the "walker". In this scenario you can use their "jumping"
377+
> version to make the "walker" escape these loops and ensure termination.
378+
379+
To see how to use a sampled knowledge base for the task of concept learning check
380+
the `sampling_example.py` in [examples](https://github.com/dice-group/Ontolearn/tree/develop/examples)
381+
folder. You will find descriptive comments in that script that will help you understand it better.
382+
383+
For more details about OntoSample you can see [this paper](https://dl.acm.org/doi/10.1145/3583780.3615158).
384+
256385
-----------------------------------------------------------------------------------------------------
257386

258-
See [KnowledgeBase API documentation](ontolearn.knowledge_base.KnowledgeBase)
259-
to check all the methods that this class has to offer. You will find methods to
260-
access the class/property hierarchy, convenient methods that use the reasoner indirectly and
387+
Since we cannot cover everything here in details, see [KnowledgeBase API documentation](ontolearn.knowledge_base.KnowledgeBase)
388+
to check all the methods that this class has to offer. You will find convenient methods to
389+
access the class/property hierarchy, methods that use the reasoner indirectly and
261390
a lot more.
262391

263392
Speaking of the reasoner, it is important that an ontology

docs/usage/05_reasoner.md

Lines changed: 25 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,6 @@ from. Currently, there are the following reasoners available:
3434
The structural reasoner requires an ontology ([OWLOntology](owlapy.model.OWLOntology)).
3535
The second argument is `isolate` argument which isolates the world (therefore the ontology) where the reasoner is
3636
performing the reasoning. More on that on _[Reasoning Details](07_reasoning_details.md#isolated-world)_.
37-
The remaining argument, `triplestore_address`, is used in case you want to
38-
retrieve instances from a triplestore (go to
39-
[_Using a Triplestore for Reasoning Tasks_](#using-a-triplestore-for-reasoning-tasks) for details).
4037

4138

4239

@@ -60,8 +57,7 @@ from. Currently, there are the following reasoners available:
6057
which is just an enumeration with two possible values: `BaseReasoner_Owlready2.HERMIT` and `BaseReasoner_Owlready2.PELLET`.
6158
You can set the `infer_property_values` argument to `True` if you want the reasoner to infer
6259
property values. `infer_data_property_values` is an additional argument when the base reasoner is set to
63-
`BaseReasoner_Owlready2.PELLET`. The rest of the arguments `isolated` and `triplestore_address`
64-
are inherited from the base class.
60+
`BaseReasoner_Owlready2.PELLET`. The argument `isolated` is inherited from the base class
6561

6662

6763
- [**OWLReasoner_FastInstanceChecker**](ontolearn.base.fast_instance_checker.OWLReasoner_FastInstanceChecker) **(FIC)**
@@ -87,6 +83,29 @@ from. Currently, there are the following reasoners available:
8783
`sub_properties` is another boolean argument to specify whether you want to take sub properties in consideration
8884
for `instances()` method.
8985

86+
87+
- [**TripleStoreReasoner**](ontolearn.triple_store.TripleStoreReasoner)
88+
89+
Triplestores are known for their efficiency in retrieving data, and they can be queried using SPARQL.
90+
Making this functionality available in Ontolearn makes it possible to use concept learners that
91+
fully operates in datasets hosted on triplestores. Although that is the main goal, the reasoner can be used
92+
independently for reasoning tasks.
93+
94+
In Ontolearn, we have implemented `TripleStoreReasoner`, to query triplestore endpoints using SPARQL queries.
95+
It has only one required parameter:
96+
- `ontology` - a [TripleStoreOntology](ontolearn.triple_store.TripleStoreOntology) that can be instantiated
97+
using a string that contains the URL of the triplestore host/server.
98+
99+
This reasoner inherit from OWLReasoner, and therefore you can use it like any other reasoner.
100+
101+
**Initialization:**
102+
103+
```python
104+
from ontolearn.triple_store import TripleStoreReasoner, TripleStoreOntology
105+
106+
reasoner = TripleStoreReasoner(TripleStoreOntology("http://some_domain/some_path/sparql"))
107+
```
108+
90109
## Usage of the Reasoner
91110
All the reasoners available in the Ontolearn library inherit from the
92111
class: [OWLReasonerEx](ontolearn.base.ext.OWLReasonerEx). This class provides some
@@ -139,7 +158,7 @@ You can get all the types of a certain individual using `types` method:
139158
<!--pytest-codeblocks:cont-->
140159

141160
```python
142-
anna = list( onto.individuals_in_signature()).pop()
161+
anna = list(onto.individuals_in_signature()).pop()
143162

144163
anna_types = ccei_reasoner.types(anna)
145164
```
@@ -229,43 +248,6 @@ for ind in male_individuals:
229248
print(ind)
230249
```
231250

232-
### Using a Triplestore for Reasoning Tasks
233-
234-
As we mentioned earlier, OWLReasoner has an argument for enabling triplestore querying:
235-
- `triplestore_address` - a string that contains the URL of the triplestore host/server. If specified, it tells
236-
the reasoner that for its operations it should query the triplestore hosted on the given address.
237-
238-
Triplestores are known for their efficiency in retrieving data, and they can be queried using SPARQL.
239-
Making this functionality available for reasoners in Ontolearn makes it possible to use concept learners that
240-
fully operates in datasets hosted on triplestores. Although that is the main goal, the reasoner can be used
241-
independently for reasoning tasks. Therefore, you can initialize a reasoner to use triplestore as follows:
242-
243-
```python
244-
from ontolearn.base import OWLReasoner_Owlready2
245-
246-
reasoner = OWLReasoner_Owlready2(onto, triplestore_address="http://some_domain/some_path/sparql")
247-
```
248-
249-
Now you can use the reasoner methods as you would normally do:
250-
251-
```python
252-
# Retrieving the male instances using `male` variable that we declared earlier
253-
males = reasoner.instances(male, direct=False)
254-
```
255-
256-
**Some important notice are given below:**
257-
258-
> Not all the methods of the reasoner are implemented to use triplestore but the main methods
259-
> such as 'instance' and those used to get sub/super classes/properties will work just fine.
260-
261-
> **You cannot pass the triplestore argument directly to FIC constructor.**
262-
> Because of the way it is implemented, if the base reasoner is set to use triplestore,
263-
> then FIC's is considered to using triplestore.
264-
265-
> When using triplestore all methods, including `instances` method **will default to the base
266-
> implementation**. This means that no matter which type of reasoner you are using, the results will be always
267-
> the same.
268-
269251
-----------------------------------------------------------------------
270252

271253
In this guide we covered the main functionalities of the reasoners in Ontolearn. More

docs/usage/06_concept_learners.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -323,23 +323,24 @@ Let's see what it takes to make use of it.
323323
First of all you need a server which should host the triplestore for your ontology. If you don't
324324
already have one, see [Loading and Launching a Triplestore](#loading-and-launching-a-triplestore) below.
325325

326-
Now you can simply initialize the `KnowledgeBase` object that will server as an input for your desired
326+
Now you can simply initialize a `TripleStoreKnowledgeBase` object that will server as an input for your desired
327327
concept learner as follows:
328328

329329
```python
330-
from ontolearn.knowledge_base import KnowledgeBase
330+
from ontolearn.triple_store import TripleStoreKnowledgeBase
331331

332-
kb = KnowledgeBase(triplestore_address="http://your_domain/some_path/sparql")
332+
kb = TripleStoreKnowledgeBase("http://your_domain/some_path/sparql")
333333
```
334334

335-
Notice that we did not provide a value for the `path` argument. When using triplestore, it is not required. Keep
336-
in mind that the `kb` will create a default reasoner that uses the triplestore. Passing a custom
337-
reasoner will not make any difference, because they all behave the same when using the triplestore.
338-
You may wonder what happens to the `Ontology` object of the `kb` since no path was given. A default ontology
339-
object is created that will also use the triplestore for its processes. Basically every querying process concerning
340-
concept learning is now using the triplestore.
335+
Notice that the triplestore endpoint is the only argument that you need to pass.
336+
Also keep in mind that this knowledge base contains a
337+
[TripleStoreOntology](ontolearn.triple_store.TripleStoreOntology)
338+
and [TripleStoreReasoner](ontolearn.triple_store.TripleStoreReasoner) which means that
339+
every querying process concerning concept learning is now using the triplestore.
341340

342341
> **Important notice:** The performance of a concept learner may differentiate when using triplestore.
342+
> This happens because some SPARQL queries may not yield the exact same results as the local querying methods.
343+
343344

344345
## Loading and Launching a Triplestore
345346

@@ -401,14 +402,15 @@ you pass this url to `triplestore_address` argument, you have to add the
401402
`/sparql` sub-path indicating to the server that we are querying via SPARQL queries. Full path now should look like:
402403
`http://localhost:3030/father/sparql`.
403404

404-
You can now create a knowledge base or a reasoner object that uses this URL for their
405+
You can now create a triplestore knowledge base or a reasoner that uses this URL for their
405406
operations:
406407

407408
```python
408-
from ontolearn.knowledge_base import KnowledgeBase
409+
from ontolearn.triple_store import TripleStoreKnowledgeBase
410+
411+
father_kb = TripleStoreKnowledgeBase("http://localhost:3030/father/sparql")
409412

410-
father_kb = KnowledgeBase(triplestore_address="http://localhost:3030/father/sparql")
411-
# ** Execute the learning algorithm as you normally would. ** .
413+
# ** Continue to execute the learning algorithm as you normally do. ** .
412414
```
413415

414416
-------------------------------------------------------------------

0 commit comments

Comments
 (0)