dice-group
diff --git a/‎docs/conf.py
Lines changed: 2 additions & 1 deletion b/‎docs/conf.py
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/index.rst
Lines changed: 1 addition & 0 deletions b/‎docs/index.rst
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/usage/01_introduction.md
Lines changed: 3 additions & 2 deletions b/‎docs/usage/01_introduction.md
Lines changed: 3 additions & 2 deletions
diff --git a/‎docs/usage/04_knowledge_base.md
Lines changed: 98 additions & 1 deletion b/‎docs/usage/04_knowledge_base.md
Lines changed: 98 additions & 1 deletion
diff --git a/‎docs/usage/05_reasoner.md
Lines changed: 25 additions & 43 deletions b/‎docs/usage/05_reasoner.md
Lines changed: 25 additions & 43 deletions
diff --git a/‎docs/usage/06_concept_learners.md
Lines changed: 15 additions & 13 deletions b/‎docs/usage/06_concept_learners.md
Lines changed: 15 additions & 13 deletions
diff --git a/‎docs/usage/09_further_resources.md
Lines changed: 36 additions & 1 deletion b/‎docs/usage/09_further_resources.md
Lines changed: 36 additions & 1 deletion
@@ -36,7 +36,8 @@
 ]
 
 # autoapi for ontolearn and owlapy. for owlapy we need to refer to its path in GitHub Action environment
-autoapi_dirs = ['../ontolearn', '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/owlapy']
+autoapi_dirs = ['../ontolearn', '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/owlapy',
+                '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/ontosample']
 
 # by default all are included but had to reinitialize this to remove private members from shoing
 autoapi_options = ['members', 'undoc-members', 'show-inheritance', 'show-module-summary', 'special-members',
 
@@ -18,6 +18,7 @@ Ontolearn is an open-source software library for explainable structured machine
    usage/09_further_resources
    autoapi/ontolearn/index
    autoapi/owlapy/index
+   autoapi/ontosample/index
 
 
 .. raw:: latex
 
@@ -1,6 +1,6 @@
 # Ontolearn
 
-**Version:** ontolearn 0.6.2
+**Version:** ontolearn 0.6.1
 
 **GitHub repository:** [https://github.com/dice-group/Ontolearn](https://github.com/dice-group/Ontolearn)
 
@@ -31,12 +31,13 @@ Owlready2 library made it possible to build more complex algorithms.
 
 ---------------------------------------
 
-**Ontolearn (including owlapy) can do the following:**
+**Ontolearn (including owlapy and ontosample) can do the following:**
 
 - Load/save ontologies in RDF/XML, OWL/XML.
 - Modify ontologies by adding/removing axioms.
 - Access individuals/classes/properties of an ontology (and a lot more).
 - Define learning problems.
+- Sample ontologies.
 - Construct class expressions.
 - Use concept learning algorithms to classify positive examples in a learning problem.
 - Use local datasets or datasets that are hosted on a triplestore server, for the learning task.
 
@@ -49,7 +49,7 @@ kb = KnowledgeBase(path="file://KGs/father.owl")
 What happens in the background is that the ontology located in this path will be loaded
 in the `OWLOntology` object of `kb` as done [here](03_ontologies.md#loading-an-ontology).
 
-In our recent version you can also initialize the KnowledgeBase using a dataset hosted in a triplestore.
+In our recent version you can also initialize a knowledge base using a dataset hosted in a triplestore.
 Since that knowledge base is mainly used for executing a concept learner, we cover that matter more in depth 
 in _[Use Triplestore Knowledge Base](06_concept_learners.md#use-triplestore-knowledge-base)_ 
 section of _[Concept Learning](06_concept_learners.md)_.
@@ -285,6 +285,103 @@ requires only the `mode` parameter.
 > only the latter subsumption axiom will be returned.
 
 
+## Sampling the Knowledge Base
+
+Sometimes ontologies and therefore knowledge bases can get very large and our
+concept learners become inefficient in terms of runtime. Sampling is an approach
+to extract a portion of the whole knowledge base without changing its semantic and
+still being expressive enough to yield results with as little loss of quality as 
+possible. [OntoSample](https://github.com/alkidbaci/OntoSample/tree/main) is 
+a library that we use to perform the sampling process. It offers different sampling 
+techniques which fall into the following categories:
+
+- Node-based samplers
+- Edge-based samplers
+- Exploration-based samplers
+
+and almost each sampler is offered in 3 modes:
+
+- Classic
+- Learning problem first (LPF)
+- Learning problem centered (LPC)
+
+You can check them [here](ontosample).
+
+When operated on its own, Ontosample uses a light version of Ontolearn (`ontolearn_light`) 
+to reason over ontologies, but when both packages are installed in the same environment 
+it will use `ontolearn` module instead. This is made for compatibility reasons.
+
+Ontosample treats the knowledge base as a graph where nodes are individuals
+and edges are object properties. However, Ontosample also offers support for 
+data properties sampling, although they are not considered as _"edges"_.
+
+#### Sampling steps:
+1. Initialize the sample using a `KnowledgeBase` object. If you are using an LPF or LPC
+   sampler than you also need to pass the set of learning problem individuals (`lp_nodes`).
+2. To perform the sampling use the `sample` method where you pass the number
+   of nodes (`nodes_number`) that you want to sample, the amount of data properties in percentage
+   (`data_properties_percentage`) that you want to sample which is represented by float values 
+   form 0 to 1 and jump probability (`jump_prob`) for samplers that 
+   use "jumping", a technique to avoid infinite loops during sampling.
+3. The `sample` method returns the sampled knowledge which you can store to a 
+   variable, use directly in the code or save locally by using the static method 
+   `save_sample`.
+
+Let's see an example where we use [RandomNodeSampler](ontosample.classic_samplers.RandomNodeSampler) to sample a 
+knowledge base:
+
+```python
+from ontosample.classic_samplers import RandomNodeSampler
+
+# 1. Initialize KnowledgeBase object using the path of the ontology
+kb = KnowledgeBase(path="KGs/Family/family-benchmark_rich_background.owl")
+
+# 2. Initialize the sampler and generate the sample
+sampler = RandomNodeSampler(kb)
+sampled_kb = sampler.sample(30) # will generate a sample with 30 nodes
+
+# 3. Save the sampled ontology
+sampler.save_sample(kb=sampled_kb, filename="some_name")
+```
+
+Here is another example where this time we use an LPC sampler:
+
+```python
+from ontosample.lpc_samplers import RandomWalkerJumpsSamplerLPCentralized
+from owlapy.model import OWLNamedIndividual,IRI
+import json
+
+# 0. Load json that stores the learning problem
+with open("examples/uncle_lp2.json") as json_file:
+    examples = json.load(json_file)
+
+# 1. Initialize KnowledgeBase object using the path of the ontology
+kb = KnowledgeBase(path="KGs/Family/family-benchmark_rich_background.owl")
+
+# 2. Initialize learning problem (required only for LPF and LPC samplers)
+pos = set(map(OWLNamedIndividual, map(IRI.create, set(examples['positive_examples']))))
+neg = set(map(OWLNamedIndividual, map(IRI.create, set(examples['negative_examples']))))
+lp = pos.union(neg)
+
+# 3. Initialize the sampler and generate the sample
+sampler = RandomWalkerJumpsSamplerLPCentralized(graph=kb, lp_nodes=lp)
+sampled_kb = sampler.sample(nodes_number=40,jump_prob=0.15)
+
+# 4. Save the sampled ontology
+sampler.save_sample(kb=sampled_kb, filename="some_other_name")
+```
+
+> WARNING! Random walker and Random Walker with Prioritization are two samplers that suffer 
+> from non-termination in case that the ontology contains nodes that point to each other and 
+> form an inescapable loop for the "walker". In this scenario you can use their "jumping" 
+> version to make the "walker" escape these loops and ensure termination.
+
+To see how to use a sampled knowledge base for the task of concept learning check
+the `sampling_example.py` in [examples](https://github.com/dice-group/Ontolearn/tree/develop/examples) 
+folder. You will find descriptive comments in that script that will help you understand it better.
+
+For more details about OntoSample you can see [this paper](https://dl.acm.org/doi/10.1145/3583780.3615158).
+
 -----------------------------------------------------------------------------------------------------
 
 Since we cannot cover everything here in details, see [KnowledgeBase API documentation](ontolearn.knowledge_base.KnowledgeBase)
 
@@ -34,9 +34,6 @@ from. Currently, there are the following reasoners available:
     The structural reasoner requires an ontology ([OWLOntology](owlapy.model.OWLOntology)).
   The second argument is `isolate` argument which isolates the world (therefore the ontology) where the reasoner is
   performing the reasoning. More on that on _[Reasoning Details](07_reasoning_details.md#isolated-world)_.
-  The remaining argument, `triplestore_address`, is used in case you want to
-  retrieve instances from a triplestore (go to 
-    [_Using a Triplestore for Reasoning Tasks_](#using-a-triplestore-for-reasoning-tasks) for details).
 
 
 
@@ -60,8 +57,7 @@ from. Currently, there are the following reasoners available:
     which is just an enumeration with two possible values: `BaseReasoner_Owlready2.HERMIT` and `BaseReasoner_Owlready2.PELLET`.
   You can set the `infer_property_values` argument to `True` if you want the reasoner to infer
   property values. `infer_data_property_values` is an additional argument when the base reasoner is set to 
-    `BaseReasoner_Owlready2.PELLET`. The rest of the arguments `isolated` and `triplestore_address` 
-    are inherited from the base class.
+    `BaseReasoner_Owlready2.PELLET`. The argument `isolated` is inherited from the base class
 
 
 - [**OWLReasoner_FastInstanceChecker**](ontolearn.base.fast_instance_checker.OWLReasoner_FastInstanceChecker) **(FIC)**
@@ -87,6 +83,29 @@ from. Currently, there are the following reasoners available:
     `sub_properties` is another boolean argument to specify whether you want to take sub properties in consideration
   for `instances()` method.
 
+
+- [**TripleStoreReasoner**](ontolearn.triple_store.TripleStoreReasoner)
+  
+  Triplestores are known for their efficiency in retrieving data, and they can be queried using SPARQL.
+  Making this functionality available in Ontolearn makes it possible to use concept learners that
+  fully operates in datasets hosted on triplestores. Although that is the main goal, the reasoner can be used
+  independently for reasoning tasks.
+
+  In Ontolearn, we have implemented `TripleStoreReasoner`, to query triplestore endpoints using SPARQL queries.
+  It has only one required parameter:
+    - `ontology` - a [TripleStoreOntology](ontolearn.triple_store.TripleStoreOntology) that can be instantiated 
+  using a string that contains the URL of the triplestore host/server. 
+  
+  This reasoner inherit from OWLReasoner, and therefore you can use it like any other reasoner.
+  
+  **Initialization:**
+
+  ```python
+  from ontolearn.triple_store import TripleStoreReasoner, TripleStoreOntology
+  
+  reasoner = TripleStoreReasoner(TripleStoreOntology("http://some_domain/some_path/sparql"))
+  ```
+
 ## Usage of the Reasoner
 All the reasoners available in the Ontolearn library inherit from the
 class: [OWLReasonerEx](ontolearn.base.ext.OWLReasonerEx). This class provides some 
@@ -139,7 +158,7 @@ You can get all the types of a certain individual using `types` method:
 <!--pytest-codeblocks:cont-->
 
 ```python
-anna = list( onto.individuals_in_signature()).pop()
+anna = list(onto.individuals_in_signature()).pop()
 
 anna_types = ccei_reasoner.types(anna)
 ```
@@ -229,43 +248,6 @@ for ind in male_individuals:
     print(ind)
 ```
 
-### Using a Triplestore for Reasoning Tasks
-
-As we mentioned earlier, OWLReasoner has an argument for enabling triplestore querying:
-- `triplestore_address` - a string that contains the URL of the triplestore host/server. If specified, it tells
-the reasoner that for its operations it should query the triplestore hosted on the given address.
-
-Triplestores are known for their efficiency in retrieving data, and they can be queried using SPARQL.
-Making this functionality available for reasoners in Ontolearn makes it possible to use concept learners that
-fully operates in datasets hosted on triplestores. Although that is the main goal, the reasoner can be used
-independently for reasoning tasks. Therefore, you can initialize a reasoner to use triplestore as follows:
-
-```python
-from ontolearn.base import OWLReasoner_Owlready2
-
-reasoner = OWLReasoner_Owlready2(onto, triplestore_address="http://some_domain/some_path/sparql")
-```
-
-Now you can use the reasoner methods as you would normally do:
-
-```python
-# Retrieving the male instances using `male` variable that we declared earlier
-males = reasoner.instances(male, direct=False)
-```
-
-**Some important notice are given below:**
-
-> Not all the methods of the reasoner are implemented to use triplestore but the main methods 
-> such as 'instance' and those used to get sub/super classes/properties will work just fine.
-
-> **You cannot pass the triplestore argument directly to FIC constructor.** 
-> Because of the way it is implemented, if the base reasoner is set to use triplestore,
-> then FIC's is considered to using triplestore.
-
-> When using triplestore all methods, including `instances` method **will default to the base
-> implementation**. This means that no matter which type of reasoner you are using, the results will be always 
-> the same.
-
 -----------------------------------------------------------------------
 
 In this guide we covered the main functionalities of the reasoners in Ontolearn. More
 
@@ -323,23 +323,24 @@ Let's see what it takes to make use of it.
 First of all you need a server which should host the triplestore for your ontology. If you don't
 already have one, see [Loading and Launching a Triplestore](#loading-and-launching-a-triplestore) below.
 
-Now you can simply initialize the `KnowledgeBase` object that will server as an input for your desired 
+Now you can simply initialize a `TripleStoreKnowledgeBase` object that will server as an input for your desired 
 concept learner as follows:
 
 ```python
-from ontolearn.knowledge_base import KnowledgeBase
+from ontolearn.triple_store import TripleStoreKnowledgeBase
 
-kb = KnowledgeBase(triplestore_address="http://your_domain/some_path/sparql")
+kb = TripleStoreKnowledgeBase("http://your_domain/some_path/sparql")
 ```
 
-Notice that we did not provide a value for the `path` argument. When using triplestore, it is not required. Keep
-in mind that the `kb` will create a default reasoner that uses the triplestore. Passing a custom
-reasoner will not make any difference, because they all behave the same when using the triplestore.
-You may wonder what happens to the `Ontology` object of the `kb` since no path was given. A default ontology 
-object is created that will also use the triplestore for its processes. Basically every querying process concerning
-concept learning is now using the triplestore.
+Notice that the triplestore endpoint is the only argument that you need to pass.
+Also keep in mind that this knowledge base contains a 
+[TripleStoreOntology](ontolearn.triple_store.TripleStoreOntology) 
+and [TripleStoreReasoner](ontolearn.triple_store.TripleStoreReasoner) which means that
+every querying process concerning concept learning is now using the triplestore.
 
 > **Important notice:** The performance of a concept learner may differentiate when using triplestore.
+>  This happens because some SPARQL queries may not yield the exact same results as the local querying methods.
+
 
 ## Loading and Launching a Triplestore
 
@@ -401,14 +402,15 @@ you pass this url to `triplestore_address` argument, you have to add the
 `/sparql` sub-path indicating to the server that we are querying via SPARQL queries. Full path now should look like:
 `http://localhost:3030/father/sparql`.
 
-You can now create a knowledge base or a reasoner object that uses this URL for their 
+You can now create a triplestore knowledge base or a reasoner that uses this URL for their 
 operations:
 
 ```python
-from ontolearn.knowledge_base import KnowledgeBase
+from ontolearn.triple_store import TripleStoreKnowledgeBase
+
+father_kb = TripleStoreKnowledgeBase("http://localhost:3030/father/sparql")
 
-father_kb = KnowledgeBase(triplestore_address="http://localhost:3030/father/sparql")
-# ** Execute the learning algorithm as you normally would. ** .
+# ** Continue to execute the learning algorithm as you normally do. ** .
 ```
 
 -------------------------------------------------------------------
 
@@ -2,6 +2,8 @@
 
 You can find more details in the related papers for each algorithm:
 
+Concept Learning:
+
 - **NCES2** &rarr; (soon) [Neural Class Expression Synthesis in ALCHIQ(D)](https://papers.dice-research.org/2023/ECML_NCES2/NCES2_public.pdf)
 - **Drill** &rarr; [Deep Reinforcement Learning for Refinement Operators in ALC](https://arxiv.org/pdf/2106.15373.pdf)
 - **NCES** &rarr; [Neural Class Expression Synthesis](https://link.springer.com/chapter/10.1007/978-3-031-33455-9_13)
@@ -10,12 +12,24 @@ You can find more details in the related papers for each algorithm:
 - **CLIP** &rarr; (soon) [Learning Concept Lengths Accelerates Concept Learning in ALC](https://link.springer.com/chapter/10.1007/978-3-031-06981-9_14)
 - **CELOE** &rarr; [Class Expression Learning for Ontology Engineering](https://www.sciencedirect.com/science/article/abs/pii/S1570826811000023)
 
+Sampling:
+- **OntoSample** &rarr; [Accelerating Concept Learning via Sampling](https://dl.acm.org/doi/10.1145/3583780.3615158)
+
 ## Citing
 
 Currently, we are working on our manuscript describing our framework. 
 If you find our work useful in your research, please consider citing the respective paper:
 
 ```
+# DRILL
+@inproceedings{demir2023drill,
+  author = {Demir, Caglar and Ngomo, Axel-Cyrille Ngonga},
+  booktitle = {The 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023},
+  title = {Neuro-Symbolic Class Expression Learning},
+  url = {https://www.ijcai.org/proceedings/2023/0403.pdf},
+ year={2023}
+}
+
 # NCES2
 @inproceedings{kouagou2023nces2,
 author={Kouagou, N'Dah Jean and Heindorf, Stefan and Demir, Caglar and Ngonga Ngomo, Axel-Cyrille},
@@ -56,12 +70,33 @@ address="Cham"
   year={2022},
   publisher={Springer Nature Switzerland}
 }
+
+# OntoSample
+@inproceedings{10.1145/3583780.3615158,
+  author = {Baci, Alkid and Heindorf, Stefan},
+  title = {Accelerating Concept Learning via Sampling},
+  year = {2023},
+  isbn = {9798400701245},
+  publisher = {Association for Computing Machinery},
+  address = {New York, NY, USA},
+  url = {https://doi.org/10.1145/3583780.3615158},
+  doi = {10.1145/3583780.3615158},
+  abstract = {Node classification is an important task in many fields, e.g., predicting entity types in knowledge graphs, classifying papers in citation graphs, or classifying nodes in social networks. In many cases, it is crucial to explain why certain predictions are made. Towards this end, concept learning has been proposed as a means of interpretable node classification: given positive and negative examples in a knowledge base, concepts in description logics are learned that serve as classification models. However, state-of-the-art concept learners, including EvoLearner and CELOE exhibit long runtimes. In this paper, we propose to accelerate concept learning with graph sampling techniques. We experiment with seven techniques and tailor them to the setting of concept learning. In our experiments, we achieve a reduction in training size by over 90\% while maintaining a high predictive performance.},
+  booktitle = {Proceedings of the 32nd ACM International Conference on Information and Knowledge Management},
+  pages = {3733–3737},
+  numpages = {5},
+  keywords = {concept learning, graph sampling, knowledge bases},
+  location = {, Birmingham, United Kingdom, },
+  series = {CIKM '23}
+}
 ```
 
 ## More Inside the Project
 
 Examples and test cases provide a good starting point to get to know
-the project better. Find them in the folders `examples` and `tests`.
+the project better. Find them in the folders 
+[examples](https://github.com/dice-group/Ontolearn/tree/develop/examples)  
+and [tests](https://github.com/dice-group/Ontolearn/tree/develop/tests).
 
 ## Contribution
Original file line number	Diff line number	Diff line change
`@@ -36,7 +36,8 @@`
`36`	`36`	`]`
`37`	`37`
`38`	`38`	`# autoapi for ontolearn and owlapy. for owlapy we need to refer to its path in GitHub Action environment`
`39`		`-autoapi_dirs = ['../ontolearn', '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/owlapy']`
	`39`	`+autoapi_dirs = ['../ontolearn', '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/owlapy',`
	`40`	`+ '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/ontosample']`
`40`	`41`
`41`	`42`	`# by default all are included but had to reinitialize this to remove private members from shoing`
`42`	`43`	`autoapi_options = ['members', 'undoc-members', 'show-inheritance', 'show-module-summary', 'special-members',`