Skip to content

Commit 1c78a6a

Browse files
authored
Merge pull request #267 from Mec-iS/issue-261-graph-algebra
Starting graph algebra
2 parents 3814e8a + 3687dfc commit 1c78a6a

File tree

8 files changed

+533
-24
lines changed

8 files changed

+533
-24
lines changed

Diff for: examples/graph_algebra/gla_ex0_0.ipynb

+199
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Graph Algebra with `kglab`\n",
8+
"\n",
9+
"## intro\n",
10+
"`kglab` provides tools to access graph data from multiple source to build a `KnowledgeGraph` that can be easily used by data scientists. For a thorough explanation of how to use triples-stored data and how to load this data into `kglab` please see examples in the `examples/` directory. The examples in this directory (`examples/graph_algebra/`) will care to introduce graph algebra capabilities to be used on the graphs the user has loaded. \n",
11+
"\n",
12+
"## basic load and querying\n",
13+
"In particular, once your data is loaded in a `KnowledgeGraph` with something like:\n",
14+
"\n",
15+
"1. Instantiate a graph from a dataset:"
16+
]
17+
},
18+
{
19+
"cell_type": "code",
20+
"execution_count": 1,
21+
"metadata": {},
22+
"outputs": [
23+
{
24+
"data": {
25+
"text/plain": [
26+
"<kglab.kglab.KnowledgeGraph at 0x7f283f3d3940>"
27+
]
28+
},
29+
"execution_count": 1,
30+
"metadata": {},
31+
"output_type": "execute_result"
32+
}
33+
],
34+
"source": [
35+
"# for use in tutorial and development; do not include this `sys.path` change in production:\n",
36+
"import sys ; sys.path.insert(0, \"../../\")\n",
37+
"from os.path import dirname\n",
38+
"import kglab\n",
39+
"import os\n",
40+
"\n",
41+
"namespaces = {\n",
42+
" \"foaf\": \"http://xmlns.com/foaf/0.1/\",\n",
43+
" \"gorm\": \"http://example.org/sagas#\",\n",
44+
" \"rel\": \"http://purl.org/vocab/relationship/\",\n",
45+
" }\n",
46+
"\n",
47+
"kg = kglab.KnowledgeGraph(\n",
48+
" name = \"Happy Vikings KG example for SKOS/OWL inference\",\n",
49+
" namespaces=namespaces,\n",
50+
" )\n",
51+
"\n",
52+
"kg.load_rdf(dirname(dirname(os.getcwd())) + \"/dat/gorm.ttl\")"
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"metadata": {},
58+
"source": [
59+
"\n",
60+
"2. It is possible to create a subgraph by providing a SPARQL query, by defining a \"subject\" and \"object\":\n"
61+
]
62+
},
63+
{
64+
"cell_type": "code",
65+
"execution_count": 2,
66+
"metadata": {},
67+
"outputs": [],
68+
"source": [
69+
"query = \"\"\"SELECT ?subject ?object\n",
70+
"WHERE {\n",
71+
" ?subject rdf:type gorm:Viking .\n",
72+
" ?subject gorm:childOf ?object .\n",
73+
"}\n",
74+
"\"\"\""
75+
]
76+
},
77+
{
78+
"cell_type": "markdown",
79+
"metadata": {},
80+
"source": [
81+
"\n",
82+
"## define a subgraph\n",
83+
"In this case we are looking for the network of parent-child relations among members of Vikings family.\n",
84+
"\n",
85+
"With this query we can define a **subgraph** so to have access to **graph algebra** capabilities: "
86+
]
87+
},
88+
{
89+
"cell_type": "code",
90+
"execution_count": 3,
91+
"metadata": {},
92+
"outputs": [],
93+
"source": [
94+
"from kglab.subg import SubgraphMatrix\n",
95+
"\n",
96+
"subgraph = SubgraphMatrix(kg=kg, sparql=query)\n"
97+
]
98+
},
99+
{
100+
"cell_type": "markdown",
101+
"metadata": {},
102+
"source": [
103+
"## compute Adjacency matrices\n",
104+
"Let's compute the first basic adjacency matrix (usually noted with `A`):"
105+
]
106+
},
107+
{
108+
"cell_type": "code",
109+
"execution_count": 4,
110+
"metadata": {},
111+
"outputs": [
112+
{
113+
"data": {
114+
"text/plain": [
115+
"array([[0., 1., 1., 0., 0.],\n",
116+
" [0., 0., 0., 1., 0.],\n",
117+
" [0., 0., 0., 0., 0.],\n",
118+
" [0., 0., 0., 0., 1.],\n",
119+
" [0., 0., 0., 0., 0.]])"
120+
]
121+
},
122+
"execution_count": 4,
123+
"metadata": {},
124+
"output_type": "execute_result"
125+
}
126+
],
127+
"source": [
128+
"adj_matrix = subgraph.to_adjacency()\n",
129+
"adj_matrix"
130+
]
131+
},
132+
{
133+
"cell_type": "markdown",
134+
"metadata": {},
135+
"source": [
136+
"what happened here is that all the subjects and objects have been turned into integer indices from 0 to number of nodes. So we can see that the entity with index 0 is adjancent (is connected, has a directed edge) to the entity with index 1. This is a directed graph because the relationship `gorm:childOf` goes from child to parent, let's turn this into an undirected graph so to see the relation in a more symmetric way (both the child-parent and parent-child)."
137+
]
138+
},
139+
{
140+
"cell_type": "code",
141+
"execution_count": 6,
142+
"metadata": {},
143+
"outputs": [
144+
{
145+
"data": {
146+
"text/plain": [
147+
"array([[0., 1., 1., 0., 0.],\n",
148+
" [1., 0., 0., 1., 0.],\n",
149+
" [1., 0., 0., 0., 0.],\n",
150+
" [0., 1., 0., 0., 1.],\n",
151+
" [0., 0., 0., 1., 0.]])"
152+
]
153+
},
154+
"execution_count": 6,
155+
"metadata": {},
156+
"output_type": "execute_result"
157+
}
158+
],
159+
"source": [
160+
"undirected_adj_mtx = subgraph.to_undirected()\n",
161+
"undirected_adj_mtx"
162+
]
163+
},
164+
{
165+
"cell_type": "markdown",
166+
"metadata": {},
167+
"source": [
168+
"We can see now the relationship is a generic symmetric \"parenthood\" relations, not just a child-parent directed relationship."
169+
]
170+
}
171+
],
172+
"metadata": {
173+
"kernelspec": {
174+
"display_name": "Python 3.8.10 64-bit ('.venv': venv)",
175+
"language": "python",
176+
"name": "python3"
177+
},
178+
"language_info": {
179+
"codemirror_mode": {
180+
"name": "ipython",
181+
"version": 3
182+
},
183+
"file_extension": ".py",
184+
"mimetype": "text/x-python",
185+
"name": "python",
186+
"nbconvert_exporter": "python",
187+
"pygments_lexer": "ipython3",
188+
"version": "3.8.10"
189+
},
190+
"orig_nbformat": 4,
191+
"vscode": {
192+
"interpreter": {
193+
"hash": "de68f9b565e1e230f4433adb1a318d8f3a0dfad0917fa0c696727472c8ddadbf"
194+
}
195+
}
196+
},
197+
"nbformat": 4,
198+
"nbformat_minor": 2
199+
}

Diff for: kglab/algebra.py

+65
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
"""
2+
Working with `SubgraphMatrix` as vectorized representation.
3+
Additions to functionalities present in `subg.py`.
4+
Integrate `scipy` and `scikit-learn` functionalities.
5+
6+
see license https://github.com/DerwenAI/kglab#license-and-copyright
7+
"""
8+
import typing
9+
10+
import networkx as nx
11+
from networkx import DiGraph
12+
13+
class AlgebraMixin:
14+
"""
15+
Provides methods to work with graph algebra using `SubgraphMatrix` data.
16+
17+
NOTE: provide optional Oxigraph support for fast in-memory computation
18+
"""
19+
nx_graph: typing.Optional[DiGraph] = None
20+
21+
def to_undirected(self):
22+
return nx.to_numpy_array(self.nx_graph.to_undirected())
23+
24+
def to_adjacency(self):
25+
"""
26+
Return adjacency (dense) matrix for the KG.
27+
[Relevant NetworkX interface](https://networkx.org/documentation/stable/reference/convert.html#id2)
28+
29+
returns:
30+
`numpy.array`: the array representation in `numpy` standard
31+
"""
32+
self.check_attributes()
33+
return nx.to_numpy_array(self.nx_graph)
34+
35+
def to_incidence(self):
36+
"""
37+
Return incidence (dense) matrix for the KG.
38+
[Relevant scipy docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html)
39+
40+
returns:
41+
`numpy.array`: the array representation in `numpy` standard
42+
"""
43+
self.check_attributes()
44+
return nx.incidence_matrix(self.nx_graph).toarray()
45+
46+
def to_laplacian(self):
47+
"""
48+
Return Laplacian matrix for the KG. Graph is turned into undirected.
49+
[docs](https://networkx.org/documentation/stable/reference/generated/networkx.linalg.laplacianmatrix.laplacian_matrix.html)
50+
51+
returns:
52+
`numpy.array`: the array representation in `numpy` standard
53+
"""
54+
self.check_attributes()
55+
return nx.laplacian_matrix(self.nx_graph.to_undirected()).toarray()
56+
57+
def to_scipy_sparse(self):
58+
"""
59+
Return graph in CSR format (optimized for matrix-matrix operations).
60+
61+
returns:
62+
SciPy sparse matrix: Graph adjacency matrix.
63+
"""
64+
self.check_attributes()
65+
return nx.to_scipy_sparse_array(self.nx_graph)

Diff for: kglab/networks.py

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
"""
2+
Working with `SubgraphMatrix` as vectorized representation.
3+
Additions to functionalities present in `subg.py`.
4+
Integrate `scikit-network` functionalities.
5+
6+
see license https://github.com/DerwenAI/kglab#license-and-copyright
7+
"""
8+
9+
import sknetwork as skn
10+
11+
class NetAnalysisMixin:
12+
"""
13+
Provides methods for network analysis tools to work with `KnowledgeGraph`.
14+
"""
15+
def get_distances(self, adj_mtx):
16+
"""
17+
Compute distances according to an adjacency matrix.
18+
"""
19+
self.check_attributes()
20+
return skn.path.get_distances(adj_mtx)
21+
22+
def get_shortest_path(self, adj_matx, src, dst):
23+
"""
24+
Return shortest path from sources to destinations according to an djacency matrix.
25+
26+
adj_mtx:
27+
numpy.array: adjacency matrix for the graph.
28+
src:
29+
int or iterable: indices of source nodes
30+
dst:
31+
int or iterable: indices of destination nodes
32+
33+
returns:
34+
list of int: a path of indices
35+
"""
36+
self.check_attributes()
37+
return skn.path.get_shortest_path(adj_matx, src, dst)
38+
39+
40+
# number of nodes, number of edges
41+
# density
42+
# triangles
43+
# reciprocity

Diff for: kglab/query/mixin.py

+7-3
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,8 @@ def query_as_df (
8181
pythonify: bool = True,
8282
) -> pd.DataFrame:
8383
"""
84-
Wrapper for [`rdflib.Graph.query()`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=query#rdflib.Graph.query) to perform a SPARQL query on the RDF graph.
84+
Wrapper for [`rdflib.Graph.query()`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=query#rdflib.Graph.query)
85+
to perform a SPARQL query on the RDF graph.
8586
8687
sparql:
8788
text for the SPARQL query
@@ -123,7 +124,8 @@ def visualize_query (
123124
notebook: bool = False,
124125
) -> pyvis.network.Network:
125126
"""
126-
Visualize the given SPARQL query as a [`pyvis.network.Network`](https://pyvis.readthedocs.io/en/latest/documentation.html#pyvis.network.Network)
127+
Visualize the given SPARQL query as a
128+
[`pyvis.network.Network`](https://pyvis.readthedocs.io/en/latest/documentation.html#pyvis.network.Network)
127129
128130
sparql:
129131
input SPARQL query to be visualized
@@ -144,7 +146,9 @@ def n3fy (
144146
pythonify: bool = True,
145147
) -> typing.Any:
146148
"""
147-
Wrapper for RDFlib [`n3()`](https://rdflib.readthedocs.io/en/stable/utilities.html?highlight=n3#serializing-a-single-term-to-n3) and [`toPython()`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=toPython#rdflib.Variable.toPython) to serialize a node into a human-readable representation using N3 format.
149+
Wrapper for RDFlib [`n3()`](https://rdflib.readthedocs.io/en/stable/utilities.html?highlight=n3#serializing-a-single-term-to-n3)
150+
and [`toPython()`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=toPython#rdflib.Variable.toPython)
151+
to serialize a node into a human-readable representation using N3 format.
148152
149153
node:
150154
must be a [`rdflib.term.Node`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=Node#rdflib.term.Node)

0 commit comments

Comments
 (0)