Skip to content

Commit 3dcfc20

Browse files
Update knowledge_graph_embedding.md
in progress commit; checking with author
1 parent 8431d71 commit 3dcfc20

File tree

1 file changed

+18
-15
lines changed

1 file changed

+18
-15
lines changed

docs/use_cases/knowledge_graph_embedding.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,9 @@
44

55
Large Language Models (LLMs) are everywhere, achieving impressive results in all sorts of language-related tasks. However, in specific domains involving non-text data representations - like Q&A tasks, in which the model needs to understand the context in order to respond accurately, LLMs may not offer the same level of performance.
66

7-
We look at how Knowledge Graph Embedding (KGE) algorithms can improve performance on tasks that LLMs have difficulty with, explore some example code for training and evaluating a KGE model, and use the KGE model to perform Q&A tasks. We also compare KGE and LLM performance on a Q&A task. Let's get started.
7+
We look at how Knowledge Graph Embedding (KGE) algorithms can improve performance on tasks that LLMs have difficulty with, explore some example code for training and evaluating a KGE model, and use the KGE model to perform Q&A tasks. We also compare KGE and LLM performance on a Q&A task.
8+
9+
Let's get started.
810

911
## Knowledge Graphs and missing edges
1012

@@ -14,27 +16,27 @@ We use Knowledge Graphs (KGs) to describe how different entities, like people, p
1416

1517
In domains where understanding these specific connections is crucial - like recommendation systems, search engines, or information retrieval - KGs specialize in helping computers grasp the detailed relationships between things.
1618

17-
The problem with KGs is that they are usually incomplete. Edges that should be present are missing. These missing links can result from inaccuracies in the data collection process, or simply reflect that our data source is imperfect. In large open-source knowledge bases, [we can observe a significant amount of incompleteness](https://towardsdatascience.com/neural-graph-databases-cc35c9e1d04f):
19+
The problem with KGs is that they are usually incomplete. Edges that should be present are missing. These missing links can result from inaccuracies in the data collection process, or simply reflect that our data source is imperfect. In large open-source knowledge bases, [we can observe a _significant_ amount of incompleteness](https://towardsdatascience.com/neural-graph-databases-cc35c9e1d04f):
1820

19-
“… in Freebase, 93.8% of people have no place of birth and [78.5% have no nationality](https://aclanthology.org/P09-1113.pdf), [about 68% of people do not have any profession](https://dl.acm.org/doi/abs/10.1145/2566486.2568032), while, in Wikidata, [about 50% of artists have no date of birth](https://arxiv.org/abs/2207.00143), and only [0.4% of known buildings have information about height](https://dl.acm.org/doi/abs/10.1145/3485447.3511932).”
21+
“… in Freebase, 93.8% of people have no place of birth, and [78.5% have no nationality](https://aclanthology.org/P09-1113.pdf), [about 68% of people do not have any profession](https://dl.acm.org/doi/abs/10.1145/2566486.2568032), while, in Wikidata, [about 50% of artists have no date of birth](https://arxiv.org/abs/2207.00143), and only [0.4% of known buildings have information about height](https://dl.acm.org/doi/abs/10.1145/3485447.3511932).”
2022

21-
The imperfections of KGs can have negative outcomes. For example in recommendations systems, KG incompleteness can lead to limited or biased recommendations; on Q&A tasks, KG incompleteness can result in substantively and contextually incomplete or inaccurate answers to queries.
23+
The imperfections of KGs can lead to negative outcomes. For example, in recommendations systems, KG incompleteness can result in limited or biased recommendations; on Q&A tasks, KG incompleteness can result in substantively and contextually incomplete or inaccurate answers to queries.
2224

2325
Fortunately, KGEs can help solve the problems faced by KGs.
2426

2527
## Knowledge Graph Embeddings and how they work
2628

2729
Trained KGE algorithms can generalize and predict missing edges by calculating the likelihood of connections between entities.
2830

29-
KGE algorithms do this by taking tangled complex webs of connections between entities and turn them into something AI systems can understand: vectors. Embedding entities in a vector space allows KGE algorithms to define a loss function that measures the discrepancy between embedding similarity and node similarity in the graph. If the loss is minimal that means that similar nodes in the graph have similar embeddings.
31+
KGE algorithms do this by taking tangled complex webs of connections between entities and turning them into something AI systems can understand: vectors. Embedding entities in a vector space allows KGE algorithms to define a loss function that measures the discrepancy between embedding similarity and node similarity in the graph. If the loss is minimal, similar nodes in the graph have similar embeddings.
3032

3133
The KGE model is trained by trying to align the similarities between embedding vectors align with the similarities of corresponding nodes in the graph. The model adjusts its parameters during training to ensure that entities that are similar in the KG have similar embeddings. This ensures that vector representations capture the structural and relational aspects of entities in the graph.
3234

3335
KGE algorithms vary in the similarity functions they employ and how they define node similarity within a graph. A simple approach is to consider nodes that are connected by an edge as similar. Using this definition, learning node embeddings can be framed as a classification task. In this task, the goal is to determine how likely it is that any given pair of nodes have a specific type of relationship (i.e., share a specific edge), given their embeddings.
3436

3537
## Demo using DistMult KGE
3638

37-
For our KGE model demo, we opted for the DistMult KGE algorithm. It works by representing the likelihood of relationships between entities (the similarity function) as a bilinear function. Essentially, it assumes that the score of a given triple (comprised of a head entity $h$, a relationship $r$, and a tail entity $t$) can be computed as $h^T \text{diag}(r) t$.
39+
For our KGE model demo, we opted for the DistMult KGE algorithm. It works by representing the likelihood of relationships between entities (i.e., similarity) as a bilinear function. Essentially, DisMult KGE assumes that the score of a given triple (comprised of a head entity $h$, a relationship $r$, and a tail entity $t$) can be computed as $h^T \text{diag}(r) t$.
3840

3941
![DistMult similarity function](../assets/use_cases/knowledge_graph_embedding/distmult.png)
4042

@@ -74,6 +76,7 @@ model = DistMult(
7476
hidden_channels=64
7577
)
7678
```
79+
7780
For additional configuration of the model, please refer to the [PyTorch Geometric documentation](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.kge.DistMult.html).
7881

7982

@@ -121,7 +124,7 @@ Now that we have a trained model, we can do **some experiments** to see how well
121124

122125
1. France contains Burgundy (which is true)
123126
2. France contains Rio de Janeiro (which is not true)
124-
3. France contains Bonnie and Clyde (which does not make sense at all)
127+
3. France contains Bonnie and Clyde (which makes no sense)
125128

126129
```python
127130
# Get node and relation IDs
@@ -181,11 +184,11 @@ top_5_scores = scores[sorted_indices]
181184
('artist', 2.522)]
182185
```
183186

184-
Impressively, the model correctly interprets and infers information that isn't explicitly included in the graph, and provides the correct answer to our question. Our model aptly demonstrates KGE's ability to make up for graph incompleteness.
187+
Impressively, the model correctly interprets and infers information that isn't explicitly included in the graph, and provides the right answer to our question. Our model aptly demonstrates KGE's ability to make up for graph incompleteness.
185188

186-
Furthermore, the fact that the top five relevant entities identified by the model are all professions suggests that the model has successfully learned and understood the concept of a "profession" - that is, the model has grasped the broader context and implications of "profession," rather than just recognizing the term itself.
189+
Furthermore, the fact that the top five relevant entities identified by the model are all professions suggests that the model has successfully learned and understood the concept of a "profession" - that is, the model has grasped the _broader context and implications_ of "profession," rather than just recognizing the term itself.
187190

188-
Moreover, these five professions are all closely related to the film industry, suggesting that the model has not only understood the concept of a profession but also managed to capture the semantic meaning of the question, which was related to a film director. Thus, the model was able to link the general concept of a profession to the specific context of the film industry, a testament to its ability to capture and interpret semantic meaning.
191+
Moreover, these five professions are all closely related to the film industry, suggesting that the model has _not only_ understood the concept of a profession but _also_ narrowed this category to film industry professions specifically; that is, KGE has managed to capture the semantic meaning of the combination of the two query terms: the head entity (Guy Ritchie) and the relation entity (profession), and therefore was able to link the general concept of a profession to the specific context of the film industry, a testament to its ability to capture and interpret semantic meaning.
189192

190193
In sum, the model's performance in this scenario demonstrates its potential for understanding concepts, interpreting context, and extracting semantic meaning.
191194

@@ -196,7 +199,7 @@ Here is the [complete code for this demo](https://drive.google.com/file/d/1G3tJ6
196199

197200
Next, let's compare the performance of KGE and LLMs on the ogbl-wikikg2 dataset, drawn from Wikidata. This dataset includes 2.5 million unique entities, 535 types of relations, and 17.1 million fact triplets. We'll evaluate their performance using hit rates (ratio of correct answers), following the guidelines provided in [Stanford's Open Graph Benchmark](https://ogb.stanford.edu/docs/linkprop/#ogbl-wikikg2).
198201

199-
First, we create textual representations for each node within the graph by crafting sentences that describe their connections, like this: "[node] [relation1] [neighbor1], [neighbor2]. [node] [relation2] [neighbor3], [neighbor4]. ..." We then feed these textual representations were into a LLM – specifically, the `BAAI/bge-base-en-v1.5` model available on [HuggingFace](https://huggingface.co/BAAI/bge-base-en-v1.5). The embeddings resulting from this process serve as our node embeddings.
202+
First, we create textual representations for each node within the graph by crafting sentences that describe their connections, like this: "[node] [relation1] [neighbor1], [neighbor2]. [node] [relation2] [neighbor3], [neighbor4]. ..." We then feed these textual representations into a LLM – specifically, the `BAAI/bge-base-en-v1.5` model available on [HuggingFace](https://huggingface.co/BAAI/bge-base-en-v1.5). The embeddings that result from this process serve as our node embeddings.
200203

201204
For queries, we take a similar textual representation approach, creating descriptions of the query but omitting the specific entity in question. With these representations in hand, we utilize dot product similarity to find and rank relevant answers.
202205

@@ -213,14 +216,14 @@ You can see the results on the Open Graph Benchmark query set in the table below
213216
| HitRate@3 | 0.003 | 0.0154 | **0.150** |
214217
| HitRate@10 | 0.010 | 0.0436 | **0.307** |
215218

216-
While the LLM performs three times better than when the nodes are randomly ordered, KGE significantly outperforms LLM, with hit rates almost ten times higher. In addition, DistMult finds the correct answer on its first try more frequently than LLM does in ten attempts. Moreover, DisMult's performance is even more remarkable when considering that it outperforms LLM even though we used lower-dimensional (250) embeddings with DisMult than the LLM, which outputs 768-dimensional embeddings.
219+
While the LLM performs three times better than when the nodes are randomly ordered, KGE really stands out as the superior option, with hit rates almost ten times higher than the LLM. In addition, DistMult finds the correct answer on its first try more frequently than LLM does in ten attempts. DisMult's performance is even more remarkable when considering that it outperforms LLM even though we used lower-dimensional (250) embeddings with DisMult than the LLM, which outputs 768-dimensional embeddings.
217220

218-
Our results strongly indicate KGE's superior suitability compared with LLMs for tasks where relational information is important.
221+
Our results unequivocally demonstrate KGE's superior suitability compared with LLMs for tasks where relational information is important.
219222

220223

221-
## Limitations
224+
## Not a panacea (limitations)
222225

223-
PBecause LLMs have trouble encoding intricate relation structures, their performance suffers when dealing with relational information. Creating a string representation of a node's connections tends to overload the LLM's input. LLMs' strength lies in processing more focused and specific textual information; they are typically not trained to handle broad and diverse information within a single context.
226+
Because LLMs have trouble encoding intricate relation structures, their performance suffers when dealing with relational information. Creating a string representation of a node's connections tends to overload the LLM's input. LLMs' strength lies in processing more focused and specific textual information; they are typically not trained to handle broad and diverse information within a single context.
224227

225228
KGE algorithms, on the other hand, are specifically designed to handle relational data.
226229

0 commit comments

Comments
 (0)