Add PageRank example (#16)

matea16 · katarinasupe · web-flow · commit 3b4b365093dd · 2025-03-12T15:55:20.000+01:00
* add pagerank example

* update readme

Co-authored-by: Katarina Supe &lt;61758502+katarinasupe@users.noreply.github.com&gt;

* add conclusion

---------

Co-authored-by: Katarina Supe &lt;61758502+katarinasupe@users.noreply.github.com&gt;
diff --git a/README.md b/README.md
@@ -123,11 +123,26 @@ knowledge graphs seamlessly within Memgraph.
     - Querying and analyzing the knowledge graph to generate insightful responses
 
 - **:bulb: Demo: [Multi-agent RAG System with LlamaIndex](./integrations/llamaindex/multi-agent-rag-system)**
-    - This demo extends the RAG framework by utilizing a multi-agent architecture with LlamaIndex and Memgraph. Multiple agents collaborate to retrieve, process, and refine knowledge from the graph, enhancing response accuracy and depth.
-    - **:mag_right: Key Features:**
-      - Multi-agent system for distributed retrieval
-      - Advanced knowledge graph construction and querying with Memgraph
-      - Improved contextual understanding through agent collaboration
+  - This demo extends the RAG framework by utilizing a multi-agent architecture with LlamaIndex and Memgraph. Multiple agents collaborate to retrieve, process, and refine knowledge from the graph, enhancing response accuracy and depth.
+  - **:mag_right: Key Features:**
+    - Multi-agent system for distributed retrieval
+    - Advanced knowledge graph construction and querying with Memgraph
+    - Improved contextual understanding through agent collaboration
+
+- **:bulb: Demo: [Multi-agent RAG with Memgraph tools](./integrations/llamaindex/agentic-rag-with-graph-tools/agentic_rag_with_pagerank.ipynb)**
+  - This demo demonstrates how to integrate Memgraph procedures, such as
+      PageRank, as tools within a multi-agent architecture using LlamaIndex. The
+      agents work collaboratively to retrieve data from the graph, process it,
+      and perform calculations like summing the weight properties of nodes based
+      on the PageRank algorithm.
+  - **:mag_right: Key Features:**
+    - Integration of PageRank as a tool in a multi-agent system
+    - Execution of graph algorithms within agents for enhanced retrieval and
+      computation
+    - Multi-agent collaboration to process and analyze data retrieved from
+      Memgraph
+    - Dynamic query execution combining graph-based retrieval and computation
+      tasks
 
 **:book: Additional resources**
 - [LangChain & Memgraph](https://memgraph.com/docs/ai-ecosystem/graph-rag#langchain)
diff --git a/integrations/llamaindex/agentic-rag-with-graph-tools/agentic_rag_with_pagerank.ipynb b/integrations/llamaindex/agentic-rag-with-graph-tools/agentic_rag_with_pagerank.ipynb
@@ -0,0 +1,344 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Integrating PageRank as a tool in a multi-agent workflow\n",
+    "\n",
+    "In this example, we'll create a multi-agent workflow using LlamaIndex and\n",
+    "Memgraph to perform graph-based querying and computation. We'll explore how to:\n",
+    "\n",
+    "- Set up [**Memgraph**](https://memgraph.com/) as a graph store and create a\n",
+    "  sample dataset.\n",
+    "- Use [**LlamaIndex**](https://www.llamaindex.ai/) to define function agents for\n",
+    "  retrieval and arithmetic operations.\n",
+    "- Implement a **retriever agent** to run the\n",
+    "  [**PageRank**](https://memgraph.com/docs/advanced-algorithms/available-algorithms/pagerank)\n",
+    "  algorithm and extract ranked nodes.\n",
+    "- Use a **calculator agent** to process numerical data from retrieved nodes.\n",
+    "- Design an **AgentWorkflow** that integrates retrieval and computation for\n",
+    "  automated query execution.\n",
+    "\n",
+    "By the end, we'll have a system capable of retrieving graph-based data and\n",
+    "performing calculations dynamically.\n",
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "1. Make sure you have [Docker](https://www.docker.com/) running in the\n",
+    "   background. \n",
+    "\n",
+    "2. Run Memgraph\n",
+    "\n",
+    "The easiest way to run Memgraph is using the following commands:\n",
+    "\n",
+    "For Linux/macOS: `curl https://install.memgraph.com | sh`\n",
+    "\n",
+    "For Windows: `iwr https://windows.memgraph.com | iex`\n",
+    "\n",
+    "3. Install neccessary dependencies:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install llama-index llama-index-graph-stores-memgraph python-dotenv neo4j"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "4. Create vector index in Memgraph on the `__Entity__` label and `embedding`\n",
+    "   property. LlamaIndex creates embeddings and uses Memgraph's [vector\n",
+    "   search](https://memgraph.com/docs/querying/vector-search) for more accurate\n",
+    "   retrieval.\n",
+    "\n",
+    "`CREATE VECTOR INDEX entity ON :__Entity__(embedding) WITH CONFIG {\"dimension\":\n",
+    "1536, \"capacity\": 1000};`\n",
+    "\n",
+    "## Environment setup\n",
+    "\n",
+    "Create a `.env` file that contains your OpenAI API key and the values of\n",
+    "environment variables necessary to connect to your Memgraph instance. If the\n",
+    "user is not created, the default value is the empty string:\n",
+    "\n",
+    "`OPENAI_API_KEY=sk-proj-...` \n",
+    "`URI=bolt://localhost:7687` \n",
+    "`AUTH_USER=\"\"`\n",
+    "`AUTH_PASS=\"\"`\n",
+    "\n",
+    "## Create the script\n",
+    "\n",
+    "Let's first load our `.env` file and set the LLM model we want to use. In this\n",
+    "example, we're using OpenAI's GPT-4 model.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dotenv import load_dotenv\n",
+    "load_dotenv()\n",
+    "from llama_index.llms.openai import OpenAI\n",
+    "from llama_index.core import Settings\n",
+    "\n",
+    "# settings\n",
+    "Settings.llm = OpenAI(model=\"gpt-4\",temperature=0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Connect to Memgraph\n",
+    "\n",
+    "In this section, we'll establish a connection to Memgraph using the environment\n",
+    "variables for authentication and connection details.\n",
+    "\n",
+    "1. **Retrieve Environment Variables**  \n",
+    "   The script fetches the `URI`, `AUTH_USER`, and `AUTH_PASS` values from the\n",
+    "   environment using `os.getenv()`. These values determine how the script\n",
+    "   connects to the Memgraph database.\n",
+    "\n",
+    "2. **Set Up Authentication**  \n",
+    "   The credentials (`AUTH_USER`, `AUTH_PASS`) are combined into a tuple (`AUTH`)\n",
+    "   to be used for authentication.\n",
+    "\n",
+    "3. **Create a Memgraph Connection**  \n",
+    "   A connection to Memgraph is established using `GraphDatabase.driver(URI,\n",
+    "   auth=AUTH)`.  \n",
+    "\n",
+    "\n",
+    "This setup ensures that the script can interact with your Memgraph instance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from neo4j import GraphDatabase\n",
+    "from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore\n",
+    "\n",
+    "URI = os.getenv(\"URI\")\n",
+    "AUTH_USER = os.getenv(\"AUTH_USER\")\n",
+    "AUTH_PASS = os.getenv(\"AUTH_PASS\")\n",
+    "\n",
+    "AUTH = (AUTH_USER, AUTH_PASS)\n",
+    "\n",
+    "driver = GraphDatabase.driver(URI, auth=AUTH)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Define calculator tools\n",
+    "\n",
+    "Next, define addition and subtraction tools for calculations and a calculator\n",
+    "agent. The role of the agent in this case will be to perform basic arithmetic\n",
+    "operations with access to the defined tools."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core.tools import FunctionTool\n",
+    "from llama_index.core.agent.workflow import FunctionAgent\n",
+    "\n",
+    "def add(a: int, b: int) -> int:\n",
+    "    \"\"\"Add two numbers.\"\"\"\n",
+    "    return a + b\n",
+    "\n",
+    "def subtract(a: int, b: int) -> int:\n",
+    "    \"\"\"Subtract two numbers.\"\"\"\n",
+    "    return a - b\n",
+    "\n",
+    "# Create agent configs\n",
+    "calculator_agent = FunctionAgent(\n",
+    "    name=\"calculator\",\n",
+    "    description=\"Performs basic arithmetic operations\",\n",
+    "    system_prompt=\"You are a calculator assistant.\",\n",
+    "    tools=[\n",
+    "        FunctionTool.from_defaults(fn=add),\n",
+    "        FunctionTool.from_defaults(fn=subtract),\n",
+    "    ],\n",
+    "    llm=OpenAI(model=\"gpt-4\"),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, define a function to execute Cypher queries and implement a PageRank\n",
+    "retrieval tool. The retriever agent is responsible for running the PageRank\n",
+    "algorithm and retrieving ranked nodes using the defined tool."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def execute_query(query: str):\n",
+    "    \"\"\"Runs a given Cypher query inside a session.\"\"\"\n",
+    "    with driver.session() as session:\n",
+    "        return session.execute_read(lambda tx: list(tx.run(query)))\n",
+    "\n",
+    "def run_pagerank():\n",
+    "    \"\"\"Executes the PageRank algorithm.\"\"\"\n",
+    "    query = \"CALL pagerank.get() YIELD node, rank RETURN node, rank ORDER BY rank DESC LIMIT 5\"\n",
+    "    return execute_query(query)\n",
+    "\n",
+    "pagerank_tool = FunctionTool.from_defaults(\n",
+    "    fn=run_pagerank,\n",
+    "    name=\"pagerank_tool\",\n",
+    "    description=\"Runs the PageRank algorithm and retrieves ranked nodes.\"\n",
+    ")\n",
+    "\n",
+    "retriever_agent = FunctionAgent(\n",
+    "    name=\"retriever\",\n",
+    "    description=\"Manages data retrieval\",\n",
+    "    system_prompt=\"You have the ability to run the PageRank algorithm.\",\n",
+    "    tools=[\n",
+    "        pagerank_tool,\n",
+    "    ],\n",
+    "    llm=OpenAI(model=\"gpt-4\"),\n",
+    "    memory=None\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create the dataset \n",
+    "\n",
+    "Now, let's create a small dataset in Memgraph consisting of 10 nodes, each with\n",
+    "a weight property. The nodes are connected through `LINKS_TO` relationships,\n",
+    "forming a structured graph. To create your graph, run the following Cypher query\n",
+    "in your Memgraph instance:\n",
+    "\n",
+    "`CREATE (n1:Node {id: 1, weight: 1.2}), (n2:Node {id: 2, weight: 2.5}), (n3:Node\n",
+    "{id: 3, weight: 0.8}), (n4:Node {id: 4, weight: 1.7}), (n5:Node {id: 5, weight:\n",
+    "3.0}), (n6:Node {id: 6, weight: 2.2}), (n7:Node {id: 7, weight: 1.0}), (n8:Node\n",
+    "{id: 8, weight: 2.8}), (n9:Node {id: 9, weight: 1.5}), (n10:Node {id: 10,\n",
+    "weight: 2.0}), (n1)-[:LINKS_TO]->(n2), (n1)-[:LINKS_TO]->(n3),\n",
+    "(n2)-[:LINKS_TO]->(n4), (n3)-[:LINKS_TO]->(n4), (n4)-[:LINKS_TO]->(n5),\n",
+    "(n5)-[:LINKS_TO]->(n6), (n6)-[:LINKS_TO]->(n7), (n7)-[:LINKS_TO]->(n8),\n",
+    "(n8)-[:LINKS_TO]->(n9), (n9)-[:LINKS_TO]->(n10), (n10)-[:LINKS_TO]->(n1),\n",
+    "(n3)-[:LINKS_TO]->(n6), (n4)-[:LINKS_TO]->(n9), (n7)-[:LINKS_TO]->(n2),\n",
+    "(n8)-[:LINKS_TO]->(n5);`\n",
+    "\n",
+    "### Memgraph graph store\n",
+    "\n",
+    "We'll now establish a connection to **Memgraph**, using\n",
+    "`MemgraphPropertyGraphStore` from LlamaIndex. This allows us to store and\n",
+    "retrieve structured data efficiently, enabling **graph-based querying** for\n",
+    "retrieval-augmented generation (RAG) pipelines."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore\n",
+    "\n",
+    "graph_store = MemgraphPropertyGraphStore(\n",
+    "    username=\"\",  # Your Memgraph username, default is \"\"\n",
+    "    password=\"\",  # Your Memgraph password, default is \"\"\n",
+    "    url=\"bolt://localhost:7687\"  # Connection URL for Memgraph\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Creating and running the workflow\n",
+    "\n",
+    "Finally, let's create an **AgentWorkflow** that ties together the previously\n",
+    "defined agents, including the **calculator** and **retriever** agents. The\n",
+    "workflow runs the PageRank algorithm, retrieves nodes, and sums their weight\n",
+    "properties using the addition tool.\n",
+    "\n",
+    "We define an **async function** to execute the workflow, sending a user query\n",
+    "that asks to run the PageRank algorithm and using the addition tool, add all of\n",
+    "the weight properties of returned nodes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from llama_index.core.agent.workflow import (\n",
+    "    AgentWorkflow,\n",
+    "    FunctionAgent,\n",
+    "    ReActAgent,\n",
+    ")\n",
+    "import asyncio\n",
+    "\n",
+    "# Create and run the workflow\n",
+    "workflow = AgentWorkflow(\n",
+    "    agents=[calculator_agent, retriever_agent], root_agent=\"retriever\"\n",
+    ")\n",
+    "\n",
+    "# Define an async function to run the workflow\n",
+    "async def run_workflow():\n",
+    "    response = await workflow.run(user_msg=\"Run PageRank algorithm and using addition tool, add all of the weight properties of returned nodes.\")\n",
+    "    print(response)\n",
+    "\n",
+    "# Run the async function using asyncio\n",
+    "asyncio.run(run_workflow())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Conclusion\n",
+    "\n",
+    "This notebook provides a simple example of how to create and use **Memgraph procedures as tools** when implementing an **Agentic RAG system** with LlamaIndex. By integrating graph algorithms like **PageRank** into agents, we enable more powerful and context-aware data retrieval and computation.\n",
+    "\n",
+    "This is just the beginning, Memgraph supports a wide range of graph algorithms and procedures that can be leveraged in multi-agent workflows. You can explore more built-in algorithms and create custom ones using [MAGE (Memgraph Advanced Graph Extensions)](https://memgraph.com/docs/advanced-algorithms/available-algorithms) to further enhance your system's capabilities. The possibilities are endless!"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3.10.16 ('llama_examples')",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.16"
+  },
+  "orig_nbformat": 4,
+  "vscode": {
+   "interpreter": {
+    "hash": "42d147008be9a222f6757cc3d1527f7d3e48d8ff31a8ceb9f319427f25b07d46"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}