Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting warnings while doing data ingestion #220

Closed
Anindyadeep opened this issue Nov 27, 2024 · 1 comment
Closed

Getting warnings while doing data ingestion #220

Anindyadeep opened this issue Nov 27, 2024 · 1 comment

Comments

@Anindyadeep
Copy link

First of, this is an amazing library, hats off for making such great community efforts. While I was using graphiti, I encountered lots of warnings during the time of data ingestion.

Here is the code that I used for data ingestion:

import os
from graphiti_core import Graphiti
from graphiti_core.nodes import EpisodeType
from datetime import datetime
from graphiti_core.llm_client import openai_client
from graphiti_core.embedder.openai import OpenAIEmbedder, OpenAIEmbedderConfig
from graphiti_core.llm_client import LLMConfig

import nest_asyncio
nest_asyncio.apply()

# I am using Neo4J AuraDB
username="neo4j"
password="xxxxxx"
url="neo4j+s://xxxxx.neo4j.io"

config = LLMConfig(
    api_key=os.environ.get("OPENAI_API_KEY"),
    model="gpt-4o-mini"
)
client = openai_client.OpenAIClient(config=config)
embedding_config = OpenAIEmbedderConfig(
    embedding_model="text-embedding-3-small",
    api_key=os.environ.get("OPENAI_API_KEY"),
    embedding_dim=1024
)
embedder = OpenAIEmbedder(config=embedding_config)

graphiti = Graphiti(
    uri=url,
    user=username,
    password=password,
    llm_client=client,
    embedder=embedder
)

await graphiti.build_indices_and_constraints() 


from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

documents = SimpleDirectoryReader(input_dir="./documents").load_data()
splitter = SentenceSplitter(
    chunk_size=512,
    chunk_overlap=0
)
nodes = splitter.get_nodes_from_documents(documents=documents)
nodes = nodes[:30]

for node in nodes:
    id_ = node.id_
    filename = node.metadata.get("file_name")
    text = node.text
    
    # Now make episodes
    await graphiti.add_episode(
        name=id_,
        episode_body=text,
        source=EpisodeType.text,
        source_description=filename,
        reference_time=datetime.now()
    )

Now While I did this, I encountered two types of error:

First error (received lots of errors like these one by one):

Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.UnknownPropertyKeyWarning} {category: UNRECOGNIZED} {title: The provided property key is not in the database} {description: One of the property names in your query is not available in the database, make sure you didn't misspell it or that the label is available when you run this statement in your application (the missing property name is: content)} {position: line: 4, column: 18, offset: 143} for query: '\n        MATCH (e:Episodic) WHERE e.valid_at <= $reference_time \n        AND ($group_ids IS NULL) OR e.group_id in $group_ids\n        RETURN e.content AS content,\n            e.created_at AS created_at,\n            e.valid_at AS valid_at,\n            e.uuid AS uuid,\n            e.group_id AS group_id,\n            e.name AS name,\n            e.source_description AS source_description,\n            e.source AS source\n        ORDER BY e.created_at DESC\n        LIMIT $num_episodes\n        '

and finally got this:

[#DC36]  _: <CONNECTION> error: Failed to read from defunct connection IPv4Address(('3c4c6c3f.databases.neo4j.io', 7687)) (ResolvedIPv4Address(('54.216.115.14', 7687))): ConnectionResetError(104, 'Connection reset by peer')

And I see just to upload 30 chunks, it took around 20 minutes. Let me know if I am missing anything here, or in any ways I can improve and do not encounters error. Thanks

@prasmussen15
Copy link
Collaborator

These issues should be solved. The connection error is likely caused by a stale connection in the neo4j client. Pool connections can have their lifetime set to a lower value (like 200 seconds) to prevent this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants