Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data insertion #128

Open
alejandrods opened this issue Sep 19, 2024 · 3 comments
Open

Data insertion #128

alejandrods opened this issue Sep 19, 2024 · 3 comments
Assignees

Comments

@alejandrods
Copy link

alejandrods commented Sep 19, 2024

Hello,

I am playing with the runner notebook and I just found that these lines take quite long time:

async def ingest_products_data(client: Graphiti):
    script_dir = Path.cwd().parent
    json_file_path = script_dir / 'data' / 'manybirds_products.json'

    with open(json_file_path) as file:
        products = json.load(file)['products']

    episodes: list[RawEpisode] = [
        RawEpisode(
            name=product.get('title', f'Product {i}'),
            content=str({k: v for k, v in product.items() if k != 'images'}),
            source_description='ManyBirds products',
            source=EpisodeType.json,
            reference_time=datetime.now(),
        )
        for i, product in enumerate(products)
    ]

    await client.add_episode_bulk(episodes)

Also after sometime it gets stuck:
Screenshot 2024-09-19 at 16 34 08

I understand that under the hood is embedding each category of each product, right? Can you share more about what this is doing, please?

Thank you!

@prasmussen15
Copy link
Collaborator

Yeah! So we found that the large JSON files do end up taking a long time, but it should still complete everything within a few minutes. Under the hood we are using LLMs to extract entities and relations and then deduplicating those with existing entities and relations. We also attempt to extract the time that these relations are true at and whether or not new relations invalidate old ones.

I go a bit more in depth on some of our LLM calls in this blog post: https://blog.getzep.com/llm-rag-knowledge-graphs-faster-and-more-dynamic/

@prasmussen15 prasmussen15 self-assigned this Sep 19, 2024
@alejandrods
Copy link
Author

Thank you for the clarification!

I also received this error running the example notebook of "agent":

# Test the tool node
await tool_node.ainvoke({'messages': [await llm.ainvoke('What are the different types of shoes')]})
Screenshot 2024-09-20 at 10 56 19

@prasmussen15
Copy link
Collaborator

Make sure you are on at least version 5.21 for neo4j as that is our minimum supported version.

Some history. Neo4j used to have an alternative syntax for determining shortest paths between nodes, but the ISO actually released a GQL standard for graphs in April 2024. Neo4j added the SHORTEST keyword as they work towards being GQL compliant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants