Adds a knowledge graph writer #83

alexthomas93 · 2024-07-22T15:21:09Z

Description

Adds an abstract class KGWriter which allows a knowledge graphs to be written to an arbitrary data store.
Adds a Neo4jWriter class which allows a knowledge graph to be written to a Neo4j database.
Adds an upsert_vector_on relationships function to allow for vectors to be added to Neo4j relationships.
Adds Pydantic models for a Neo4j graph.
Adds an end-to-end test for the Neo4jWriter class.
Fixes some broken copyright headers.

Type of Change

Complexity

Complexity: Medium

How Has This Been Tested?

Unit tests
E2E tests
Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

Documentation has been updated
Unit tests have been updated
E2E tests have been updated
Examples have been updated
New files have copyright header
CLA (https://neo4j.com/developer/cla/) has been signed
CHANGELOG.md updated if appropriate

src/neo4j_genai/indexes.py

willtai · 2024-07-25T10:14:49Z

src/neo4j_genai/indexes.py

-        RETURN n
-        """
+        query = (
+            "MATCH (n)"


Why change the multiline string here?

If you use """ to create multiline strings any indentation in your code will be included in the string. This means when you write your tests you need to make sure the indentation in the multi-line strings in the tests matches the indentation in the parts of the code they're testing, which can make the multi-line strings in tests look a bit weird. If you instead write multi-line strings this way you don't have to worry about it

Have you tested these queries? I think you may need to add a space to the end of each line this way.

Regardless, I think using """ makes it more readable though, we had an issue previously where with the proposed way, someone missed adding a space at the end as it was not very readable. I proposed switching back to using """ but for tests use something like https://docs.python.org/3/library/textwrap.html . What do you think?

I'm not sure how that'd work for tests such as test_upsert_node_with_embedding. You'd need to call textwrap within upsert_vector and the test to make sure the strings match. Maybe the way around this is to just move the queries to the neo4j_queries.py file and import them in both tests and the upsert functions.

I think if it works, maybe there's no need to overcomplicate it for now. I'd be happy with the original approach provided the query runs

src/neo4j_genai/kg_construction/kg_writer.py

src/neo4j_genai/kg_construction/types.py

src/neo4j_genai/kg_construction/kg_writer.py

tests/e2e/test_kg_construction_e2e.py

…f type hints

* Pipeline (#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Pipeline (#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Adds a Text Splitter (#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Added a TextChunkEmbedder (#87) * Added a TextChunkEmbedder * Added the copyright header to test_embedder.py * Updated test_text_chunk_embedder_run * Adds a knowledge graph writer (#83) * Added copyright header to new files * Added copyright header to kg_writer.py * Added __future__ import to kg_writer.py for backwards compatibility of type hints * Added E2E test for Neo4jWriter * Added a copyright header to test_kg_builder_e2e.py * Added upsert_vector test for relationship embeddings * Moved KG writer and its tests * Moved Neo4jGraph and associated objects to a new file * Renamed KG builder fixture * Added unit tests for KG writer * Split upsert_vector into 2 functions * Fixed broken cypher query strings * Removed embedding creation from Neo4jWriter * Fixed setup_neo4j_for_kg_construction fixture * Added KGWriterModel class * Fixed minor mistake in test_weaviate_e2e.py * Renamed kg_construction folder to components * Updated unit tests with new folder structure * Fixed broken import * Fixed copyright headers * Added missing docstrings * Fixed typo * Add documentation for pipeline exceptions (#90) * Fixes and refactors the KG writer component (#92) * Fixes and refactors the KG writer component * Fixed mypy error * Made start_node_id and end_node_id parameters in UPSERT_RELATIONSHIP_QUERY * Add schema for kg builder (#88) * Add schema for kg builder and tests * Fixed mypy checks * Reverted kg builder example with schema * Revert to List and Dict due to Python3.8 issue with using get_type_hints * Added properties to Entity and Relation * Add test for missing properties * Fix type annotations in test * Add property types * Refactored entity, relation, and property types * Unused import * Moved schema to components/ (#96) * Add entity / Relation extraction component (#85) * Pipeline (#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Entity / Relation extraction component * Adds a Text Splitter (#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Add tests * Keep it simple: remove deps to jinja for now * Update example with existing components * log config in example * Fix tests * Rm unused import * Add copyright headers * Rm debug code * Try and fix tests * Unused import * get_type_hints is failing for python 3.8/3.9, even when using __future__ annotations => back to the typing.Dict annotation which is compatible with all python versions * Return model is also conditioned to the existence of the run method => should raise an error if run is not implemented? * Log when we do not raise exception to keep track of the failure * Update prompt to match new KGwriter expected type * Fix test * Fix type for `examples` * Use SchemaConfig as input for the ER Extractor component * The "base" EntityRelationExtractor is an ABC that must be subclassed * Make node IDs unique across several runs of the pipeline by prefixing them with a timestamp * Option to build lexical graph in the ERExtractor component * Fix one test * Fix some more tests * Fix some more tests * Remove "type: ignore" comments --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> * Update lock file after merge * Remove pipeline/components folder (again) * Updated component docs (#99) * Updated component docs * Removed weaviate test update * Updated pipeline user guide with link to components in the API section * Feature/kg builder e2e tests (#98) * End to end tests for KG builder pipeline * Adding chunk embedder to the pipeline and e2e tests * Fix how the chunk embedding is saved * Fix e2e tests * Fix mypy * mypy stuff :'( * WIP: update e2e tests * Check counts also here * Enable e2e tests on this PR only * Fix e2e tests (was not mocking the correct method for Embedder) * Revert CI to normal * Updated CHANGLOG and set max-parallel: 1 for E2E tests in pr-e2e-tests.yaml --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> Co-authored-by: willtai <william.tai@neo4j.com>

* Pipeline (#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Pipeline (#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Adds a Text Splitter (#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Added a TextChunkEmbedder (#87) * Added a TextChunkEmbedder * Added the copyright header to test_embedder.py * Updated test_text_chunk_embedder_run * Adds a knowledge graph writer (#83) * Added copyright header to new files * Added copyright header to kg_writer.py * Added __future__ import to kg_writer.py for backwards compatibility of type hints * Added E2E test for Neo4jWriter * Added a copyright header to test_kg_builder_e2e.py * Added upsert_vector test for relationship embeddings * Moved KG writer and its tests * Moved Neo4jGraph and associated objects to a new file * Renamed KG builder fixture * Added unit tests for KG writer * Split upsert_vector into 2 functions * Fixed broken cypher query strings * Removed embedding creation from Neo4jWriter * Fixed setup_neo4j_for_kg_construction fixture * Added KGWriterModel class * Fixed minor mistake in test_weaviate_e2e.py * Renamed kg_construction folder to components * Updated unit tests with new folder structure * Fixed broken import * Fixed copyright headers * Added missing docstrings * Fixed typo * Add documentation for pipeline exceptions (#90) * Start documentation for KG construction pipeline * Fixes and refactors the KG writer component (#92) * Fixes and refactors the KG writer component * Fixed mypy error * Made start_node_id and end_node_id parameters in UPSERT_RELATIONSHIP_QUERY * Add schema for kg builder (#88) * Add schema for kg builder and tests * Fixed mypy checks * Reverted kg builder example with schema * Revert to List and Dict due to Python3.8 issue with using get_type_hints * Added properties to Entity and Relation * Add test for missing properties * Fix type annotations in test * Add property types * Refactored entity, relation, and property types * Unused import * Moved schema to components/ (#96) * Add entity / Relation extraction component (#85) * Pipeline (#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Entity / Relation extraction component * Adds a Text Splitter (#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Add tests * Keep it simple: remove deps to jinja for now * Update example with existing components * log config in example * Fix tests * Rm unused import * Add copyright headers * Rm debug code * Try and fix tests * Unused import * get_type_hints is failing for python 3.8/3.9, even when using __future__ annotations => back to the typing.Dict annotation which is compatible with all python versions * Return model is also conditioned to the existence of the run method => should raise an error if run is not implemented? * Log when we do not raise exception to keep track of the failure * Update prompt to match new KGwriter expected type * Fix test * Fix type for `examples` * Use SchemaConfig as input for the ER Extractor component * The "base" EntityRelationExtractor is an ABC that must be subclassed * Make node IDs unique across several runs of the pipeline by prefixing them with a timestamp * Option to build lexical graph in the ERExtractor component * Fix one test * Fix some more tests * Fix some more tests * Remove "type: ignore" comments --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> * Update lock file after merge * Remove pipeline/components folder (again) * Updated component docs (#99) * Updated component docs * Removed weaviate test update * Updated pipeline user guide with link to components in the API section * Feature/kg builder e2e tests (#98) * End to end tests for KG builder pipeline * Adding chunk embedder to the pipeline and e2e tests * Fix how the chunk embedding is saved * Fix e2e tests * Fix mypy * mypy stuff :'( * WIP: update e2e tests * Check counts also here * Enable e2e tests on this PR only * Fix e2e tests (was not mocking the correct method for Embedder) * Revert CI to normal * User guide for KG builder pipeline * Update line length * Review comments 1 * Address review comments - add missing file (image) * Nicer lists --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> Co-authored-by: willtai <william.tai@neo4j.com>

* Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Adds a Text Splitter (neo4j#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Added a TextChunkEmbedder (neo4j#87) * Added a TextChunkEmbedder * Added the copyright header to test_embedder.py * Updated test_text_chunk_embedder_run * Adds a knowledge graph writer (neo4j#83) * Added copyright header to new files * Added copyright header to kg_writer.py * Added __future__ import to kg_writer.py for backwards compatibility of type hints * Added E2E test for Neo4jWriter * Added a copyright header to test_kg_builder_e2e.py * Added upsert_vector test for relationship embeddings * Moved KG writer and its tests * Moved Neo4jGraph and associated objects to a new file * Renamed KG builder fixture * Added unit tests for KG writer * Split upsert_vector into 2 functions * Fixed broken cypher query strings * Removed embedding creation from Neo4jWriter * Fixed setup_neo4j_for_kg_construction fixture * Added KGWriterModel class * Fixed minor mistake in test_weaviate_e2e.py * Renamed kg_construction folder to components * Updated unit tests with new folder structure * Fixed broken import * Fixed copyright headers * Added missing docstrings * Fixed typo * Add documentation for pipeline exceptions (neo4j#90) * Fixes and refactors the KG writer component (neo4j#92) * Fixes and refactors the KG writer component * Fixed mypy error * Made start_node_id and end_node_id parameters in UPSERT_RELATIONSHIP_QUERY * Add schema for kg builder (neo4j#88) * Add schema for kg builder and tests * Fixed mypy checks * Reverted kg builder example with schema * Revert to List and Dict due to Python3.8 issue with using get_type_hints * Added properties to Entity and Relation * Add test for missing properties * Fix type annotations in test * Add property types * Refactored entity, relation, and property types * Unused import * Moved schema to components/ (neo4j#96) * Add entity / Relation extraction component (neo4j#85) * Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Entity / Relation extraction component * Adds a Text Splitter (neo4j#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Add tests * Keep it simple: remove deps to jinja for now * Update example with existing components * log config in example * Fix tests * Rm unused import * Add copyright headers * Rm debug code * Try and fix tests * Unused import * get_type_hints is failing for python 3.8/3.9, even when using __future__ annotations => back to the typing.Dict annotation which is compatible with all python versions * Return model is also conditioned to the existence of the run method => should raise an error if run is not implemented? * Log when we do not raise exception to keep track of the failure * Update prompt to match new KGwriter expected type * Fix test * Fix type for `examples` * Use SchemaConfig as input for the ER Extractor component * The "base" EntityRelationExtractor is an ABC that must be subclassed * Make node IDs unique across several runs of the pipeline by prefixing them with a timestamp * Option to build lexical graph in the ERExtractor component * Fix one test * Fix some more tests * Fix some more tests * Remove "type: ignore" comments --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> * Update lock file after merge * Remove pipeline/components folder (again) * Updated component docs (neo4j#99) * Updated component docs * Removed weaviate test update * Updated pipeline user guide with link to components in the API section * Feature/kg builder e2e tests (neo4j#98) * End to end tests for KG builder pipeline * Adding chunk embedder to the pipeline and e2e tests * Fix how the chunk embedding is saved * Fix e2e tests * Fix mypy * mypy stuff :'( * WIP: update e2e tests * Check counts also here * Enable e2e tests on this PR only * Fix e2e tests (was not mocking the correct method for Embedder) * Revert CI to normal * Updated CHANGLOG and set max-parallel: 1 for E2E tests in pr-e2e-tests.yaml --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> Co-authored-by: willtai <william.tai@neo4j.com>

* Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Adds a Text Splitter (neo4j#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Added a TextChunkEmbedder (neo4j#87) * Added a TextChunkEmbedder * Added the copyright header to test_embedder.py * Updated test_text_chunk_embedder_run * Adds a knowledge graph writer (neo4j#83) * Added copyright header to new files * Added copyright header to kg_writer.py * Added __future__ import to kg_writer.py for backwards compatibility of type hints * Added E2E test for Neo4jWriter * Added a copyright header to test_kg_builder_e2e.py * Added upsert_vector test for relationship embeddings * Moved KG writer and its tests * Moved Neo4jGraph and associated objects to a new file * Renamed KG builder fixture * Added unit tests for KG writer * Split upsert_vector into 2 functions * Fixed broken cypher query strings * Removed embedding creation from Neo4jWriter * Fixed setup_neo4j_for_kg_construction fixture * Added KGWriterModel class * Fixed minor mistake in test_weaviate_e2e.py * Renamed kg_construction folder to components * Updated unit tests with new folder structure * Fixed broken import * Fixed copyright headers * Added missing docstrings * Fixed typo * Add documentation for pipeline exceptions (neo4j#90) * Start documentation for KG construction pipeline * Fixes and refactors the KG writer component (neo4j#92) * Fixes and refactors the KG writer component * Fixed mypy error * Made start_node_id and end_node_id parameters in UPSERT_RELATIONSHIP_QUERY * Add schema for kg builder (neo4j#88) * Add schema for kg builder and tests * Fixed mypy checks * Reverted kg builder example with schema * Revert to List and Dict due to Python3.8 issue with using get_type_hints * Added properties to Entity and Relation * Add test for missing properties * Fix type annotations in test * Add property types * Refactored entity, relation, and property types * Unused import * Moved schema to components/ (neo4j#96) * Add entity / Relation extraction component (neo4j#85) * Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Entity / Relation extraction component * Adds a Text Splitter (neo4j#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Add tests * Keep it simple: remove deps to jinja for now * Update example with existing components * log config in example * Fix tests * Rm unused import * Add copyright headers * Rm debug code * Try and fix tests * Unused import * get_type_hints is failing for python 3.8/3.9, even when using __future__ annotations => back to the typing.Dict annotation which is compatible with all python versions * Return model is also conditioned to the existence of the run method => should raise an error if run is not implemented? * Log when we do not raise exception to keep track of the failure * Update prompt to match new KGwriter expected type * Fix test * Fix type for `examples` * Use SchemaConfig as input for the ER Extractor component * The "base" EntityRelationExtractor is an ABC that must be subclassed * Make node IDs unique across several runs of the pipeline by prefixing them with a timestamp * Option to build lexical graph in the ERExtractor component * Fix one test * Fix some more tests * Fix some more tests * Remove "type: ignore" comments --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> * Update lock file after merge * Remove pipeline/components folder (again) * Updated component docs (neo4j#99) * Updated component docs * Removed weaviate test update * Updated pipeline user guide with link to components in the API section * Feature/kg builder e2e tests (neo4j#98) * End to end tests for KG builder pipeline * Adding chunk embedder to the pipeline and e2e tests * Fix how the chunk embedding is saved * Fix e2e tests * Fix mypy * mypy stuff :'( * WIP: update e2e tests * Check counts also here * Enable e2e tests on this PR only * Fix e2e tests (was not mocking the correct method for Embedder) * Revert CI to normal * User guide for KG builder pipeline * Update line length * Review comments 1 * Address review comments - add missing file (image) * Nicer lists --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> Co-authored-by: willtai <william.tai@neo4j.com>

* Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Adds a Text Splitter (neo4j#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Added a TextChunkEmbedder (neo4j#87) * Added a TextChunkEmbedder * Added the copyright header to test_embedder.py * Updated test_text_chunk_embedder_run * Adds a knowledge graph writer (neo4j#83) * Added copyright header to new files * Added copyright header to kg_writer.py * Added __future__ import to kg_writer.py for backwards compatibility of type hints * Added E2E test for Neo4jWriter * Added a copyright header to test_kg_builder_e2e.py * Added upsert_vector test for relationship embeddings * Moved KG writer and its tests * Moved Neo4jGraph and associated objects to a new file * Renamed KG builder fixture * Added unit tests for KG writer * Split upsert_vector into 2 functions * Fixed broken cypher query strings * Removed embedding creation from Neo4jWriter * Fixed setup_neo4j_for_kg_construction fixture * Added KGWriterModel class * Fixed minor mistake in test_weaviate_e2e.py * Renamed kg_construction folder to components * Updated unit tests with new folder structure * Fixed broken import * Fixed copyright headers * Added missing docstrings * Fixed typo * Add documentation for pipeline exceptions (neo4j#90) * Fixes and refactors the KG writer component (neo4j#92) * Fixes and refactors the KG writer component * Fixed mypy error * Made start_node_id and end_node_id parameters in UPSERT_RELATIONSHIP_QUERY * Add schema for kg builder (neo4j#88) * Add schema for kg builder and tests * Fixed mypy checks * Reverted kg builder example with schema * Revert to List and Dict due to Python3.8 issue with using get_type_hints * Added properties to Entity and Relation * Add test for missing properties * Fix type annotations in test * Add property types * Refactored entity, relation, and property types * Unused import * Moved schema to components/ (neo4j#96) * Add entity / Relation extraction component (neo4j#85) * Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Entity / Relation extraction component * Adds a Text Splitter (neo4j#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Add tests * Keep it simple: remove deps to jinja for now * Update example with existing components * log config in example * Fix tests * Rm unused import * Add copyright headers * Rm debug code * Try and fix tests * Unused import * get_type_hints is failing for python 3.8/3.9, even when using __future__ annotations => back to the typing.Dict annotation which is compatible with all python versions * Return model is also conditioned to the existence of the run method => should raise an error if run is not implemented? * Log when we do not raise exception to keep track of the failure * Update prompt to match new KGwriter expected type * Fix test * Fix type for `examples` * Use SchemaConfig as input for the ER Extractor component * The "base" EntityRelationExtractor is an ABC that must be subclassed * Make node IDs unique across several runs of the pipeline by prefixing them with a timestamp * Option to build lexical graph in the ERExtractor component * Fix one test * Fix some more tests * Fix some more tests * Remove "type: ignore" comments --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> * Update lock file after merge * Remove pipeline/components folder (again) * Updated component docs (neo4j#99) * Updated component docs * Removed weaviate test update * Updated pipeline user guide with link to components in the API section * Feature/kg builder e2e tests (neo4j#98) * End to end tests for KG builder pipeline * Adding chunk embedder to the pipeline and e2e tests * Fix how the chunk embedding is saved * Fix e2e tests * Fix mypy * mypy stuff :'( * WIP: update e2e tests * Check counts also here * Enable e2e tests on this PR only * Fix e2e tests (was not mocking the correct method for Embedder) * Revert CI to normal * Updated CHANGLOG and set max-parallel: 1 for E2E tests in pr-e2e-tests.yaml --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> Co-authored-by: willtai <william.tai@neo4j.com>

* Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Adds a Text Splitter (neo4j#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Added a TextChunkEmbedder (neo4j#87) * Added a TextChunkEmbedder * Added the copyright header to test_embedder.py * Updated test_text_chunk_embedder_run * Adds a knowledge graph writer (neo4j#83) * Added copyright header to new files * Added copyright header to kg_writer.py * Added __future__ import to kg_writer.py for backwards compatibility of type hints * Added E2E test for Neo4jWriter * Added a copyright header to test_kg_builder_e2e.py * Added upsert_vector test for relationship embeddings * Moved KG writer and its tests * Moved Neo4jGraph and associated objects to a new file * Renamed KG builder fixture * Added unit tests for KG writer * Split upsert_vector into 2 functions * Fixed broken cypher query strings * Removed embedding creation from Neo4jWriter * Fixed setup_neo4j_for_kg_construction fixture * Added KGWriterModel class * Fixed minor mistake in test_weaviate_e2e.py * Renamed kg_construction folder to components * Updated unit tests with new folder structure * Fixed broken import * Fixed copyright headers * Added missing docstrings * Fixed typo * Add documentation for pipeline exceptions (neo4j#90) * Start documentation for KG construction pipeline * Fixes and refactors the KG writer component (neo4j#92) * Fixes and refactors the KG writer component * Fixed mypy error * Made start_node_id and end_node_id parameters in UPSERT_RELATIONSHIP_QUERY * Add schema for kg builder (neo4j#88) * Add schema for kg builder and tests * Fixed mypy checks * Reverted kg builder example with schema * Revert to List and Dict due to Python3.8 issue with using get_type_hints * Added properties to Entity and Relation * Add test for missing properties * Fix type annotations in test * Add property types * Refactored entity, relation, and property types * Unused import * Moved schema to components/ (neo4j#96) * Add entity / Relation extraction component (neo4j#85) * Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Entity / Relation extraction component * Adds a Text Splitter (neo4j#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Add tests * Keep it simple: remove deps to jinja for now * Update example with existing components * log config in example * Fix tests * Rm unused import * Add copyright headers * Rm debug code * Try and fix tests * Unused import * get_type_hints is failing for python 3.8/3.9, even when using __future__ annotations => back to the typing.Dict annotation which is compatible with all python versions * Return model is also conditioned to the existence of the run method => should raise an error if run is not implemented? * Log when we do not raise exception to keep track of the failure * Update prompt to match new KGwriter expected type * Fix test * Fix type for `examples` * Use SchemaConfig as input for the ER Extractor component * The "base" EntityRelationExtractor is an ABC that must be subclassed * Make node IDs unique across several runs of the pipeline by prefixing them with a timestamp * Option to build lexical graph in the ERExtractor component * Fix one test * Fix some more tests * Fix some more tests * Remove "type: ignore" comments --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> * Update lock file after merge * Remove pipeline/components folder (again) * Updated component docs (neo4j#99) * Updated component docs * Removed weaviate test update * Updated pipeline user guide with link to components in the API section * Feature/kg builder e2e tests (neo4j#98) * End to end tests for KG builder pipeline * Adding chunk embedder to the pipeline and e2e tests * Fix how the chunk embedding is saved * Fix e2e tests * Fix mypy * mypy stuff :'( * WIP: update e2e tests * Check counts also here * Enable e2e tests on this PR only * Fix e2e tests (was not mocking the correct method for Embedder) * Revert CI to normal * User guide for KG builder pipeline * Update line length * Review comments 1 * Address review comments - add missing file (image) * Nicer lists --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> Co-authored-by: willtai <william.tai@neo4j.com>

willtai reviewed Jul 23, 2024

View reviewed changes

src/neo4j_genai/indexes.py Outdated Show resolved Hide resolved

willtai reviewed Jul 23, 2024

View reviewed changes

src/neo4j_genai/indexes.py Show resolved Hide resolved

willtai reviewed Jul 25, 2024

View reviewed changes

stellasia reviewed Jul 25, 2024

View reviewed changes

src/neo4j_genai/kg_construction/kg_writer.py Outdated Show resolved Hide resolved

src/neo4j_genai/kg_construction/kg_writer.py Outdated Show resolved Hide resolved

src/neo4j_genai/kg_construction/types.py Outdated Show resolved Hide resolved

willtai reviewed Jul 30, 2024

View reviewed changes

src/neo4j_genai/kg_construction/kg_writer.py Show resolved Hide resolved

willtai reviewed Jul 30, 2024

View reviewed changes

tests/e2e/test_kg_construction_e2e.py Outdated Show resolved Hide resolved

alexthomas93 force-pushed the feature/kg_writer branch from 1a33dfe to 039c964 Compare July 31, 2024 13:36

willtai force-pushed the feature/kg_builder branch from 4171f65 to 83b2f56 Compare July 31, 2024 15:17

alexthomas93 added 19 commits August 1, 2024 09:36

Added copyright header to new files

fc86a84

Added copyright header to kg_writer.py

dc367ec

Added __future__ import to kg_writer.py for backwards compatibility o…

a4e9c3d

…f type hints

Added E2E test for Neo4jWriter

f232af8

Added a copyright header to test_kg_builder_e2e.py

7e9f779

Added upsert_vector test for relationship embeddings

ff944e5

Moved KG writer and its tests

ab9947b

Moved Neo4jGraph and associated objects to a new file

9c30284

Renamed KG builder fixture

5d92dba

Added unit tests for KG writer

d926b26

Split upsert_vector into 2 functions

e324015

Fixed broken cypher query strings

705ab44

Removed embedding creation from Neo4jWriter

f678ba3

Fixed setup_neo4j_for_kg_construction fixture

5aa4722

Added KGWriterModel class

a0086e9

Fixed minor mistake in test_weaviate_e2e.py

d82ba09

Renamed kg_construction folder to components

2b88fef

Updated unit tests with new folder structure

442a4e8

Fixed broken import

78e6d93

alexthomas93 force-pushed the feature/kg_writer branch from 78f86ea to 78e6d93 Compare August 1, 2024 08:38

alexthomas93 added 2 commits August 1, 2024 09:41

Fixed copyright headers

c78d9db

Added missing docstrings

64501e8

Fixed typo

609feaf

stellasia approved these changes Aug 1, 2024

View reviewed changes

alexthomas93 merged commit a688a2a into feature/kg_builder Aug 1, 2024

alexthomas93 deleted the feature/kg_writer branch August 1, 2024 09:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds a knowledge graph writer #83

Adds a knowledge graph writer #83

Uh oh!

alexthomas93 commented Jul 22, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

willtai Jul 25, 2024

Uh oh!

alexthomas93 Jul 25, 2024

Uh oh!

willtai Jul 25, 2024

Uh oh!

alexthomas93 Jul 26, 2024

Uh oh!

willtai Jul 26, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adds a knowledge graph writer #83

Adds a knowledge graph writer #83

Uh oh!

Conversation

alexthomas93 commented Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Complexity

How Has This Been Tested?

Checklist

Uh oh!

Uh oh!

Uh oh!

willtai Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

alexthomas93 Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

willtai Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

alexthomas93 Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

willtai Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexthomas93 commented Jul 22, 2024 •

edited

Loading