Skip to content

Allow external_id to be set during ingestion #127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ian
Copy link
Contributor

@ian ian commented May 2, 2025

We have a need for our local DB's document IDs to match those in Morphik.

This allows for a metadata external_id property to be set which overrides the default document UUID generator.

Copy link

prophet-code-review-bot bot commented May 2, 2025

Bug Report

Name Severity Example test case Description
Inconsistent Handling of external_id after pop High Upload a file with external_id in metadata and verify it is not present in the document's metadata but is present as a separate field. Check if the system relies on external_id being present in both places. The code uses metadata_dict.pop("external_id", None) which removes the external_id from the metadata_dict. This might be incorrect if the system relies on the external_id being present in both the dedicated external_id field and within the metadata dictionary.
Missing Handling of Empty Metadata Low Upload a file with only external_id in the metadata. If the metadata_dict is empty after popping the external_id, the code might behave unexpectedly in downstream processes that rely on the metadata.

Comments? Email us.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mrge reviewed 1 file and found no issues. Review this PR in mrge.io.

Copy link

prophet-code-review-bot bot commented May 2, 2025

Bug Report

Name: external_id parameter in ingest_text function call incorrect.
Severity: High
Example test case: Call the /ingest_text endpoint, providing a value for the external_id parameter. The document created will not have that external_id.
Description: In core/services/document_service.py, the ingest_text function definition includes the external_id parameter, however, when calling the Document constructor, the external_id argument is being passed in incorrectly. external_id=external_id if external_id else None, does not pass the value, so it is being passed as a boolean expression instead of the value of external_id. This means the external_id is never set.

Comments? Email us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant