Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document: Explicitly check that pageContents is not an empty string. #2836

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

skarard
Copy link
Contributor

@skarard skarard commented Oct 8, 2023

This PR is related to a discussion in #2807, the current behaviour of the Document constructor field.pageContent when given an empty string is to set pageContent as undefined. This broke HNSWLib.addDocuments() as it attempts a string.replace() on undefined rather than the typed string expected.

I suggested allowing pageContents to be set as an empty string. @jacoblee93 suggested that a Document with an empty pageContents should not be created at all.

This implementation requires pageContents to be a string that is not empty, otherwise it throws Error("pageContents must not be empty").

@vercel
Copy link

vercel bot commented Oct 8, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ❌ Failed (Inspect) Dec 11, 2023 6:01pm
langchainjs-docs ✅ Ready (Inspect) Visit Preview Dec 11, 2023 6:01pm

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Oct 8, 2023
@skarard skarard changed the title Explicitly check that pageContents is not an empty string. Document: Explicitly check that pageContents is not an empty string. Oct 8, 2023
@jacoblee93
Copy link
Collaborator

Ideally we shouldn't need this with TypeScript, but an empty string should be permissible, just not a hard undefined, null, or non-string value - can you change?

@skarard
Copy link
Contributor Author

skarard commented Oct 9, 2023 via email

@skarard
Copy link
Contributor Author

skarard commented Oct 26, 2023

@jacoblee93 I added the proposed change and threw in a couple of unit tests for good measure.

Copy link
Collaborator

@bracesproul bracesproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @skarard sorry for taking so long to review, these changes look good! We've moved document.ts into langchain-core/src/documents/document.ts, could you push up a change moving your changes into that updated file then re-request my review?

Thanks again!

…cument-constructor-throw-on-empty-pageContent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants