fix: skip empty documents before vector embedding#35763
Merged
fatelei merged 4 commits intoMay 4, 2026
Conversation
fatelei
reviewed
May 3, 2026
|
Hi, Vector.add_texts() runs duplicate_check before removing empty documents, causing unnecessary text_exists() calls for documents that will be skipped anyway. Severity: remediation recommended | Category: performance How to fix: Filter empty before duplicate_check Agent prompt to fix - you can give this to your LLM of choice:
Found by Qodo code review |
fatelei
approved these changes
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
Fixes #<issue number>.Summary
Fixes #35737.
This adds a defensive filter before vector embedding so blank text chunks are skipped instead of being sent to the embedding provider. It covers both
Vector.create()andVector.add_texts()so malformed chunker output cannot createMissingParameter: input[n].textfailures during indexing.Added unit coverage for mixed non-empty/empty inputs and all-empty inputs.
From Codex
Screenshots
Checklist
make lint && make type-check(backend) andcd web && pnpm exec vp staged(frontend) to appease the lint godsValidation run:
git diff --checkpython3 -m py_compile core/rag/datasource/vdb/vector_factory.py tests/unit_tests/core/rag/datasource/vdb/test_vector_factory.pypython3 -m ruff check core/rag/datasource/vdb/vector_factory.py tests/unit_tests/core/rag/datasource/vdb/test_vector_factory.pyTargeted pytest was attempted but the local environment could not finish dependency setup because the runner disk is at 99% and
uvfailed extractingmysql-connector-pythonwithNo space left on device; running with system Python then failed on missinggraphondependency.