improvement(knowledge): remove innerJoin and add id identifiers to results, updated docs by waleedlatif1 · Pull Request #1170 · simstudioai/sim

waleedlatif1 · 2025-08-28T23:40:42Z

Summary

split document name retrieval out of vector search and removed innerJoin, costly DB operation that causes us to transform the document record for every query over the embeddings table

Type of Change

Other: Performance

Testing

Tested manually.

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…sults, updated docs

vercel · 2025-08-28T23:40:48Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
sim	Ready	Preview	Comment	Aug 28, 2025 11:45pm

1 Skipped Deployment

Project	Deployment	Preview	Comments	Updated (UTC)
docs	Skipped			Aug 28, 2025 11:45pm

greptile-apps

Greptile Summary

This PR implements a significant performance optimization for the knowledge base search functionality by separating document name retrieval from vector search operations. The key architectural change removes expensive innerJoin operations between the embeddings and documents tables, which was causing the system to transform document records for every query over the embeddings table.

The optimization works by splitting the search process into two phases: first performing the vector/tag search to get relevant chunks, then separately fetching document names for only the returned results using a new getDocumentNamesByIds utility function. This approach trades a small amount of additional complexity for significant performance gains, especially important for large knowledge bases.

Additional improvements include standardizing field naming conventions throughout the knowledge base API (id → chunkId, id/name → documentId/documentName) and updating response structures to be more consistent. The changes span multiple files including the core search utilities, API routes, tool configurations, type definitions, and corresponding tests.

The refactoring maintains the same external API contract while fundamentally improving the underlying query efficiency. Database queries now focus on embeddings data without expensive joins, and document metadata is retrieved through optimized batch lookups with deduplication.

Confidence score: 4/5

This PR is safe to merge with good confidence as it maintains API compatibility while improving performance
Score reflects well-structured performance optimization with proper test coverage and consistent type updates
Pay close attention to apps/sim/app/api/knowledge/search/utils.ts for the core database query changes and ensure the new getDocumentNamesByIds function handles edge cases properly

_{8 files reviewed, no comments}

_{Edit Code Review Bot Settings | Greptile}

…sults, updated docs (#1170) * improvement(knowledge): remove innerJoin and add id identifiers to results, updated docs * cleanup * add documentName to upload chunk op as well

…sults, updated docs (simstudioai#1170) * improvement(knowledge): remove innerJoin and add id identifiers to results, updated docs * cleanup * add documentName to upload chunk op as well

waleedlatif1 added 3 commits August 28, 2025 16:27

improvement(knowledge): remove innerJoin and add id identifiers to re…

8524a4d

…sults, updated docs

cleanup

ab9421c

add documentName to upload chunk op as well

4c1c1c0

greptile-apps bot reviewed Aug 28, 2025

View reviewed changes

vercel bot deployed to Preview – sim August 28, 2025 23:45 View deployment

waleedlatif1 merged commit fcf128f into staging Aug 29, 2025
5 checks passed

waleedlatif1 mentioned this pull request Aug 29, 2025

v0.3.40: copilot improvements, knowledgebase improvements, security improvements, billing fixes #1158

Merged

waleedlatif1 deleted the improvement/kb-search branch August 29, 2025 05:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvement(knowledge): remove innerJoin and add id identifiers to results, updated docs#1170

improvement(knowledge): remove innerJoin and add id identifiers to results, updated docs#1170
waleedlatif1 merged 3 commits intostagingfrom
improvement/kb-search

waleedlatif1 commented Aug 28, 2025

Uh oh!

vercel bot commented Aug 28, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

waleedlatif1 commented Aug 28, 2025

Summary

Type of Change

Testing

Checklist

Uh oh!

vercel bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Aug 28, 2025 •

edited

Loading