Skip to content

Conversation

@dfordp
Copy link

@dfordp dfordp commented Oct 25, 2025

🧠 Summary

Closes #282

This PR upgrades the documentation search experience by introducing a hybrid ranking system that combines Fuse.js fuzzy search (local metadata-based) with backend embedding similarity scores (semantic relevance).
The result is a much more accurate and natural search experience, balancing instant responsiveness with deeper semantic understanding.


✨ What’s Changed

  • Fuse.js configuration tuned

    • Looser threshold (0.25) and ignoreLocation: true for flexible word order
    • Added support for excerpt field to provide lightweight context
    • Enabled includeScore for hybrid ranking
  • Backend integration upgraded

    • Backend /searchDocUtil now returns both fileKey and similarityScore (cosine similarity)
    • Frontend uses these scores to rank results more intelligently
  • Hybrid ranking system added

    • Merges Fuse.js and embedding results using a weighted blend (0.7 * backend + 0.3 * fuse)
    • Deduplicates and sorts final results by combined relevance score
  • Improved user experience

    • Faster initial results from Fuse.js, enhanced later by semantic backend results
    • More accurate top matches, even with word order or phrasing variations
    • Debug info (rank indicators) added for easier tuning

🔍 Why This Matters

  • Previous search relied heavily on metadata-only matching (title, description, keywords).
  • Word order and token matching limited recall, and semantic meaning wasn’t considered.
  • The new system integrates vector-based semantic similarity to better understand user intent while preserving fast local results.
  • No external dependencies (like Algolia) — still fully self-hosted and open.

🧪 Testing Instructions

  1. Run the website locally (pnpm dev or yarn dev).

  2. Open the search modal and try queries such as:

    • “authentication guide”
    • “get started with API”
    • “token expiration issue”
  3. Observe that:

    • Results appear instantly (Fuse.js).
    • After a short delay, more relevant results rise to the top (backend embeddings).
    • Rankings feel semantically aware (not just keyword-based).

🧩 Affected Areas

  • apps/website/src/components/DocPage/Search/SearchView.tsx
  • apps/backend/src/controllers/search.controller.ts
  • @utils/AI/askDocQuestion/askDocQuestion (used indirectly)

🚀 Future Improvements

  • Cache embedding results for frequent queries
  • Experiment with different weight ratios (e.g., 0.6 / 0.4)
  • Add a contributor debug panel to visualize Fuse vs embedding scores

@xenophobic-xenomorph-ts0o88484
Copy link

xenophobic-xenomorph-ts0o88484 bot commented Oct 26, 2025

The preview deployment failed. 🔴

Open Build Logs

Last updated at: 2025-10-26 23:19:48 CET

@aymericzip
Copy link
Owner

build not pass

@intlayer/backend:build:ci: src/controllers/search.controller.ts(33,36): error TS2339: Property 'similarity' does not exist on type 'VectorStoreEl'.

Copy link
Owner

@aymericzip aymericzip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code good to me, can merge once the build pass

const fuseScore = normalizeFuse(fuseItem.score);
const backendScore = backendMap.get(doc.docKey);
const combinedScore = backendScore
? 0.7 * backendScore + 0.3 * fuseScore
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid magic numbers

const backendResultUrls = new Set(
backendResults.map((doc) => doc.docKey)
const backendDocsWithScore =
(searchDocData?.data ?? []).map((d: any) => ({
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore abbreviations to plain variable

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid any

{results.map((result, i) => (
<li key={result.url}>
<SearchResultItem doc={result} onClickLink={onClickLink} />
<p className="text-gray-400 text-xs">Rank #{i + 1}</p>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug only
to remove

allDocs: DocMetadata[]
): DocMetadata[] {
const normalizeFuse = (score?: number) => 1 - Math.min((score ?? 1) / 0.5, 1); // invert Fuse score
const normalizeBackend = (score: number) => Math.min(score / 1.0, 1); // already cosine-like
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to divide by 1

@dfordp
Copy link
Author

dfordp commented Oct 27, 2025

fixed the build errors and code review requests

Copy link
Owner

@aymericzip aymericzip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to me

@aymericzip
Copy link
Owner

image

Hey @dfordp , thanks it works well! It's much more precise to search a doc using fuse

however, as showed in the screenshot, when fuse.js do not retrieve result, the backend result are not showed anymore
It display no result, but should show the docs send from the backend

@dfordp
Copy link
Author

dfordp commented Oct 31, 2025

Hey @aymericzip, good catch
I’ve added a fallback so that when Fuse returns no matches, we now display backend-only semantic results instead.
This ensures relevant docs from embeddings still show up even if there’s no local fuzzy match.

Thanks for spotting this edge case 🙌

@dfordp dfordp reopened this Oct 31, 2025
Copy link
Author

@dfordp dfordp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel the changes are all as per the requirement

@aymericzip
Copy link
Owner

image

{results.map((result) => (
<li key={result.url}>
<SearchResultItem doc={result} onClickLink={onClickLink} />
{results.map((r) => (
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no abreviation

...r,
similarity:
selection.find(
(s) => s.fileKey === r.fileKey && s.chunkNumber === r.chunkNumber
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no abreviation


type BackendDocResult = { fileKey: string; similarityScore: number };

function mergeHybridResults(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer arrow function

const inputRef = useRef<HTMLInputElement>(null);
const searchQueryParam = useSearchParams().get('search');
const [results, setResults] = useState<DocMetadata[]>([]);
const [currentQuery, setCurrentQuery] = useState<string | null>(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not necessary

use search

@aymericzip
Copy link
Owner

Hey @dfordp

I gave another review,

Please make sure the flow has been tested and works for next submission

@aymericzip aymericzip marked this pull request as draft October 31, 2025 11:47
@dfordp
Copy link
Author

dfordp commented Nov 1, 2025

gwg

this is an update on the same

also included the workspace changes

@dfordp dfordp marked this pull request as ready for review November 1, 2025 10:09
Copy link
Author

@dfordp dfordp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this fixes your required changes

@dfordp dfordp reopened this Nov 1, 2025
@xenophobic-xenomorph-ts0o88484
Copy link

xenophobic-xenomorph-ts0o88484 bot commented Nov 1, 2025

The preview deployment failed. 🔴

Open Build Logs

Last updated at: 2025-11-04 01:38:52 CET

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Documentation Search System

2 participants