Background
The current HybridContentRetriever.reciprocalRankFusion() applies a uniform RRF formula across all retrieved content:
score(doc) = Σ 1 / (k + rank_i)
This treats results from all knowledge sources equally, regardless of their type (local_folders, local_files, urls) or any source-specific signals such as recency, trust level, or proximity to the query context.
Problem
KnowledgeSourceConfig already models distinct source types via its sealed subclasses:
- LocalFoldersKnowledgeSourceConfig — watched directories (potentially stale if indexing is delayed)
- LocalFilesKnowledgeSourceConfig — individual static files
- UrlKnowledgeSourceConfig — crawled web pages (configurable depth/page count)
However, reciprocalRankFusion() in HybridContentRetriever.kt discards source metadata entirely. All Content objects are keyed only by their text segment, and RRF scores are computed without distinguishing which source type produced them.
Proposed Improvement
Introduce source-aware score boosting into the RRF algorithm. Concretely:
- Attach the originating KnowledgeSourceConfig type (or a normalized source weight) to each Content result at retrieval time (e.g., via TextSegment metadata).
- In reciprocalRankFusion(), apply a configurable multiplier per source type:
adjusted_score = rrf_score * source_weight(sourceType)
Allow source weights to be defined in AppConfig.rag (e.g., sourceWeights: { local_files: 1.2, local_folders: 1.0, urls: 0.8 }).
Background
The current HybridContentRetriever.reciprocalRankFusion() applies a uniform RRF formula across all retrieved content:
This treats results from all knowledge sources equally, regardless of their type (local_folders, local_files, urls) or any source-specific signals such as recency, trust level, or proximity to the query context.
Problem
KnowledgeSourceConfig already models distinct source types via its sealed subclasses:
However, reciprocalRankFusion() in HybridContentRetriever.kt discards source metadata entirely. All Content objects are keyed only by their text segment, and RRF scores are computed without distinguishing which source type produced them.
Proposed Improvement
Introduce source-aware score boosting into the RRF algorithm. Concretely:
Allow source weights to be defined in AppConfig.rag (e.g., sourceWeights: { local_files: 1.2, local_folders: 1.0, urls: 0.8 }).