-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[BUG] episode_mentions_reranker sorts nodes ascending (fewest mentions first) #1342
Description
Bug Description
episode_mentions_reranker in graphiti_core/search/search_utils.py sorts nodes in ascending order of mention count, so nodes mentioned in the fewest episodes rank highest. This is the opposite of what the reranker name implies and what the equivalent edge reranker does.
Steps to Reproduce
Provide a minimal code example that reproduces the issue:
from collections import defaultdict
def rrf(results, rank_const=1, min_score=0):
scores = defaultdict(float)
for result in results:
for i, uuid in enumerate(result):
scores[uuid] += 1 / (i + rank_const)
sorted_uuids = sorted(scores.keys(), key=lambda u: scores[u], reverse=True)
return (
[u for u in sorted_uuids if scores[u] >= min_score],
[scores[u] for u in sorted_uuids if scores[u] >= min_score],
)
# Simulate episode_mentions_reranker with mocked DB counts
def simulate_reranker(node_uuid_lists, db_mention_counts):
sorted_uuids, _ = rrf(node_uuid_lists)
scores = {}
for uuid in sorted_uuids:
if uuid in db_mention_counts:
scores[uuid] = db_mention_counts[uuid]
for uuid in sorted_uuids:
if uuid not in scores:
scores[uuid] = float('inf')
sorted_uuids.sort(key=lambda u: scores[u]) # line 1894 — ascending
return [(u, scores[u]) for u in sorted_uuids]
result = simulate_reranker(
[["node-A", "node-B", "node-C"]],
{"node-A": 20, "node-B": 5, "node-C": 1},
)
for rank, (uuid, score) in enumerate(result, 1):
print(f"rank {rank}: {uuid} mentions={score}")Expected Behavior
rank 1: node-A mentions=20
rank 2: node-B mentions=5
rank 3: node-C mentions=1
Actual Behavior
rank 1: node-C mentions=1
rank 2: node-B mentions=5
rank 3: node-A mentions=20
Environment
- Graphiti Version: current
main - Python Version: any
- Operating System: any
- Database Backend: any
- LLM Provider & Model: any
Installation Method
- Development installation (git clone)
Error Messages/Traceback
No exception is raised. The output is silently wrong.
Additional Context
This is a copy-paste bug from node_distance_reranker. A stray comment on line 1875 is the tell:
async def episode_mentions_reranker(...):
# Find the shortest path to center node <- copy-pasted from node_distance_reranker
sorted_uuids.sort(key=lambda cur_uuid: scores[cur_uuid]) Ascending sort is correct in node_distance_reranker because the score is a graph distance (smaller =
closer = better, then inverted before returning). It is wrong here because the score is a raw mention
count (larger = more prominent = better).
The edge episode_mentions path in search.py shows the correct intent - it explicitly adds reverse=True
after sorting:
# search.py:304-305
if config.reranker == EdgeReranker.episode_mentions:
reranked_edges.sort(reverse=True, key=lambda edge: len(edge.episodes))The node path has no equivalent correction at the call site in node_search:
# search.py:401-417
elif config.reranker == NodeReranker.episode_mentions:
reranked_uuids, node_scores = await episode_mentions_reranker(
driver, search_result_uuids, min_score=reranker_min_score
)
elif config.reranker == NodeReranker.node_distance:
if center_node_uuid is None:
raise SearchRerankerError('No center node provided for Node Distance reranker')
reranked_uuids, node_scores = await node_distance_reranker(
driver,
rrf(search_result_uuids, min_score=reranker_min_score)[0],
center_node_uuid,
min_score=reranker_min_score,
)
reranked_nodes = [node_uuid_map[uuid] for uuid in reranked_uuids]
return reranked_nodes[:limit], node_scores[:limit]The existing test does not catch this (tests/utils/search/search_utils_test.py::test_episode_mentions_reranker) because it only compares a node with 1 mention against a node with 0 mentions. The zero-mention node is absent from the DB result and receives the sentinel float('inf'). Under ascending sort 1 < inf, so the mentioned node ranks first by accident. The bug only manifests when two nodes both have positive mention counts.
The float('inf') sentinel interacts with the fix. A pure reverse=True patch moves zero-mention nodes
(inf) to rank first. The correct fix changes both the sentinel and the sort direction.
- Affects: core library, NodeReranker.episode_mentions search config
- Occurs consistently on any graph where multiple nodes each appear in at least one episode
Possible Solution
# search_utils.py:1889-1898
for uuid in sorted_uuids:
if uuid not in scores:
scores[uuid] = 0 # was float('inf') - keeps zero-mention nodes at the bottom after fix
sorted_uuids.sort(key=lambda cur_uuid: scores[cur_uuid], reverse=True) # was ascending
return [uuid for uuid in sorted_uuids if scores[uuid] >= min_score], [
scores[uuid] for uuid in sorted_uuids if scores[uuid] >= min_score
]The test should also be extended with a second episodic node and edge so that both entity nodes have a
positive mention count, then assert the more-mentioned node ranks first.