Skip to content

[BUG] episode_mentions_reranker sorts nodes ascending (fewest mentions first) #1342

@jake158

Description

@jake158

Bug Description

episode_mentions_reranker in graphiti_core/search/search_utils.py sorts nodes in ascending order of mention count, so nodes mentioned in the fewest episodes rank highest. This is the opposite of what the reranker name implies and what the equivalent edge reranker does.

Steps to Reproduce

Provide a minimal code example that reproduces the issue:

from collections import defaultdict                                                        

def rrf(results, rank_const=1, min_score=0):
    scores = defaultdict(float)
    for result in results:
        for i, uuid in enumerate(result):
            scores[uuid] += 1 / (i + rank_const)                                                        
    sorted_uuids = sorted(scores.keys(), key=lambda u: scores[u], reverse=True)
    return (                                                                                            
        [u for u in sorted_uuids if scores[u] >= min_score],                                          
        [scores[u] for u in sorted_uuids if scores[u] >= min_score],                                    
    )                                                                                                 

# Simulate episode_mentions_reranker with mocked DB counts                                              
def simulate_reranker(node_uuid_lists, db_mention_counts):
    sorted_uuids, _ = rrf(node_uuid_lists)                                                              
    scores = {}                                                                                         
    for uuid in sorted_uuids:                                                                           
        if uuid in db_mention_counts:                                                                   
            scores[uuid] = db_mention_counts[uuid]                                                    
    for uuid in sorted_uuids:                                                                           
        if uuid not in scores:                                                                          
            scores[uuid] = float('inf')                                                                 
    sorted_uuids.sort(key=lambda u: scores[u])  # line 1894 — ascending                                 
    return [(u, scores[u]) for u in sorted_uuids]                                                       
                                                                                                          
result = simulate_reranker(                                                                             
    [["node-A", "node-B", "node-C"]],                                                                   
    {"node-A": 20, "node-B": 5, "node-C": 1},                                                           
)                                                                                                       
for rank, (uuid, score) in enumerate(result, 1):                                                        
    print(f"rank {rank}: {uuid}  mentions={score}")

Expected Behavior

rank 1: node-A  mentions=20                                                                           
rank 2: node-B  mentions=5                                                                              
rank 3: node-C  mentions=1

Actual Behavior

rank 1: node-C  mentions=1                                                                           
rank 2: node-B  mentions=5                                                                           
rank 3: node-A  mentions=20

Environment

  • Graphiti Version: current main
  • Python Version: any
  • Operating System: any
  • Database Backend: any
  • LLM Provider & Model: any

Installation Method

  • Development installation (git clone)

Error Messages/Traceback

No exception is raised. The output is silently wrong.

Additional Context

This is a copy-paste bug from node_distance_reranker. A stray comment on line 1875 is the tell:

async def episode_mentions_reranker(...):

# Find the shortest path to center node   <- copy-pasted from node_distance_reranker                 
sorted_uuids.sort(key=lambda cur_uuid: scores[cur_uuid])   

Ascending sort is correct in node_distance_reranker because the score is a graph distance (smaller =
closer = better, then inverted before returning). It is wrong here because the score is a raw mention
count (larger = more prominent = better).

The edge episode_mentions path in search.py shows the correct intent - it explicitly adds reverse=True
after sorting:

# search.py:304-305                                                                                   
if config.reranker == EdgeReranker.episode_mentions:                                                    
    reranked_edges.sort(reverse=True, key=lambda edge: len(edge.episodes))

The node path has no equivalent correction at the call site in node_search:

# search.py:401-417
elif config.reranker == NodeReranker.episode_mentions:
    reranked_uuids, node_scores = await episode_mentions_reranker(
        driver, search_result_uuids, min_score=reranker_min_score
    )
elif config.reranker == NodeReranker.node_distance:
    if center_node_uuid is None:
        raise SearchRerankerError('No center node provided for Node Distance reranker')
    reranked_uuids, node_scores = await node_distance_reranker(
        driver,
        rrf(search_result_uuids, min_score=reranker_min_score)[0],
        center_node_uuid,
        min_score=reranker_min_score,
    )

reranked_nodes = [node_uuid_map[uuid] for uuid in reranked_uuids]

return reranked_nodes[:limit], node_scores[:limit]

The existing test does not catch this (tests/utils/search/search_utils_test.py::test_episode_mentions_reranker) because it only compares a node with 1 mention against a node with 0 mentions. The zero-mention node is absent from the DB result and receives the sentinel float('inf'). Under ascending sort 1 < inf, so the mentioned node ranks first by accident. The bug only manifests when two nodes both have positive mention counts.

The float('inf') sentinel interacts with the fix. A pure reverse=True patch moves zero-mention nodes
(inf) to rank first. The correct fix changes both the sentinel and the sort direction.

  • Affects: core library, NodeReranker.episode_mentions search config
  • Occurs consistently on any graph where multiple nodes each appear in at least one episode

Possible Solution

# search_utils.py:1889-1898
for uuid in sorted_uuids:                                                                             
    if uuid not in scores:
        scores[uuid] = 0  # was float('inf') - keeps zero-mention nodes at the bottom after fix
                                                                                                        
sorted_uuids.sort(key=lambda cur_uuid: scores[cur_uuid], reverse=True)  # was ascending                 
                                                                                                        
return [uuid for uuid in sorted_uuids if scores[uuid] >= min_score], [                                  
    scores[uuid] for uuid in sorted_uuids if scores[uuid] >= min_score                                
]

The test should also be extended with a second episodic node and edge so that both entity nodes have a
positive mention count, then assert the more-mentioned node ranks first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions