Skip to content

Improve relevance scoring with path-based signals #16

@MarkSpectarium

Description

@MarkSpectarium

Context

Part of #13 (MCP server improvements). The current relevance scoring in find_relevant_files is purely keyword-based. Files in SDK paths score high just because they contain common terms like "Player". The scoring should consider file paths as relevance signals.

Current State

  • src/query/context.ts:71-99 scoreFiles() scores based on symbol/member name matches only
  • src/query/context.ts:101-138 scoreSymbol() scores symbol names, namespaces, members
  • No path-based scoring exists
  • SDK files like MetaplaySDK/PlayerMessagesInternalCore.cs rank highly for "player" queries

Deliverables

1. Add path-based scoring in src/query/context.ts

Add SDK path penalty patterns:

const SDK_PATH_PATTERNS = [
  'SDK', 'Packages/', 'ThirdParty/', 'Plugins/', 'External/',
  'Vendor/', 'Dependencies/', 'node_modules/'
];

function getPathScore(relativePath: string, tokens: string[]): { score: number; reasons: string[] } {
  let score = 0;
  const reasons: string[] = [];
  const pathLower = relativePath.toLowerCase();
  
  // Penalize SDK/vendor paths
  for (const pattern of SDK_PATH_PATTERNS) {
    if (pathLower.includes(pattern.toLowerCase())) {
      score -= 10;
      reasons.push(`SDK/vendor path penalty`);
      break;
    }
  }
  
  // Boost paths that match task tokens
  for (const token of tokens) {
    if (pathLower.includes(token)) {
      score += 3;
      reasons.push(`Path contains "${token}"`);
    }
  }
  
  return { score, reasons };
}

2. Integrate path scoring into scoreFiles()

Update the scoring loop:

function scoreFiles(allFileSymbols: FileSymbols[], tokens: string[]): ScoredFile[] {
  const scored: ScoredFile[] = [];

  for (const fileSymbols of allFileSymbols) {
    let score = 0;
    const matchedSymbols: string[] = [];
    const relevance: string[] = [];

    // Add path-based scoring
    const pathScore = getPathScore(fileSymbols.relativePath, tokens);
    score += pathScore.score;
    relevance.push(...pathScore.reasons);

    // Existing symbol scoring...
    for (const symbol of fileSymbols.symbols) {
      const symbolScore = scoreSymbol(symbol, tokens);
      if (symbolScore.score > 0) {
        score += symbolScore.score;
        matchedSymbols.push(symbol.name);
        relevance.push(...symbolScore.reasons);
      }
    }

    // Only include if net positive score
    if (score > 0) {
      scored.push({ fileSymbols, score, matchedSymbols, relevance: [...new Set(relevance)] });
    }
  }

  return scored.sort((a, b) => b.score - a.score);
}

Technical Constraints

  • Path penalties should be significant enough to demote SDK files but not completely exclude them
  • Path matching should be case-insensitive
  • Existing scoring weights should remain unchanged for backward compatibility
  • Files with only SDK penalty and no symbol matches should be excluded (score <= 0)

Success Criteria

  • SDK/vendor path files score lower than game code files
  • Files with path segments matching task keywords get boosted
  • find_relevant_files "player death" returns game code before SDK infrastructure
  • pnpm typecheck passes
  • pnpm build succeeds

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions