Skip to content

Proposal: Use Bucketed Symbol Index for Fast "Find All References" #19951

Closed
@teraoka-k

Description

@teraoka-k

Motivation

Currently, rust-analyzer’s "Find All References" can become slow on large projects due to scanning many files and symbols repeatedly. I propose a bucketed symbol-to-files index to dramatically speed up reference searches.

Algorithm Overview

Initialization (Run once on analyzer startup)

  1. Traverse all files (N files) and collect a unique set of symbols.
    • Complexity: O(N) assuming bounded symbols per file.
  2. Traverse all files again to create a map from symbol to the files that contain it (M symbols).
    • Data structure: HashMap<SymbolID, HashSet>
    • Complexity: O(M)

Since these two steps run sequentially, total cost is O(max(N, M)) — linear time, which is efficient for large codebases.

Finding References (find_references_by_bucket)

When a user requests all references for symbol S:

  1. Lookup the list of files from the bucket index in O(1) time.
  2. Search for references to S only in these files, rather than the whole project.
  3. Return the found references.

This search takes O(c) or at worst O(cN), where c is the cost to search a single file. This results in a very fast lookup for most cases, and remains performant even in worst-case scenarios.

Handling New or Modified Files

To maintain correctness incrementally:

  1. On new or edited files, track their paths in a Vec (or similar structure).
  2. When running find_references_by_bucket, include these new/modified files in the search along with the precomputed bucket files for the symbol.

This approach avoids the need for full reindexing on every file change and keeps the system responsive.

Benefits

  • Significant speedup for "Find All References" due to drastically reduced search space (O(1) in most cases and O(N) in the worst case).
  • Linear-time indexing on startup is practical for large projects.
  • Incremental updates keep the index accurate without full reprocessing.
  • Simple data structures and algorithms make it feasible to implement and maintain.

Next Steps

I would like feedback on this approach and am happy to work on a prototype implementation if the maintainers find this direction promising.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-featureCategory: feature request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions