Fast Code Search via BM25
Shebe is a fast and simple local code-search tool powered by BM25. No embeddings, No GPU, No cloud.
Research shows 70-85% of developer code search value comes from keyword-based queries. Developers search with exact terms they know: function names, API calls, error messages. BM25 excels at this.
Trade-offs:
- Repositories must be cloned locally before indexing (no remote URL support)
- No semantic similarity: "login" does not match "authenticate". However, BM25
supports multi-term queries without performance degradation - agents quickly
learn to include synonyms (e.g.,
login OR authenticate OR sign-in). For true semantic search, pair with vector tools. See detailed analysis.
Capabilities:
- 2ms query latency
- 2k-12k files/sec indexing (6k files in 0.5s)
- 200-700 tokens/query
- Full UTF-8 support (emoji, CJK, special characters)
- 14 MCP tools for coding agents (claude, codex etc) (reference)
Size:
- ~10k lines of Rust source code (and another ~10k LoC test code).
- 2 binaries (cli and mcp) each at ~8MB.
Positioning: Complements structural tools (Serena MCP) with content search. Coding agents learn tool selection quickly:
- grep/ripgrep - Exact regex patterns, exhaustive matches, small codebases
- Shebe - Ranked results, large codebases (1k+ files), polyglot search, boolean queries
- Serena - Symbol refactoring, AST-aware edits, type-safe renaming
Alternatives: Cloud solutions like turbopuffer and nia come at a premium. Shebe is a free, local-only alternative. See WHY_SHEBE.md for benchmarks.
- Quick Start
- Common Tasks
- Configuration
- Documentation
- Performance
- Architecture
- Troubleshooting
- Project Status
- License
- Contributing
# Download the latest release
export SHEBE_VERSION=0.5.6-rc3
curl -LO "https://gitlab.com/api/v4/projects/75748935/packages/generic/shebe/${SHEBE_VERSION}/shebe-v${SHEBE_VERSION}-linux-x86_64.tar.gz"
curl -LO "https://gitlab.com/api/v4/projects/75748935/packages/generic/shebe/${SHEBE_VERSION}/shebe-v${SHEBE_VERSION}-linux-x86_64.tar.gz.sha256"
# Verify checksum
sha256sum -c shebe-v${SHEBE_VERSION}-linux-x86_64.tar.gz.sha256
# Extract and install
tar -xzf shebe-v${SHEBE_VERSION}-linux-x86_64.tar.gz
sudo mv shebe shebe-mcp /usr/local/bin/
# check version
shebe --version# Clone a test repository
git clone --depth 1 https://github.com/envoyproxy/envoy.git ~/envoy
# Index it (creates session "envoy-v1")
shebe index-repository ~/envoy envoy-v1
# Output: Indexed 8,234 files (12,847 chunks) in 2.1s# Search for access log formatting
shebe search-code envoy-v1 "accesslog format"Results for "accesslog format" in envoy-v1 (top 10):
1. source/extensions/access_loggers/common/access_log_base.h [0.847]
class AccessLogBase : public AccessLog::Instance {
void formatAccessLog(...);
2. source/common/formatter/substitution_formatter.cc [0.823]
SubstitutionFormatter::format(const StreamInfo& info) {
# Find all references to SubstitutionFormatter
shebe find-references envoy-v1 SubstitutionFormatter --symbol-type typeReferences to "SubstitutionFormatter" (type) - 23 found:
HIGH CONFIDENCE (18):
source/common/formatter/substitution_formatter.h:45
class SubstitutionFormatter : public Formatter {
source/extensions/access_loggers/file/file_access_log.cc:28
std::unique_ptr<SubstitutionFormatter> formatter_;
...
For detailed setup, see INSTALLATION.md.
Quick links to accomplish specific goals:
| Task | Tool | Guide |
|---|---|---|
| Rename a symbol safely | find_references |
Reference |
| Search polyglot codebase | search_code |
Reference |
| Explore unfamiliar repo | index_repository + search_code |
Quick Start |
| Find files by pattern | find_file |
Reference |
| View file with context | read_file or preview_chunk |
Reference |
| Update stale index | reindex_session |
Reference |
| Variable | Default | Description |
|---|---|---|
SHEBE_INDEX_DIR |
~/.local/state/shebe |
Session storage location |
SHEBE_CHUNK_SIZE |
512 |
Characters per chunk (100-2000) |
SHEBE_OVERLAP |
64 |
Overlap between chunks |
SHEBE_DEFAULT_K |
10 |
Default search results count |
SHEBE_MAX_K |
100 |
Maximum search results allowed |
Create shebe.toml in your working directory or ~/.config/shebe/shebe.toml:
[indexing]
chunk_size = 512
overlap = 64
max_file_size = 10485760 # 10MB
[search]
default_k = 10
max_k = 100See CONFIGURATION.md for complete reference.
- INSTALLATION.md - Installation and setup guide
- Quick Start Guide - 5-minute setup for Claude Code
- MCP Tools Reference - Complete API for all 14 tools
- CONFIGURATION.md - All configuration options
- Performance Benchmarks - Detailed performance data
- ARCHITECTURE.md - Developer guide (where/how to change code)
- CONTRIBUTING.md - How to contribute
- CODE_OF_CONDUCT.md - Community guidelines
- SECURITY.md - Security policy and reporting
Validated on Istio (5,605 files, Go-heavy) and OpenEMR (6,364 files, PHP polyglot):
| Metric | Result |
|---|---|
| Query latency | 2ms (consistent across all query types) |
| Indexing (Istio) | 11,210 files/sec (0.5s for 5,605 files) |
| Indexing (OpenEMR) | 1,928 files/sec (3.3s for 6,364 files) |
| Token usage | 210-650 tokens/query |
| Polyglot coverage | 11 file types in single query |
See docs/Performance.md for detailed benchmarks.
See ARCHITECTURE.md for developer guide.
| Issue | Cause | Solution |
|---|---|---|
| "Session not found" | Session doesn't exist or typo | Run list_sessions to see available sessions |
| "Schema version mismatch" | Session from older Shebe version | Run upgrade_session to migrate |
| Slow indexing | Disk I/O or large files | Exclude node_modules/, target/, check disk |
| No search results | Empty session or wrong query | Verify with get_session_info, check query syntax |
| "File not found" in read_file | File deleted since indexing | Run reindex_session to update |
| High token usage | Too many results | Reduce k parameter (default: 10) |
For detailed troubleshooting, see docs/guides/mcp-setup-guide.md.
Version: v0.5.X
Status: Release Candidate
Testing: 76% coverage
Next: Pagination for list_dir and read_file when more than 500 files match a search term
See CHANGELOG.md for version history.
See LICENSE.
We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines.