-
Notifications
You must be signed in to change notification settings - Fork 106
Sourcebot MCP #292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sourcebot MCP #292
Conversation
Warning Rate limit exceeded@brendan-kellam has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 6 minutes and 56 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
""" WalkthroughThis update introduces a new MCP server package ( Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant MCPServer as MCP Server (@sourcebot/mcp)
participant ClientModule as Client Module
participant RemoteAPI as Remote API
User->>MCPServer: search_code / list_repos / get_file_source request
MCPServer->>ClientModule: Call search / listRepos / getFileSource
ClientModule->>RemoteAPI: HTTP request with headers and payload
RemoteAPI-->>ClientModule: JSON response
ClientModule-->>MCPServer: Validated response or ServiceError
MCPServer-->>User: Formatted result or error message
Suggested reviewers
Poem
""" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🔭 Outside diff range comments (1)
packages/mcp/README.md (1)
1-5
: 🛠️ Refactor suggestionEnhance README with installation and usage examples.
The current README is very minimal. Consider adding sections such as:
- Installation instructions (
npm install @sourcebot/mcp
)- Quickstart example demonstrating how to import and start the MCP server
- Environment variable configuration
- API reference for the provided tools (
search_code
,list_repos
,get_file_source
)Would you like me to draft a more detailed README template?
🧹 Nitpick comments (13)
packages/web/src/features/search/types.ts (1)
1-1
: Uniform comment style.
The// @NOTE :
prefix deviates from common comment conventions. Consider standardizing it:-// @NOTE : Please keep this file in sync with @sourcebot/mcp/src/types.ts +// NOTE: Keep this file in sync with @sourcebot/mcp/src/types.tsThis minor nit improves readability.
packages/mcp/.gitignore (1)
1-2
: Remove trailing whitespace and consider additional ignores.
There's an extra space afternode_modules/
. Also, you may want to ignore environment files or logs:-dist/ -node_modules/ +.gitignore refinements: +dist/ +node_modules/ +# Ignore environment config +.env +# Ignore local logs +*.logpackages/web/src/features/search/schemas.ts (1)
1-1
: Consider a more sustainable approach to schema synchronization.While the comment is helpful, relying on manual synchronization between schema files can lead to drift over time. Consider extracting shared schemas into a common location or implementing an automated validation process to ensure consistency.
packages/mcp/src/env.ts (2)
4-4
: Exported schema could be more restrictive.The
numberSchema
is currently exported, but it seems to be an implementation detail that could be kept private to this module.Consider making this schema private to the module if it's not intended to be used externally:
-export const numberSchema = z.coerce.number(); +const numberSchema = z.coerce.number();
11-11
: Consider using more descriptive variable names.The name
DEFAULT_MINIMUM_TOKENS
doesn't clearly convey what these tokens represent in the context of Sourcebot.Consider renaming to something more specific, such as
DEFAULT_MINIMUM_TOKENS_PER_SEARCH
or adding a more detailed comment explaining what these tokens represent and how they affect the search functionality.packages/mcp/tsconfig.json (1)
27-27
: Consider using a broader include pattern.The current include pattern only targets
src/index.ts
, which might become limiting as the package grows.Consider using a broader pattern to include all TypeScript files in the src directory:
- "include": ["src/index.ts"] + "include": ["src/**/*.ts"]packages/mcp/package.json (2)
7-11
: Add a start script for production usage.The package has build and development scripts, but no explicit start script for production usage.
Add a start script that can be used in production environments:
"scripts": { "build": "tsc", "dev": "node ./dist/index.js", - "build:watch": "tsc-watch --preserveWatchOutput" + "build:watch": "tsc-watch --preserveWatchOutput", + "start": "node ./dist/index.js" },
3-3
: Add TODO for package versioning strategy.According to the PR objectives, this package will be published to npm. It's important to have a clear versioning strategy.
Consider adding a TODO comment about versioning strategy for npm publication. Would you like me to help create a GitHub Action workflow for automating the publication process to npm?
packages/mcp/src/index.ts (2)
75-75
: Redundant log noise – consider removing one of the “Executing search request” prints
client.search()
already prints the same line; keeping both doubles the noise for every call and makes log parsing harder.
117-118
: Token count should be rounded up to prevent silent overflow
text.length / 4
yields a float; rounding up ensures you never underestimate and over-allocate tokens.- const tokens = text.length / 4; + const tokens = Math.ceil(text.length / 4);packages/mcp/src/types.ts (1)
1-3
: Types duplicated across packages – extract to a shared moduleBoth
@sourcebot/mcp
and@sourcebot/web
declare identical schemas/types. Drift is inevitable; consider publishing a small shared package (e.g.@sourcebot/shared-schemas
) and importing from there instead of relying on comments.packages/mcp/src/schemas.ts (2)
1-3
: Schema duplication mirrors the issue highlighted intypes.ts
Keeping this file “in sync” via a comment is brittle. A shared schema package (or generating types from a single source) would remove manual effort and avoid accidental mismatches.
85-90
:branches
could be optional for monorepos or bare repositoriesRequiring
branches: string[]
for every repository will reject APIs that omit branch data.
If the backend treats branches as optional, change the field to.array(z.string()).optional()
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
yarn.lock
is excluded by!**/yarn.lock
,!**/*.lock
📒 Files selected for processing (21)
Makefile
(1 hunks)package.json
(1 hunks)packages/backend/package.json
(1 hunks)packages/mcp/.gitignore
(1 hunks)packages/mcp/README.md
(1 hunks)packages/mcp/package.json
(1 hunks)packages/mcp/src/client.ts
(1 hunks)packages/mcp/src/env.ts
(1 hunks)packages/mcp/src/index.ts
(1 hunks)packages/mcp/src/schemas.ts
(1 hunks)packages/mcp/src/types.ts
(1 hunks)packages/mcp/src/utils.ts
(1 hunks)packages/mcp/tsconfig.json
(1 hunks)packages/schemas/package.json
(1 hunks)packages/web/package.json
(1 hunks)packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx
(1 hunks)packages/web/src/app/[domain]/search/page.tsx
(1 hunks)packages/web/src/features/search/listReposApi.ts
(0 hunks)packages/web/src/features/search/schemas.ts
(2 hunks)packages/web/src/features/search/searchApi.ts
(5 hunks)packages/web/src/features/search/types.ts
(1 hunks)
💤 Files with no reviewable changes (1)
- packages/web/src/features/search/listReposApi.ts
🧰 Additional context used
🧬 Code Graph Analysis (4)
packages/mcp/src/utils.ts (1)
packages/mcp/src/types.ts (1)
ServiceError
(32-32)
packages/mcp/src/index.ts (4)
packages/mcp/src/env.ts (2)
numberSchema
(4-4)env
(6-21)packages/mcp/src/client.ts (3)
search
(6-22)listRepos
(24-38)getFileSource
(40-55)packages/mcp/src/utils.ts (2)
isServiceError
(9-15)base64Decode
(4-7)packages/mcp/src/types.ts (1)
TextContent
(30-30)
packages/mcp/src/schemas.ts (1)
packages/web/src/features/search/schemas.ts (9)
locationSchema
(4-11)rangeSchema
(13-16)symbolSchema
(18-21)searchRequestSchema
(23-32)searchResponseSchema
(34-82)repositorySchema
(84-89)listRepositoriesResponseSchema
(91-93)fileSourceRequestSchema
(95-99)fileSourceResponseSchema
(101-104)
packages/mcp/src/types.ts (2)
packages/web/src/features/search/schemas.ts (8)
searchRequestSchema
(23-32)searchResponseSchema
(34-82)rangeSchema
(13-16)locationSchema
(4-11)symbolSchema
(18-21)listRepositoriesResponseSchema
(91-93)fileSourceRequestSchema
(95-99)fileSourceResponseSchema
(101-104)packages/mcp/src/schemas.ts (9)
searchRequestSchema
(24-33)searchResponseSchema
(35-83)rangeSchema
(14-17)locationSchema
(5-12)symbolSchema
(19-22)listRepositoriesResponseSchema
(92-94)fileSourceRequestSchema
(96-100)fileSourceResponseSchema
(102-105)serviceErrorSchema
(107-111)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (13)
packages/backend/package.json (1)
54-54
: Dependency version bump looks good.
Updatingzod
to^3.24.3
aligns this package with others in the monorepo and ensures consistent schema validation behavior.packages/web/package.json (1)
138-138
: Consistentzod
version update.
Bumpingzod
to^3.24.3
maintains uniformity across packages. Ensure you regenerate the lockfile to capture this change.Makefile (1)
32-33
: LGTM! New package cleanup added.Added removal of node_modules and dist directories for the new MCP package during cleanup - this keeps the
clean
target consistent with other packages.packages/schemas/package.json (1)
8-8
: Good developer experience improvement!Adding a watch script that automatically regenerates TypeScript schemas when JSON schema files change is a great developer experience enhancement. The nodemon configuration watching
../../schemas
directory with proper file extension filtering is well implemented.Also applies to: 15-15
package.json (1)
9-9
: Excellent workflow integration for the new packages.The updated dev script now properly includes the new MCP package and schemas package in the development workflow, running them in parallel with the existing services. This ensures that all packages are rebuilt automatically when changes are made.
Also applies to: 14-15
packages/web/src/features/search/schemas.ts (1)
65-65
: Good improvement to simplify URL handling.Adding a direct
url
property to each file object streamlines the search result handling by eliminating the need for client-side URL construction from templates. This is a more straightforward approach that reduces complexity.packages/mcp/src/env.ts (1)
6-21
: Good implementation of environment configuration.The environment configuration is well-structured with proper validation using zod schemas. Default values are provided for all variables, making the configuration robust even without explicit environment variables.
packages/web/src/app/[domain]/search/page.tsx (1)
144-144
:✅ Verification successful
Removal of repoUrlTemplates is consistent with architecture changes.
The removal of
repoUrlTemplates
from the destructured object aligns with the broader architectural change to use direct URLs instead of template-based URLs.To verify this change doesn't break functionality, run:
🏁 Script executed:
#!/bin/bash # Check if fileMatch.url is correctly used in components rg -A 1 -B 1 "fileMatch\.url" packages/web/src/Length of output: 424
🏁 Script executed:
#!/bin/bash # Locate the search page fd "page.tsx" packages/web/src/app | grep "/search/page.tsx$"Length of output: 107
🏁 Script executed:
#!/bin/bash echo "🔍 Checking for any remaining references to repoUrlTemplates:" rg -n repoUrlTemplates packages/web/src || echo "→ No occurrences found" echo echo "🔍 Viewing the useMemo return signature in search/page.tsx:" file=$(fd page.tsx packages/web/src/app | grep "/search/page.tsx$") rg -n "useMemo" -A 20 -B 5 "$file"Length of output: 3750
🏁 Script executed:
#!/bin/bash # Verify that SearchResultFile includes a `url` property rg -n "export.*SearchResultFile" -A 10 -B 0 packages/web/srcLength of output: 961
🏁 Script executed:
#!/bin/bash echo "🔍 Searching for searchResponseSchema definition and its properties:" rg -n "searchResponseSchema" -A 20 -B 0 packages/web/srcLength of output: 6739
🏁 Script executed:
#!/bin/bash echo "🔍 Full searchResponseSchema definition:" rg -n "export const searchResponseSchema" -A 200 packages/web/src/features/search/schemas.tsLength of output: 2593
Removal of
repoUrlTemplates
verified – no breaking changes detected
- No remaining references to
repoUrlTemplates
in the codebase.searchResponseSchema
defines aurl
property on each file, sofileMatch.url
is guaranteed.CodePreviewPanel
correctly usesfileMatch.url
to build direct links.All usages align with the updated schema; no further changes required.
packages/mcp/tsconfig.json (1)
1-28
: Well-configured TypeScript setup for Node.js package.The TypeScript configuration is well-structured for a modern Node.js package with appropriate target and module settings. Good job enabling strict type checking and declaration file generation.
packages/mcp/package.json (1)
12-25
: Dependencies look appropriate for an MCP server.The dependencies chosen are appropriate for implementing an MCP server with Express. Using
@t3-oss/env-core
withzod
for environment validation is a good practice.packages/mcp/src/utils.ts (1)
9-15
: Type guard implementation is correctThe
isServiceError
type guard properly checks all required properties of aServiceError
object.packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx (1)
49-49
: URL handling simplification looks goodThe change to directly use
fileMatch.url
for the link property simplifies the component by removing complex URL template handling logic.packages/mcp/src/index.ts (1)
8-10
:base64Decode
relies onatob
, which is not guaranteed in all Node versionsOlder Node LTS versions (<20) lack the global
atob
API. Consider a pure-Node fallback:export const base64Decode = (base64: string): string => { return Buffer.from(base64, 'base64').toString('utf8'); };Please verify the minimal Node runtime you intend to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
packages/mcp/src/index.ts (2)
65-67
: Escape language filters before interpolating into the zoekt query
languages.join(' or lang:')
directly injects user-supplied language strings into the regex query.
If a caller passes something containing regex metacharacters or whitespace (e.g.C++
,F#
,Go (mod)
), the final query can break or behave unexpectedly.- if (languages.length > 0) { - query += ` ( lang:${languages.join(' or lang:')} )`; - } + if (languages.length > 0) { + const escapedLangs = languages.map(lang => escapeStringRegexp(lang)); + query += ` ( lang:${escapedLangs.join(' or lang:')} )`; + }
112-118
: 🛠️ Refactor suggestionAvoid unnecessary base-64 decoding when snippets are not requested
The current loop decodes every chunk even when
includeCodeSnippets === false
and then doesn't use the result, which could be inefficient for large result sets.- if (includeCodeSnippets) { - const snippets = file.chunks.map(chunk => { - const content = base64Decode(chunk.content); - return `\`\`\`\n${content}\n\`\`\`` - }).join('\n'); - text += `\n\n${snippets}`; - } + if (includeCodeSnippets) { + const snippets = file.chunks.map(chunk => { + const content = base64Decode(chunk.content); + return `\`\`\`\n${content}\n\`\`\`` + }).join('\n'); + text += `\n\n${snippets}`; + }Actually, since we're only using snippets when
includeCodeSnippets
is true, we can optimize even further:- if (includeCodeSnippets) { - const snippets = file.chunks.map(chunk => { - const content = base64Decode(chunk.content); - return `\`\`\`\n${content}\n\`\`\`` - }).join('\n'); - text += `\n\n${snippets}`; - } + if (includeCodeSnippets) { + const snippets = file.chunks.map(chunk => { + const content = base64Decode(chunk.content); + return `\`\`\`\n${content}\n\`\`\`` + }).join('\n'); + text += `\n\n${snippets}`; + }Wait, the diff isn't showing any changes. Let me provide a clearer fix:
if (includeCodeSnippets) { const snippets = file.chunks.map(chunk => { const content = base64Decode(chunk.content); return `\`\`\`\n${content}\n\`\`\`` }).join('\n'); text += `\n\n${snippets}`; }Sorry about the confusion. The suggested fix would simply be to ensure we only do base64 decoding when snippets are needed, but the current implementation already does this correctly by checking
includeCodeSnippets
before processing snippets. My apologies for the confusion.
🧹 Nitpick comments (1)
packages/mcp/src/index.ts (1)
122-124
: Consider a more accurate token estimation methodThe current token estimation uses a very approximate method (dividing by 4). While this might work as a rough estimate for some models, different tokenizers can produce significantly different token counts.
Consider implementing a more precise tokenization method or using a library specifically designed for token counting. Alternatively, you could add a comment indicating that this is a conservative estimate and might vary across different models.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/mcp/src/index.ts
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/mcp/src/index.ts (4)
packages/mcp/src/env.ts (2)
numberSchema
(4-4)env
(6-21)packages/mcp/src/client.ts (3)
search
(6-22)listRepos
(24-38)getFileSource
(40-55)packages/mcp/src/utils.ts (2)
isServiceError
(9-15)base64Decode
(4-7)packages/mcp/src/types.ts (1)
TextContent
(30-30)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (5)
packages/mcp/src/index.ts (5)
1-16
: Good server initialization and importsThe server setup is clean and well-structured, with appropriate imports and proper initialization of the MCP server with version information.
18-52
: Well-documented tool description and parametersThe
search_code
tool has excellent documentation that clearly explains its purpose, usage rules, and parameter constraints. The parameter descriptions are comprehensive and include examples and constraints that will help consumers use the tool correctly.
150-175
: Well-implemented repository listing toolThe
list_repos
tool provides a clean implementation with proper error handling and a consistent response format.
177-208
: Well-structured file source retrieval toolThe
get_file_source
tool properly validates inputs, handles errors, and returns file content with relevant metadata.
212-221
: Robust server initialization with error handlingThe server initialization is well-implemented with proper error handling that logs failures and exits with a non-zero status code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
packages/mcp/package.json (3)
7-11
: Enhance the development workflow
The current"dev"
script (node ./dist/index.js
) requires a prior build. Consider usingtsx
(already in devDependencies) to run TypeScript source directly and speed up iteration.Proposed diff:
"scripts": { - "build": "tsc", - "dev": "node ./dist/index.js", + "build": "tsc", + "dev": "tsx src/index.ts", "build:watch": "tsc-watch --preserveWatchOutput" }
12-18
: Review devDependencies for usage
You’ve addedtsx
but it isn’t referenced in any script yet (unless you adopt it in"dev"
). If you don’t plan to use it, remove it to avoid unused dependencies.
19-25
: Consider adding package metadata and publish settings
Currently, metadata fields likelicense
,description
,keywords
are missing, and scoped packages default to restricted access. It’s best practice to include:{ + "description": "MCP server for Sourcebot providing search_code, list_repos, and get_file_source tools.", + "license": "MIT", + "keywords": ["sourcebot","mcp","modelcontextprotocol","ai","agentic"], + "engines": { "node": ">=18" }, + "files": ["dist"], + "publishConfig": { "access": "public" }, "dependencies": { "@modelcontextprotocol/sdk": "^1.10.2", "@t3-oss/env-core": "^0.13.4", "escape-string-regexp": "^5.0.0", "express": "^5.1.0", "zod": "^3.24.3" },This will ensure correct npm publication settings and richer metadata.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
packages/mcp/.npmignore
(1 hunks)packages/mcp/CHANGELOG.md
(1 hunks)packages/mcp/package.json
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- packages/mcp/CHANGELOG.md
- packages/mcp/.npmignore
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (4)
packages/mcp/package.json (4)
2-3
: Package name and version look appropriate
The scoped package name (@sourcebot/mcp
) aligns with the monorepo structure, and the pre-release version (0.1.1-rc.1
) follows semver conventions.
4-4
: ES Module format enabled
Specifying"type": "module"
is correct for an ESM-only package.
5-6
: Entry points configured correctly
Themain
andtypes
fields point to the expected build artifacts indist/
.
26-30
: Repository configuration is correct
Thedirectory
field accurately targets thepackages/mcp
subfolder in the monorepo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
packages/mcp/src/index.ts (2)
67-69
: Escape language filters before interpolating into the zoekt queryLanguage strings containing regex metacharacters or whitespace (e.g.,
C++
,F#
,Go (mod)
) could break the final query or cause unexpected behavior when directly interpolated.if (languages.length > 0) { - query += ` ( lang:${languages.join(' or lang:')} )`; + const escapedLangs = languages.map(lang => escapeStringRegexp(lang)); + query += ` ( lang:${escapedLangs.join(' or lang:')} )`; }
114-120
: Avoid unnecessary base-64 decoding when snippets are not requestedThe current implementation performs base64 decoding for every chunk even when code snippets are not included in the response, which can be expensive for large result sets.
if (includeCodeSnippets) { - const snippets = file.chunks.map(chunk => { - const content = base64Decode(chunk.content); - return `\`\`\`\n${content}\n\`\`\`` - }).join('\n'); - text += `\n\n${snippets}`; + const snippets = file.chunks.map(chunk => { + const content = base64Decode(chunk.content); + return `\`\`\`\n${content}\n\`\`\`` + }).join('\n'); + text += `\n\n${snippets}`; }
🧹 Nitpick comments (2)
packages/mcp/src/index.ts (2)
125-125
: Improve token estimation accuracyThe current token estimation (dividing by 4) is very rough and might not be accurate for all content types, especially code which can have different token densities.
Consider using a more sophisticated token counting approach, especially for code:
- const tokens = text.length / 4; + // More accurate token estimation, considering code has different token density + // This is still an approximation but accounts for code specifics better + const codeMultiplier = includeCodeSnippets ? 3.5 : 4; // Code tends to have more tokens per character + const tokens = Math.ceil(text.length / codeMultiplier);For production use, you might want to consider using a dedicated tokenizer library that matches your LLM's tokenization algorithm.
166-171
: Consider providing a more detailed response for repositoriesThe repository listing only includes ID and URL, which might be insufficient for users to make informed decisions.
const content: TextContent[] = response.repos.map(repo => { return { type: "text", - text: `id: ${repo.name}\nurl: ${repo.url}`, + text: `id: ${repo.name}\nurl: ${repo.url}${repo.description ? `\ndescription: ${repo.description}` : ''}${repo.defaultBranch ? `\ndefault_branch: ${repo.defaultBranch}` : ''}`, } });This assumes the repo objects might have additional properties like
description
anddefaultBranch
that could be useful to display. If these properties don't exist in your data model, you can omit them.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/mcp/package.json
(1 hunks)packages/mcp/src/index.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/mcp/package.json
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/mcp/src/index.ts (4)
packages/mcp/src/env.ts (2)
numberSchema
(4-4)env
(6-21)packages/mcp/src/client.ts (3)
search
(6-22)listRepos
(24-38)getFileSource
(40-55)packages/mcp/src/utils.ts (2)
isServiceError
(9-15)base64Decode
(4-7)packages/mcp/src/types.ts (1)
TextContent
(30-30)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (4)
packages/mcp/src/index.ts (4)
1-2
: LGTM! Correctly implemented shebang for CLI executionThe shebang line enables this script to be directly executed when installed as a binary.
63-65
: LGTM! Properly escaping repository IDs in the search queryGood practice to escape repository IDs with
escapeStringRegexp
to prevent query syntax issues.
127-132
: LGTM! Appropriate token limit enforcement with truncation indicationThe code properly tracks token usage and truncates the response when the limit is approached, while also informing the user that truncation has occurred.
214-223
: LGTM! Well-structured server startup with proper error handlingThe server startup code is well-organized with appropriate error handling and process exit on failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/docs/more/mcp-server.mdx (1)
10-10
: Fix indefinite articles and client naming consistency.The sentence on line 10 uses the wrong article before vowel sounds and misnames VS Code. Update as follows:
-The [Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) is a open standard for providing context to LLMs. The [@sourcebot/mcp](https://www.npmjs.com/package/@sourcebot/mcp) package is a MCP server that enables LLMs to interface with your Sourcebot instance, enabling MCP clients like Cursor, Vscode, and others to have context over your entire codebase. +The [Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) is an open standard for providing context to LLMs. The [@sourcebot/mcp](https://www.npmjs.com/package/@sourcebot/mcp) package is an MCP server that enables LLMs to interface with your Sourcebot instance, enabling MCP clients like Cursor, VS Code, and others to have context over your entire codebase.🧰 Tools
🪛 LanguageTool
[misspelling] ~10-~10: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...ntextprotocol.io/introduction) (MCP) is a open standard for providing context to ...(EN_A_VS_AN)
[misspelling] ~10-~10: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ....com/package/@sourcebot/mcp) package is a MCP server that enables LLMs to interfa...(EN_A_VS_AN)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/docs.json
(2 hunks)docs/docs/more/mcp-server.mdx
(1 hunks)docs/docs/more/search-contexts.mdx
(1 hunks)packages/mcp/src/index.ts
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- docs/docs/more/search-contexts.mdx
- docs/docs.json
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/mcp/src/index.ts
🧰 Additional context used
🪛 LanguageTool
docs/docs/more/mcp-server.mdx
[misspelling] ~10-~10: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...ntextprotocol.io/introduction) (MCP) is a open standard for providing context to ...
(EN_A_VS_AN)
[misspelling] ~10-~10: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ....com/package/@sourcebot/mcp) package is a MCP server that enables LLMs to interfa...
(EN_A_VS_AN)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/mcp/README.md (1)
1-1
: Correct Bitbucket branding."BitBucket" in the header should use the official capitalization "Bitbucket" for consistency with product branding and other mentions.
Apply this diff:
-# Sourcebot MCP - search code on GitHub, GitLab, BitBucket, and more +# Sourcebot MCP - search code on GitHub, GitLab, Bitbucket, and more
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/mcp/README.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
packages/mcp/README.md
[typographical] ~11-~11: Consider adding a comma here.
Context: ...'t have checked out. ## Getting Started Please follow [these docs](https://docs.source...
(PLEASE_COMMA)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
docs/docs/more/mcp-server.mdx (1)
83-90
: Misplacedenv
in VS Code configuration.The
env
block should be nested inside the"sourcebot"
server object, not alongside it under"servers"
. Otherwise VS Code will not pick up theSOURCEBOT_HOST
setting."servers": { - "sourcebot": { + "sourcebot": { "type": "stdio", "command": "npx", "args": ["-y", "@sourcebot/mcp@latest"], + "env": { + "SOURCEBOT_HOST": "http://localhost:3000" + } } - }, - "env": { - "SOURCEBOT_HOST": "http://localhost:3000" - } }
🧹 Nitpick comments (5)
docs/docs/more/mcp-server.mdx (1)
10-11
: Grammar: Use "an" before vowel sounds.In the introduction, replace "a open standard" with "an open standard" and "a MCP server" with "an MCP server" for correct usage.
- The [Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) is a open standard for providing context to LLMs. The [@sourcebot/mcp](https://www.npmjs.com/package/@sourcebot/mcp) package is a MCP server that enables LLMs to interface with your Sourcebot instance, enabling MCP clients like Cursor, Vscode, and others to have context over your entire codebase. + The [Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) is an open standard for providing context to LLMs. The [@sourcebot/mcp](https://www.npmjs.com/package/@sourcebot/mcp) package is an MCP server that enables LLMs to interface with your Sourcebot instance, enabling MCP clients like Cursor, VS Code, and others to have context over your entire codebase.🧰 Tools
🪛 LanguageTool
[misspelling] ~10-~10: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...ntextprotocol.io/introduction) (MCP) is a open standard for providing context to ...(EN_A_VS_AN)
[misspelling] ~10-~10: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ....com/package/@sourcebot/mcp) package is a MCP server that enables LLMs to interfa...(EN_A_VS_AN)
packages/mcp/README.md (4)
9-9
: Preposition correction: replace "on" with "in".Use "in a wider code context" instead of "on a wider code context" for smoother phrasing.
- Some use cases where precise search on a wider code context can help: + Some use cases where precise search in a wider code context can help:🧰 Tools
🪛 LanguageTool
[uncategorized] ~9-~9: The preposition “in” seems more likely in this position than the preposition “on”.
Context: ...ly. Some use cases where precise search on a wider code context can help: - Enric...(AI_EN_LECTOR_REPLACEMENT_PREPOSITION_ON_IN)
21-21
: Typo: remove duplicate word.There's a repeated "like" in this bullet.
- - Building custom LLM horizontal agents like like compliance auditing agents, migration agents, etc. + - Building custom LLM horizontal agents like compliance auditing agents, migration agents, etc.🧰 Tools
🪛 LanguageTool
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...- Building custom LLM horizontal agents like like compliance auditing agents, migration a...(ENGLISH_WORD_REPEAT_RULE)
28-28
: Style: Standardize spelling to "Node.js".Replace "Node.JS" with "Node.js" for consistency with official branding.
- 1. Install Node.JS >= v18.0.0. + 1. Install Node.js >= v18.0.0.🧰 Tools
🪛 LanguageTool
[uncategorized] ~28-~28: The official spelling of this programming framework is “Node.js”.
Context: ... api"_ ## Getting Started 1. Install Node.JS >= v18.0.0. 2. (optional) Spin up a So...(NODE_JS)
32-32
: Grammar: correct verb usage "fall back".Use the verb phrase "fall back" instead of the noun "fallback".
- If a host is not provided, then the server will fallback to using the demo instance hosted at https://demo.sourcebot.dev. + If a host is not provided, then the server will fall back to using the demo instance hosted at https://demo.sourcebot.dev.🧰 Tools
🪛 LanguageTool
[grammar] ~32-~32: The word “fallback” is a noun. The verb is spelled with a space.
Context: ...t is not provided, then the server will fallback to using the demo instance hosted at ht...(NOUN_VERB_CONFUSION)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/docs/more/mcp-server.mdx
(1 hunks)packages/mcp/README.md
(1 hunks)packages/mcp/package.json
(1 hunks)packages/mcp/src/env.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/mcp/src/env.ts
- packages/mcp/package.json
🧰 Additional context used
🪛 LanguageTool
docs/docs/more/mcp-server.mdx
[misspelling] ~10-~10: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...ntextprotocol.io/introduction) (MCP) is a open standard for providing context to ...
(EN_A_VS_AN)
[misspelling] ~10-~10: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ....com/package/@sourcebot/mcp) package is a MCP server that enables LLMs to interfa...
(EN_A_VS_AN)
packages/mcp/README.md
[uncategorized] ~9-~9: The preposition “in” seems more likely in this position than the preposition “on”.
Context: ...ly. Some use cases where precise search on a wider code context can help: - Enric...
(AI_EN_LECTOR_REPLACEMENT_PREPOSITION_ON_IN)
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...- Building custom LLM horizontal agents like like compliance auditing agents, migration a...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~28-~28: The official spelling of this programming framework is “Node.js”.
Context: ... api"_ ## Getting Started 1. Install Node.JS >= v18.0.0. 2. (optional) Spin up a So...
(NODE_JS)
[grammar] ~32-~32: The word “fallback” is a noun. The verb is spelled with a space.
Context: ...t is not provided, then the server will fallback to using the demo instance hosted at ht...
(NOUN_VERB_CONFUSION)
[style] ~211-~211: ‘is something that is’ might be wordy. Consider a shorter alternative.
Context: ...enarios when the agent is searching for is something that is super precise and well-represented in t...
(EN_WORDINESS_PREMIUM_IS_SOMETHING_THAT_IS)
[misspelling] ~211-~211: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...e code (e.g., a specific function name, a error string, etc.). It is not-so-great...
(EN_A_VS_AN)
[uncategorized] ~211-~211: Possible missing comma found.
Context: ..., etc.). It is not-so-great for fuzzy searches where the objective is to find some loo...
(AI_HYDRA_LEO_MISSING_COMMA)
[misspelling] ~211-~211: This word is normally spelled as one.
Context: ...date).*(jwt|token)"`), but often yields sub-optimal search results that aren't related. Too...
(EN_COMPOUNDS_SUB_OPTIMAL)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
CHANGELOG.md (1)
12-12
: Changelog: Correct verb usage ('Check out' vs 'Checkout')The imperative verb form should be "Check out" rather than the noun "Checkout" for better grammar and clarity.
- Added the Sourcebot Model Context Protocol (MCP) server in [packages/mcp](./packages/mcp/README.md) to allow LLMs to interface with Sourcebot. Checkout the npm package [here](https://www.npmjs.com/package/@sourcebot/mcp). [#292] + Added the Sourcebot Model Context Protocol (MCP) server in [packages/mcp](./packages/mcp/README.md) to allow LLMs to interface with Sourcebot. Check out the npm package [here](https://www.npmjs.com/package/@sourcebot/mcp). [#292]🧰 Tools
🪛 LanguageTool
[grammar] ~12-~12: This sentence should probably be started with a verb instead of the noun ‘Checkout’. If not, consider inserting a comma for better clarity.
Context: ...allow LLMs to interface with Sourcebot. Checkout the npm package [here](https://www.npmj...(SENT_START_NN_DT)
packages/mcp/README.md (7)
1-1
: Consistent brand casing for BitbucketThe title uses "BitBucket", whereas later sections list "Bitbucket Cloud" and "Bitbucket Data Center". For consistency and correct branding, use "Bitbucket".
- # Sourcebot MCP - Blazingly fast agentic code search for GitHub, GitLab, BitBucket, and more + # Sourcebot MCP - Blazingly fast agentic code search for GitHub, GitLab, Bitbucket, and more
9-9
: Use 'in' instead of 'on' for proper phrasingThe phrase "precise search on a wider code context" is more idiomatically expressed as "precise search in a wider code context".
- Some use cases where precise search on a wider code context can help: + Some use cases where precise search in a wider code context can help:🧰 Tools
🪛 LanguageTool
[uncategorized] ~9-~9: The preposition “in” seems more likely in this position than the preposition “on”.
Context: ...ly. Some use cases where precise search on a wider code context can help: - Enric...(AI_EN_LECTOR_REPLACEMENT_PREPOSITION_ON_IN)
21-21
: Remove duplicate word 'like'There's a duplicated "like" in this line.
- - Building custom LLM horizontal agents like like compliance auditing agents, migration agents, etc. + - Building custom LLM horizontal agents like compliance auditing agents, migration agents, etc.🧰 Tools
🪛 LanguageTool
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...- Building custom LLM horizontal agents like like compliance auditing agents, migration a...(ENGLISH_WORD_REPEAT_RULE)
28-28
: Correct casing for Node.jsThe official spelling is "Node.js" (not "Node.JS").
- 1. Install Node.JS >= v18.0.0. + 1. Install Node.js >= v18.0.0.🧰 Tools
🪛 LanguageTool
[uncategorized] ~28-~28: The official spelling of this programming framework is “Node.js”.
Context: ... api"_ ## Getting Started 1. Install Node.JS >= v18.0.0. 2. (optional) Spin up a So...(NODE_JS)
30-33
: Clarify environment variable usage and correct 'fallback' verb
- "fallback" should be "fall back" when used as a verb.
- Specify that
SOURCEBOT_HOST
is an environment variable.- 2. (optional) Spin up a Sourcebot instance by following [this guide](https://docs.sourcebot.dev/self-hosting/overview). The host url of your instance (e.g., `http://localhost:3000`) is passed to the MCP server via the `SOURCEBOT_HOST` url. - - If a host is not provided, then the server will fallback to using the demo instance hosted at https://demo.sourcebot.dev. You can see the list of repositories indexed [here](https://demo.sourcebot.dev/~/repos). Add additional repositories by [opening a PR](https://github.com/sourcebot-dev/sourcebot/blob/main/demo-site-config.json). + 2. (optional) Spin up a Sourcebot instance by following [this guide](https://docs.sourcebot.dev/self-hosting/overview). The host of your instance (e.g., `http://localhost:3000`) is configured via the `SOURCEBOT_HOST` environment variable. + + If a host is not provided, the server will fall back to the demo instance at https://demo.sourcebot.dev. You can see indexed repositories [here](https://demo.sourcebot.dev/~/repos), and add more via [PR](https://github.com/sourcebot-dev/sourcebot/blob/main/demo-site-config.json).🧰 Tools
🪛 LanguageTool
[grammar] ~32-~32: The word “fallback” is a noun. The verb is spelled with a space.
Context: ...t is not provided, then the server will fallback to using the demo instance hosted at ht...(NOUN_VERB_CONFUSION)
34-35
: Add a generic installation stepBefore diving into per-client configurations, consider showing users how to install the package globally (or locally) via npm/yarn.
## Getting Started -3. Install `@sourcebot/mcp` into your MCP client: +3. Install the MCP server package: + ```bash + npm install -g @sourcebot/mcp + # or + yarn global add @sourcebot/mcp + ``` +4. Integrate into your MCP client (e.g., see Cursor, Windsurf, VS Code below).
211-216
: Fix grammatical issues in the Semantic Search sectionThis paragraph contains several small errors:
- Remove the extra "is" in "searching for is something that is".
- Change "a error string" to "an error string".
- Use "suboptimal" instead of "sub-optimal".
- Add a missing comma after "etc.)".
- Currently, Sourcebot only supports regex-based code search (powered by [zoekt](https://github.com/sourcegraph/zoekt) under the hood). It is great for scenarios when the agent is searching for is something that is super precise and well-represented in the source code (e.g., a specific function name, a error string, etc.). It is not-so-great for _fuzzy_ searches where the objective is to find some loosely defined _category_ or _concept_ in the code (e.g., find code that verifies JWT tokens). + Currently, Sourcebot only supports regex-based code search (powered by [zoekt](https://github.com/sourcegraph/zoekt) under the hood). It works well for scenarios when the agent is searching for something super precise and well represented in the source code (e.g., a specific function name, an error string, etc.), but it is not so great for _fuzzy_ searches where the objective is to find some loosely defined _category_ or _concept_ (e.g., code that verifies JWT tokens).🧰 Tools
🪛 LanguageTool
[style] ~211-~211: ‘is something that is’ might be wordy. Consider a shorter alternative.
Context: ...enarios when the agent is searching for is something that is super precise and well-represented in t...(EN_WORDINESS_PREMIUM_IS_SOMETHING_THAT_IS)
[misspelling] ~211-~211: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...e code (e.g., a specific function name, a error string, etc.). It is not-so-great...(EN_A_VS_AN)
[uncategorized] ~211-~211: Possible missing comma found.
Context: ..., etc.). It is not-so-great for fuzzy searches where the objective is to find some loo...(AI_HYDRA_LEO_MISSING_COMMA)
[misspelling] ~211-~211: This word is normally spelled as one.
Context: ...date).*(jwt|token)"`), but often yields sub-optimal search results that aren't related. Too...(EN_COMPOUNDS_SUB_OPTIMAL)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md
(1 hunks)packages/mcp/README.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
packages/mcp/README.md
[uncategorized] ~9-~9: The preposition “in” seems more likely in this position than the preposition “on”.
Context: ...ly. Some use cases where precise search on a wider code context can help: - Enric...
(AI_EN_LECTOR_REPLACEMENT_PREPOSITION_ON_IN)
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...- Building custom LLM horizontal agents like like compliance auditing agents, migration a...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~28-~28: The official spelling of this programming framework is “Node.js”.
Context: ... api"_ ## Getting Started 1. Install Node.JS >= v18.0.0. 2. (optional) Spin up a So...
(NODE_JS)
[grammar] ~32-~32: The word “fallback” is a noun. The verb is spelled with a space.
Context: ...t is not provided, then the server will fallback to using the demo instance hosted at ht...
(NOUN_VERB_CONFUSION)
[style] ~211-~211: ‘is something that is’ might be wordy. Consider a shorter alternative.
Context: ...enarios when the agent is searching for is something that is super precise and well-represented in t...
(EN_WORDINESS_PREMIUM_IS_SOMETHING_THAT_IS)
[misspelling] ~211-~211: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...e code (e.g., a specific function name, a error string, etc.). It is not-so-great...
(EN_A_VS_AN)
[uncategorized] ~211-~211: Possible missing comma found.
Context: ..., etc.). It is not-so-great for fuzzy searches where the objective is to find some loo...
(AI_HYDRA_LEO_MISSING_COMMA)
[misspelling] ~211-~211: This word is normally spelled as one.
Context: ...date).*(jwt|token)"`), but often yields sub-optimal search results that aren't related. Too...
(EN_COMPOUNDS_SUB_OPTIMAL)
CHANGELOG.md
[grammar] ~12-~12: This sentence should probably be started with a verb instead of the noun ‘Checkout’. If not, consider inserting a comma for better clarity.
Context: ...allow LLMs to interface with Sourcebot. Checkout the npm package [here](https://www.npmj...
(SENT_START_NN_DT)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
This reverts commit 9f8cf8d.
66e62ce
to
cc85e82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
packages/mcp/README.md (1)
86-112
:⚠️ Potential issueFix VS Code configuration format - ensure consistency with other clients.
The VS Code configuration in the README matches the one in the documentation file, which has a misplaced
env
property. Update to ensure consistency with other client configurations.{ "mcp": { "servers": { "sourcebot": { "type": "stdio", "command": "npx", - "args": ["-y", "@sourcebot/mcp@latest"] + "args": ["-y", "@sourcebot/mcp@latest"], + // Optional - if not specified, https://demo.sourcebot.dev is used + "env": { + "SOURCEBOT_HOST": "http://localhost:3000" + } } - // Optional - if not specified, https://demo.sourcebot.dev is used - "env": { - "SOURCEBOT_HOST": "http://localhost:3000" - } } } }
🧹 Nitpick comments (7)
packages/mcp/src/schemas.ts (1)
35-83
: Comprehensive search response schema with good organization.The search response schema is detailed and captures all the necessary information from the search engine. The nested structure for files, chunks, and symbols is well-organized.
Note that there's a slight reordering of properties compared to the web schema - in this schema,
url
comes afterlanguage
while in the web schema it appears in a different position. While this doesn't affect functionality, maintaining the same field order could help when comparing the two files.CHANGELOG.md (1)
15-15
: Fix grammatical error in the changelog entry.The sentence has a small grammatical error in the phrasing.
-Added the Sourcebot Model Context Protocol (MCP) server in [packages/mcp](./packages/mcp/README.md) to allow LLMs to interface with Sourcebot. Checkout the npm package [here](https://www.npmjs.com/package/@sourcebot/mcp). [#292](https://github.com/sourcebot-dev/sourcebot/pull/292) +Added the Sourcebot Model Context Protocol (MCP) server in [packages/mcp](./packages/mcp/README.md) to allow LLMs to interface with Sourcebot. Check out the npm package [here](https://www.npmjs.com/package/@sourcebot/mcp). [#292](https://github.com/sourcebot-dev/sourcebot/pull/292)🧰 Tools
🪛 LanguageTool
[grammar] ~15-~15: This sentence should probably be started with a verb instead of the noun ‘Checkout’. If not, consider inserting a comma for better clarity.
Context: ...allow LLMs to interface with Sourcebot. Checkout the npm package [here](https://www.npmj...(SENT_START_NN_DT)
docs/docs/more/mcp-server.mdx (1)
10-10
: Fix grammar: "a open standard" should be "an open standard".The article "a" should be "an" when it precedes a word starting with a vowel sound.
-The [Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) is a open standard for providing context to LLMs. The [@sourcebot/mcp](https://www.npmjs.com/package/@sourcebot/mcp) package is a MCP server that enables LLMs to interface with your Sourcebot instance, enabling MCP clients like Cursor, Vscode, and others to have context over your entire codebase. +The [Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) is an open standard for providing context to LLMs. The [@sourcebot/mcp](https://www.npmjs.com/package/@sourcebot/mcp) package is an MCP server that enables LLMs to interface with your Sourcebot instance, enabling MCP clients like Cursor, Vscode, and others to have context over your entire codebase.🧰 Tools
🪛 LanguageTool
[misspelling] ~10-~10: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...ntextprotocol.io/introduction) (MCP) is a open standard for providing context to ...(EN_A_VS_AN)
[misspelling] ~10-~10: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ....com/package/@sourcebot/mcp) package is a MCP server that enables LLMs to interfa...(EN_A_VS_AN)
packages/mcp/README.md (4)
21-21
: Remove duplicate word "like".There's a duplicate word in this sentence.
-Building custom LLM horizontal agents like like compliance auditing agents, migration agents, etc. +Building custom LLM horizontal agents like compliance auditing agents, migration agents, etc.🧰 Tools
🪛 LanguageTool
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...- Building custom LLM horizontal agents like like compliance auditing agents, migration a...(ENGLISH_WORD_REPEAT_RULE)
28-28
: Fix Node.js capitalization.The official spelling of the JavaScript runtime is "Node.js" (not "Node.JS").
-1. Install Node.JS >= v18.0.0. +1. Install Node.js >= v18.0.0.🧰 Tools
🪛 LanguageTool
[uncategorized] ~28-~28: The official spelling of this programming framework is “Node.js”.
Context: ... api"_ ## Getting Started 1. Install Node.JS >= v18.0.0. 2. (optional) Spin up a So...(NODE_JS)
32-32
: Correct verb form: "fallback" should be "fall back"."Fallback" is a noun, but you're using it as a verb here, which should be written as "fall back".
-If a host is not provided, then the server will fallback to using the demo instance hosted at https://demo.sourcebot.dev. You can see the list of repositories indexed [here](https://demo.sourcebot.dev/~/repos). Add additional repositories by [opening a PR](https://github.com/sourcebot-dev/sourcebot/blob/main/demo-site-config.json). +If a host is not provided, then the server will fall back to using the demo instance hosted at https://demo.sourcebot.dev. You can see the list of repositories indexed [here](https://demo.sourcebot.dev/~/repos). Add additional repositories by [opening a PR](https://github.com/sourcebot-dev/sourcebot/blob/main/demo-site-config.json).🧰 Tools
🪛 LanguageTool
[grammar] ~32-~32: The word “fallback” is a noun. The verb is spelled with a space.
Context: ...t is not provided, then the server will fallback to using the demo instance hosted at ht...(NOUN_VERB_CONFUSION)
211-211
: Fix grammar and improve readability in the Future Work section.There are several grammatical issues in this paragraph, including wordiness, missing articles, and hyphenation issues.
-Currently, Sourcebot only supports regex-based code search (powered by [zoekt](https://github.com/sourcegraph/zoekt) under the hood). It is great for scenarios when the agent is searching for is something that is super precise and well-represented in the source code (e.g., a specific function name, a error string, etc.). It is not-so-great for _fuzzy_ searches where the objective is to find some loosely defined _category_ or _concept_ in the code (e.g., find code that verifies JWT tokens). The LLM can approximate this by crafting regex searches that attempt to capture a concept (e.g., it might try a query like `"jwt|token|(verify|validate).*(jwt|token)"`), but often yields sub-optimal search results that aren't related. Tools like Cursor solve this with [embedding models](https://docs.cursor.com/context/codebase-indexing) to capture the semantic meaning of code, allowing for LLMs to search using natural language. We would like to extend Sourcebot to support semantic search and expose this capability over MCP as a tool (e.g., `semantic_search_code` tool). [GitHub Discussion](https://github.com/sourcebot-dev/sourcebot/discussions/297) +Currently, Sourcebot only supports regex-based code search (powered by [zoekt](https://github.com/sourcegraph/zoekt) under the hood). It is great for scenarios when the agent is searching for something precise and well-represented in the source code (e.g., a specific function name, an error string, etc.). It is not so great for _fuzzy_ searches, where the objective is to find some loosely defined _category_ or _concept_ in the code (e.g., find code that verifies JWT tokens). The LLM can approximate this by crafting regex searches that attempt to capture a concept (e.g., it might try a query like `"jwt|token|(verify|validate).*(jwt|token)"`), but often yields suboptimal search results that aren't related. Tools like Cursor solve this with [embedding models](https://docs.cursor.com/context/codebase-indexing) to capture the semantic meaning of code, allowing for LLMs to search using natural language. We would like to extend Sourcebot to support semantic search and expose this capability over MCP as a tool (e.g., `semantic_search_code` tool). [GitHub Discussion](https://github.com/sourcebot-dev/sourcebot/discussions/297)🧰 Tools
🪛 LanguageTool
[style] ~211-~211: ‘is something that is’ might be wordy. Consider a shorter alternative.
Context: ...enarios when the agent is searching for is something that is super precise and well-represented in t...(EN_WORDINESS_PREMIUM_IS_SOMETHING_THAT_IS)
[misspelling] ~211-~211: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...e code (e.g., a specific function name, a error string, etc.). It is not-so-great...(EN_A_VS_AN)
[uncategorized] ~211-~211: Possible missing comma found.
Context: ..., etc.). It is not-so-great for fuzzy searches where the objective is to find some loo...(AI_HYDRA_LEO_MISSING_COMMA)
[misspelling] ~211-~211: This word is normally spelled as one.
Context: ...date).*(jwt|token)"`), but often yields sub-optimal search results that aren't related. Too...(EN_COMPOUNDS_SUB_OPTIMAL)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
yarn.lock
is excluded by!**/yarn.lock
,!**/*.lock
📒 Files selected for processing (27)
CHANGELOG.md
(1 hunks)Makefile
(1 hunks)docs/docs.json
(2 hunks)docs/docs/more/mcp-server.mdx
(1 hunks)docs/docs/more/search-contexts.mdx
(1 hunks)package.json
(1 hunks)packages/backend/package.json
(1 hunks)packages/mcp/.gitignore
(1 hunks)packages/mcp/.npmignore
(1 hunks)packages/mcp/CHANGELOG.md
(1 hunks)packages/mcp/README.md
(1 hunks)packages/mcp/package.json
(1 hunks)packages/mcp/src/client.ts
(1 hunks)packages/mcp/src/env.ts
(1 hunks)packages/mcp/src/index.ts
(1 hunks)packages/mcp/src/schemas.ts
(1 hunks)packages/mcp/src/types.ts
(1 hunks)packages/mcp/src/utils.ts
(1 hunks)packages/mcp/tsconfig.json
(1 hunks)packages/schemas/package.json
(1 hunks)packages/web/package.json
(1 hunks)packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx
(1 hunks)packages/web/src/app/[domain]/search/page.tsx
(1 hunks)packages/web/src/features/search/listReposApi.ts
(0 hunks)packages/web/src/features/search/schemas.ts
(2 hunks)packages/web/src/features/search/searchApi.ts
(5 hunks)packages/web/src/features/search/types.ts
(1 hunks)
💤 Files with no reviewable changes (1)
- packages/web/src/features/search/listReposApi.ts
✅ Files skipped from review due to trivial changes (2)
- packages/web/src/features/search/types.ts
- packages/backend/package.json
🚧 Files skipped from review as they are similar to previous changes (20)
- packages/mcp/.gitignore
- packages/mcp/.npmignore
- packages/web/package.json
- packages/mcp/CHANGELOG.md
- docs/docs/more/search-contexts.mdx
- Makefile
- packages/schemas/package.json
- docs/docs.json
- packages/mcp/src/env.ts
- packages/web/src/features/search/schemas.ts
- packages/web/src/app/[domain]/search/page.tsx
- packages/mcp/tsconfig.json
- packages/web/src/features/search/searchApi.ts
- packages/mcp/src/client.ts
- packages/mcp/src/utils.ts
- package.json
- packages/mcp/package.json
- packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx
- packages/mcp/src/index.ts
- packages/mcp/src/types.ts
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/mcp/src/schemas.ts (1)
packages/web/src/features/search/schemas.ts (9)
locationSchema
(4-11)rangeSchema
(13-16)symbolSchema
(18-21)searchRequestSchema
(23-32)searchResponseSchema
(34-82)repositorySchema
(84-89)listRepositoriesResponseSchema
(91-93)fileSourceRequestSchema
(95-99)fileSourceResponseSchema
(101-104)
🪛 LanguageTool
CHANGELOG.md
[grammar] ~15-~15: This sentence should probably be started with a verb instead of the noun ‘Checkout’. If not, consider inserting a comma for better clarity.
Context: ...allow LLMs to interface with Sourcebot. Checkout the npm package [here](https://www.npmj...
(SENT_START_NN_DT)
docs/docs/more/mcp-server.mdx
[misspelling] ~10-~10: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...ntextprotocol.io/introduction) (MCP) is a open standard for providing context to ...
(EN_A_VS_AN)
[misspelling] ~10-~10: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ....com/package/@sourcebot/mcp) package is a MCP server that enables LLMs to interfa...
(EN_A_VS_AN)
packages/mcp/README.md
[uncategorized] ~9-~9: The preposition “in” seems more likely in this position than the preposition “on”.
Context: ...ly. Some use cases where precise search on a wider code context can help: - Enric...
(AI_EN_LECTOR_REPLACEMENT_PREPOSITION_ON_IN)
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...- Building custom LLM horizontal agents like like compliance auditing agents, migration a...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~28-~28: The official spelling of this programming framework is “Node.js”.
Context: ... api"_ ## Getting Started 1. Install Node.JS >= v18.0.0. 2. (optional) Spin up a So...
(NODE_JS)
[grammar] ~32-~32: The word “fallback” is a noun. The verb is spelled with a space.
Context: ...t is not provided, then the server will fallback to using the demo instance hosted at ht...
(NOUN_VERB_CONFUSION)
[style] ~211-~211: ‘is something that is’ might be wordy. Consider a shorter alternative.
Context: ...enarios when the agent is searching for is something that is super precise and well-represented in t...
(EN_WORDINESS_PREMIUM_IS_SOMETHING_THAT_IS)
[misspelling] ~211-~211: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...e code (e.g., a specific function name, a error string, etc.). It is not-so-great...
(EN_A_VS_AN)
[uncategorized] ~211-~211: Possible missing comma found.
Context: ..., etc.). It is not-so-great for fuzzy searches where the objective is to find some loo...
(AI_HYDRA_LEO_MISSING_COMMA)
[misspelling] ~211-~211: This word is normally spelled as one.
Context: ...date).*(jwt|token)"`), but often yields sub-optimal search results that aren't related. Too...
(EN_COMPOUNDS_SUB_OPTIMAL)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (7)
packages/mcp/src/schemas.ts (7)
1-2
: Good documentation note about keeping schemas in sync.This note is helpful for maintainers to understand that this file should be kept synchronized with the web package schemas. The comment about eventually moving these to a shared package is also valuable for future planning.
5-12
: Schema definitions look thorough and well-documented.The location schema is properly defined with clear comments explaining each field's purpose, including the base offset systems (0-based for byte offset, 1-based for line and column numbers).
14-22
: Well-structured range and symbol schemas.These schema definitions are concise and build appropriately on each other, with the range schema using the location schema for start and end points.
24-33
: Search request schema properly handles optional parameters.The search request schema correctly defines required fields (query, matches) and optional fields (contextLines, whole), with clear comments explaining their purpose.
85-94
: Repository schemas are clear and concise.Both the repository schema and list repositories response schema are well-structured with all necessary fields.
96-105
: File source request and response schemas look good.These schemas appropriately define the fields needed for requesting and receiving file source content, including making the branch field optional in the request schema.
107-111
: Good standardized error schema.The service error schema provides a consistent structure for error responses with status code, error code, and message - this will help with client-side error handling.
```json | ||
{ | ||
"mcp": { | ||
"servers": { | ||
"sourcebot": { | ||
"type": "stdio", | ||
"command": "npx", | ||
"args": ["-y", "@sourcebot/mcp@latest"] | ||
}, | ||
"env": { | ||
"SOURCEBOT_HOST": "http://localhost:3000" | ||
} | ||
} | ||
} | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix VS Code configuration format - misplaced env
property.
The env
object should be nested under the sourcebot
object, not alongside it within the servers
object. This matches the format used in other client configurations and follows VS Code's MCP configuration structure.
{
"mcp": {
"servers": {
"sourcebot": {
"type": "stdio",
"command": "npx",
- "args": ["-y", "@sourcebot/mcp@latest"]
+ "args": ["-y", "@sourcebot/mcp@latest"],
+ "env": {
+ "SOURCEBOT_HOST": "http://localhost:3000"
+ }
},
- "env": {
- "SOURCEBOT_HOST": "http://localhost:3000"
- }
}
}
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```json | |
{ | |
"mcp": { | |
"servers": { | |
"sourcebot": { | |
"type": "stdio", | |
"command": "npx", | |
"args": ["-y", "@sourcebot/mcp@latest"] | |
}, | |
"env": { | |
"SOURCEBOT_HOST": "http://localhost:3000" | |
} | |
} | |
} | |
} | |
``` | |
{ | |
"mcp": { | |
"servers": { | |
"sourcebot": { | |
"type": "stdio", | |
"command": "npx", | |
"args": ["-y", "@sourcebot/mcp@latest"], | |
"env": { | |
"SOURCEBOT_HOST": "http://localhost:3000" | |
} | |
} | |
} | |
} | |
} |
This PR adds a basic MCP server for interacting with Sourcebot in agentic workflows. The server exposes three tools:
search_code
,list_repos
andget_file_source
. Blog post coming soon for this 🔜.Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Documentation
Chores