-
Couldn't load subscription status.
- Fork 149
Add MongoDB Assistant Tools #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces MongoDB Assistant tools for the MCP server, enabling natural language search and knowledge source discovery. It adds integration with the MongoDB Assistant AI chatbot and search service.
- Scaffolds a new
AssistantToolBaseclass for MongoDB Assistant tool interactions - Implements two initial tools:
list_knowledge_sourcesandsearch_knowledge - Adds comprehensive test coverage with mocked API responses
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/common/config.ts | Adds assistantBaseUrl configuration option |
| src/tools/tool.ts | Extends ToolCategory type to include "assistant" |
| src/tools/assistant/assistantTool.ts | Base class for Assistant tools with common functionality |
| src/tools/assistant/list_knowledge_sources.ts | Tool for listing available knowledge sources |
| src/tools/assistant/search_knowledge.ts | Tool for searching the knowledge base |
| src/tools/assistant/tools.ts | Exports array of Assistant tools |
| src/server.ts | Registers Assistant tools with the server |
| tests/integration/helpers.ts | Updates response element types to support metadata |
| tests/integration/tools/assistant/assistantHelpers.ts | Test utilities for Assistant tool testing |
| tests/integration/tools/assistant/listKnowledgeSources.test.ts | Integration tests for list knowledge sources tool |
| tests/integration/tools/assistant/searchKnowledge.test.ts | Integration tests for search knowledge tool |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/tools/assistant/assistantTool.ts
Outdated
| if (args.method === "POST") { | ||
| headers.set("Content-Type", "application/json"); | ||
| } | ||
| return await fetch(endpoint, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't use Node.js fetch, but customFetch configured like in the Atlas Client:
mongodb-mcp-server/src/common/atlas/apiClient.ts
Lines 41 to 47 in 04a5ddc
| // createFetch assumes that the first parameter of fetch is always a string | |
| // with the URL. However, fetch can also receive a Request object. While | |
| // the typechecking complains, createFetch does passthrough the parameters | |
| // so it works fine. | |
| private static customFetch: typeof fetch = createFetch({ | |
| useEnvironmentVariableProxies: true, | |
| }) as unknown as typeof fetch; |
The reason is that the customFetch supports enterprise proxies, which are relevant for customers with specific security requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move this into apiClient? we also have telemetry events logic there, it would be good to keep all the api related logic there for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok no problem - I will create a new customFetch instance in the assistant tool base because we don't need all of the extra baggage from the Atlas ApiClient class when working with this non-Atlas API.
| return { | ||
| content: dataSources.map(({ id, type, versions }) => ({ | ||
| type: "text", | ||
| text: id, | ||
| _meta: { | ||
| type, | ||
| versions, | ||
| }, | ||
| })), | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should convert this into a string, similar to what we do in:
mongodb-mcp-server/src/tools/mongodb/read/find.ts
Lines 93 to 103 in 04a5ddc
| return { | |
| content: formatUntrustedData( | |
| this.generateMessage({ | |
| collection, | |
| queryResultsCount, | |
| documents: cursorResults.documents, | |
| appliedLimits: [limitOnFindCursor.cappedBy, cursorResults.cappedBy].filter((limit) => !!limit), | |
| }), | |
| cursorResults.documents.length > 0 ? EJSON.stringify(cursorResults.documents) : undefined | |
| ), | |
| }; |
And also we should consider the information in the knowledge list untrusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like formatUntrustedData maps to a single CallToolResult["content"] object so I will need to convert the array of results I get from each API call into a single formatted string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to wrap API error text in formatUntrustedData as well? Not sure if that's already mitigated by the isError: true or not.
i.e. instead of
if (!response.ok) {
return {
content: [
{
type: "text",
text: `Failed to list knowledge sources: ${response.statusText}`,
},
],
isError: true,
};
}we would have
if (!response.ok) {
return {
content: formatUntrustedData("Failed to list knowledge sources", response.statusText),
isError: true,
};
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe the LLM will employ any injection protection measures regardless of whether the response is marked as error-ed or not. That being said, do we expect that statusText would contain any instructions - this should be just the textual representation of the status code, shouldn't it? I wouldn't expect there to be a way to inject anything malicious in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think statusCode should only ever be something reasonable like Ok, Not Found, etc.
| import { LogId } from "../../common/logger.js"; | ||
|
|
||
| export const SearchKnowledgeToolArgs = { | ||
| query: z.string().describe("A natural language query to search for in the knowledge base"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we add a bit more detail here? this is what the LLM will use to understand that you want it to make a generic natural question that the user haave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
| this.requiredHeaders = new Headers({ | ||
| "x-request-origin": "mongodb-mcp-server", | ||
| "user-agent": serverVersion ? `mongodb-mcp-server/v${serverVersion}` : "mongodb-mcp-server", | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[q] why does this need to be initialized here at the tool level? can we also move this logic into the api layer?
also, we should probably use the userAgent from the apiClient and make sure we use the same name everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need to be initialized here at the tool level? can we also move this logic into the api layer?
The "API layer" as it exists seems to be tailored to Atlas - this is a totally separate server/API. If we prefer I could create an equivalent assistant/apiClient.ts file but personally that seems unnecessary at this point.
we should probably use the userAgent from the apiClient
- Our server firewall allowlists specific user agents and right now that's configured to be
mongodb-mcp-server - nit: The
apiClientusesUser-Agent: AtlasMCPwhich again seems to refer to Atlas specifically
| /** | ||
| * Mocks fetch for assistant API calls | ||
| */ | ||
| interface MockedAssistantAPI { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if we're mocking the api calls i'd rather call these unit tests, could you please move the tests to the unit folder and the helpers?
if you have any staging / cloud-dev environment, we could configure integration tests as well, let me know on slack if you do as I can help setting up a GH variable for the staging environment, otherwise we could keep it with unit tests only which is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha that's totally reasonable. I'll move these mocked tests to tests/unit. It's a bit awkward because I think I still want the integration test flow (i.e. mcpClient.callTool({ ... })) but I'd like to use the mocks to control the output to confirm that everything is formatted/handled appropriately.
For integration tests, we could try hitting https://knowledge-dev.mongodb.com. What GH variables do we need to support this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an env variable called TEST_ASSISTANT_BASE_URL in GH actions, you can make it so your integration tests get configured to that if the env is set.
here's an atlas example. you'd need to add the logic for using it as well
- name: Run tests
env:
MDB_MCP_API_CLIENT_ID: ${{ secrets.TEST_ATLAS_CLIENT_ID }}
MDB_MCP_API_CLIENT_SECRET: ${{ secrets.TEST_ATLAS_CLIENT_SECRET }}
MDB_MCP_API_BASE_URL: ${{ vars.TEST_ATLAS_BASE_URL }}
run: npm test -- tests/integration/tools/atlas
|
The only thing missing from my side in this PR is to add some accuracy tests, to make sure that the models we test with do know when to call this tool. In the tests/accuracy folder there are a few examples that can be used as inspiration. The idea is to write a few prompts that we expect to trigger our tool and ensure that the tool is called with the specific parameters. |
|
Good to see there is work being done on this area. As I mentioned on the issue I created, I have always found myself going through the documentation, but also making sure that what I'm reading is available on the version that I have installed. It is not always the case that people are on the latest of the latest, so would there be a way to "hint" the MCP about the version you need the documentation to be? |
|
|
||
| export class ListKnowledgeSourcesTool extends AssistantToolBase { | ||
| public name = ListKnowledgeSourcesToolName; | ||
| protected description = "List available data sources in the MongoDB Assistant knowledge base"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing locally I was able to trigger it explicitly by "listing knowledge topics.
After extending description a little bit I was able to call this tool more accurately.
"list mongodb topics for me"
| protected description = "List available data sources in the MongoDB Assistant knowledge base"; | |
| protected description = "List available data sources in the MongoDB Assistant knowledge base. Covers t knowledge topics for MongoDB database"; |
Proposed changes
(EAI-1266) Knowledge Search on MCP
Adds a new class of
Assistanttools for interacting with the MongoDB Assistant (AI Chatbot + Search).This PR scaffolds the generic
AssistantToolBaseclass as well as two initial tools:list_knowledge_sources- pulls a list of all available data sources. Useful for knowing what data is available and for filtering thesearch_knowledgesearch_knowledge- runs a natural language search against the MongoDB Assistant knowledge base. Returns content chunks for the most relevant results.Checklist