- 
                Notifications
    You must be signed in to change notification settings 
- Fork 152
feat: add ability to create atlas search indexes MCP-275 #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e9cef48    to
    3fab3b7      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for creating Atlas Search (lexical) indexes through the MCP server. The changes enable users to create Atlas Search indexes with both dynamic and explicit field mappings, complementing the existing support for listing and dropping search indexes.
Key changes:
- Added Atlas Search index creation support to create-indextool with comprehensive schema validation
- Enhanced testing to differentiate between search and vector search indexes
- Added accuracy tests for various Atlas Search index creation scenarios
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description | 
|---|---|
| src/tools/mongodb/create/createIndex.ts | Adds atlasSearchIndexDefinitionschema and implements search index creation logic | 
| tests/integration/tools/mongodb/create/createIndex.test.ts | Adds comprehensive integration tests for Atlas Search index creation scenarios | 
| tests/integration/tools/mongodb/delete/dropIndex.test.ts | Refactors tests to handle both search and vector search indexes separately | 
| tests/accuracy/createIndex.test.ts | Adds accuracy test cases for Atlas Search index creation with various configurations | 
| tests/accuracy/sdk/accuracyTestingClient.ts | Removes deprecated --connectionStringflag from CLI arguments | 
| README.md | Updates documentation to remove deprecated --connectionStringflag | 
Comments suppressed due to low confidence (1)
tests/integration/tools/mongodb/create/createIndex.test.ts:1
- The index names are being evaluated at test definition time rather than test execution time. If getSearchIndexName()andgetVectorIndexName()depend onbeforeEachsetup, these calls will execute before the setup runs, potentially returning undefined or stale values. Wrap these in functions:{ description: "search", indexName: () => getSearchIndexName() }and update the test to call the function.
import { describeWithMongoDB, validateAutoConnectBehavior, waitUntilSearchIsReady } from "../mongodbHelpers.js";
| Pull Request Test Coverage Report for Build 18909991547Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes. 
 Details
 
 
 💛 - Coveralls | 
| import { quantizationEnum, similarityEnum } from "../../../common/search/vectorSearchEmbeddingsManager.js"; | ||
|  | ||
| export class CreateIndexTool extends MongoDBToolBase { | ||
| private vectorSearchIndexDefinition = z.object({ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is best viewed with the "Hide whitespace" option - it's just prettier reformatting the indents.
| }); | ||
|  | ||
| const args = [MCP_SERVER_CLI_SCRIPT, "--connectionString", mdbConnectionString, ...additionalArgs]; | ||
| const args = [MCP_SERVER_CLI_SCRIPT, mdbConnectionString, ...additionalArgs]; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--connectionString is deprecated so using the positional argument.
| }) | ||
| .describe("Definition for a Vector Search index."); | ||
|  | ||
| private atlasSearchIndexDefinition = z | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we supporting custom analyzers?
https://www.mongodb.com/docs/atlas/atlas-search/analyzers/custom/
| private atlasSearchIndexDefinition = z | ||
| .object({ | ||
| type: z.literal("search"), | ||
| analyzer: z | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this should be an enum of the analyzers.
| "The analyzer to use for the index. Can be one of the built-in lucene analyzers (`lucene.standard`, `lucene.simple`, `lucene.whitespace`, `lucene.keyword`), a language-specific analyzer, such as `lucene.cjk` or `lucene.czech`, or a custom analyzer defined in the Atlas UI." | ||
| ), | ||
| mappings: z | ||
| .object({ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lack support of:
- numPartitions
- searchAnalyzer vs analyze
- custom analyzers
- storedSources
- synonyms
- typeSets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could say that custom analyzers are not that important, but storedSources is actually relevant most of the times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The args shape is based on the POC the search team did for index support and I was going off of the assumption that they've selected the fields that they see the most value in exposing to LLMs. I realize there's a lot more configuration that's possible, I'm just not sure how much of that is stuff we expect agents to configure vs an actual human who wants to fine-tune the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A POC to see the feasibility to create search indexes and production code are likely to have different requirements.
| mappings: z | ||
| .object({ | ||
| dynamic: z | ||
| .boolean() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic can be an object of typeSets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in preview though, so I don't expect there's sufficient docs or training data for general-purpose models to accurately choose which one to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preview is for vector search though, FTS is explicitly out of scope of the vector search project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typeSets functionality is in preview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation lacks support for multiple important fields, and it should be discussed if we want to support them or not.
| z.string().describe("The field name"), | ||
| z | ||
| .object({ | ||
| type: z | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects will require additional fields depending on the type. I know passthrough will keep them, but we should document them so the agent knows which ones to use and how. For example, autocomplete supports defining a custom analyzer, how to tokenize (which is really important) and similarity functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact shape is extremely complex to represent in a json schema. I'm worried that being overly specific will result in this being more harmful than helpful, especially if we expect the majority of the use cases to revolve around just specifying the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the schema is complicated, it has a lot of options that are not compatible even between them. We should have proper documentation of which ones we want to expose and which ones not, something that we haven't discussed yet because supporting the most used bits of Atlas Search is already a substantial effort.
Proposed changes
This adds support for creating atlas search indexes. Dropping and listing is already supported, so needed no changes.