-
Notifications
You must be signed in to change notification settings - Fork 72
chore(tests): accuracy tests for MongoDB tools exposed by MCP server MCP-39 #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
himanshusinghs
wants to merge
47
commits into
main
Choose a base branch
from
chore/issue-307-proposal-2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+5,089
−6
Open
Changes from all commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
45abd9f
chore: LangChain based accuracy tests
himanshusinghs af67d6c
chore: use vercel AI SDK instead of langchain
himanshusinghs dffeabf
chore: integrate capturing accuracy snapshots
himanshusinghs 2e89f7a
chore: correct env names
himanshusinghs 2345c27
chore: more consolidated prompt tests
himanshusinghs 0cdfe2e
chore: add a few more tests and some more models
himanshusinghs 6e69fd6
chore: add AzureOpenAI model in the model list
himanshusinghs ea099c2
chore: use ListDatabasesTool response creator for tests
himanshusinghs 8ae3d3d
chore: use ListCollectionsTool response creators in tests
himanshusinghs 1f5b246
chore: tests for collection-indexes tool
himanshusinghs 330b9e5
modify prompt for list-collections prompt and log tools provided
himanshusinghs 127fee0
chore: have mock generators return Promise of ToolResult as well
himanshusinghs d8c79b8
chore: tests for collection-schema tool
himanshusinghs f430780
chore: do not fail tests on dropped accuracy
himanshusinghs a09c725
chore: added tests for find tool
himanshusinghs 1aa80eb
chore: tests for insert-many tool
himanshusinghs b0c3df6
chore: tests for delete-many tool
himanshusinghs c5365ac
chore: add oepnai provider
himanshusinghs f79faca
chore: fixes accuracy scorer for position independent matching
himanshusinghs e0470bc
chore: replace mock mcp client with real (mockable) mcp client
himanshusinghs b961916
chore: moved all existing tests to vercel mcp client
himanshusinghs 5ffee02
chore: adds tests for the rest of the tools
himanshusinghs abec91a
chore: adds missed out tests for tools
himanshusinghs 047da6a
chore: MongoDB based snapshot storage for accuracy runs
himanshusinghs 94a0fe3
chore: remove file based snapshot
himanshusinghs 5bc21aa
wip: snapshot summary generator
himanshusinghs 6abc324
chore: single entry point for running accuracy tests with different c…
himanshusinghs c9c3b36
chore: reformat
himanshusinghs 746d7eb
chore: lint fixes
himanshusinghs f84bf43
chore: simplified toolCallingAccuracy calculation
himanshusinghs 496acc7
chore: account for types moved around
himanshusinghs c5ead9d
chore: adds accuracyRunStatus to snapshot entries
himanshusinghs b54cf14
chore: add disk based accuracy storage for local runs
himanshusinghs 188aebc
chore: revert changes done to any of the src files
himanshusinghs b309fb4
chore: handle test failures and appropriately mark them as failed in …
himanshusinghs 43493f3
chore: make snapshot storage independent of accuracyRunId and commitSHA
himanshusinghs cb46c43
chore: bail on first failure and add some explanation for update-accu…
himanshusinghs 9db296e
chore: refactor to make tests writing simpler and other QOL improveme…
himanshusinghs d7b1c57
chore: generate accuracy test summary post test
himanshusinghs 6c25f1b
chore: add Github workflow to trigger test runs
himanshusinghs 6da9538
chore: fix permissions issue
himanshusinghs 6ccaa11
chore: bring back packages post merge
himanshusinghs 865dbfe
chore: update report generation to include comparison with baseline a…
himanshusinghs 055628d
Update .github/workflows/accuracy-tests.yml
himanshusinghs 1933f1f
Update .github/workflows/accuracy-tests.yml
himanshusinghs 5c97ca8
Update .github/workflows/accuracy-tests.yml
himanshusinghs f666014
Update .github/workflows/accuracy-tests.yml
himanshusinghs File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
name: Accuracy Tests | ||
|
||
on: | ||
workflow_dispatch: | ||
pull_request: | ||
types: [labeled] | ||
|
||
jobs: | ||
run-accuracy-tests: | ||
name: Run Accuracy Tests | ||
runs-on: ubuntu-latest | ||
permissions: | ||
contents: read | ||
pull-requests: write | ||
if: | | ||
github.event_name == 'workflow_dispatch' || | ||
(github.event_name == 'pull_request' && github.event.label.name == 'accuracy-tests') | ||
env: | ||
MDB_OPEN_AI_API_KEY: ${{ secrets.ACCURACY_OPEN_AI_API_KEY }} | ||
MDB_GEMINI_API_KEY: ${{ secrets.MDB_GEMINI_API_KEY }} | ||
MDB_AZURE_OPEN_AI_API_KEY: ${{ secrets.MDB_AZURE_OPEN_AI_API_KEY }} | ||
MDB_AZURE_OPEN_AI_API_URL: ${{ secrets.MDB_AZURE_OPEN_AI_API_URL }} | ||
MDB_ACCURACY_MDB_URL: ${{ secrets.ACCURACY_MDB_CONNECTION_STRING }} | ||
MDB_ACCURACY_MDB_DB: ${{ vars.ACCURACY_MDB_DB }} | ||
MDB_ACCURACY_MDB_COLLECTION: ${{ vars.ACCURACY_MDB_COLLECTION }} | ||
MDB_ACCURACY_BASELINE_COMMIT: ${{ github.event.pull_request.base.sha || '' }} | ||
steps: | ||
- uses: GitHubSecurityLab/actions-permissions/monitor@v1 | ||
- uses: actions/checkout@v4 | ||
- uses: actions/setup-node@v4 | ||
with: | ||
node-version-file: package.json | ||
cache: "npm" | ||
- name: Install dependencies | ||
run: npm ci | ||
- name: Run accuracy tests | ||
run: ./scripts/run-accuracy-tests.sh | ||
- name: Upload accuracy test summary | ||
if: always() | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: accuracy-test-summary | ||
path: .accuracy/tests-summary.html | ||
- name: Comment summary on PR | ||
if: github.event_name == 'pull_request' && github.event.label.name == 'accuracy-tests' | ||
uses: marocchino/sticky-pull-request-comment@v2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This needs to be a commit sha |
||
with: | ||
path: .accuracy/tests-summary.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,3 +11,5 @@ state.json | |
|
||
tests/tmp | ||
coverage | ||
# Generated assets by accuracy runs | ||
.accuracy |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guessing we want to eventually also run them on merges to main, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that's right - I will update this as well.