Skip to content

chore(tests): accuracy tests for MongoDB tools exposed by MCP server MCP-39 #341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 47 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
45abd9f
chore: LangChain based accuracy tests
himanshusinghs Jun 28, 2025
af67d6c
chore: use vercel AI SDK instead of langchain
himanshusinghs Jun 30, 2025
dffeabf
chore: integrate capturing accuracy snapshots
himanshusinghs Jun 30, 2025
2e89f7a
chore: correct env names
himanshusinghs Jun 30, 2025
2345c27
chore: more consolidated prompt tests
himanshusinghs Jun 30, 2025
0cdfe2e
chore: add a few more tests and some more models
himanshusinghs Jun 30, 2025
6e69fd6
chore: add AzureOpenAI model in the model list
himanshusinghs Jul 1, 2025
ea099c2
chore: use ListDatabasesTool response creator for tests
himanshusinghs Jul 1, 2025
8ae3d3d
chore: use ListCollectionsTool response creators in tests
himanshusinghs Jul 1, 2025
1f5b246
chore: tests for collection-indexes tool
himanshusinghs Jul 1, 2025
330b9e5
modify prompt for list-collections prompt and log tools provided
himanshusinghs Jul 1, 2025
127fee0
chore: have mock generators return Promise of ToolResult as well
himanshusinghs Jul 1, 2025
d8c79b8
chore: tests for collection-schema tool
himanshusinghs Jul 1, 2025
f430780
chore: do not fail tests on dropped accuracy
himanshusinghs Jul 1, 2025
a09c725
chore: added tests for find tool
himanshusinghs Jul 1, 2025
1aa80eb
chore: tests for insert-many tool
himanshusinghs Jul 3, 2025
b0c3df6
chore: tests for delete-many tool
himanshusinghs Jul 3, 2025
c5365ac
chore: add oepnai provider
himanshusinghs Jul 3, 2025
f79faca
chore: fixes accuracy scorer for position independent matching
himanshusinghs Jul 4, 2025
e0470bc
chore: replace mock mcp client with real (mockable) mcp client
himanshusinghs Jul 4, 2025
b961916
chore: moved all existing tests to vercel mcp client
himanshusinghs Jul 6, 2025
5ffee02
chore: adds tests for the rest of the tools
himanshusinghs Jul 7, 2025
abec91a
chore: adds missed out tests for tools
himanshusinghs Jul 7, 2025
047da6a
chore: MongoDB based snapshot storage for accuracy runs
himanshusinghs Jul 8, 2025
94a0fe3
chore: remove file based snapshot
himanshusinghs Jul 8, 2025
5bc21aa
wip: snapshot summary generator
himanshusinghs Jul 8, 2025
6abc324
chore: single entry point for running accuracy tests with different c…
himanshusinghs Jul 8, 2025
c9c3b36
chore: reformat
himanshusinghs Jul 8, 2025
746d7eb
chore: lint fixes
himanshusinghs Jul 8, 2025
f84bf43
chore: simplified toolCallingAccuracy calculation
himanshusinghs Jul 8, 2025
496acc7
chore: account for types moved around
himanshusinghs Jul 8, 2025
c5ead9d
chore: adds accuracyRunStatus to snapshot entries
himanshusinghs Jul 8, 2025
b54cf14
chore: add disk based accuracy storage for local runs
himanshusinghs Jul 8, 2025
188aebc
chore: revert changes done to any of the src files
himanshusinghs Jul 8, 2025
b309fb4
chore: handle test failures and appropriately mark them as failed in …
himanshusinghs Jul 8, 2025
43493f3
chore: make snapshot storage independent of accuracyRunId and commitSHA
himanshusinghs Jul 9, 2025
cb46c43
chore: bail on first failure and add some explanation for update-accu…
himanshusinghs Jul 9, 2025
9db296e
chore: refactor to make tests writing simpler and other QOL improveme…
himanshusinghs Jul 9, 2025
d7b1c57
chore: generate accuracy test summary post test
himanshusinghs Jul 10, 2025
6c25f1b
chore: add Github workflow to trigger test runs
himanshusinghs Jul 10, 2025
6da9538
chore: fix permissions issue
himanshusinghs Jul 10, 2025
6ccaa11
chore: bring back packages post merge
himanshusinghs Jul 10, 2025
865dbfe
chore: update report generation to include comparison with baseline a…
himanshusinghs Jul 10, 2025
055628d
Update .github/workflows/accuracy-tests.yml
himanshusinghs Jul 11, 2025
1933f1f
Update .github/workflows/accuracy-tests.yml
himanshusinghs Jul 11, 2025
5c97ca8
Update .github/workflows/accuracy-tests.yml
himanshusinghs Jul 11, 2025
f666014
Update .github/workflows/accuracy-tests.yml
himanshusinghs Jul 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .github/workflows/accuracy-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: Accuracy Tests

on:
workflow_dispatch:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guessing we want to eventually also run them on merges to main, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that's right - I will update this as well.

pull_request:
types: [labeled]

jobs:
run-accuracy-tests:
name: Run Accuracy Tests
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
if: |
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'pull_request' && github.event.label.name == 'accuracy-tests')
env:
MDB_OPEN_AI_API_KEY: ${{ secrets.ACCURACY_OPEN_AI_API_KEY }}
MDB_GEMINI_API_KEY: ${{ secrets.MDB_GEMINI_API_KEY }}
MDB_AZURE_OPEN_AI_API_KEY: ${{ secrets.MDB_AZURE_OPEN_AI_API_KEY }}
MDB_AZURE_OPEN_AI_API_URL: ${{ secrets.MDB_AZURE_OPEN_AI_API_URL }}
MDB_ACCURACY_MDB_URL: ${{ secrets.ACCURACY_MDB_CONNECTION_STRING }}
MDB_ACCURACY_MDB_DB: ${{ vars.ACCURACY_MDB_DB }}
MDB_ACCURACY_MDB_COLLECTION: ${{ vars.ACCURACY_MDB_COLLECTION }}
MDB_ACCURACY_BASELINE_COMMIT: ${{ github.event.pull_request.base.sha || '' }}
steps:
- uses: GitHubSecurityLab/actions-permissions/monitor@v1
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version-file: package.json
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Run accuracy tests
run: ./scripts/run-accuracy-tests.sh
- name: Upload accuracy test summary
if: always()
uses: actions/upload-artifact@v4
with:
name: accuracy-test-summary
path: .accuracy/tests-summary.html
- name: Comment summary on PR
if: github.event_name == 'pull_request' && github.event.label.name == 'accuracy-tests'
uses: marocchino/sticky-pull-request-comment@v2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a commit sha

with:
path: .accuracy/tests-summary.html
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@ state.json

tests/tmp
coverage
# Generated assets by accuracy runs
.accuracy
Loading
Loading