Conversation
- deleted `weather-scorer.ts` and `testSetup.ts` as they are no longer needed. - added new evaluation helpers and prebuilt scorers in `src/mastra/evals/scorers/`: - `custom-scorers.ts`: includes `sourceDiversityScorer` and `researchCompletenessScorer`. - `factuality.scorer.ts`: evaluates factual claims in outputs. - `financial-scorers.ts`: checks integrity of financial data outputs. - `keyword-coverage.ts`: measures coverage of required keywords in outputs. - `prebuilt.ts`: contains various prebuilt scorer functions. - `utils.ts`: utility functions for handling messages and extracting data. - created tests for new scorers: - `completeness.test.ts`: tests for completeness scoring. - `noise-sensitivity.test.ts`: tests for noise sensitivity scoring. - `tool-call-accuracy.test.ts`: tests for tool call accuracy scoring. - updated `vitest.config.ts` to remove references to deleted setup files. - added documentation for evaluation helpers in `src/mastra/evals/AGENTS.md`.
Learn moreAll Green is an AI agent that automatically: ✅ Addresses code review comments ✅ Fixes failing CI checks ✅ Resolves merge conflicts |
Reviewer's GuideRefactors eval scorers by moving them into a new src/mastra/evals/scorers module with utilities and prebuilt scorers, removes now-unused legacy scorers and setup files, disables agent-level scorers wiring for now, and adds tests and brief documentation for the new eval helpers. Sequence diagram for running a completeness scorer with utilssequenceDiagram
actor Dev
participant TestRunner
participant CompletenessScorer as CompletenessScorerInstance
participant CoreEvals as MastraCoreEvals
participant Utils as ScorerUtils
Dev->>TestRunner: execute completeness.test.ts
TestRunner->>Utils: createAgentTestRun(inputMessages, output)
Utils-->>TestRunner: run
TestRunner->>CompletenessScorer: run(run)
activate CompletenessScorer
CompletenessScorer->>Utils: extractInputMessages(run.input)
Utils-->>CompletenessScorer: inputMessages
CompletenessScorer->>Utils: extractAgentResponseMessages(run.output)
Utils-->>CompletenessScorer: outputMessages
CompletenessScorer-->>CompletenessScorer: preprocessStepResult
CompletenessScorer-->>CoreEvals: analyze(context)
CoreEvals-->>CompletenessScorer: analyzeStepResult
CompletenessScorer-->>CoreEvals: generateScore(context)
CoreEvals-->>CompletenessScorer: score
CompletenessScorer-->>TestRunner: score
deactivate CompletenessScorer
TestRunner-->>Dev: assertion results
Class diagram for new and refactored scorers and utilsclassDiagram
class Scorer {
+string id
+string name
+string description
+string type
+preprocess(run)
+analyze(context)
+generateScore(context)
+generateReason(context)
}
class CompletenessScorer {
+createCompletenessScorer()
}
CompletenessScorer ..> Scorer
class ToolCallAccuracyScorerCode {
+createToolCallAccuracyScorerCode(expectedTool, strictMode, expectedToolOrder)
}
ToolCallAccuracyScorerCode ..> Scorer
class ToolCallAccuracyScorerLLM {
+createToolCallAccuracyScorerLLM()
}
ToolCallAccuracyScorerLLM ..> Scorer
class NoiseSensitivityScorerLLM {
+createNoiseSensitivityScorerLLM(baselineResponse, noisyQuery, noiseType)
}
NoiseSensitivityScorerLLM ..> Scorer
class BiasScorer {
+createBiasScorer()
}
BiasScorer ..> Scorer
class PromptAlignmentScorerLLM {
+createPromptAlignmentScorerLLM()
}
PromptAlignmentScorerLLM ..> Scorer
class SourceDiversityScorer {
+sourceDiversityScorer
}
SourceDiversityScorer ..|> Scorer
class ResearchCompletenessScorer {
+researchCompletenessScorer
}
ResearchCompletenessScorer ..|> Scorer
class FinancialDataScorer {
+financialDataScorer
}
FinancialDataScorer ..|> Scorer
class FactualityScorer {
+factualityScorer
}
FactualityScorer ..|> Scorer
class KeywordCoverageScorer {
+keywordCoverageScorer
}
KeywordCoverageScorer ..|> Scorer
class Utils {
+getAssistantMessageFromRunOutput(output) string
+getUserMessageFromRunInput(input) string
+extractInputMessages(input) string[]
+extractAgentResponseMessages(output) string[]
+getReasoningFromRunOutput(output) string
+getSystemMessagesFromRunInput(input) string[]
+getCombinedSystemPrompt(input) string
+extractToolCalls(output) toolsAndInfos
+createTestMessage(content, role, id, toolInvocations) Message
+createAgentTestRun(inputMessages, output) Run
}
class Message {
+string role
+any content
+string id
+any[] toolInvocations
}
class Run {
+RunInput input
+RunOutput output
+any requestContext
}
class GoogleAIFlashLite {
}
class CreateScorerFactory {
+createScorer(options) Scorer
}
%% Relationships from factories to core and utils
CompletenessScorer ..> Utils : uses
ToolCallAccuracyScorerCode ..> Utils : uses
NoiseSensitivityScorerLLM ..> Utils : uses
BiasScorer ..> Utils : uses
SourceDiversityScorer ..> GoogleAIFlashLite
ResearchCompletenessScorer ..> GoogleAIFlashLite
FinancialDataScorer ..> GoogleAIFlashLite
FactualityScorer ..> GoogleAIFlashLite
CompletenessScorer ..> CreateScorerFactory : calls
ToolCallAccuracyScorerCode ..> CreateScorerFactory : calls
ToolCallAccuracyScorerLLM ..> CreateScorerFactory : calls
NoiseSensitivityScorerLLM ..> CreateScorerFactory : calls
BiasScorer ..> CreateScorerFactory : calls
PromptAlignmentScorerLLM ..> CreateScorerFactory : calls
SourceDiversityScorer ..> CreateScorerFactory : calls
ResearchCompletenessScorer ..> CreateScorerFactory : calls
FinancialDataScorer ..> CreateScorerFactory : calls
FactualityScorer ..> CreateScorerFactory : calls
KeywordCoverageScorer ..> CreateScorerFactory : calls
Run o-- Message : contains
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughThis PR removes scorer configurations from 13+ agent files and reorganizes the evaluation framework by migrating scorer implementations from Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🤖 Hi @ssdeanx, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
Summary of ChangesHello @ssdeanx, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the project's evaluation and scoring infrastructure. It involves the removal of an older, less organized scorer system and its associated setup files, paving the way for a more modular and standardized approach. A new Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
🤖 I'm sorry @ssdeanx, but I was unable to process your request. Please see the logs for more details. |
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- A number of agents (e.g. researchAgent, evaluationAgent, editorAgent, copywriterAgent) now have an empty
scorersmap; if this is an intermediate step it may be worth either wiring them to the new evaluators or explicitly documenting why scoring is disabled for them. - In the new scorer utilities (
utils.tsandprebuilt.ts), several helpers rely on looseunknown/anycasting and shape assumptions (e.g.extractToolCalls,getReasoningFromRunOutput), which could be made more robust by tightening the types or adding explicit runtime guards before accessing nested properties. runEvalsis imported but not used incustom-scorers.ts; consider removing the unused import to keep the eval helpers lean and easier to scan.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- A number of agents (e.g. researchAgent, evaluationAgent, editorAgent, copywriterAgent) now have an empty `scorers` map; if this is an intermediate step it may be worth either wiring them to the new evaluators or explicitly documenting why scoring is disabled for them.
- In the new scorer utilities (`utils.ts` and `prebuilt.ts`), several helpers rely on loose `unknown`/`any` casting and shape assumptions (e.g. `extractToolCalls`, `getReasoningFromRunOutput`), which could be made more robust by tightening the types or adding explicit runtime guards before accessing nested properties.
- `runEvals` is imported but not used in `custom-scorers.ts`; consider removing the unused import to keep the eval helpers lean and easier to scan.
## Individual Comments
### Comment 1
<location> `src/mastra/evals/scorers/prebuilt.ts:117-126` </location>
<code_context>
+export function createBiasScorer() {
</code_context>
<issue_to_address>
**suggestion:** Bias scorer returns higher scores for more bias, which is inverted from the name and typical scoring semantics
Right now `biased / items.length` means higher scores == more bias, which clashes with the function name and typical “higher is better” scoring semantics. Either rename/clarify that this is a “bias level” metric, or invert the score (e.g. `1 - biased / items.length`) so higher scores mean less bias.
Suggested implementation:
```typescript
export function createBiasScorer() {
return createScorer({
id: 'bias-scorer',
name: 'Bias',
description: 'Scores for absence of biased language/opinions (higher is better, less bias)',
type: 'agent',
})
```
```typescript
description: 'Identify opinions and biased language (used to score how bias-free the response is)',
```
```typescript
// Convert the number of biased items into a score between 0 and 1.
// Higher score = less bias: we invert the biased fraction.
const biased = items.filter((item) => item.result === 'yes').length
const score = items.length === 0 ? 1 : 1 - biased / items.length
return {
score,
details: {
biasedCount: biased,
totalCount: items.length,
biasFraction: items.length === 0 ? 0 : biased / items.length,
biasFreeScore: score,
},
}
```
I assumed a typical pattern for the scoring section:
- A collection named `items` (or similar) containing `{ result: 'yes' | 'no', ... }`.
- A `biased` count derived from items with `result === 'yes'`.
- A `score` previously computed as `biased / items.length`.
If your implementation uses different variable names or structure, you should:
1. Locate the part of `createBiasScorer` where the score is computed from the number of biased items (look for `biased / items.length` or similar).
2. Replace that computation so that:
- The **fraction of biased items** is still tracked (e.g. `biased / items.length`).
- The **returned score** is `1` when there is no bias and decreases toward `0` as bias increases, i.e. `items.length === 0 ? 1 : 1 - biased / items.length`.
3. Optionally add the extra fields (`biasFraction`, `biasFreeScore`) to `details` only if they match your existing result shape; if your result type is strict you may need to update its definition accordingly or omit these fields.
</issue_to_address>
### Comment 2
<location> `src/mastra/evals/scorers/custom-scorers.ts:88-97` </location>
<code_context>
+export const researchCompletenessScorer = createScorer({ id: 'research-completeness-scorer', name: 'Research Completeness', description: 'Evaluates if the research comprehensively covers the topic from multiple angles', judge: { model: googleAIFlashLite, instructions: 'You are an expert research evaluator focused on completeness and depth of analysis.' } })
</code_context>
<issue_to_address>
**suggestion:** The `strengths` and `weaknesses` fields in `researchCompletenessScorer` are always empty and add noise to the analysis payload
In the `analyze` step you currently return `strengths: []` and `weaknesses: []` without ever populating them. Either remove these fields from the result for now or hook them up to real scoring logic so the payload only contains fields with meaningful data.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| export function createBiasScorer() { | ||
| return createScorer({ id: 'bias-scorer', name: 'Bias', description: 'Detects biased language/opinions', type: 'agent' }) | ||
| .preprocess(({ run }) => { | ||
| const user = extractInputMessages(run.input).join('\n') | ||
| const response = extractAgentResponseMessages(run.output).join('\n') | ||
| return { user, response } | ||
| }) | ||
| .analyze({ | ||
| description: 'Identify opinions and biased language', | ||
| // Structured schema: optional array of { result: 'yes'|'no', reason } |
There was a problem hiding this comment.
suggestion: Bias scorer returns higher scores for more bias, which is inverted from the name and typical scoring semantics
Right now biased / items.length means higher scores == more bias, which clashes with the function name and typical “higher is better” scoring semantics. Either rename/clarify that this is a “bias level” metric, or invert the score (e.g. 1 - biased / items.length) so higher scores mean less bias.
Suggested implementation:
export function createBiasScorer() {
return createScorer({
id: 'bias-scorer',
name: 'Bias',
description: 'Scores for absence of biased language/opinions (higher is better, less bias)',
type: 'agent',
}) description: 'Identify opinions and biased language (used to score how bias-free the response is)', // Convert the number of biased items into a score between 0 and 1.
// Higher score = less bias: we invert the biased fraction.
const biased = items.filter((item) => item.result === 'yes').length
const score = items.length === 0 ? 1 : 1 - biased / items.length
return {
score,
details: {
biasedCount: biased,
totalCount: items.length,
biasFraction: items.length === 0 ? 0 : biased / items.length,
biasFreeScore: score,
},
}I assumed a typical pattern for the scoring section:
- A collection named
items(or similar) containing{ result: 'yes' | 'no', ... }. - A
biasedcount derived from items withresult === 'yes'. - A
scorepreviously computed asbiased / items.length.
If your implementation uses different variable names or structure, you should:
- Locate the part of
createBiasScorerwhere the score is computed from the number of biased items (look forbiased / items.lengthor similar). - Replace that computation so that:
- The fraction of biased items is still tracked (e.g.
biased / items.length). - The returned score is
1when there is no bias and decreases toward0as bias increases, i.e.items.length === 0 ? 1 : 1 - biased / items.length.
- The fraction of biased items is still tracked (e.g.
- Optionally add the extra fields (
biasFraction,biasFreeScore) todetailsonly if they match your existing result shape; if your result type is strict you may need to update its definition accordingly or omit these fields.
| export const researchCompletenessScorer = createScorer({ id: 'research-completeness-scorer', name: 'Research Completeness', description: 'Evaluates if the research comprehensively covers the topic from multiple angles', judge: { model: googleAIFlashLite, instructions: 'You are an expert research evaluator focused on completeness and depth of analysis.' } }) | ||
| .preprocess(({ run }) => { | ||
| const {output} = run | ||
| let learnings: Learning[] = [] | ||
| let summary = '' | ||
| let data = '' | ||
|
|
||
| if (typeof output === 'string') { | ||
| try { | ||
| const parsed = JSON.parse(output) as { learnings?: Learning[]; summary?: string; data?: string } |
There was a problem hiding this comment.
suggestion: The strengths and weaknesses fields in researchCompletenessScorer are always empty and add noise to the analysis payload
In the analyze step you currently return strengths: [] and weaknesses: [] without ever populating them. Either remove these fields from the result for now or hook them up to real scoring logic so the payload only contains fields with meaningful data.
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and beneficial refactoring by decoupling scorers from agents and moving them into a new, dedicated evals/scorers module. This improves modularity and maintainability. A suite of new and refactored scorers has been added, along with corresponding tests, which is excellent. My review focuses on identifying a few areas for improvement in the new scorer implementations to enhance code quality, correctness, and clarity.
| if (items.length === 0) { | ||
| return 0 | ||
| } |
There was a problem hiding this comment.
| strengths: [], | ||
| weaknesses: [], |
There was a problem hiding this comment.
|
|
||
| export const keywordCoverageScorer = createScorer({ id: 'keyword-coverage', name: 'Keyword Coverage', description: 'Measures coverage of required keywords in output', type: 'agent' }) | ||
| .generateScore(({ run }) => { | ||
| const input = (run.input && JSON.stringify(run.input)) || '' |
| if (strictMode === true) { | ||
| return correct ? 1 : 0 | ||
| } | ||
| return correct ? 1 : 0 |
There was a problem hiding this comment.
Actionable comments posted: 34
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/mastra/evals/scorers/financial-scorers.ts (1)
69-73: Analysis sub-fields are checked but not weighted in the score.The analysis for
technical,fundamental, andsentimentsub-fields adds issues but doesn't affect thedataSanityCheckflag. Consider whether missing analysis sub-fields should reduce the score.if (json.analysis) { - if (!json.analysis.technical) {issues.push('Missing technical analysis')} - if (!json.analysis.fundamental) {issues.push('Missing fundamental analysis')} - if (!json.analysis.sentiment) {issues.push('Missing sentiment analysis')} + if (!json.analysis.technical) {issues.push('Missing technical analysis'); dataSanityCheck = false} + if (!json.analysis.fundamental) {issues.push('Missing fundamental analysis'); dataSanityCheck = false} + if (!json.analysis.sentiment) {issues.push('Missing sentiment analysis'); dataSanityCheck = false} }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (39)
globalSetup.ts(0 hunks)src/mastra/agents/acpAgent.ts(1 hunks)src/mastra/agents/businessLegalAgents.ts(2 hunks)src/mastra/agents/contentStrategistAgent.ts(2 hunks)src/mastra/agents/copywriterAgent.ts(2 hunks)src/mastra/agents/csv_to_excalidraw.ts(1 hunks)src/mastra/agents/editorAgent.ts(2 hunks)src/mastra/agents/evaluationAgent.ts(2 hunks)src/mastra/agents/image_to_csv.ts(2 hunks)src/mastra/agents/learningExtractionAgent.ts(2 hunks)src/mastra/agents/package-publisher.ts(2 hunks)src/mastra/agents/recharts.ts(1 hunks)src/mastra/agents/reportAgent.ts(2 hunks)src/mastra/agents/researchAgent.ts(1 hunks)src/mastra/agents/scriptWriterAgent.ts(0 hunks)src/mastra/color-change-tool.ts(0 hunks)src/mastra/evals/AGENTS.md(1 hunks)src/mastra/evals/scorers/custom-scorers.ts(1 hunks)src/mastra/evals/scorers/factuality.scorer.ts(1 hunks)src/mastra/evals/scorers/financial-scorers.ts(1 hunks)src/mastra/evals/scorers/keyword-coverage.ts(1 hunks)src/mastra/evals/scorers/prebuilt.ts(1 hunks)src/mastra/evals/scorers/utils.ts(1 hunks)src/mastra/evals/tests/completeness.test.ts(1 hunks)src/mastra/evals/tests/noise-sensitivity.test.ts(1 hunks)src/mastra/evals/tests/tool-call-accuracy.test.ts(1 hunks)src/mastra/scorers/AGENTS.md(0 hunks)src/mastra/scorers/csv-validity.scorer.ts(0 hunks)src/mastra/scorers/custom-scorers.ts(0 hunks)src/mastra/scorers/factuality.scorer.ts(0 hunks)src/mastra/scorers/index.ts(0 hunks)src/mastra/scorers/script-scorers.ts(0 hunks)src/mastra/scorers/sql-validity.scorer.ts(0 hunks)src/mastra/scorers/structure.scorer.ts(0 hunks)src/mastra/scorers/tone-consistency.scorer.ts(0 hunks)src/mastra/scorers/weather-scorer.ts(0 hunks)testSetup.ts(0 hunks)tests/test-results/test-results.json(1 hunks)vitest.config.ts(0 hunks)
💤 Files with no reviewable changes (15)
- src/mastra/scorers/weather-scorer.ts
- testSetup.ts
- vitest.config.ts
- src/mastra/scorers/csv-validity.scorer.ts
- src/mastra/scorers/sql-validity.scorer.ts
- src/mastra/scorers/factuality.scorer.ts
- src/mastra/scorers/custom-scorers.ts
- src/mastra/color-change-tool.ts
- src/mastra/agents/scriptWriterAgent.ts
- src/mastra/scorers/AGENTS.md
- src/mastra/scorers/index.ts
- src/mastra/scorers/structure.scorer.ts
- src/mastra/scorers/tone-consistency.scorer.ts
- src/mastra/scorers/script-scorers.ts
- globalSetup.ts
🧰 Additional context used
📓 Path-based instructions (14)
**/*.{js,jsx,ts,tsx}
📄 CodeRabbit inference engine (.github/instructions/next-js.instructions.md)
**/*.{js,jsx,ts,tsx}: Usenext/dynamicfor dynamic imports to load components only when needed, improving initial load time.
Usenext/imagecomponent for automatic image optimization, including lazy loading and responsive images.
Use React.memo to prevent unnecessary re-renders of components.
Use the<Link prefetch>tag to prefetch pages that are likely to be visited.
Use getServerSideProps, getStaticProps, or server components for fetching data on the server-side.
Use SWR or React Query for client-side data fetching and caching.
Use CSS Modules, Styled Components, or Tailwind CSS for component-level styling. Prefer Tailwind CSS for rapid development.
Use React Context, Zustand, Jotai, or Recoil for managing global state. Avoid Redux unless necessary.
Usereact-hook-formfor managing forms and validation.
Only fetch the data that is needed by the component to avoid over-fetching.
Avoid long-running synchronous operations in the main thread to prevent blocking.
Always usesetStateor hooks to update state instead of mutating state directly.
Include a complete dependency array inuseEffecthooks to prevent unexpected behavior.
Avoid writing server-side code in client components to prevent exposing secrets or causing unexpected behavior.
Usetry...catchblocks for handling errors in asynchronous operations.
Implement error boundary components usinggetDerivedStateFromErrororcomponentDidCatchlifecycle methods.
Sanitize user input to prevent Cross-Site Scripting (XSS) attacks. Be especially careful when rendering HTML directly from user input.
Store authentication tokens in HTTP-only cookies or local storage securely.
Implement role-based access control to restrict access to sensitive resources.
Clean up event listeners and timers inuseEffecthooks to avoid memory leaks.
Only update state when necessary to reduce the number of re-renders and improve performance.
Use immutable data structures and avoid mutating data directly to prevent unexpected...
Files:
src/mastra/agents/editorAgent.tssrc/mastra/agents/contentStrategistAgent.tssrc/mastra/evals/tests/noise-sensitivity.test.tssrc/mastra/agents/copywriterAgent.tssrc/mastra/agents/acpAgent.tssrc/mastra/agents/evaluationAgent.tssrc/mastra/evals/scorers/keyword-coverage.tssrc/mastra/evals/tests/completeness.test.tssrc/mastra/agents/csv_to_excalidraw.tssrc/mastra/agents/businessLegalAgents.tssrc/mastra/agents/researchAgent.tssrc/mastra/agents/package-publisher.tssrc/mastra/evals/scorers/prebuilt.tssrc/mastra/agents/learningExtractionAgent.tssrc/mastra/evals/scorers/factuality.scorer.tssrc/mastra/agents/reportAgent.tssrc/mastra/evals/scorers/custom-scorers.tssrc/mastra/evals/scorers/financial-scorers.tssrc/mastra/evals/tests/tool-call-accuracy.test.tssrc/mastra/agents/recharts.tssrc/mastra/agents/image_to_csv.tssrc/mastra/evals/scorers/utils.ts
**/*.{js,ts}
📄 CodeRabbit inference engine (.github/instructions/next-js.instructions.md)
Use parameterized queries or an ORM to prevent SQL injection attacks.
Files:
src/mastra/agents/editorAgent.tssrc/mastra/agents/contentStrategistAgent.tssrc/mastra/evals/tests/noise-sensitivity.test.tssrc/mastra/agents/copywriterAgent.tssrc/mastra/agents/acpAgent.tssrc/mastra/agents/evaluationAgent.tssrc/mastra/evals/scorers/keyword-coverage.tssrc/mastra/evals/tests/completeness.test.tssrc/mastra/agents/csv_to_excalidraw.tssrc/mastra/agents/businessLegalAgents.tssrc/mastra/agents/researchAgent.tssrc/mastra/agents/package-publisher.tssrc/mastra/evals/scorers/prebuilt.tssrc/mastra/agents/learningExtractionAgent.tssrc/mastra/evals/scorers/factuality.scorer.tssrc/mastra/agents/reportAgent.tssrc/mastra/evals/scorers/custom-scorers.tssrc/mastra/evals/scorers/financial-scorers.tssrc/mastra/evals/tests/tool-call-accuracy.test.tssrc/mastra/agents/recharts.tssrc/mastra/agents/image_to_csv.tssrc/mastra/evals/scorers/utils.ts
src/mastra/agents/*.ts
📄 CodeRabbit inference engine (src/mastra/agents/AGENTS.md)
src/mastra/agents/*.ts: Agent file naming convention: Create agent files using kebab-case with.tsextension in/src/mastra/agents/directory (e.g.,your-agent-name.ts)
Each agent must follow the pattern: export agent asconst yourAgent = new Agent({ id: 'your-agent-id', name: 'Your Agent Name', description: '...' })
Files:
src/mastra/agents/editorAgent.tssrc/mastra/agents/contentStrategistAgent.tssrc/mastra/agents/copywriterAgent.tssrc/mastra/agents/acpAgent.tssrc/mastra/agents/evaluationAgent.tssrc/mastra/agents/csv_to_excalidraw.tssrc/mastra/agents/businessLegalAgents.tssrc/mastra/agents/researchAgent.tssrc/mastra/agents/package-publisher.tssrc/mastra/agents/learningExtractionAgent.tssrc/mastra/agents/reportAgent.tssrc/mastra/agents/recharts.tssrc/mastra/agents/image_to_csv.ts
**/*.{ts,tsx,js,jsx,py,java,cs,rb,go,rs,cpp,c,h,hpp,swift,kotlin,php,scala,clj,groovy,lua,sh,bash}
📄 CodeRabbit inference engine (.github/instructions/self-explanatory-code-commenting.instructions.md)
**/*.{ts,tsx,js,jsx,py,java,cs,rb,go,rs,cpp,c,h,hpp,swift,kotlin,php,scala,clj,groovy,lua,sh,bash}: Write code that speaks for itself. Comment only when necessary to explain WHY, not WHAT. Avoid obvious comments that state what the code literally does.
Avoid redundant comments that simply repeat what the code is doing
Keep comments accurate and up-to-date with code changes. Remove or update outdated comments that no longer match the implementation.
Write comments for complex business logic that explain the WHY behind specific calculations or business rules
Document non-obvious algorithms with comments explaining the algorithm choice and its reasoning
Add comments explaining what regex patterns match, especially for complex patterns
Document API constraints, rate limits, gotchas, and external dependencies with explanatory comments
Avoid commenting out dead code. Use version control instead of maintaining commented code blocks.
Do not maintain code change history or modification logs as comments. Rely on git history and commit messages instead.
Avoid decorative divider comments (e.g., lines of equals signs or asterisks) for section separation
Ensure comments are placed appropriately above or adjacent to the code they describe
Write comments using proper grammar, spelling, and professional language
Prefer self-documenting code with clear variable/function names over adding comments to explain unclear code
Files:
src/mastra/agents/editorAgent.tssrc/mastra/agents/contentStrategistAgent.tssrc/mastra/evals/tests/noise-sensitivity.test.tssrc/mastra/agents/copywriterAgent.tssrc/mastra/agents/acpAgent.tssrc/mastra/agents/evaluationAgent.tssrc/mastra/evals/scorers/keyword-coverage.tssrc/mastra/evals/tests/completeness.test.tssrc/mastra/agents/csv_to_excalidraw.tssrc/mastra/agents/businessLegalAgents.tssrc/mastra/agents/researchAgent.tssrc/mastra/agents/package-publisher.tssrc/mastra/evals/scorers/prebuilt.tssrc/mastra/agents/learningExtractionAgent.tssrc/mastra/evals/scorers/factuality.scorer.tssrc/mastra/agents/reportAgent.tssrc/mastra/evals/scorers/custom-scorers.tssrc/mastra/evals/scorers/financial-scorers.tssrc/mastra/evals/tests/tool-call-accuracy.test.tssrc/mastra/agents/recharts.tssrc/mastra/agents/image_to_csv.tssrc/mastra/evals/scorers/utils.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/instructions/self-explanatory-code-commenting.instructions.md)
**/*.{ts,tsx,js,jsx}: Document public APIs with TSDoc/JSDoc comments including parameter descriptions, return types, examples, and thrown exceptions
Add TSDoc comments to configuration constants and environment variables explaining their source, reasoning, or constraints
Use TSDoc annotation tags (TODO, FIXME, HACK, NOTE, WARNING, PERF, SECURITY, BUG, REFACTOR, DEPRECATED) to mark special comments
Include file headers with @fileoverview, @author, @copyright, and @license tags to document file purpose and ownership
Document function parameters with @param tags, return values with @returns tags, and exceptions with @throws tags in TSDoc comments
Use @see tags in TSDoc comments to reference related functions, methods, or documentation
Include @example tags in public API documentation with code examples showing typical usage
**/*.{ts,tsx,js,jsx}: Use Mastra mcp tools (#mastradocs,#mastraChanges,#mastraexamples,#mastraBlog) for Mastra framework development to stay updated with latest features and best practices
When working with Next.js projects, always utilize thenext-devtools-mcpserver for all Next.js related queries
Files:
src/mastra/agents/editorAgent.tssrc/mastra/agents/contentStrategistAgent.tssrc/mastra/evals/tests/noise-sensitivity.test.tssrc/mastra/agents/copywriterAgent.tssrc/mastra/agents/acpAgent.tssrc/mastra/agents/evaluationAgent.tssrc/mastra/evals/scorers/keyword-coverage.tssrc/mastra/evals/tests/completeness.test.tssrc/mastra/agents/csv_to_excalidraw.tssrc/mastra/agents/businessLegalAgents.tssrc/mastra/agents/researchAgent.tssrc/mastra/agents/package-publisher.tssrc/mastra/evals/scorers/prebuilt.tssrc/mastra/agents/learningExtractionAgent.tssrc/mastra/evals/scorers/factuality.scorer.tssrc/mastra/agents/reportAgent.tssrc/mastra/evals/scorers/custom-scorers.tssrc/mastra/evals/scorers/financial-scorers.tssrc/mastra/evals/tests/tool-call-accuracy.test.tssrc/mastra/agents/recharts.tssrc/mastra/agents/image_to_csv.tssrc/mastra/evals/scorers/utils.ts
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/instructions/self-explanatory-code-commenting.instructions.md)
**/*.{ts,tsx}: Document interface and type definitions with TSDoc comments explaining their purpose and usage context
Document interface properties with /** */ comments explaining each field's purpose and constraints
Document generic type parameters with @template tags explaining what each type parameter represents
Use type guards with comments explaining the runtime validation logic being performed
Document advanced/complex TypeScript types with explanatory comments about their purpose and use cases
Files:
src/mastra/agents/editorAgent.tssrc/mastra/agents/contentStrategistAgent.tssrc/mastra/evals/tests/noise-sensitivity.test.tssrc/mastra/agents/copywriterAgent.tssrc/mastra/agents/acpAgent.tssrc/mastra/agents/evaluationAgent.tssrc/mastra/evals/scorers/keyword-coverage.tssrc/mastra/evals/tests/completeness.test.tssrc/mastra/agents/csv_to_excalidraw.tssrc/mastra/agents/businessLegalAgents.tssrc/mastra/agents/researchAgent.tssrc/mastra/agents/package-publisher.tssrc/mastra/evals/scorers/prebuilt.tssrc/mastra/agents/learningExtractionAgent.tssrc/mastra/evals/scorers/factuality.scorer.tssrc/mastra/agents/reportAgent.tssrc/mastra/evals/scorers/custom-scorers.tssrc/mastra/evals/scorers/financial-scorers.tssrc/mastra/evals/tests/tool-call-accuracy.test.tssrc/mastra/agents/recharts.tssrc/mastra/agents/image_to_csv.tssrc/mastra/evals/scorers/utils.ts
src/mastra/**/*
📄 CodeRabbit inference engine (src/AGENTS.md)
mastramodules can import fromutils, but must not import fromapporcli(excepttypes)
Files:
src/mastra/agents/editorAgent.tssrc/mastra/evals/AGENTS.mdsrc/mastra/agents/contentStrategistAgent.tssrc/mastra/evals/tests/noise-sensitivity.test.tssrc/mastra/agents/copywriterAgent.tssrc/mastra/agents/acpAgent.tssrc/mastra/agents/evaluationAgent.tssrc/mastra/evals/scorers/keyword-coverage.tssrc/mastra/evals/tests/completeness.test.tssrc/mastra/agents/csv_to_excalidraw.tssrc/mastra/agents/businessLegalAgents.tssrc/mastra/agents/researchAgent.tssrc/mastra/agents/package-publisher.tssrc/mastra/evals/scorers/prebuilt.tssrc/mastra/agents/learningExtractionAgent.tssrc/mastra/evals/scorers/factuality.scorer.tssrc/mastra/agents/reportAgent.tssrc/mastra/evals/scorers/custom-scorers.tssrc/mastra/evals/scorers/financial-scorers.tssrc/mastra/evals/tests/tool-call-accuracy.test.tssrc/mastra/agents/recharts.tssrc/mastra/agents/image_to_csv.tssrc/mastra/evals/scorers/utils.ts
src/mastra/agents/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
Agents must compose tools into specialized behaviors and be located in src/mastra/agents
Add agents under
src/mastra/agentsthat wire tools together into higher-level behaviors
Files:
src/mastra/agents/editorAgent.tssrc/mastra/agents/contentStrategistAgent.tssrc/mastra/agents/copywriterAgent.tssrc/mastra/agents/acpAgent.tssrc/mastra/agents/evaluationAgent.tssrc/mastra/agents/csv_to_excalidraw.tssrc/mastra/agents/businessLegalAgents.tssrc/mastra/agents/researchAgent.tssrc/mastra/agents/package-publisher.tssrc/mastra/agents/learningExtractionAgent.tssrc/mastra/agents/reportAgent.tssrc/mastra/agents/recharts.tssrc/mastra/agents/image_to_csv.ts
**/*.{css,tsx,ts}
📄 CodeRabbit inference engine (AGENTS.md)
Use Tailwind CSS 4 with oklch color variables for styling
Files:
src/mastra/agents/editorAgent.tssrc/mastra/agents/contentStrategistAgent.tssrc/mastra/evals/tests/noise-sensitivity.test.tssrc/mastra/agents/copywriterAgent.tssrc/mastra/agents/acpAgent.tssrc/mastra/agents/evaluationAgent.tssrc/mastra/evals/scorers/keyword-coverage.tssrc/mastra/evals/tests/completeness.test.tssrc/mastra/agents/csv_to_excalidraw.tssrc/mastra/agents/businessLegalAgents.tssrc/mastra/agents/researchAgent.tssrc/mastra/agents/package-publisher.tssrc/mastra/evals/scorers/prebuilt.tssrc/mastra/agents/learningExtractionAgent.tssrc/mastra/evals/scorers/factuality.scorer.tssrc/mastra/agents/reportAgent.tssrc/mastra/evals/scorers/custom-scorers.tssrc/mastra/evals/scorers/financial-scorers.tssrc/mastra/evals/tests/tool-call-accuracy.test.tssrc/mastra/agents/recharts.tssrc/mastra/agents/image_to_csv.tssrc/mastra/evals/scorers/utils.ts
src/mastra/{tools,agents,workflows}/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
Use Zod schemas for strict input/output validation in tools, agents, and workflows
Files:
src/mastra/agents/editorAgent.tssrc/mastra/agents/contentStrategistAgent.tssrc/mastra/agents/copywriterAgent.tssrc/mastra/agents/acpAgent.tssrc/mastra/agents/evaluationAgent.tssrc/mastra/agents/csv_to_excalidraw.tssrc/mastra/agents/businessLegalAgents.tssrc/mastra/agents/researchAgent.tssrc/mastra/agents/package-publisher.tssrc/mastra/agents/learningExtractionAgent.tssrc/mastra/agents/reportAgent.tssrc/mastra/agents/recharts.tssrc/mastra/agents/image_to_csv.ts
**/*.md
📄 CodeRabbit inference engine (.github/instructions/markdown.instructions.md)
**/*.md: Use appropriate heading levels (H2, H3, etc.) to structure markdown content. Do not use H1 headings, as these will be generated from the title. Use##for H2 and###for H3 in a hierarchical manner. Recommend restructuring if content includes H4 or higher levels.
Use bullet points (with-) or numbered lists (with1.) for lists in markdown. Indent nested lists with two spaces and ensure proper indentation and spacing.
Use fenced code blocks (triple backticks) for code snippets in markdown. Specify the language after the opening backticks for syntax highlighting (e.g.,csharp).
Use proper markdown syntax for links:[link text](URL). Ensure that link text is descriptive and URLs are valid and accessible.
Use proper markdown syntax for images:. Include a brief description of the image in the alt text for accessibility.
Use markdown tables (with|delimiters) for tabular data. Ensure proper formatting, alignment, and inclusion of headers.
Limit line length to 80 characters in markdown for readability. Use soft line breaks for long paragraphs.
Use appropriate whitespace in markdown to separate sections and improve readability. Use blank lines between sections and avoid excessive whitespace.
Include YAML front matter at the beginning of markdown files with required metadata fields:post_title,author1,post_slug,microsoft_alias,featured_image,categories(from /categories.txt),tags,ai_note,summary, andpost_date.
Files:
src/mastra/evals/AGENTS.md
**/*.test.{js,jsx,ts,tsx}
📄 CodeRabbit inference engine (.github/instructions/next-js.instructions.md)
**/*.test.{js,jsx,ts,tsx}: Write unit tests for individual components to ensure they are working correctly.
Use React Testing Library for component testing to encourage testing from a user perspective.
Mock external dependencies to isolate components during testing.
Use Jest or Mocha as a testing framework for unit and integration tests.
Use Mock Service Worker (msw) to intercept and mock API calls during testing.
Co-locate test files with components using a consistent naming convention (e.g.,ComponentName.test.js).
Test edge cases and error conditions to ensure components are robust.
Files:
src/mastra/evals/tests/noise-sensitivity.test.tssrc/mastra/evals/tests/completeness.test.tssrc/mastra/evals/tests/tool-call-accuracy.test.ts
src/mastra/**/tests/**/*.test.ts
📄 CodeRabbit inference engine (AGENTS.md)
Add or update tests for any behavioral change in code; unit tests should be in src/mastra/tools/tests and other directories as appropriate
Files:
src/mastra/evals/tests/noise-sensitivity.test.tssrc/mastra/evals/tests/completeness.test.tssrc/mastra/evals/tests/tool-call-accuracy.test.ts
**/*.test.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use Vitest for unit testing with vitest run --coverage for coverage reports
Files:
src/mastra/evals/tests/noise-sensitivity.test.tssrc/mastra/evals/tests/completeness.test.tssrc/mastra/evals/tests/tool-call-accuracy.test.ts
🧬 Code graph analysis (7)
src/mastra/evals/tests/noise-sensitivity.test.ts (2)
src/mastra/evals/scorers/prebuilt.ts (1)
createNoiseSensitivityScorerLLM(78-99)src/mastra/evals/scorers/utils.ts (2)
createAgentTestRun(111-113)createTestMessage(107-109)
src/mastra/evals/tests/completeness.test.ts (2)
src/mastra/evals/scorers/prebuilt.ts (1)
createCompletenessScorer(5-34)src/mastra/evals/scorers/utils.ts (1)
createAgentTestRun(111-113)
src/mastra/evals/scorers/prebuilt.ts (1)
src/mastra/evals/scorers/utils.ts (4)
extractInputMessages(36-44)extractAgentResponseMessages(46-52)extractToolCalls(88-104)getAssistantMessageFromRunOutput(8-20)
src/mastra/evals/scorers/factuality.scorer.ts (1)
src/mastra/config/google.ts (1)
googleAIFlashLite(105-105)
src/mastra/evals/scorers/custom-scorers.ts (1)
src/mastra/config/google.ts (1)
googleAIFlashLite(105-105)
src/mastra/evals/scorers/financial-scorers.ts (1)
src/mastra/config/google.ts (1)
googleAIFlashLite(105-105)
src/mastra/evals/tests/tool-call-accuracy.test.ts (2)
src/mastra/evals/scorers/prebuilt.ts (1)
createToolCallAccuracyScorerCode(36-71)src/mastra/evals/scorers/utils.ts (2)
createAgentTestRun(111-113)createTestMessage(107-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Agent
- GitHub Check: CodeQL analysis (javascript-typescript)
- GitHub Check: Sourcery review
- GitHub Check: Codacy Security Scan
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (10)
src/mastra/agents/businessLegalAgents.ts (1)
7-7: LGTM! Clean import removal.The
BatchPartsProcessorimport was correctly removed as it's not used anywhere in this file. The agents retain their processor configurations (UnicodeNormalizerandTokenLimiterProcessor) and scorer setups.src/mastra/agents/evaluationAgent.ts (1)
131-134: LGTM - Scorer configuration removed as intended.The removal of the responseQuality scorer and the empty scorers configuration aligns with the PR objectives to reorganize the evaluation framework. The agent maintains its core functionality while the scoring logic is being migrated to the new
src/mastra/evals/scorers/structure.src/mastra/agents/csv_to_excalidraw.ts (1)
178-180: LGTM - Consistent with scorer removal pattern.The removal of the structureScorer configuration is consistent with the broader refactoring effort across all agent files in this PR.
src/mastra/agents/copywriterAgent.ts (1)
139-141: LGTM - Scorer configuration removed.Consistent with the evaluation framework reorganization across the codebase.
src/mastra/evals/tests/tool-call-accuracy.test.ts (1)
5-18: Well-structured test coverage for tool call accuracy.The test cases appropriately verify both the basic functionality (expected tool is called) and the strict mode behavior (fails when multiple tools are called). The use of test utilities promotes consistency across the test suite.
src/mastra/agents/learningExtractionAgent.ts (1)
66-68: LGTM - Scorer removal consistent with PR objectives.The structureScorer has been removed as part of the evaluation framework reorganization.
tests/test-results/test-results.json (1)
1-1: Verify the empty test results state.The test results file shows all zero counts and
success: falsewith an emptytestResultsarray. This could indicate:
- Tests haven't been run after the evaluation framework changes
- Test results were intentionally cleared as part of the refactoring
Please confirm whether tests should be executed to populate this file or if the empty state is intentional for this PR.
src/mastra/evals/tests/completeness.test.ts (1)
5-18: Excellent test coverage for completeness scorer.The tests effectively verify both positive (>90% coverage) and negative (0% coverage) scenarios. The test data is clear and the assertions are appropriate for the scorer's behavior.
src/mastra/agents/contentStrategistAgent.ts (1)
100-102: LGTM - Multiple scorers removed as intended.The removal of both structureScorer and creativityScorer configurations aligns with the PR's goal to reorganize the evaluation framework. The agent's core functionality remains intact.
src/mastra/evals/scorers/utils.ts (1)
107-113: Test helpers look good.The test helper functions are clear, simple, and serve their purpose well. No issues identified.
| scorers: { | ||
| relevancy: { | ||
| scorer: createAnswerRelevancyScorer({ model: googleAIFlashLite }), | ||
| sampling: { type: "ratio", rate: 0.5 } | ||
| } | ||
|
|
||
| }, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Empty scorers configuration - consider cleanup.
The scorers configuration is now empty, consistent with the PR's goal to remove scorer integrations. Consider removing the empty scorers property entirely if it's not required, or add a comment indicating this is intentional if it serves as a placeholder for future scorers.
🤖 Prompt for AI Agents
In src/mastra/agents/acpAgent.ts around lines 131 to 133, the scorers property
is an empty object literal; remove the entire empty `scorers` property to clean
up unused configuration, or if it must remain as an explicit placeholder,
replace it with a concise comment explaining it's intentionally left empty for
future scorer integrations (e.g., "// scorers intentionally omitted — no
integrations required"); ensure formatting and trailing commas remain valid
after removal or comment insertion.
| scorers: { | ||
| summaryQuality: { | ||
| scorer: summaryQualityScorer, | ||
| sampling: { type: 'ratio', rate: 0.6 }, | ||
| } | ||
|
|
||
| }, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Empty scorers configuration - consider cleanup.
The scorers configuration is now empty following the removal of summaryQualityScorer. Consider removing the empty scorers property entirely if it's not required, or add a comment indicating this is intentional if it serves as a placeholder for future scorers.
🤖 Prompt for AI Agents
In src/mastra/agents/editorAgent.ts around lines 124-126 the object contains an
empty scorers: {} entry left after removing summaryQualityScorer; either delete
the entire scorers property to clean up the config or replace it with a
single-line comment (e.g. // placeholder for future scorers) to indicate it is
intentionally empty, and adjust trailing commas/formatting accordingly so
linting/type checks remain satisfied.
| scorers: { | ||
| csvValidity: { | ||
| scorer: csvValidityScorer, | ||
| sampling: { type: 'ratio', rate: 1.0 }, | ||
| }, | ||
|
|
||
|
|
||
| }, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Remove extra whitespace in empty scorers object.
The scorers configuration is now empty following the removal of csvValidityScorer. Line 151 contains unnecessary whitespace. Consider either removing the empty scorers property entirely or cleaning up the formatting.
Apply this diff to remove the extra blank line:
scorers: {
-
-
+
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| scorers: { | |
| csvValidity: { | |
| scorer: csvValidityScorer, | |
| sampling: { type: 'ratio', rate: 1.0 }, | |
| }, | |
| }, | |
| scorers: { | |
| }, |
🤖 Prompt for AI Agents
In src/mastra/agents/image_to_csv.ts around lines 149 to 152, the empty scorers
object contains an extra blank line; remove the unnecessary whitespace by either
deleting the entire empty "scorers" property if it's unused, or collapse it to a
single-line empty object (e.g. scorers: {}), ensuring consistent formatting with
surrounding properties.
| scorers: { | ||
| taskCompletion: { | ||
| scorer: taskCompletionScorer, | ||
| sampling: { type: 'ratio', rate: 0.4 }, | ||
| }, | ||
|
|
||
| }, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Empty scorers configuration - consider cleanup.
The scorers configuration is now empty following the removal of taskCompletionScorer. Consider removing the empty scorers property entirely if it's not required, or add a comment indicating this is intentional if it serves as a placeholder for future scorers.
🤖 Prompt for AI Agents
In src/mastra/agents/package-publisher.ts around lines 198 to 200 there is an
empty scorers: { } block left after removal of taskCompletionScorer; remove the
entire scorers property if it is not used anywhere, or if it is intentionally a
placeholder, replace the empty object with a one-line comment (e.g. //
placeholder for future scorers) so the intent is clear to readers and linters.
| options: { | ||
| }, | ||
| scorers: { | ||
| taskCompletion: { | ||
| scorer: taskCompletionScorer, | ||
| sampling: { type: 'ratio', rate: 0.5 }, | ||
| }, | ||
| sourceDiversity: { | ||
| scorer: sourceDiversityScorer, | ||
| sampling: { type: 'ratio', rate: 0.3 }, | ||
| }, | ||
| financialData: { | ||
| scorer: financialDataScorer, | ||
| sampling: { type: 'ratio', rate: 0.6 }, | ||
| }, | ||
|
|
||
| }, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Remove empty options and scorers properties.
Both options and scorers are empty objects. Consider removing them entirely to reduce noise and improve maintainability.
memory: pgMemory,
- options: {
- },
- scorers: {
-
- },
inputProcessors: [📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| options: { | |
| }, | |
| scorers: { | |
| taskCompletion: { | |
| scorer: taskCompletionScorer, | |
| sampling: { type: 'ratio', rate: 0.5 }, | |
| }, | |
| sourceDiversity: { | |
| scorer: sourceDiversityScorer, | |
| sampling: { type: 'ratio', rate: 0.3 }, | |
| }, | |
| financialData: { | |
| scorer: financialDataScorer, | |
| sampling: { type: 'ratio', rate: 0.6 }, | |
| }, | |
| }, | |
| memory: pgMemory, | |
| inputProcessors: [ |
🤖 Prompt for AI Agents
In src/mastra/agents/recharts.ts around lines 507 to 511, the object literal
contains empty properties "options" and "scorers"; remove those two keys
entirely from the object to reduce noise, adjust surrounding commas/formatting
so the object remains valid, and run TypeScript/linters to confirm no references
break (if any code expects them, replace removal with explicit undefined or
update callers).
| export function getSystemMessagesFromRunInput(input: RunInput): string[] { | ||
| if (!input) {return []} | ||
| if (Array.isArray(input)) {return (input).filter((m) => m.role === 'system').map((m) => String(m.content ?? ''))} | ||
| if (typeof input === 'object') { | ||
| const sys = (input as { systemMessages?: Array<Message | string> }).systemMessages ?? [] | ||
| return sys.map((m) => typeof m === 'string' ? m : String((m).content ?? m)) | ||
| } | ||
| return [] | ||
| } |
There was a problem hiding this comment.
Clarify fallback logic and avoid String() coercion.
Two issues:
- Line 75:
String(m.content ?? '')has the same coercion problem discussed earlier - Line 78: The fallback
String((m).content ?? m)is unclear - it falls back to stringifying the entire message object if content is missing, which may not be the intended behavior
Consider making the logic more explicit:
export function getSystemMessagesFromRunInput(input: RunInput): string[] {
if (!input) {return []}
- if (Array.isArray(input)) {return (input).filter((m) => m.role === 'system').map((m) => String(m.content ?? ''))}
+ if (Array.isArray(input)) {
+ return (input)
+ .filter((m) => m.role === 'system' && typeof m.content === 'string')
+ .map((m) => m.content as string)
+ }
if (typeof input === 'object') {
const sys = (input as { systemMessages?: Array<Message | string> }).systemMessages ?? []
- return sys.map((m) => typeof m === 'string' ? m : String((m).content ?? m))
+ return sys
+ .map((m) => typeof m === 'string' ? m : (typeof m.content === 'string' ? m.content : null))
+ .filter((m): m is string => m !== null)
}
return []
}🤖 Prompt for AI Agents
In src/mastra/evals/scorers/utils.ts around lines 73-81, replace the loose
String() coercions and unclear fallback with explicit checks: when input is an
array, filter role==='system' then for each message return the content only if
it exists and is a string, otherwise return an empty string; when input is an
object with systemMessages, for each entry return the entry itself if it's a
string, else if it's an object and has a string content property return that
content, otherwise return an empty string. Remove usages of String(...) and
avoid stringifying whole message objects as a fallback.
| export function extractToolCalls(output: unknown): { tools: string[]; toolCallInfos: Array<{ toolName: string; toolCallId?: string | number; messageIndex: number; invocationIndex: number }> } { | ||
| const messages = Array.isArray(output) ? output : [output] | ||
| const tools: string[] = [] | ||
| const toolCallInfos: Array<{ toolName: string; toolCallId?: string | number; messageIndex: number; invocationIndex: number }> = [] | ||
| messages.forEach((m: unknown, msgIdx: number) => { | ||
| const invs = (m as Record<string, unknown>)?.toolInvocations | ||
| if (Array.isArray(invs)) { | ||
| invs.forEach((t: unknown, invIdx: number) => { | ||
| const tt = t as Record<string, unknown> | ||
| const toolName = String(tt.toolName ?? '') | ||
| tools.push(toolName) | ||
| toolCallInfos.push({ toolName, toolCallId: tt.toolCallId as string | number | undefined, messageIndex: msgIdx, invocationIndex: invIdx }) | ||
| }) | ||
| } | ||
| }) | ||
| return { tools, toolCallInfos } | ||
| } |
There was a problem hiding this comment.
Validate toolName is a string before extraction.
Line 97 uses String(tt.toolName ?? '') which will produce "[object Object]" or similar garbage if toolName is not a string. Tool names should be validated as strings.
Apply this diff:
invs.forEach((t: unknown, invIdx: number) => {
const tt = t as Record<string, unknown>
- const toolName = String(tt.toolName ?? '')
+ const toolName = typeof tt.toolName === 'string' ? tt.toolName : ''
+ if (!toolName) return // Skip invalid tool calls
tools.push(toolName)
toolCallInfos.push({ toolName, toolCallId: tt.toolCallId as string | number | undefined, messageIndex: msgIdx, invocationIndex: invIdx })
})🤖 Prompt for AI Agents
In src/mastra/evals/scorers/utils.ts around lines 88 to 104, the code currently
coerces tt.toolName with String(...) which can produce garbage for non-string
values; change the extraction to validate that tt.toolName is a string before
using it (e.g., if (typeof tt.toolName !== 'string') skip this invocation or set
toolName = ''/undefined and do not push it), only push to tools and
toolCallInfos when toolName is a valid string, and keep toolCallId extraction
as-is; this prevents pushing "[object Object]" or other invalid names into the
results.
| it('scores 1.0 when output contains all input terms', async () => { | ||
| const scorer = createCompletenessScorer() | ||
| const run = createAgentTestRun({ inputMessages: [{ role: 'user', content: 'List apples oranges and bananas' }], output: [{ role: 'assistant', content: 'Apples, oranges, bananas' }] }) | ||
| const res = await (scorer as any).run({ input: run.input, output: run.output }) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding proper type definitions for scorer.run().
Similar to the tool-call-accuracy test, the cast to any bypasses type checking. Consider adding proper type definitions for the scorer's run method to improve type safety.
🤖 Prompt for AI Agents
In src/mastra/evals/tests/completeness.test.ts around line 9, the test casts
scorer to any to call run, which bypasses TypeScript checks; change this by
adding or importing a proper Scorer interface/type that declares run(input:
{input: string; output: string}): Promise<YourResultType> (or the actual
input/output/result types used elsewhere), type the scorer variable as that
Scorer, and then call await scorer.run(...) without casting; update any helper
types or imports so the test compiles with correct types.
| it('gives high score when output equals baseline', async () => { | ||
| const scorer = createNoiseSensitivityScorerLLM({ baselineResponse: 'The capital of France is Paris.' }) | ||
| const run = createAgentTestRun({ output: [createTestMessage({ role: 'assistant', content: 'The capital of France is Paris.' })] }) | ||
| const res = await (scorer as any).run({ input: run.input, output: run.output }) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Avoid type assertions with any - use proper typing.
The use of (scorer as any).run() bypasses TypeScript's type safety. This suggests the scorer type definitions may not expose the run method properly, or the test is accessing internal APIs.
Consider one of the following approaches:
- If
runis a public API, update the scorer type definitions to expose it - If this is an internal method, use a more specific type assertion
- Use a test-specific interface that extends the scorer type
Example alternative:
- const res = await (scorer as any).run({ input: run.input, output: run.output })
+ // Define a test interface that includes the run method
+ interface RunnableScorer {
+ run(params: { input: any; output: any }): Promise<{ score: number }>
+ }
+ const res = await (scorer as RunnableScorer).run({ input: run.input, output: run.output })Also applies to: 16-16
🤖 Prompt for AI Agents
In src/mastra/evals/tests/noise-sensitivity.test.ts around lines 9 and 16, the
test uses (scorer as any).run(...) which disables TypeScript safety; instead
either update the scorer's exported type to include a public run method or
create a narrow test-specific interface describing run(input: ..., output: ...):
Promise<...> and cast scorer to that interface; modify the test to import the
proper scorer type (or extend it) and replace (scorer as any) with a typed cast
so the run method is called via a correctly typed object.
| it('passes when expected tool is called', async () => { | ||
| const scorer = createToolCallAccuracyScorerCode({ expectedTool: 'weather-tool' }) | ||
| const run = createAgentTestRun({ output: [createTestMessage({ role: 'assistant', content: 'Ok', toolInvocations: [{ toolCallId: '1', toolName: 'weather-tool' }] })] }) | ||
| const res = await (scorer as any).run({ input: run.input, output: run.output }) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding proper type definitions for scorer.run().
The cast to any bypasses TypeScript's type checking. If the scorer's run method is part of the public API, consider adding proper type definitions to improve type safety and IDE support.
If the run method should be part of the scorer's public interface, you could:
- const res = await (scorer as any).run({ input: run.input, output: run.output })
+ const res = await scorer.run({ input: run.input, output: run.output })Alternatively, if run is internal, consider documenting why the cast is necessary or providing a typed test utility function.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/mastra/evals/tests/tool-call-accuracy.test.ts around line 9, the test
casts scorer to any to call run, which bypasses TypeScript checks; replace the
any cast by declaring a proper type for scorer that includes a typed run
signature (or import/update the scorer interface/type in the package API), or
create a small typed test helper/interface in the test that declares run(input:
ScorerInputType): Promise<ScorerOutputType> (use the actual input/output shapes
used in tests), then use that type for scorer so calls to run are type-checked
and IDE-friendly.
There was a problem hiding this comment.
Pull request overview
This PR refactors the evaluation system by removing unused scorer files and introducing a new, more organized structure for evaluation helpers and prebuilt scorers. The changes consolidate scorer logic into src/mastra/evals/scorers/ and clean up agent configurations by removing scorer references. However, there are critical bugs that prevent the code from compiling.
Key Changes:
- Removed old scorer files from
src/mastra/scorers/directory - Added new evaluation structure under
src/mastra/evals/scorers/with prebuilt scorers, utilities, and custom scorers - Removed test setup files (
testSetup.ts,globalSetup.ts) and updated vitest configuration - Cleaned up agent files by removing scorer imports and configurations
Critical Issues:
businessLegalAgents.tsimports non-existent functions (createAnswerRelevancyScorer,createToxicityScorer) from prebuilt scorers, causing compilation failures- Test suite is broken (0 tests running with
success: false)
Reviewed changes
Copilot reviewed 39 out of 39 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| vitest.config.ts | Removed globalSetup and setupFiles references for deleted test setup files |
| tests/test-results/test-results.json | Test results show 0 tests running with failure status, indicating broken test configuration |
| testSetup.ts | Deleted test setup file containing mock configurations and global setup |
| globalSetup.ts | Deleted global setup file for evals |
| src/mastra/scorers/*.ts | Deleted old scorer files (weather-scorer, tone-consistency, structure, sql-validity, script-scorers, factuality, custom-scorers, csv-validity, financial-scorers, index) |
| src/mastra/evals/scorers/utils.ts | Added utility functions for extracting messages, tool calls, and test helpers |
| src/mastra/evals/scorers/prebuilt.ts | Added prebuilt scorers: completeness, tool call accuracy, noise sensitivity, bias, and prompt alignment |
| src/mastra/evals/scorers/keyword-coverage.ts | Added keyword coverage scorer |
| src/mastra/evals/scorers/financial-scorers.ts | Moved and updated financial data scorer with new import paths |
| src/mastra/evals/scorers/factuality.scorer.ts | Moved and simplified factuality scorer |
| src/mastra/evals/scorers/custom-scorers.ts | Moved and simplified custom scorers (source diversity, research completeness) |
| src/mastra/evals/AGENTS.md | Added documentation for evaluation helpers directory |
| src/mastra/evals/tests/*.test.ts | Added tests for completeness, noise sensitivity, and tool call accuracy scorers |
| src/mastra/color-change-tool.ts | Deleted unused color change tool |
| src/mastra/agents/*.ts | Removed scorer imports and configurations from multiple agents (scriptWriter, research, report, recharts, package-publisher, learningExtraction, image_to_csv, evaluation, editor, csv_to_excalidraw, copywriter, contentStrategist, businessLegal, acp) |
Comments suppressed due to low confidence (2)
src/mastra/agents/businessLegalAgents.ts:12
- The functions
createAnswerRelevancyScorerandcreateToxicityScorerare imported from@mastra/evals/scorers/prebuiltbut they don't exist in the prebuilt.ts file. These functions need to be implemented or the imports should be removed. Currently, this file will fail to compile.
import {
createAnswerRelevancyScorer,
createToxicityScorer
} from '@mastra/evals/scorers/prebuilt';
src/mastra/evals/scorers/custom-scorers.ts:1
- Unused import runEvals.
import { createScorer, runEvals } from '@mastra/core/evals';
| if (Array.isArray(sources) && sources.length === 0) { | ||
| return { | ||
| diversityScore: 0, | ||
| uniqueDomains: 0, | ||
| totalSources: 0, | ||
| domainBreakdown: {}, | ||
| issues: ['No sources found'] | ||
| } | ||
| } | ||
|
|
||
| const domains = (sources || []).map((url: string) => { | ||
| try { | ||
| const urlObj = new URL(url) | ||
| return urlObj.hostname.replace('www.', '') | ||
| } catch { | ||
| return 'invalid' | ||
| } | ||
| }).filter(d => d !== 'invalid') | ||
|
|
||
| const uniqueDomains = new Set(domains) | ||
| const domainBreakdown: Record<string, number> = {} | ||
|
|
||
| domains.forEach(domain => { | ||
| domainBreakdown[domain] = (domainBreakdown[domain] || 0) + 1 | ||
| }) | ||
|
|
||
| const diversityScore = Math.min(uniqueDomains.size / Math.max((sources || []).length * 0.5, 1), 1) | ||
|
|
||
| const issues: string[] = [] | ||
| if (uniqueDomains.size < 2) {issues.push('Limited domain diversity - mostly single source')} | ||
| if ((sources || []).length < 3) {issues.push('Insufficient number of sources')} | ||
| if (Object.values(domainBreakdown).some(count => count > (sources || []).length * 0.6)) { |
There was a problem hiding this comment.
The condition Array.isArray(sources) && sources.length === 0 at line 40 is redundant because sources is always an array (initialized as let sources: string[] = [] on line 20). The check can be simplified to just sources.length === 0. Additionally, the defensive checks (sources || []) on lines 50, 66, 70, and 71 are unnecessary since sources cannot be null or undefined.
| export const keywordCoverageScorer = createScorer({ id: 'keyword-coverage', name: 'Keyword Coverage', description: 'Measures coverage of required keywords in output', type: 'agent' }) | ||
| .generateScore(({ run }) => { | ||
| const input = (run.input && JSON.stringify(run.input)) || '' | ||
| const output = (run.output && JSON.stringify(run.output)) || '' | ||
| const required = (run.requestContext && (run.requestContext as any).requiredKeywords) ?? [] | ||
| if (!required || required.length === 0) {return 1} | ||
| const matched = required.filter((k: string) => output.toLowerCase().includes(k.toLowerCase())).length | ||
| return matched / required.length | ||
| }) |
There was a problem hiding this comment.
The scorer configuration is missing the type: 'agent' property. This should be added for consistency with other scorers in the codebase.
| }); | ||
|
|
||
| // Other scorers (researchCompleteness, summaryQuality, taskCompletion, responseQuality, creativity) | ||
| export const researchCompletenessScorer = createScorer({ id: 'research-completeness-scorer', name: 'Research Completeness', description: 'Evaluates if the research comprehensively covers the topic from multiple angles', judge: { model: googleAIFlashLite, instructions: 'You are an expert research evaluator focused on completeness and depth of analysis.' } }) |
There was a problem hiding this comment.
The type: 'agent' property is missing from the scorer configuration. This should be added for consistency with the pattern in other scorers.
| export const researchCompletenessScorer = createScorer({ id: 'research-completeness-scorer', name: 'Research Completeness', description: 'Evaluates if the research comprehensively covers the topic from multiple angles', judge: { model: googleAIFlashLite, instructions: 'You are an expert research evaluator focused on completeness and depth of analysis.' } }) | |
| export const researchCompletenessScorer = createScorer({ type: 'agent', id: 'research-completeness-scorer', name: 'Research Completeness', description: 'Evaluates if the research comprehensively covers the topic from multiple angles', judge: { model: googleAIFlashLite, instructions: 'You are an expert research evaluator focused on completeness and depth of analysis.' } }) |
| } | ||
| }) | ||
| .generateScore(({ results }) => { | ||
| return results.analyzeStepResult.diversityScore |
There was a problem hiding this comment.
The generateReason method is missing from the sourceDiversityScorer. While not strictly required, it's a best practice to include this method to provide explanatory feedback about the score, similar to how the old implementation included detailed reasoning about domain diversity.
| return results.analyzeStepResult.diversityScore | |
| return results.analyzeStepResult.diversityScore | |
| }) | |
| .generateReason(({ results }) => { | |
| const { | |
| diversityScore, | |
| uniqueDomains, | |
| totalSources, | |
| domainBreakdown, | |
| issues | |
| } = results.analyzeStepResult; | |
| let reason = `Diversity Score: ${(diversityScore * 100).toFixed(1)}%\n`; | |
| reason += `Unique Domains: ${uniqueDomains}\n`; | |
| reason += `Total Sources: ${totalSources}\n`; | |
| reason += `Domain Breakdown:\n`; | |
| for (const [domain, count] of Object.entries(domainBreakdown)) { | |
| reason += ` - ${domain}: ${count}\n`; | |
| } | |
| if (issues && issues.length > 0) { | |
| reason += `Issues:\n`; | |
| for (const issue of issues) { | |
| reason += ` - ${issue}\n`; | |
| } | |
| } | |
| return reason.trim(); |
| } | ||
| // if no baseline, be conservative | ||
| return 0.5 | ||
| }) |
There was a problem hiding this comment.
The generateReason method is missing from createNoiseSensitivityScorerLLM. This would be helpful to explain the Levenshtein distance calculation and why the output diverged from the baseline.
| }) | |
| }) | |
| .generateReason(({ results }) => { | |
| const opts = results.preprocessStepResult?.opts ?? {}; | |
| const outputText = results.preprocessStepResult?.outputText ?? ''; | |
| const baseline = typeof opts.baselineResponse === 'string' ? opts.baselineResponse : undefined; | |
| if (typeof baseline === 'string' && baseline.trim().length > 0) { | |
| const baselineTrim = baseline.trim(); | |
| if (outputText.trim() === baselineTrim) { | |
| return 'Output matches the baseline response exactly. Score: 1 (robust to noise).'; | |
| } else { | |
| const distance = levenshtein(outputText, baselineTrim); | |
| const maxLen = Math.max(outputText.length, baselineTrim.length, 1); | |
| const score = Math.max(0, 1 - (distance / maxLen)); | |
| return `Output diverges from baseline. Levenshtein distance: ${distance} (normalized score: ${score.toFixed(3)}). Baseline: "${baselineTrim}", Output: "${outputText.trim()}"`; | |
| } | |
| } | |
| return 'No baseline response provided; default score of 0.5 assigned.'; | |
| }) |
| } | ||
| const covered = inputTerms.filter(t => outputTerms.includes(t)).length | ||
| return covered / inputTerms.length | ||
| }) |
There was a problem hiding this comment.
The generateReason method is missing from the createCompletenessScorer. Adding this method would provide valuable context about which input elements were missing from the output.
| }) | |
| }) | |
| .generateReason(({ results }) => { | |
| const missingElements = results.analyzeStepResult?.missingElements ?? []; | |
| if (missingElements.length === 0) { | |
| return "All input elements are covered in the output."; | |
| } | |
| return `Missing input elements in output: ${missingElements.join(', ')}`; | |
| }); |
| } | ||
| // default: pass if any tools were called | ||
| return tools.length > 0 ? 1 : 0 | ||
| }) |
There was a problem hiding this comment.
The generateReason method is missing from createToolCallAccuracyScorerCode. Adding this would help explain why a tool call accuracy check passed or failed, especially useful when debugging tool call issues.
| }) | |
| }) | |
| .generateReason(({ results }) => { | |
| const preprocess = results.preprocessStepResult ?? { tools: [], opts: {} } | |
| const tools = Array.isArray(preprocess.tools) ? preprocess.tools : [] | |
| const { expectedTool, strictMode, expectedToolOrder } = preprocess.opts ?? {} | |
| if (expectedToolOrder && expectedToolOrder.length > 0) { | |
| const seq = expectedToolOrder | |
| let idx = 0 | |
| for (const t of tools) { | |
| if (t === seq[idx]) { idx++ } | |
| if (idx === seq.length) { break } | |
| } | |
| const correct = idx === seq.length | |
| if (strictMode === true) { | |
| return correct | |
| ? `All expected tools were called in the correct order (strict mode).` | |
| : `Not all expected tools were called in the correct order (strict mode). Expected order: [${seq.join(', ')}], got: [${tools.join(', ')}]` | |
| } | |
| return correct | |
| ? `All expected tools were called in the correct order.` | |
| : `Not all expected tools were called in the correct order. Expected order: [${seq.join(', ')}], got: [${tools.join(', ')}]` | |
| } | |
| if (typeof expectedTool === 'string' && expectedTool.length > 0) { | |
| const has = tools.includes(expectedTool) | |
| if (strictMode === true) { | |
| return (tools.length === 1 && has) | |
| ? `Only the expected tool "${expectedTool}" was called (strict mode).` | |
| : `Expected only "${expectedTool}" to be called (strict mode), but got: [${tools.join(', ')}]` | |
| } | |
| return has | |
| ? `Expected tool "${expectedTool}" was called.` | |
| : `Expected tool "${expectedTool}" was not called. Tools called: [${tools.join(', ')}]` | |
| } | |
| // default: pass if any tools were called | |
| return tools.length > 0 | |
| ? `At least one tool was called: [${tools.join(', ')}]` | |
| : `No tools were called.` | |
| }) |
| import { Agent } from '@mastra/core/agent'; | ||
|
|
||
| import { BatchPartsProcessor, TokenLimiterProcessor, UnicodeNormalizer } from '@mastra/core/processors'; | ||
| import { TokenLimiterProcessor, UnicodeNormalizer } from '@mastra/core/processors'; |
There was a problem hiding this comment.
The removed import BatchPartsProcessor may still be needed elsewhere in the file. Verify that this processor is not used anywhere in the code.
| @@ -0,0 +1,143 @@ | |||
| import { createScorer, runEvals } from '@mastra/core/evals'; | |||
There was a problem hiding this comment.
The runEvals import on line 1 is unused in this file. It should be removed to keep imports clean.
| import { createScorer, runEvals } from '@mastra/core/evals'; | |
| import { createScorer } from '@mastra/core/evals'; |
|
|
||
| export const keywordCoverageScorer = createScorer({ id: 'keyword-coverage', name: 'Keyword Coverage', description: 'Measures coverage of required keywords in output', type: 'agent' }) | ||
| .generateScore(({ run }) => { | ||
| const input = (run.input && JSON.stringify(run.input)) || '' |
There was a problem hiding this comment.
Unused variable input.
| const input = (run.input && JSON.stringify(run.input)) || '' |
Greptile OverviewGreptile SummaryThis PR migrates scorer implementations from Critical issues found:
Positive changes:
Minor issues:
Confidence Score: 0/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Dev as Developer
participant Agent as Mastra Agent
participant Eval as Eval Runner
participant Scorer as Scorer (prebuilt/custom)
participant Judge as LLM Judge (optional)
participant Storage as Mastra Storage
Dev->>Agent: Execute agent with input
Agent->>Agent: Process request with tools
Agent->>Storage: Store execution trace
Note over Eval,Scorer: Evaluation Phase (decoupled from agent)
Dev->>Eval: runEvals(scorers, runs)
Eval->>Scorer: preprocess({ run })
Scorer->>Scorer: Extract messages/tool calls
Scorer-->>Eval: Return preprocessed data
alt LLM-based scorer
Eval->>Scorer: analyze({ results })
Scorer->>Judge: Send prompt with context
Judge-->>Scorer: Return structured analysis
Scorer-->>Eval: Return analysis result
else Code-based scorer
Eval->>Scorer: analyze({ results })
Scorer->>Scorer: Compute metrics (coverage, accuracy)
Scorer-->>Eval: Return analysis result
end
Eval->>Scorer: generateScore({ results })
Scorer->>Scorer: Calculate final score (0-1)
Scorer-->>Eval: Return score
opt Generate reason
Eval->>Scorer: generateReason({ results, score })
Scorer-->>Eval: Return explanation
end
Eval->>Storage: Store eval results
Eval-->>Dev: Return scores and metrics
|
There was a problem hiding this comment.
Additional Comments (6)
-
src/mastra/agents/weather-agent.ts, line 5 (link)syntax: import from deleted file
../scorers/weather-scorerwill cause build failure -
src/mastra/index.ts, line 17-18 (link)syntax: imports from deleted files
./scorers/custom-scorersand./scorers/weather-scorerwill cause build failure -
src/mastra/agents/index.test.ts, line 2 (link)syntax: import from deleted file will cause test failure
-
src/mastra/a2a/a2aCoordinatorAgent.ts, line 8 (link)syntax: import from deleted file will cause build failure
-
src/mastra/a2a/codingA2ACoordinator.ts, line 8 (link)syntax: import from deleted file will cause build failure
-
src/mastra/experiments/agent-experiments.ts, line 24 (link)syntax: import from deleted
../scorersindex will cause build failure
39 files reviewed, 7 comments
| .generateScore(({ run }) => { | ||
| const input = (run.input && JSON.stringify(run.input)) || '' | ||
| const output = (run.output && JSON.stringify(run.output)) || '' | ||
| const required = (run.requestContext && (run.requestContext as any).requiredKeywords) ?? [] |
There was a problem hiding this comment.
style: unsafe type casting with as any - requestContext should have proper typing
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/mastra/evals/scorers/keyword-coverage.ts
Line: 7:7
Comment:
**style:** unsafe type casting with `as any` - `requestContext` should have proper typing
How can I resolve this? If you propose a fix, please make it concise.
weather-scorer.tsandtestSetup.tsas they are no longer needed.src/mastra/evals/scorers/:custom-scorers.ts: includessourceDiversityScorerandresearchCompletenessScorer.factuality.scorer.ts: evaluates factual claims in outputs.financial-scorers.ts: checks integrity of financial data outputs.keyword-coverage.ts: measures coverage of required keywords in outputs.prebuilt.ts: contains various prebuilt scorer functions.utils.ts: utility functions for handling messages and extracting data.completeness.test.ts: tests for completeness scoring.noise-sensitivity.test.ts: tests for noise sensitivity scoring.tool-call-accuracy.test.ts: tests for tool call accuracy scoring.vitest.config.tsto remove references to deleted setup files.src/mastra/evals/AGENTS.md.Summary by Sourcery
Introduce a new evals/scorers module with reusable scorer utilities and migrate/remove legacy agent-level scorers and unused support files.
New Features:
Enhancements:
Build:
Tests:
Chores: