Conversation
- Removed the Qdrant vector configuration file (`qdrant.ts`) and its associated logic. - Introduced a new LanceDB vector configuration file (`lance.ts`) to support vector storage and similarity search using LanceDB. - Implemented comprehensive settings for LanceDB, including memory management, embedding generation, and advanced filtering capabilities. - Added functions for generating embeddings, validating filters, and performing storage operations with detailed logging. - Created new JSON files for crawler statistics and session pool state to track request metrics and session management. - Added a request queue JSON file to manage requests with unique identifiers and metadata. chore: update storage and request queue structures - Created `SDK_CRAWLER_STATISTICS_0.json` to log crawler performance metrics. - Created `SDK_SESSION_POOL_STATE.json` to manage session states and usability. - Created a request queue JSON file to handle requests with relevant metadata and error tracking.
Learn moreAll Green is an AI agent that automatically: ✅ Addresses code review comments ✅ Fixes failing CI checks ✅ Resolves merge conflicts |
Reviewer's GuideRefactors vector storage to introduce a new LanceDB-based configuration and memory setup, updates several agents’ output-processing and instructions, adjusts pg and Upstash vector configs, tweaks dependencies, and adds JSON-backed crawler/request/session state files. Sequence diagram for updated agent output processing pipelinesequenceDiagram
actor User
participant DaneAgent as DaneAgent
participant Model as LLMModel
participant TokenLimiter as TokenLimiterProcessor
participant BatchParts as BatchPartsProcessor
User->>DaneAgent: Send request
DaneAgent->>Model: Generate response
Model-->>DaneAgent: Streamed model output
DaneAgent->>TokenLimiter: Process output (limit to 128576 tokens)
TokenLimiter-->>DaneAgent: Truncated or original output
DaneAgent->>BatchParts: Batch parts
BatchParts-->>DaneAgent: Batched chunks
DaneAgent-->>User: Final processed response
Class diagram for updated LanceDB memory and vector configurationclassDiagram
class LanceStorage {
+create(dbUri, storageName, tablePrefix) LanceStorage
}
class LanceVectorStore {
+create(dbPath) LanceVectorStore
}
class Memory {
+storage
+vector
+embedder
+options
}
class LanceConfig {
+dbPath
+tableName
+embeddingDimension
+embeddingModel
}
class LanceStorageConfig {
+storageName
+dbUri
+storageOptions
+tablePrefix
}
class LanceMetadataFilter
<<interface>> LanceMetadataFilter
class LanceRawFilter
<<type>> LanceRawFilter
class LanceTools {
+lanceGraphTool
+lanceQueryTool
}
class EmbeddingUtils {
+generateEmbeddings(texts, options) number[][]
}
class StorageUtils {
+formatStorageMessages(messages) Message[]
+performStorageOperation(operation, operationName, metadata) T
}
class Message {
+id string
+content string
+role string
+createdAt Date
+metadata Record
}
LanceConfig <.. LanceTools : uses
LanceConfig <.. EmbeddingUtils : uses
LanceStorageConfig <.. LanceStorage : configuredBy
LanceStorage ..> Memory : storage
LanceVectorStore ..> Memory : vector
Memory o-- Message : manages
EmbeddingUtils ..> LanceConfig : reads
StorageUtils ..> LanceStorage : operatesOn
StorageUtils ..> LanceVectorStore : operatesOn
LanceMetadataFilter <.. LanceRawFilter : transformedTo
LanceTools ..> LanceVectorStore : vectorStoreName
LanceTools ..> Memory : usesEmbeddings
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
🤖 Hi @ssdeanx, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughThis PR consolidates LanceDB configuration from a vector-specific module into a unified config file, updates agent output processors to include BatchPartsProcessor alongside TokenLimiterProcessor, changes vector storage indexing from HNSW to IVFFlat, updates dependencies, and adjusts tool references in agents. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Summary of ChangesHello @ssdeanx, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily focuses on refactoring the application's vector storage configuration by introducing a new LanceDB implementation for enhanced vector storage and similarity search. It replaces an older LanceDB configuration and updates various AI agent settings, including their processing capabilities and available tools. The PR also adds new JSON files for tracking crawler statistics, managing session states, and handling request queues, improving overall system observability. Notably, while the PR description indicates the removal of Qdrant, the changes show a renaming of the Qdrant configuration file and the addition of its package dependency. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
🤖 I'm sorry @ssdeanx, but I was unable to process your request. Please see the logs for more details. |
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- The LanceDB vector store is created with a hardcoded path (
"/path/to/db"); consider wiring this up toLANCE_CONFIG.dbPathor an env var to avoid mismatches between configuration and actual storage location. - Both
lanceStorageandvectorStoreare initialized at module load time withawait, which can introduce startup latency and coupling; consider lazy-initializing these in a factory or init function so callers can better control when the cost is paid and how errors are handled. - The
BatchPartsProcessorconfiguration (batchSize, maxWaitTime, emitOnNonText) is duplicated across multiple agents indane.ts; extracting this into a shared constant or helper would reduce repetition and keep these parameters consistent.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The LanceDB vector store is created with a hardcoded path (`"/path/to/db"`); consider wiring this up to `LANCE_CONFIG.dbPath` or an env var to avoid mismatches between configuration and actual storage location.
- Both `lanceStorage` and `vectorStore` are initialized at module load time with `await`, which can introduce startup latency and coupling; consider lazy-initializing these in a factory or init function so callers can better control when the cost is paid and how errors are handled.
- The `BatchPartsProcessor` configuration (batchSize, maxWaitTime, emitOnNonText) is duplicated across multiple agents in `dane.ts`; extracting this into a shared constant or helper would reduce repetition and keep these parameters consistent.
## Individual Comments
### Comment 1
<location> `src/mastra/config/lance.ts:5` </location>
<code_context>
+import { createVectorQueryTool, createGraphRAGTool } from '@mastra/rag'
+import { google } from '@ai-sdk/google'
+import { embedMany } from 'ai'
+import { log } from './logger
+import { Memory } from '@mastra/memory'
+
</code_context>
<issue_to_address>
**issue (bug_risk):** The `log` import is syntactically invalid and will prevent this file from compiling.
The line is missing the closing quote (and semicolon, plus `}` if it was originally there). It should be a valid import such as `import { log } from './logger'` so the module can parse and compile correctly.
</issue_to_address>
### Comment 2
<location> `src/mastra/config/pg-storage.ts:96-102` </location>
<code_context>
// HNSW index configuration to support high-dimensional embeddings (>2000 dimensions)
indexConfig: {
- type: 'hnsw', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
+ type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
metric: 'cosine', // Distance metric for normalized embeddings
- hnsw: { m: parseFloat(process.env.PG_M ?? '32'), efConstruction: parseInt(process.env.PG_EF ?? '96') }, // hnsw configuration
</code_context>
<issue_to_address>
**suggestion:** The index type comment is now misleading after switching from HNSW to IVFFlat.
The inline comment still references HNSW constraints (`unlike HNSW limit of 2000`) even though the index type is now `'ivfflat'`, which can mislead future maintainers about what’s actually in use. Please update the comment to accurately describe IVFFlat (and any relevant tradeoffs) or remove the HNSW-specific wording.
```suggestion
scope: 'resource', // 'resource' | 'thread'
// IVFFlat index configuration for high-dimensional embeddings and scalable approximate search
indexConfig: {
type: 'ivfflat', // IVFFlat index type (approximate search over high-dimensional embeddings; tuned via ivf.lists)
metric: 'cosine', // Distance metric for normalized embeddings
ivf: { lists: 1080 }, // IVFFlat configuration (number of inverted lists; higher = better recall, more memory)
},
```
</issue_to_address>
### Comment 3
<location> `src/mastra/config/lance.ts:46` </location>
<code_context>
+ LANCE_STORAGE_CONFIG.tablePrefix
+)
+
+const vectorStore = await LanceVectorStore.create("/path/to/db");
+
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The Lance vector store is created with a hard-coded path instead of the defined `LANCE_CONFIG` settings.
`LANCE_CONFIG.dbPath` / `LANCE_CONFIG.tableName` are defined but `LanceVectorStore.create` uses the literal `"/path/to/db"`. Please derive this path (and table/collection name if applicable) from `LANCE_CONFIG` so Lance storage is configured from a single source of truth and doesn’t drift from the declared settings.
Suggested implementation:
```typescript
const vectorStore = await LanceVectorStore.create(LANCE_CONFIG.dbPath);
```
If `LanceVectorStore.create` in your codebase supports specifying a table/collection name, you should also update the call to include `LANCE_CONFIG.tableName`, for example:
- `LanceVectorStore.create(LANCE_CONFIG.dbPath, LANCE_CONFIG.tableName)`
or
- `LanceVectorStore.create({ dbPath: LANCE_CONFIG.dbPath, tableName: LANCE_CONFIG.tableName })`
depending on the actual function signature used elsewhere in the project. Ensure any other places instantiating `LanceVectorStore` are updated to use `LANCE_CONFIG` so there is a single source of truth for LanceDB configuration.
</issue_to_address>
### Comment 4
<location> `src/mastra/config/lance.ts:56-65` </location>
<code_context>
+ */
+export async function generateEmbeddings(
+ texts: string[],
+ options: {
+ model?: string
+ dimensions?: number
+ } = {}
+): Promise<number[][]> {
+ try {
</code_context>
<issue_to_address>
**suggestion:** The `generateEmbeddings` options type suggests a string model, but the implementation uses the `google.textEmbedding` model object.
Here `options.model` is typed as `string`, but `LANCE_CONFIG.embeddingModel` is a `google.textEmbedding('gemini-embedding-001')` model object, so `embedMany` receives a `model` that may be either a string or that object while the type only declares `string`. Please update the type to match what `embedMany` actually accepts (e.g., a union of string/model or the concrete model type), or separate the model name from the model instance passed to `embedMany` to avoid type confusion.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| options: { | ||
| generateTitle: process.env.LANCE_THREAD_GENERATE_TITLE !== 'false', | ||
| // Message management | ||
| lastMessages: parseInt(process.env.LANCE_MEMORY_LAST_MESSAGES ?? '500'), | ||
| // Advanced semantic recall with LanceDB configuration | ||
| semanticRecall: { | ||
| topK: parseInt(process.env.LANCE_SEMANTIC_TOP_K ?? '5'), | ||
| messageRange: { | ||
| before: parseInt( | ||
| process.env.LANCE_SEMANTIC_RANGE_BEFORE ?? '3' |
There was a problem hiding this comment.
suggestion: The generateEmbeddings options type suggests a string model, but the implementation uses the google.textEmbedding model object.
Here options.model is typed as string, but LANCE_CONFIG.embeddingModel is a google.textEmbedding('gemini-embedding-001') model object, so embedMany receives a model that may be either a string or that object while the type only declares string. Please update the type to match what embedMany actually accepts (e.g., a union of string/model or the concrete model type), or separate the model name from the model instance passed to embedMany to avoid type confusion.
There was a problem hiding this comment.
Code Review
This pull request introduces a major refactoring of the vector storage configuration by replacing the previous implementation with LanceDB. It includes a new, comprehensive configuration file for LanceDB, updates to agents to use new processing capabilities, and adjustments to other storage configurations like PostgreSQL and Upstash.
My review focuses on the new LanceDB implementation, where I've found a few critical issues related to a syntax error and a hardcoded path that need to be addressed. I've also pointed out a potential type mismatch and opportunities to improve configuration clarity and logging. Additionally, there's a minor correction for a comment in the PostgreSQL storage configuration.
Overall, this is a significant and well-structured update. Addressing these points will help ensure the new LanceDB integration is robust and maintainable.
| import { createVectorQueryTool, createGraphRAGTool } from '@mastra/rag' | ||
| import { google } from '@ai-sdk/google' | ||
| import { embedMany } from 'ai' | ||
| import { log } from './logger |
| LANCE_STORAGE_CONFIG.tablePrefix | ||
| ) | ||
|
|
||
| const vectorStore = await LanceVectorStore.create("/path/to/db"); |
There was a problem hiding this comment.
The path "/path/to/db" is hardcoded for creating the LanceVectorStore. This appears to be a placeholder and will fail in any environment. This should use a configurable value, similar to how LANCE_CONFIG.dbPath is used elsewhere in this file.
const vectorStore = await LanceVectorStore.create(LANCE_CONFIG.dbPath);| } = {} | ||
| ): Promise<number[][]> { | ||
| try { | ||
| const model = options.model ?? LANCE_CONFIG.embeddingModel |
There was a problem hiding this comment.
There is a potential type mismatch here. The options.model is typed as a string, but it's being assigned to the model variable which is then passed to embedMany. The embedMany function and LANCE_CONFIG.embeddingModel expect a GoogleTextEmbeddingModel object, not a string. If a string is passed via options.model, it will likely cause a runtime error. The type for options.model in the generateEmbeddings function signature should be updated to match the expected model object type.
| }, | ||
| scope: 'resource', // 'resource' | 'thread' | ||
| // LanceDB-specific index configuration | ||
| indexConfig: {}, |
There was a problem hiding this comment.
The indexConfig for semanticRecall is an empty object, but the preceding comment suggests that "LanceDB-specific index configuration" is expected. If you are intentionally using default settings, it would be clearer to add a comment to that effect. Otherwise, the necessary index configuration appears to be missing.
| lastMessages: parseInt(process.env.LANCE_MEMORY_LAST_MESSAGES ?? '500'), | ||
| semanticRecall: { | ||
| topK: parseInt(process.env.LANCE_SEMANTIC_TOP_K ?? '5'), | ||
| }, |
There was a problem hiding this comment.
This log statement re-parses the environment variables, which is redundant and can lead to inconsistencies if the parsing logic differs from what's used in the lanceMemory configuration. It's better to log the values directly from the lanceMemory.options object to ensure the logs accurately reflect the running configuration.
lastMessages: lanceMemory.options.lastMessages,
semanticRecall: {
topK: lanceMemory.options.semanticRecall.topK,
},| // HNSW index configuration to support high-dimensional embeddings (>2000 dimensions) | ||
| indexConfig: { | ||
| type: 'hnsw', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000) | ||
| type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000) |
There was a problem hiding this comment.
The comment here is inaccurate. ivfflat is an 'Inverted File with Flat compression' index, not a 'flat index type'. A flat index performs an exhaustive search, while ivfflat partitions data for faster searching. Updating the comment will improve clarity for future maintenance.
| type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000) | |
| type: 'ivfflat', // IVFFlat index type, which is suitable for high-dimensional embeddings. |
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
src/mastra/agents/codingAgents.ts (3)
364-364: InconsistentprojectRootusage across agents.The
refactoringAgentstill computes and referencesprojectRoot(lines 364, 373), whilecodeArchitectAgenthas commented it out (line 49) and the other agents don't use it at all. This creates inconsistency across the agent configurations.If the PR intends to phase out
projectRootreferences (as suggested by the changes incodeArchitectAgent), this agent should be updated similarly. Otherwise, ifprojectRootis still needed forrefactoringAgent, consider why it was removed fromcodeArchitectAgent.♻️ Proposed fix to align with other agents
const userTier = requestContext.get('user-tier') ?? 'free' const language = requestContext.get('language') ?? 'en' - const projectRoot = requestContext.get('projectRoot') ?? process.cwd() return { role: 'system', content: `You are a Senior Refactoring Specialist. Your role is to improve code quality through safe, incremental refactoring. **Context:** - User Tier: ${userTier} - Language: ${language} - Project Root: ${projectRoot}**Context:** - User Tier: ${userTier} - Language: ${language}Also applies to: 373-373
125-125: Consider extracting shared output processor configuration.All four agents use identical
outputProcessorsconfiguration. Consider extracting this to a shared constant to improve maintainability and reduce duplication.♻️ Proposed refactor
At the top of the file after imports:
const DEFAULT_OUTPUT_PROCESSORS = [ new TokenLimiterProcessor(128000), new BatchPartsProcessor({ batchSize: 20, maxWaitTime: 100, emitOnNonText: true }) ];Then in each agent:
- outputProcessors: [new TokenLimiterProcessor(128000), new BatchPartsProcessor({ batchSize: 20, maxWaitTime: 100, emitOnNonText: true })] + outputProcessors: DEFAULT_OUTPUT_PROCESSORSNote: Only apply this if agents are expected to share the same configuration long-term. If per-agent customization is anticipated, keep them separate.
Also applies to: 238-238, 350-350, 468-468
33-33: Remove commented-out code and ensure consistentprojectRootusage across agents.The
projectRootfield is declared inCodingRuntimeContext(line 33) but is commented out incodeArchitectAgent(line 49) while actively used inrefactoringAgent(lines 364, 373). This inconsistency violates the guideline against maintaining commented-out code. Either remove the commented code and keepprojectRootonly where needed, or implement it consistently across all agents that declare it in the interface.src/mastra/config/upstash.ts (3)
4-4: Unused import after removing type assertion.The
GoogleGenerativeAIProviderOptionstype is imported but no longer used since thesatisfiesassertion was removed. This creates dead code.♻️ Remove unused import or restore type safety
Option 1: Remove unused import
-import type { GoogleGenerativeAIProviderOptions } from '@ai-sdk/google';Option 2: Restore type assertion for type safety
providerOptions: { google: { retrievalConfig: { dimensions: 1536 } }, - }, + } satisfies { google: GoogleGenerativeAIProviderOptions },Also applies to: 145-147
150-151: EmptydatabaseConfigobject serves no purpose.The empty object
databaseConfig: {}adds no value. Either populate it with the necessary Upstash configuration or remove it entirely.♻️ Remove or populate databaseConfig
- databaseConfig: { - },Or add relevant configuration:
databaseConfig: { upstash: { // Add Upstash-specific configuration }, },
31-55: Documentation comments reference incorrect vector store.The JSDoc comments mention "Pinecone" in multiple places (lines 35, 54), but the implementation uses
UpstashVector. This inconsistency can confuse maintainers.📝 Update documentation to match implementation
-* - Uses PineconeVector for semantic search with cloud-based vectors (768-dim gemini embeddings) +* - Uses UpstashVector for semantic search with cloud-based vectors (1536-dim gemini embeddings) -* @classdesc Shared memory instance for all agents using Upstash for storage and [Pinecone] for vector search +* @classdesc Shared memory instance for all agents using Upstash for storage and UpstashVector for vector search
🤖 Fix all issues with AI agents
In @src/mastra/agents/codingAgents.ts:
- Line 49: Remove the dead commented-out line containing "const projectRoot =
requestContext.get('projectRoot') ?? process.cwd()" from the codingAgents
module; simply delete that commented code (the commented variable/projectRoot
reference) so the file no longer contains commented-out dead code and rely on
version control if the line is ever needed again.
In @src/mastra/agents/dane.ts:
- Around line 225-231: Extract the duplicated outputProcessors array into a
shared constant (e.g., defaultOutputProcessors) and replace each agent's inline
outputProcessors with that constant; specifically, define a top-level constant
that contains new TokenLimiterProcessor(128576) and new BatchPartsProcessor({
batchSize: 10, maxWaitTime: 75, emitOnNonText: true }) and then update each
agent's outputProcessors property to reference defaultOutputProcessors instead
of repeating the array.
- Around line 44-50: The outputProcessors array has inconsistent indentation and
line breaks; refactor the array so each processor entry is on its own line with
consistent indentation and commas (e.g., start the array on one line, place new
TokenLimiterProcessor(128576) and new BatchPartsProcessor({...}) each on their
own indented lines, ensure the BatchPartsProcessor object keys (batchSize,
maxWaitTime, emitOnNonText) are consistently indented and the closing brackets
and commas line up). Locate the outputProcessors declaration where
TokenLimiterProcessor and BatchPartsProcessor are used and apply the same
spacing/indentation style as other agent definitions for readability.
In @src/mastra/config/lance.ts:
- Line 5: The import statement for log is missing its closing quote causing a
syntax error; update the import of symbol "log" from module './logger' to a
properly terminated string (e.g., add the closing quote and end the statement)
so the file parses correctly.
- Around line 218-227: The validateLanceFilter function currently only rejects
null/non-object values; enhance it to recursively validate the structure of
LanceMetadataFilter by ensuring top-level entries are plain objects or allowed
primitive types, arrays contain only valid filter objects or primitives, and
nested objects follow the same rules; update validateLanceFilter to traverse
keys (e.g., field names) and for each value verify it is a
string/number/boolean, an array of those, or an object that itself passes the
same validation, and throw a descriptive Error when encountering unexpected
types or empty objects/arrays so malformed nested filters and array elements are
rejected.
- Line 46: The code currently calls LanceVectorStore.create with a hardcoded
placeholder path ("/path/to/db"); replace this with a configured value (e.g.,
read from process.env.LANCE_DB_PATH or a LANCE_CONFIG value) and pass that
variable into LanceVectorStore.create instead; also add a short runtime check
that the env/config value exists (throw or log a clear error and exit if
missing) so the vector store is not initialized with an unintended path.
- Around line 232-257: The model fallback in generateEmbeddings assigns
LANCE_CONFIG.embeddingModel (a TextEmbeddingModel) to a variable that can also
be a string, causing a type mismatch when passed to embedMany; change the
assignment so if options.model is provided (string) you convert it to a
TextEmbeddingModel (e.g., via google.textEmbedding(options.model)) and otherwise
use LANCE_CONFIG.embeddingModel, then pass that TextEmbeddingModel to embedMany
and update the log to report the model name accordingly; references: function
generateEmbeddings, options.model, LANCE_CONFIG.embeddingModel, and the
embedMany call.
In @src/mastra/config/pg-storage.ts:
- Around line 99-101: The log output still prints the old HNSW fields while the
actual index configuration uses type: 'ivfflat' and ivf: { lists: 1080 }; update
the logging block that prints the index config so it reflects the real keys and
values (log type: 'ivfflat' and the ivf.lists value) instead of the hnsw object,
and remove or conditionalize any code that prints hnsw.m / hnsw.efConstruction
so logs always match the active config (check the code that constructs/prints
the index config object and adjust the properties it reads to use ivf.lists and
type).
- Line 99: Update the misleading inline comment that sits next to the "type:
'ivfflat'" setting: replace the phrase "flat index type" with a note that
ivfflat is an approximate inverted-file index using flat quantization (inverted
lists with clustering), not a true brute-force flat index, so it performs
approximate ANN search rather than exact brute-force search.
In
@src/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.json:
- Around line 1-27: The committed JSON files like SDK_CRAWLER_STATISTICS_0.json,
SDK_SESSION_POOL_STATE.json and runtime request queue files in
src/mastra/public/storage are ephemeral and must be ignored: update .gitignore
to exclude src/mastra/public/storage/ (or add patterns
**/SDK_CRAWLER_STATISTICS_*.json, **/SDK_SESSION_POOL_STATE.json and
src/mastra/public/storage/request_queues/) and remove the already committed
files from the index (use git rm --cached <file> or git rm -r --cached
src/mastra/public/storage/) then commit the .gitignore change and the removal so
these runtime artifacts stop being tracked.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (12)
.github/agents/gpt-5-beast-mode.agent.mdpackage.jsonsrc/mastra/agents/codingAgents.tssrc/mastra/agents/dane.tssrc/mastra/config/lance.tssrc/mastra/config/pg-storage.tssrc/mastra/config/qdrant.tssrc/mastra/config/upstash.tssrc/mastra/config/vector/lance.tssrc/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.jsonsrc/mastra/public/storage/key_value_stores/default/SDK_SESSION_POOL_STATE.jsonsrc/mastra/public/storage/request_queues/default/jBMmaaywUPlwSK3.json
💤 Files with no reviewable changes (1)
- src/mastra/config/vector/lance.ts
🧰 Additional context used
📓 Path-based instructions (15)
**/*.{js,jsx,ts,tsx}
📄 CodeRabbit inference engine (.github/instructions/next-js.instructions.md)
**/*.{js,jsx,ts,tsx}: Usenext/dynamicfor dynamic imports to load components only when needed, improving initial load time.
Usenext/imagecomponent for automatic image optimization, including lazy loading and responsive images.
Use React.memo to prevent unnecessary re-renders of components.
Use the<Link prefetch>tag to prefetch pages that are likely to be visited.
Use getServerSideProps, getStaticProps, or server components for fetching data on the server-side.
Use SWR or React Query for client-side data fetching and caching.
Use CSS Modules, Styled Components, or Tailwind CSS for component-level styling. Prefer Tailwind CSS for rapid development.
Use React Context, Zustand, Jotai, or Recoil for managing global state. Avoid Redux unless necessary.
Usereact-hook-formfor managing forms and validation.
Only fetch the data that is needed by the component to avoid over-fetching.
Avoid long-running synchronous operations in the main thread to prevent blocking.
Always usesetStateor hooks to update state instead of mutating state directly.
Include a complete dependency array inuseEffecthooks to prevent unexpected behavior.
Avoid writing server-side code in client components to prevent exposing secrets or causing unexpected behavior.
Usetry...catchblocks for handling errors in asynchronous operations.
Implement error boundary components usinggetDerivedStateFromErrororcomponentDidCatchlifecycle methods.
Sanitize user input to prevent Cross-Site Scripting (XSS) attacks. Be especially careful when rendering HTML directly from user input.
Store authentication tokens in HTTP-only cookies or local storage securely.
Implement role-based access control to restrict access to sensitive resources.
Clean up event listeners and timers inuseEffecthooks to avoid memory leaks.
Only update state when necessary to reduce the number of re-renders and improve performance.
Use immutable data structures and avoid mutating data directly to prevent unexpected...
Files:
src/mastra/config/qdrant.tssrc/mastra/agents/codingAgents.tssrc/mastra/config/pg-storage.tssrc/mastra/config/lance.tssrc/mastra/config/upstash.tssrc/mastra/agents/dane.ts
**/*.{js,ts}
📄 CodeRabbit inference engine (.github/instructions/next-js.instructions.md)
Use parameterized queries or an ORM to prevent SQL injection attacks.
Files:
src/mastra/config/qdrant.tssrc/mastra/agents/codingAgents.tssrc/mastra/config/pg-storage.tssrc/mastra/config/lance.tssrc/mastra/config/upstash.tssrc/mastra/agents/dane.ts
**/*.{ts,tsx,js,jsx,py,java,cs,rb,go,rs,cpp,c,h,hpp,swift,kotlin,php,scala,clj,groovy,lua,sh,bash}
📄 CodeRabbit inference engine (.github/instructions/self-explanatory-code-commenting.instructions.md)
**/*.{ts,tsx,js,jsx,py,java,cs,rb,go,rs,cpp,c,h,hpp,swift,kotlin,php,scala,clj,groovy,lua,sh,bash}: Write code that speaks for itself. Comment only when necessary to explain WHY, not WHAT. Avoid obvious comments that state what the code literally does.
Avoid redundant comments that simply repeat what the code is doing
Keep comments accurate and up-to-date with code changes. Remove or update outdated comments that no longer match the implementation.
Write comments for complex business logic that explain the WHY behind specific calculations or business rules
Document non-obvious algorithms with comments explaining the algorithm choice and its reasoning
Add comments explaining what regex patterns match, especially for complex patterns
Document API constraints, rate limits, gotchas, and external dependencies with explanatory comments
Avoid commenting out dead code. Use version control instead of maintaining commented code blocks.
Do not maintain code change history or modification logs as comments. Rely on git history and commit messages instead.
Avoid decorative divider comments (e.g., lines of equals signs or asterisks) for section separation
Ensure comments are placed appropriately above or adjacent to the code they describe
Write comments using proper grammar, spelling, and professional language
Prefer self-documenting code with clear variable/function names over adding comments to explain unclear code
Files:
src/mastra/config/qdrant.tssrc/mastra/agents/codingAgents.tssrc/mastra/config/pg-storage.tssrc/mastra/config/lance.tssrc/mastra/config/upstash.tssrc/mastra/agents/dane.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/instructions/self-explanatory-code-commenting.instructions.md)
**/*.{ts,tsx,js,jsx}: Document public APIs with TSDoc/JSDoc comments including parameter descriptions, return types, examples, and thrown exceptions
Add TSDoc comments to configuration constants and environment variables explaining their source, reasoning, or constraints
Use TSDoc annotation tags (TODO, FIXME, HACK, NOTE, WARNING, PERF, SECURITY, BUG, REFACTOR, DEPRECATED) to mark special comments
Include file headers with @fileoverview, @author, @copyright, and @license tags to document file purpose and ownership
Document function parameters with @param tags, return values with @returns tags, and exceptions with @throws tags in TSDoc comments
Use @see tags in TSDoc comments to reference related functions, methods, or documentation
Include @example tags in public API documentation with code examples showing typical usage
**/*.{ts,tsx,js,jsx}: Organize imports in the following order: (1) external framework imports (React, Next.js, Mastra), (2) type imports, (3) internal imports (config, tools, utils)
Use camelCase for functions and variables
Use PascalCase for classes, types, and interfaces
Use UPPER_SNAKE_CASE for constants
Use kebab-case for file names (e.g., weather-tool.ts, user-profile.tsx)
Enforce strict equality with===instead of==
Always use curly braces for control flow statements
Prefer arrow functions over function declarations in callbacks and higher-order functions
Useconstfor variables that are not reassigned; useletonly when necessary
Use object shorthand syntax (e.g.,{ name, age }instead of{ name: name, age: age })
Implement structured error handling with try-catch blocks that return error objects or throw custom errors with context
Enforce ESLint rules for strict equality, curly braces, no unused variables, and no explicit any types
Files:
src/mastra/config/qdrant.tssrc/mastra/agents/codingAgents.tssrc/mastra/config/pg-storage.tssrc/mastra/config/lance.tssrc/mastra/config/upstash.tssrc/mastra/agents/dane.ts
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/instructions/self-explanatory-code-commenting.instructions.md)
**/*.{ts,tsx}: Document interface and type definitions with TSDoc comments explaining their purpose and usage context
Document interface properties with /** */ comments explaining each field's purpose and constraints
Document generic type parameters with @template tags explaining what each type parameter represents
Use type guards with comments explaining the runtime validation logic being performed
Document advanced/complex TypeScript types with explanatory comments about their purpose and use cases
**/*.{ts,tsx}: Useinterfacefor public APIs andtypefor internal definitions
Always use explicit return types for public functions
Never useanytype; useunknownor proper type definitions instead
Use type-only imports withimport typefor TypeScript types
Use optional chaining (?.) for nullable access
Use nullish coalescing (??) for default values
Files:
src/mastra/config/qdrant.tssrc/mastra/agents/codingAgents.tssrc/mastra/config/pg-storage.tssrc/mastra/config/lance.tssrc/mastra/config/upstash.tssrc/mastra/agents/dane.ts
src/mastra/**/*
📄 CodeRabbit inference engine (src/AGENTS.md)
mastramodules can import fromutils, but must not import fromapporcli(excepttypes)
Files:
src/mastra/config/qdrant.tssrc/mastra/agents/codingAgents.tssrc/mastra/public/storage/request_queues/default/jBMmaaywUPlwSK3.jsonsrc/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.jsonsrc/mastra/config/pg-storage.tssrc/mastra/public/storage/key_value_stores/default/SDK_SESSION_POOL_STATE.jsonsrc/mastra/config/lance.tssrc/mastra/config/upstash.tssrc/mastra/agents/dane.ts
**/*.{js,ts,jsx,tsx,java,py,cs,go,rb,php,swift,kt,scala,rs,cpp,c,h}
📄 CodeRabbit inference engine (.github/instructions/code-review-generic.instructions.md)
**/*.{js,ts,jsx,tsx,java,py,cs,go,rb,php,swift,kt,scala,rs,cpp,c,h}: Use descriptive and meaningful names for variables, functions, and classes
Apply Single Responsibility Principle: each function/class does one thing well
Follow DRY (Don't Repeat Yourself): eliminate code duplication
Keep functions small and focused (ideally < 20-30 lines)
Avoid deeply nested code (max 3-4 levels)
Avoid magic numbers and strings; use named constants instead
Code should be self-documenting; use comments only when necessary
Implement proper error handling at appropriate levels with meaningful error messages
Avoid silent failures or ignored exceptions; fail fast and validate inputs early
Use appropriate error types/exceptions with meaningful context
Validate and sanitize all user inputs
Use parameterized queries for database access; never use string concatenation for SQL queries
Implement proper authentication checks before accessing resources
Verify user has permission to perform actions; implement proper authorization
Use established cryptographic libraries; never roll your own crypto implementation
Avoid N+1 query problems; use proper indexing and eager loading for database queries
Use appropriate algorithms with suitable time/space complexity for the use case
Utilize caching for expensive or repeated operations
Ensure proper cleanup of connections, files, and streams to prevent resource leaks
Implement pagination for large result sets
Load data only when needed (lazy loading pattern)
Document all public APIs with purpose, parameters, and return values
Add explanatory comments for non-obvious logic
No commented-out code or unresolved TODO comments without associated tickets should remain in commits
Ensure code follows consistent style and conventions with the rest of the codebase
Files:
src/mastra/config/qdrant.tssrc/mastra/agents/codingAgents.tssrc/mastra/config/pg-storage.tssrc/mastra/config/lance.tssrc/mastra/config/upstash.tssrc/mastra/agents/dane.ts
**/*.{js,ts,jsx,tsx,java,py,cs,go,rb,php,swift,kt,scala,rs,cpp,c,h,json,yaml,yml,env,config}
📄 CodeRabbit inference engine (.github/instructions/code-review-generic.instructions.md)
Never include passwords, API keys, tokens, or PII in code or logs
Files:
src/mastra/config/qdrant.tssrc/mastra/agents/codingAgents.tssrc/mastra/public/storage/request_queues/default/jBMmaaywUPlwSK3.jsonpackage.jsonsrc/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.jsonsrc/mastra/config/pg-storage.tssrc/mastra/public/storage/key_value_stores/default/SDK_SESSION_POOL_STATE.jsonsrc/mastra/config/lance.tssrc/mastra/config/upstash.tssrc/mastra/agents/dane.ts
**/*.{ts,tsx,java,cs,go,php,swift,kt,scala}
📄 CodeRabbit inference engine (.github/instructions/code-review-generic.instructions.md)
Prefer small, focused interfaces (Interface Segregation Principle)
Files:
src/mastra/config/qdrant.tssrc/mastra/agents/codingAgents.tssrc/mastra/config/pg-storage.tssrc/mastra/config/lance.tssrc/mastra/config/upstash.tssrc/mastra/agents/dane.ts
**/*.{js,mjs,cjs,ts,tsx,jsx,py,java,cs,go,rb,php,rs,cpp,c,h,hpp}
📄 CodeRabbit inference engine (.github/instructions/update-docs-on-code-change.instructions.md)
Use automated documentation generators for code documentation - JSDoc/TSDoc for JavaScript/TypeScript, Sphinx/pdoc for Python, Javadoc for Java, xmldoc for C#, godoc for Go, rustdoc for Rust
Files:
src/mastra/config/qdrant.tssrc/mastra/agents/codingAgents.tssrc/mastra/config/pg-storage.tssrc/mastra/config/lance.tssrc/mastra/config/upstash.tssrc/mastra/agents/dane.ts
src/mastra/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use maskSensitiveMessageData() helper from src/mastra/config/pg-storage.ts to mask secrets in logs
Files:
src/mastra/config/qdrant.tssrc/mastra/agents/codingAgents.tssrc/mastra/config/pg-storage.tssrc/mastra/config/lance.tssrc/mastra/config/upstash.tssrc/mastra/agents/dane.ts
src/mastra/agents/**/*.ts
📄 CodeRabbit inference engine (src/mastra/AGENTS.md)
Add agents under
src/mastra/agentsthat wire tools together into higher-level behaviorsAgent implementations must use Agent constructor with id, name, description, instructions, model, tools, and memory
Files:
src/mastra/agents/codingAgents.tssrc/mastra/agents/dane.ts
**/{package.json,package-lock.json,yarn.lock,pom.xml,build.gradle,Pipfile,Pipfile.lock,requirements.txt,composer.json,composer.lock,pubspec.yaml,Gemfile,Gemfile.lock,go.mod,go.sum,Cargo.toml,Cargo.lock}
📄 CodeRabbit inference engine (.github/instructions/code-review-generic.instructions.md)
Check dependencies for known vulnerabilities and keep them up-to-date
Files:
package.json
src/mastra/config/pg-storage.ts
📄 CodeRabbit inference engine (src/mastra/config/AGENTS.md)
src/mastra/config/pg-storage.ts: PostgreSQL storage configuration must include PgVector support for embeddings, memory, and thread persistence
Database connection must use the SUPABASE environment variable with optional performance tuning via DB_MAX_CONNECTIONS, DB_IDLE_TIMEOUT, and DB_CONNECTION_TIMEOUT
Memory configuration must support MEMORY_LAST_MESSAGES, SEMANTIC_TOP_K, SEMANTIC_RANGE_BEFORE, SEMANTIC_RANGE_AFTER, and LISTS environment variables
RAG configuration must support GRAPH_THRESHOLD, GRAPH_RANDOM_WALK_STEPS, GRAPH_RESTART_PROB, PG_MIN_SCORE, and PG_EF environment variables
Files:
src/mastra/config/pg-storage.ts
src/mastra/config/upstash.ts
📄 CodeRabbit inference engine (src/mastra/config/AGENTS.md)
Upstash Redis configuration must support caching and session storage functionality
Files:
src/mastra/config/upstash.ts
🪛 Biome (2.1.2)
src/mastra/config/lance.ts
[error] 5-5: unterminated string literal
The closing quote must be on the same line.
(parse)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: CodeQL analysis (javascript-typescript)
- GitHub Check: Agent
- GitHub Check: Sourcery review
- GitHub Check: Codacy Security Scan
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (10)
src/mastra/public/storage/request_queues/default/jBMmaaywUPlwSK3.json (1)
1-9: Verify that this runtime artifact should be committed to version control.This file appears to be a request queue state entry—a runtime artifact generated during crawling/processing. Such files are typically ephemeral and not suitable for version control. Confirm whether:
- This is intentionally committed as a test fixture or example.
- Request queue files should be added to
.gitignoreto prevent accidental commits of runtime state.If this is meant to remain, consider moving it to a dedicated
fixtures/ortest/directory with clear documentation on its purpose..github/agents/gpt-5-beast-mode.agent.md (1)
3-3: LGTM!The tools list configuration appears correct and properly formatted.
package.json (2)
131-131: LGTM on minor version updates.The minor version bumps for
framer-motionandmotion(from 12.24.11 to 12.24.12) are reasonable patch-level updates.Also applies to: 145-145
69-69: @mastra/qdrant is using the latest available beta version and is consistent with the project's dependency strategy.Version
1.0.0-beta.3is the current beta release on npm and is not deprecated. All @mastra packages in this project are beta versions (e.g., @mastra/core@^1.0.0-beta.20, @mastra/deployer@^1.0.0-beta.20, mastra@^1.0.0-beta.13), indicating this is an intentional and consistent approach to the @mastra ecosystem. No known security vulnerabilities were found.src/mastra/public/storage/key_value_stores/default/SDK_SESSION_POOL_STATE.json (1)
1-26: No action needed—this file is intentionally tracked in version control.The JSON files in
src/mastra/public/storage/(includingSDK_SESSION_POOL_STATE.jsonand similar files likeSDK_CRAWLER_STATISTICS_0.json) are committed to git and have no.gitignoreexclusion patterns. This is consistent with them being seed data, test fixtures, or expected state files required by the SDK, not accidental runtime artifacts.Likely an incorrect or invalid review comment.
src/mastra/config/qdrant.ts (1)
5-5: The import path change is correct and the logger module exists at the new location with the expected exports.src/mastra/agents/dane.ts (1)
197-202: LGTM on tool and configuration updates.The switch to
webScraperToolwith updated instructions andthinkingLevel: 'low'aligns with the PR objectives. The tool reference in the instructions matches the actual tool in the tools object.Also applies to: 210-210
src/mastra/config/lance.ts (3)
52-111: Memory configuration looks comprehensive.The
lanceMemoryconfiguration follows the same pattern aspgMemorywith proper storage, vector, and embedder setup. The working memory template is well-structured for user context tracking.
289-322: Good error handling pattern with timing metrics.The
performStorageOperationwrapper provides consistent logging, timing, and error handling for storage operations. This is a solid observability pattern.
40-46: No issues found. The project is already configured for ES modules with"type": "module"in package.json and"module": "esnext"in tsconfig.json, which properly supports the top-level await pattern used in this file.
| const userTier = requestContext.get('user-tier') ?? 'free' | ||
| const language = requestContext.get('language') ?? 'en' | ||
| const projectRoot = requestContext.get('projectRoot') ?? process.cwd() | ||
| // const projectRoot = requestContext.get('projectRoot') ?? process.cwd() |
There was a problem hiding this comment.
Remove commented-out code.
Commented-out code should be removed rather than left in the codebase. Version control preserves the history if needed.
As per coding guidelines: "Avoid commenting out dead code. Use version control instead of maintaining commented code blocks."
🧹 Proposed fix
- // const projectRoot = requestContext.get('projectRoot') ?? process.cwd()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // const projectRoot = requestContext.get('projectRoot') ?? process.cwd() |
🤖 Prompt for AI Agents
In @src/mastra/agents/codingAgents.ts at line 49, Remove the dead commented-out
line containing "const projectRoot = requestContext.get('projectRoot') ??
process.cwd()" from the codingAgents module; simply delete that commented code
(the commented variable/projectRoot reference) so the file no longer contains
commented-out dead code and rely on version control if the line is ever needed
again.
| outputProcessors: [new TokenLimiterProcessor(128576), | ||
| new BatchPartsProcessor({ | ||
| batchSize: 10, | ||
| maxWaitTime: 75, | ||
| emitOnNonText: true | ||
| }), | ||
| ] |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Inconsistent formatting in outputProcessors array.
The array formatting and indentation is inconsistent across agent definitions. Consider aligning the formatting for better readability and maintainability.
♻️ Suggested formatting
- outputProcessors: [new TokenLimiterProcessor(128576),
- new BatchPartsProcessor({
- batchSize: 10,
- maxWaitTime: 75,
- emitOnNonText: true
- }),
- ]
+ outputProcessors: [
+ new TokenLimiterProcessor(128576),
+ new BatchPartsProcessor({
+ batchSize: 10,
+ maxWaitTime: 75,
+ emitOnNonText: true,
+ }),
+ ],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| outputProcessors: [new TokenLimiterProcessor(128576), | |
| new BatchPartsProcessor({ | |
| batchSize: 10, | |
| maxWaitTime: 75, | |
| emitOnNonText: true | |
| }), | |
| ] | |
| outputProcessors: [ | |
| new TokenLimiterProcessor(128576), | |
| new BatchPartsProcessor({ | |
| batchSize: 10, | |
| maxWaitTime: 75, | |
| emitOnNonText: true, | |
| }), | |
| ], |
🤖 Prompt for AI Agents
In @src/mastra/agents/dane.ts around lines 44 - 50, The outputProcessors array
has inconsistent indentation and line breaks; refactor the array so each
processor entry is on its own line with consistent indentation and commas (e.g.,
start the array on one line, place new TokenLimiterProcessor(128576) and new
BatchPartsProcessor({...}) each on their own indented lines, ensure the
BatchPartsProcessor object keys (batchSize, maxWaitTime, emitOnNonText) are
consistently indented and the closing brackets and commas line up). Locate the
outputProcessors declaration where TokenLimiterProcessor and BatchPartsProcessor
are used and apply the same spacing/indentation style as other agent definitions
for readability.
| outputProcessors: [new TokenLimiterProcessor(128576), | ||
| new BatchPartsProcessor({ | ||
| batchSize: 10, | ||
| maxWaitTime: 75, | ||
| emitOnNonText: true | ||
| }), | ||
| ] |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider extracting shared outputProcessors configuration.
The same outputProcessors configuration is duplicated across all 5 agents. Consider extracting to a shared constant to improve maintainability and ensure consistency.
♻️ Extract shared configuration
// At the top of the file, after imports
const defaultOutputProcessors = [
new TokenLimiterProcessor(128576),
new BatchPartsProcessor({
batchSize: 10,
maxWaitTime: 75,
emitOnNonText: true,
}),
];
// Then in each agent:
outputProcessors: defaultOutputProcessors,🤖 Prompt for AI Agents
In @src/mastra/agents/dane.ts around lines 225 - 231, Extract the duplicated
outputProcessors array into a shared constant (e.g., defaultOutputProcessors)
and replace each agent's inline outputProcessors with that constant;
specifically, define a top-level constant that contains new
TokenLimiterProcessor(128576) and new BatchPartsProcessor({ batchSize: 10,
maxWaitTime: 75, emitOnNonText: true }) and then update each agent's
outputProcessors property to reference defaultOutputProcessors instead of
repeating the array.
| import { createVectorQueryTool, createGraphRAGTool } from '@mastra/rag' | ||
| import { google } from '@ai-sdk/google' | ||
| import { embedMany } from 'ai' | ||
| import { log } from './logger |
There was a problem hiding this comment.
Critical: Unterminated string literal causes syntax error.
The import statement is missing a closing quote, which will cause a parse error and prevent the module from loading.
🐛 Fix the import statement
-import { log } from './logger
+import { log } from './logger'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import { log } from './logger | |
| import { log } from './logger' |
🧰 Tools
🪛 Biome (2.1.2)
[error] 5-5: unterminated string literal
The closing quote must be on the same line.
(parse)
🤖 Prompt for AI Agents
In @src/mastra/config/lance.ts at line 5, The import statement for log is
missing its closing quote causing a syntax error; update the import of symbol
"log" from module './logger' to a properly terminated string (e.g., add the
closing quote and end the statement) so the file parses correctly.
| LANCE_STORAGE_CONFIG.tablePrefix | ||
| ) | ||
|
|
||
| const vectorStore = await LanceVectorStore.create("/path/to/db"); |
There was a problem hiding this comment.
Critical: Hardcoded placeholder path will cause incorrect behavior.
The path "/path/to/db" appears to be a placeholder that was not updated. This will create the vector store in an unintended location and likely cause runtime failures or data loss.
🐛 Use environment variable or LANCE_CONFIG
-const vectorStore = await LanceVectorStore.create("/path/to/db");
+const vectorStore = await LanceVectorStore.create(LANCE_CONFIG.dbPath);Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @src/mastra/config/lance.ts at line 46, The code currently calls
LanceVectorStore.create with a hardcoded placeholder path ("/path/to/db");
replace this with a configured value (e.g., read from process.env.LANCE_DB_PATH
or a LANCE_CONFIG value) and pass that variable into LanceVectorStore.create
instead; also add a short runtime check that the env/config value exists (throw
or log a clear error and exit if missing) so the vector store is not initialized
with an unintended path.
| export function validateLanceFilter( | ||
| filter: LanceMetadataFilter | ||
| ): LanceMetadataFilter { | ||
| // Basic validation - LanceDB has specific requirements for indexed fields | ||
| if (filter === null || typeof filter !== 'object') { | ||
| throw new Error('Filter must be a valid object') | ||
| } | ||
|
|
||
| return filter | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Null check is overly permissive and may miss edge cases.
The validation only checks for null or non-object types, but doesn't validate the structure of nested filters or array elements. This could allow malformed filters to pass through.
♻️ Enhanced validation
export function validateLanceFilter(
filter: LanceMetadataFilter
): LanceMetadataFilter {
// Basic validation - LanceDB has specific requirements for indexed fields
- if (filter === null || typeof filter !== 'object') {
+ if (filter === null || filter === undefined || typeof filter !== 'object') {
throw new Error('Filter must be a valid object')
}
+
+ // Validate no empty object (common mistake)
+ if (Object.keys(filter).length === 0) {
+ throw new Error('Filter cannot be an empty object')
+ }
return filter
}🤖 Prompt for AI Agents
In @src/mastra/config/lance.ts around lines 218 - 227, The validateLanceFilter
function currently only rejects null/non-object values; enhance it to
recursively validate the structure of LanceMetadataFilter by ensuring top-level
entries are plain objects or allowed primitive types, arrays contain only valid
filter objects or primitives, and nested objects follow the same rules; update
validateLanceFilter to traverse keys (e.g., field names) and for each value
verify it is a string/number/boolean, an array of those, or an object that
itself passes the same validation, and throw a descriptive Error when
encountering unexpected types or empty objects/arrays so malformed nested
filters and array elements are rejected.
| export async function generateEmbeddings( | ||
| texts: string[], | ||
| options: { | ||
| model?: string | ||
| dimensions?: number | ||
| } = {} | ||
| ): Promise<number[][]> { | ||
| try { | ||
| const model = options.model ?? LANCE_CONFIG.embeddingModel | ||
| const { embeddings } = await embedMany({ | ||
| values: texts, | ||
| model, | ||
| }) | ||
|
|
||
| log.info('Embeddings generated successfully', { | ||
| textCount: texts.length, | ||
| embeddingDimension: embeddings[0]?.length, | ||
| model: options.model ?? 'gemini-embedding-001', | ||
| }) | ||
|
|
||
| return embeddings | ||
| } catch (error) { | ||
| log.error('Failed to generate embeddings', { error: String(error) }) | ||
| throw error | ||
| } | ||
| } |
There was a problem hiding this comment.
Type mismatch: model parameter expects string but receives embedding model object.
The options.model has type string | undefined, but LANCE_CONFIG.embeddingModel is a TextEmbeddingModel object from the Google SDK. The fallback assignment on line 240 will produce a type error or unexpected behavior.
🐛 Fix the model type handling
export async function generateEmbeddings(
texts: string[],
options: {
- model?: string
+ model?: Parameters<typeof embedMany>[0]['model']
dimensions?: number
} = {}
): Promise<number[][]> {
try {
- const model = options.model ?? LANCE_CONFIG.embeddingModel
+ const model = options.model ?? LANCE_CONFIG.embeddingModel
const { embeddings } = await embedMany({
values: texts,
- model,
+ model: model,
})If string model names are desired, use:
const model = options.model
? google.textEmbedding(options.model)
: LANCE_CONFIG.embeddingModel🤖 Prompt for AI Agents
In @src/mastra/config/lance.ts around lines 232 - 257, The model fallback in
generateEmbeddings assigns LANCE_CONFIG.embeddingModel (a TextEmbeddingModel) to
a variable that can also be a string, causing a type mismatch when passed to
embedMany; change the assignment so if options.model is provided (string) you
convert it to a TextEmbeddingModel (e.g., via
google.textEmbedding(options.model)) and otherwise use
LANCE_CONFIG.embeddingModel, then pass that TextEmbeddingModel to embedMany and
update the log to report the model name accordingly; references: function
generateEmbeddings, options.model, LANCE_CONFIG.embeddingModel, and the
embedMany call.
| type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000) | ||
| metric: 'cosine', // Distance metric for normalized embeddings | ||
| hnsw: { m: parseFloat(process.env.PG_M ?? '32'), efConstruction: parseInt(process.env.PG_EF ?? '96') }, // hnsw configuration | ||
| ivf: { lists: 1080 }, // IVFFlat configuration |
There was a problem hiding this comment.
Log output doesn't match actual configuration.
The configuration was changed from HNSW to IVFFlat (lines 99-101), but the log output (lines 142-145) still displays the old HNSW configuration. This creates a critical observability issue where logs will mislead debugging and monitoring efforts.
Actual config (lines 99-101):
type: 'ivfflat'ivf: { lists: 1080 }
Log output (lines 142-145):
type: 'hnsw'hnsw: { m: 16, efConstruction: 64 }
🔧 Proposed fix to align log output with actual configuration
memoryOptions: {
lastMessages: parseInt(process.env.MEMORY_LAST_MESSAGES ?? '500'),
semanticRecall: {
topK: parseInt(process.env.SEMANTIC_TOP_K ?? '5'),
messageRange: {
before: parseInt(process.env.SEMANTIC_RANGE_BEFORE ?? '3'),
after: parseInt(process.env.SEMANTIC_RANGE_AFTER ?? '2'),
},
scope: 'resource',
indexConfig: {
- type: 'hnsw',
+ type: 'ivfflat',
metric: 'cosine',
- hnsw: { m: 16, efConstruction: 64 } // hnsw configuration
+ ivf: { lists: 1080 } // IVFFlat configuration
}
},Also applies to: 142-145
🤖 Prompt for AI Agents
In @src/mastra/config/pg-storage.ts around lines 99 - 101, The log output still
prints the old HNSW fields while the actual index configuration uses type:
'ivfflat' and ivf: { lists: 1080 }; update the logging block that prints the
index config so it reflects the real keys and values (log type: 'ivfflat' and
the ivf.lists value) instead of the hnsw object, and remove or conditionalize
any code that prints hnsw.m / hnsw.efConstruction so logs always match the
active config (check the code that constructs/prints the index config object and
adjust the properties it reads to use ivf.lists and type).
| // HNSW index configuration to support high-dimensional embeddings (>2000 dimensions) | ||
| indexConfig: { | ||
| type: 'hnsw', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000) | ||
| type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000) |
There was a problem hiding this comment.
Misleading comment: IVFFlat is not the same as "flat index".
The comment states "flat index type" but ivfflat is an inverted file index with flat quantization, not a simple flat/brute-force index. IVFFlat uses clustering for approximate nearest neighbor search, whereas a true flat index (like pgvector's vector type without an index) performs exact brute-force search.
The comment should clarify that IVFFlat is an approximate search method using inverted lists, not exact flat search.
📝 Suggested comment correction
- type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
+ type: 'ivfflat', // IVFFlat approximate index (supports dimensions > 2000, unlike HNSW's 2000 limit)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000) | |
| type: 'ivfflat', // IVFFlat approximate index (supports dimensions > 2000, unlike HNSW's 2000 limit) |
🤖 Prompt for AI Agents
In @src/mastra/config/pg-storage.ts at line 99, Update the misleading inline
comment that sits next to the "type: 'ivfflat'" setting: replace the phrase
"flat index type" with a note that ivfflat is an approximate inverted-file index
using flat quantization (inverted lists with clustering), not a true brute-force
flat index, so it performs approximate ANN search rather than exact brute-force
search.
| { | ||
| "requestsFinished": 1, | ||
| "requestsFailed": 0, | ||
| "requestsRetries": 0, | ||
| "requestsFailedPerMinute": 0, | ||
| "requestsFinishedPerMinute": 16, | ||
| "requestMinDurationMillis": 2907, | ||
| "requestMaxDurationMillis": 2907, | ||
| "requestTotalFailedDurationMillis": 0, | ||
| "requestTotalFinishedDurationMillis": 2907, | ||
| "crawlerStartedAt": "2026-01-08T14:09:03.077Z", | ||
| "crawlerFinishedAt": "2026-01-08T14:09:06.880Z", | ||
| "statsPersistedAt": "2026-01-08T14:09:06.880Z", | ||
| "crawlerRuntimeMillis": 3850, | ||
| "crawlerLastStartTimestamp": 1767881343030, | ||
| "requestRetryHistogram": [ | ||
| 1 | ||
| ], | ||
| "statsId": 0, | ||
| "requestAvgFailedDurationMillis": null, | ||
| "requestAvgFinishedDurationMillis": 2907, | ||
| "requestTotalDurationMillis": 2907, | ||
| "requestsTotal": 1, | ||
| "requestsWithStatusCode": {}, | ||
| "errors": {}, | ||
| "retryErrors": {} | ||
| } No newline at end of file |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check .gitignore content for patterns related to storage or SDK_CRAWLER
cat .gitignore | head -50
echo "---"
grep -E "(storage|SDK_CRAWLER|/public/)" .gitignore || echo "No storage/SDK_CRAWLER patterns found"Repository: ssdeanx/AgentStack
Length of output: 527
🏁 Script executed:
# Check if there are multiple SDK_CRAWLER_STATISTICS files in the repo
find . -name "SDK_CRAWLER_STATISTICS_*.json" -type fRepository: ssdeanx/AgentStack
Length of output: 144
🏁 Script executed:
# Check the directory structure for storage
ls -la src/mastra/public/storage/key_value_stores/default/ 2>/dev/null | head -20Repository: ssdeanx/AgentStack
Length of output: 324
🏁 Script executed:
# Search for references to SDK_CRAWLER_STATISTICS or storage in code
rg "SDK_CRAWLER_STATISTICS|storage/key_value" --type ts --type js -B 2 -A 2 | head -50Repository: ssdeanx/AgentStack
Length of output: 44
🏁 Script executed:
# Check if there are other state/storage directories similar to this one
find src/mastra -type d -name "storage" -o -name "key_value_stores"Repository: ssdeanx/AgentStack
Length of output: 130
🏁 Script executed:
# Look for any README or documentation in the storage directory
find src/mastra/public/storage -name "README*" -o -name "*.md"Repository: ssdeanx/AgentStack
Length of output: 44
🏁 Script executed:
# Check git status to see if these files are recently added or changed
git log --oneline -n 20 -- "src/mastra/public/storage/" 2>/dev/null || echo "No git history available"Repository: ssdeanx/AgentStack
Length of output: 135
🏁 Script executed:
# Check what the storage directory is used for - look at parent structure
ls -la src/mastra/public/Repository: ssdeanx/AgentStack
Length of output: 229
🏁 Script executed:
# Search for any code that might generate or use these JSON files
rg "storage|SESSION_POOL|CRAWLER" --type ts --type js src/mastra/ | head -30Repository: ssdeanx/AgentStack
Length of output: 2973
🏁 Script executed:
# Check if public/storage should be part of build output or runtime-generated
cat src/mastra/public/.gitignore 2>/dev/null || echo "No .gitignore in public/"Repository: ssdeanx/AgentStack
Length of output: 86
🏁 Script executed:
# Search for any references to the public/storage directory itself in configs or build files
rg "public/storage" --type ts --type js --type json --type yml --type yaml --type configRepository: ssdeanx/AgentStack
Length of output: 89
🏁 Script executed:
# Check if these files appear in package.json scripts or build configurations
cat package.json | grep -A 10 -B 10 "storage\|crawler" || echo "No matches in package.json"Repository: ssdeanx/AgentStack
Length of output: 88
🏁 Script executed:
# Look for any test setup or initialization that might create these files
find . -name "*.test.ts" -o -name "*.spec.ts" -o -name "setup.ts" | xargs grep -l "storage\|SDK_CRAWLER" 2>/dev/null || echo "No test files found"Repository: ssdeanx/AgentStack
Length of output: 81
🏁 Script executed:
# Check the git history of the specific file to understand when/why it was added
git log --all --full-history -- "src/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.json" | head -20Repository: ssdeanx/AgentStack
Length of output: 1303
🏁 Script executed:
# Check if there's a .gitkeep or similar file in storage directories
find src/mastra/public/storage -type fRepository: ssdeanx/AgentStack
Length of output: 291
🏁 Script executed:
# Look for any documentation or README that explains the purpose of public/storage
find src/mastra -name "README*" | xargs cat 2>/dev/null | grep -i "storage" -A 2 -B 2 || echo "No storage documentation found"Repository: ssdeanx/AgentStack
Length of output: 1854
Remove runtime-generated crawler and session state files from version control.
These JSON files (SDK_CRAWLER_STATISTICS_0.json, SDK_SESSION_POOL_STATE.json, and request queue files) are generated by the SDK crawler at runtime and contain ephemeral state and metrics that will differ between environments and runs. Committing them causes unnecessary merge conflicts and pollutes the repository.
Add to .gitignore:
src/mastra/public/storage/
or more specifically:
**/SDK_CRAWLER_STATISTICS_*.json
**/SDK_SESSION_POOL_STATE.json
src/mastra/public/storage/request_queues/
🤖 Prompt for AI Agents
In
@src/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.json
around lines 1 - 27, The committed JSON files like
SDK_CRAWLER_STATISTICS_0.json, SDK_SESSION_POOL_STATE.json and runtime request
queue files in src/mastra/public/storage are ephemeral and must be ignored:
update .gitignore to exclude src/mastra/public/storage/ (or add patterns
**/SDK_CRAWLER_STATISTICS_*.json, **/SDK_SESSION_POOL_STATE.json and
src/mastra/public/storage/request_queues/) and remove the already committed
files from the index (use git rm --cached <file> or git rm -r --cached
src/mastra/public/storage/) then commit the .gitignore change and the removal so
these runtime artifacts stop being tracked.
There was a problem hiding this comment.
Pull request overview
This pull request aims to refactor vector storage configuration by implementing LanceDB as an alternative vector storage solution. The PR adds a new lance.ts configuration file, updates several agent configurations with new processor settings, and includes runtime-generated JSON files for crawler statistics and session management.
Key changes:
- New LanceDB vector storage implementation in
src/mastra/config/lance.ts - Agent configuration updates including reduced token limits and modified thinking levels
- Addition of
@mastra/qdrantpackage and version bumps forframer-motionandmotion
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| src/mastra/config/lance.ts | New LanceDB configuration with vector storage, memory, and RAG tools |
| src/mastra/config/qdrant.ts | Fixed import path from ../logger to ./logger |
| src/mastra/config/pg-storage.ts | Changed index type from HNSW to IVFFlat with outdated documentation |
| src/mastra/config/upstash.ts | Removed satisfies type assertion from providerOptions |
| src/mastra/agents/dane.ts | Reduced TokenLimiter from 1MB to ~125KB, added BatchPartsProcessor, changed thinking level from high to low |
| src/mastra/agents/codingAgents.ts | Removed projectRoot context variable from instructions |
| package.json | Added @mastra/qdrant dependency, updated framer-motion and motion versions |
| package-lock.json | Lockfile updates for new dependencies |
| src/mastra/public/storage/*.json | Runtime-generated crawler and session state files |
| .github/agents/gpt-5-beast-mode.agent.md | Reordered tools list |
| model: googleAIFlashLite, | ||
| memory: pgMemory, | ||
| outputProcessors: [new TokenLimiterProcessor(1048576)] | ||
| outputProcessors: [new TokenLimiterProcessor(128576), |
There was a problem hiding this comment.
The TokenLimiterProcessor limit has been reduced from 1048576 (1MB) to 128576 (~125KB), which is approximately an 88% reduction. This significant change could cause truncation of larger responses or context. This should be documented or explained as it may affect the agent's ability to handle complex conversations or large code contexts.
| export const lanceMemory = new Memory({ | ||
| storage: lanceStorage, | ||
| vector: vectorStore, | ||
| embedder: google.textEmbedding('gemini-embedding-001'), | ||
| options: { | ||
| generateTitle: process.env.LANCE_THREAD_GENERATE_TITLE !== 'false', | ||
| // Message management | ||
| lastMessages: parseInt(process.env.LANCE_MEMORY_LAST_MESSAGES ?? '500'), | ||
| // Advanced semantic recall with LanceDB configuration | ||
| semanticRecall: { | ||
| topK: parseInt(process.env.LANCE_SEMANTIC_TOP_K ?? '5'), | ||
| messageRange: { | ||
| before: parseInt( | ||
| process.env.LANCE_SEMANTIC_RANGE_BEFORE ?? '3' | ||
| ), | ||
| after: parseInt(process.env.LANCE_SEMANTIC_RANGE_AFTER ?? '2'), | ||
| }, | ||
| scope: 'resource', // 'resource' | 'thread' | ||
| // LanceDB-specific index configuration | ||
| indexConfig: {}, | ||
| }, | ||
| // Enhanced working memory with supported template | ||
| workingMemory: { | ||
| enabled: true, | ||
| scope: 'resource', // 'resource' | 'thread' | ||
| version: 'vnext', // Enable the improved/experimental tool | ||
| template: `# User Profile & Context | ||
| ## Personal Information | ||
| - **Name**: [To be learned] | ||
| - **Role/Title**: [To be learned] | ||
| - **Organization**: [To be learned] | ||
| - **Location**: [To be learned] | ||
| - **Time Zone**: [To be learned] | ||
|
|
||
| ## Communication Preferences | ||
| - **Preferred Communication Style**: [To be learned] | ||
| - **Response Length Preference**: [To be learned] | ||
| - **Technical Level**: [To be learned] | ||
|
|
||
| ## Current Context | ||
| - **Active Projects**: [To be learned] | ||
| - **Current Goals**: [To be learned] | ||
| - **Recent Activities**: [To be learned] | ||
| - **Pain Points**: [To be learned] | ||
|
|
||
| ## Long-term Memory | ||
| - **Key Achievements**: [To be learned] | ||
| - **Important Relationships**: [To be learned] | ||
| - **Recurring Patterns**: [To be learned] | ||
| - **Preferences & Habits**: [To be learned] | ||
|
|
||
| ## Session Notes | ||
| - **Today's Focus**: [To be learned] | ||
| - **Outstanding Questions**: [To be learned] | ||
| - **Action Items**: [To be learned] | ||
| - **Follow-ups Needed**: [To be learned] | ||
| `, | ||
| }, | ||
| }, | ||
| }) |
There was a problem hiding this comment.
The processors array with TokenLimiter has been removed from lanceMemory configuration. The old implementation included processors: [new TokenLimiter(1048576)] to limit message token counts. Without this processor, the memory system may not properly manage token limits, potentially leading to context overflow or API errors when dealing with large conversations.
| vector: vectorStore, | ||
| embedder: google.textEmbedding('gemini-embedding-001'), | ||
| options: { | ||
| generateTitle: process.env.LANCE_THREAD_GENERATE_TITLE !== 'false', |
There was a problem hiding this comment.
The generateTitle option is incorrectly placed at the root level of options. Based on the pg-storage.ts pattern and Mastra Memory API, generateTitle should be nested under the threads property. The current placement will likely be ignored or cause a type error.
| generateTitle: process.env.LANCE_THREAD_GENERATE_TITLE !== 'false', | |
| threads: { | |
| generateTitle: process.env.LANCE_THREAD_GENERATE_TITLE !== 'false', | |
| }, |
| import { createVectorQueryTool, createGraphRAGTool } from '@mastra/rag' | ||
| import { google } from '@ai-sdk/google' | ||
| import { embedMany } from 'ai' | ||
| import { log } from './logger |
There was a problem hiding this comment.
Missing closing single quote on the import statement. This will cause a syntax error that prevents the module from loading.
| import { log } from './logger | |
| import { log } from './logger' |
| export const lanceStorage = await LanceStorage.create( | ||
| LANCE_STORAGE_CONFIG.dbUri, | ||
| LANCE_STORAGE_CONFIG.storageName, | ||
| LANCE_STORAGE_CONFIG.tablePrefix | ||
| ) |
There was a problem hiding this comment.
Top-level await is being used outside of an async context. This code will only work in ES modules with top-level await support. The initialization should be wrapped in an async function or these should be initialized lazily to avoid blocking module loading and potential runtime errors.
| }> { | ||
| return messages.map((message) => ({ | ||
| id: message.id, | ||
| content: message.content, |
There was a problem hiding this comment.
The formatStorageMessages function no longer masks sensitive data from message content. The previous version called maskSensitiveMessageData to redact passwords, secrets, tokens, and API keys before storing messages. This removal could lead to sensitive information being logged or stored in plaintext, which is a security risk.
| "requestsFinished": 1, | ||
| "requestsFailed": 0, | ||
| "requestsRetries": 0, | ||
| "requestsFailedPerMinute": 0, | ||
| "requestsFinishedPerMinute": 16, | ||
| "requestMinDurationMillis": 2907, | ||
| "requestMaxDurationMillis": 2907, | ||
| "requestTotalFailedDurationMillis": 0, | ||
| "requestTotalFinishedDurationMillis": 2907, | ||
| "crawlerStartedAt": "2026-01-08T14:09:03.077Z", | ||
| "crawlerFinishedAt": "2026-01-08T14:09:06.880Z", | ||
| "statsPersistedAt": "2026-01-08T14:09:06.880Z", | ||
| "crawlerRuntimeMillis": 3850, | ||
| "crawlerLastStartTimestamp": 1767881343030, | ||
| "requestRetryHistogram": [ | ||
| 1 | ||
| ], | ||
| "statsId": 0, | ||
| "requestAvgFailedDurationMillis": null, | ||
| "requestAvgFinishedDurationMillis": 2907, | ||
| "requestTotalDurationMillis": 2907, | ||
| "requestsTotal": 1, | ||
| "requestsWithStatusCode": {}, | ||
| "errors": {}, | ||
| "retryErrors": {} |
There was a problem hiding this comment.
This appears to be runtime-generated crawler statistics that should not be committed to version control. Statistics files contain runtime-specific metrics and timestamps ("2026-01-08") that are ephemeral and should be regenerated for each environment. This file should be added to .gitignore.
| "requestsFinished": 1, | |
| "requestsFailed": 0, | |
| "requestsRetries": 0, | |
| "requestsFailedPerMinute": 0, | |
| "requestsFinishedPerMinute": 16, | |
| "requestMinDurationMillis": 2907, | |
| "requestMaxDurationMillis": 2907, | |
| "requestTotalFailedDurationMillis": 0, | |
| "requestTotalFinishedDurationMillis": 2907, | |
| "crawlerStartedAt": "2026-01-08T14:09:03.077Z", | |
| "crawlerFinishedAt": "2026-01-08T14:09:06.880Z", | |
| "statsPersistedAt": "2026-01-08T14:09:06.880Z", | |
| "crawlerRuntimeMillis": 3850, | |
| "crawlerLastStartTimestamp": 1767881343030, | |
| "requestRetryHistogram": [ | |
| 1 | |
| ], | |
| "statsId": 0, | |
| "requestAvgFailedDurationMillis": null, | |
| "requestAvgFinishedDurationMillis": 2907, | |
| "requestTotalDurationMillis": 2907, | |
| "requestsTotal": 1, | |
| "requestsWithStatusCode": {}, | |
| "errors": {}, | |
| "retryErrors": {} | |
| "note": "This file is a placeholder. Crawler statistics are generated at runtime and should not be committed to version control.", | |
| "example": { | |
| "requestsFinished": 0, | |
| "requestsFailed": 0, | |
| "requestsRetries": 0, | |
| "requestsFailedPerMinute": 0, | |
| "requestsFinishedPerMinute": 0, | |
| "requestMinDurationMillis": 0, | |
| "requestMaxDurationMillis": 0, | |
| "requestTotalFailedDurationMillis": 0, | |
| "requestTotalFinishedDurationMillis": 0, | |
| "crawlerStartedAt": null, | |
| "crawlerFinishedAt": null, | |
| "statsPersistedAt": null, | |
| "crawlerRuntimeMillis": 0, | |
| "crawlerLastStartTimestamp": 0, | |
| "requestRetryHistogram": [], | |
| "statsId": 0, | |
| "requestAvgFailedDurationMillis": null, | |
| "requestAvgFinishedDurationMillis": 0, | |
| "requestTotalDurationMillis": 0, | |
| "requestsTotal": 0, | |
| "requestsWithStatusCode": {}, | |
| "errors": {}, | |
| "retryErrors": {} | |
| } |
| type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000) | ||
| metric: 'cosine', // Distance metric for normalized embeddings | ||
| hnsw: { m: parseFloat(process.env.PG_M ?? '32'), efConstruction: parseInt(process.env.PG_EF ?? '96') }, // hnsw configuration | ||
| ivf: { lists: 1080 }, // IVFFlat configuration |
There was a problem hiding this comment.
The index configuration has been changed from HNSW to IVFFlat. However, the comment on line 99 still incorrectly states "flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)". IVFFlat is not the same as "flat" - it's an inverted file index. The comment should be updated to accurately describe IVFFlat's characteristics and why this change was made.
| "@mastra/observability": "^1.0.0-beta.10", | ||
| "@mastra/otel-exporter": "^1.0.0-beta.11", | ||
| "@mastra/pg": "^1.0.0-beta.12", | ||
| "@mastra/qdrant": "^1.0.0-beta.3", |
There was a problem hiding this comment.
The @mastra/qdrant package is being added to dependencies, but based on the PR description and the fact that qdrant.ts is being kept (with only an import path fix), this seems intentional. However, the PR title says "refactor vector storage configuration and implement LanceDB" and the description mentions "Removed the Qdrant vector configuration file", which conflicts with keeping qdrant.ts and adding the @mastra/qdrant dependency. This inconsistency should be clarified.
| "@mastra/qdrant": "^1.0.0-beta.3", |
| LANCE_STORAGE_CONFIG.tablePrefix | ||
| ) | ||
|
|
||
| const vectorStore = await LanceVectorStore.create("/path/to/db"); |
There was a problem hiding this comment.
Hardcoded path "/path/to/db" should be replaced with the configured LANCE_CONFIG.dbPath. This appears to be placeholder code that was not updated with the actual configuration value, which will cause the vector store to use the wrong database location.
| const vectorStore = await LanceVectorStore.create("/path/to/db"); | |
| const vectorStore = await LanceVectorStore.create(LANCE_CONFIG.dbPath); |

qdrant.ts) and its associated logic.lance.ts) to support vector storage and similarity search using LanceDB.chore: update storage and request queue structures
SDK_CRAWLER_STATISTICS_0.jsonto log crawler performance metrics.SDK_SESSION_POOL_STATE.jsonto manage session states and usability.Summary by Sourcery
Refine agent processing, update vector storage configuration, and introduce LanceDB-based memory and query tooling alongside new crawler/request queue state artifacts.
New Features:
Enhancements:
Build: