Skip to content

feat: refactor vector storage configuration and implement LanceDB#79

Merged
ssdeanx merged 1 commit intomainfrom
develop
Jan 8, 2026
Merged

feat: refactor vector storage configuration and implement LanceDB#79
ssdeanx merged 1 commit intomainfrom
develop

Conversation

@ssdeanx
Copy link
Owner

@ssdeanx ssdeanx commented Jan 8, 2026

  • Removed the Qdrant vector configuration file (qdrant.ts) and its associated logic.
  • Introduced a new LanceDB vector configuration file (lance.ts) to support vector storage and similarity search using LanceDB.
  • Implemented comprehensive settings for LanceDB, including memory management, embedding generation, and advanced filtering capabilities.
  • Added functions for generating embeddings, validating filters, and performing storage operations with detailed logging.
  • Created new JSON files for crawler statistics and session pool state to track request metrics and session management.
  • Added a request queue JSON file to manage requests with unique identifiers and metadata.

chore: update storage and request queue structures

  • Created SDK_CRAWLER_STATISTICS_0.json to log crawler performance metrics.
  • Created SDK_SESSION_POOL_STATE.json to manage session states and usability.
  • Created a request queue JSON file to handle requests with relevant metadata and error tracking.

Summary by Sourcery

Refine agent processing, update vector storage configuration, and introduce LanceDB-based memory and query tooling alongside new crawler/request queue state artifacts.

New Features:

  • Add LanceDB-based memory configuration, vector query tool, and graph RAG tool for semantic search and retrieval.
  • Introduce helper utilities for embedding generation, metadata filtering, and robust storage operation logging for LanceDB workflows.
  • Add persisted JSON artifacts for crawler statistics, session pool state, and request queue metadata.

Enhancements:

  • Adjust Dane agents to use batch output processing and updated token limits while simplifying tool descriptions and reasoning settings.
  • Retune PostgreSQL semantic recall to use an IVFFlat index configuration for high-dimensional embeddings.
  • Tweak coding agent context instructions to remove dependence on the project root value.
  • Trim GPT-5 Beast Mode agent tool list by removing the mastraMigration tool reference.
  • Relax Upstash vector query configuration typing to avoid over-constraining provider options.
  • Bump framer-motion and motion dependencies to the latest patch version.

Build:

  • Add @mastra/qdrant dependency to package configuration.

- Removed the Qdrant vector configuration file (`qdrant.ts`) and its associated logic.
- Introduced a new LanceDB vector configuration file (`lance.ts`) to support vector storage and similarity search using LanceDB.
- Implemented comprehensive settings for LanceDB, including memory management, embedding generation, and advanced filtering capabilities.
- Added functions for generating embeddings, validating filters, and performing storage operations with detailed logging.
- Created new JSON files for crawler statistics and session pool state to track request metrics and session management.
- Added a request queue JSON file to manage requests with unique identifiers and metadata.

chore: update storage and request queue structures

- Created `SDK_CRAWLER_STATISTICS_0.json` to log crawler performance metrics.
- Created `SDK_SESSION_POOL_STATE.json` to manage session states and usability.
- Created a request queue JSON file to handle requests with relevant metadata and error tracking.
Copilot AI review requested due to automatic review settings January 8, 2026 14:58
@continue
Copy link

continue bot commented Jan 8, 2026

All Green - Keep your PRs mergeable

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts


Unsubscribe from All Green comments

@sourcery-ai
Copy link

sourcery-ai bot commented Jan 8, 2026

Reviewer's Guide

Refactors vector storage to introduce a new LanceDB-based configuration and memory setup, updates several agents’ output-processing and instructions, adjusts pg and Upstash vector configs, tweaks dependencies, and adds JSON-backed crawler/request/session state files.

Sequence diagram for updated agent output processing pipeline

sequenceDiagram
  actor User
  participant DaneAgent as DaneAgent
  participant Model as LLMModel
  participant TokenLimiter as TokenLimiterProcessor
  participant BatchParts as BatchPartsProcessor

  User->>DaneAgent: Send request
  DaneAgent->>Model: Generate response
  Model-->>DaneAgent: Streamed model output

  DaneAgent->>TokenLimiter: Process output (limit to 128576 tokens)
  TokenLimiter-->>DaneAgent: Truncated or original output

  DaneAgent->>BatchParts: Batch parts
  BatchParts-->>DaneAgent: Batched chunks

  DaneAgent-->>User: Final processed response
Loading

Class diagram for updated LanceDB memory and vector configuration

classDiagram

class LanceStorage {
  +create(dbUri, storageName, tablePrefix) LanceStorage
}

class LanceVectorStore {
  +create(dbPath) LanceVectorStore
}

class Memory {
  +storage
  +vector
  +embedder
  +options
}

class LanceConfig {
  +dbPath
  +tableName
  +embeddingDimension
  +embeddingModel
}

class LanceStorageConfig {
  +storageName
  +dbUri
  +storageOptions
  +tablePrefix
}

class LanceMetadataFilter

<<interface>> LanceMetadataFilter

class LanceRawFilter

<<type>> LanceRawFilter

class LanceTools {
  +lanceGraphTool
  +lanceQueryTool
}

class EmbeddingUtils {
  +generateEmbeddings(texts, options) number[][]
}

class StorageUtils {
  +formatStorageMessages(messages) Message[]
  +performStorageOperation(operation, operationName, metadata) T
}

class Message {
  +id string
  +content string
  +role string
  +createdAt Date
  +metadata Record
}

LanceConfig <.. LanceTools : uses
LanceConfig <.. EmbeddingUtils : uses
LanceStorageConfig <.. LanceStorage : configuredBy

LanceStorage ..> Memory : storage
LanceVectorStore ..> Memory : vector
Memory o-- Message : manages

EmbeddingUtils ..> LanceConfig : reads
StorageUtils ..> LanceStorage : operatesOn
StorageUtils ..> LanceVectorStore : operatesOn

LanceMetadataFilter <.. LanceRawFilter : transformedTo
LanceTools ..> LanceVectorStore : vectorStoreName
LanceTools ..> Memory : usesEmbeddings
Loading

File-Level Changes

Change Details Files
Configure LanceDB-based vector storage and memory, replacing the previous vector config.
  • Add new LanceDB config (LANCE_CONFIG/LANCE_STORAGE_CONFIG) including db paths, embedding dimensions, and storage options.
  • Instantiate LanceStorage and LanceVectorStore and wire them into a new lanceMemory instance with semantic and working memory options.
  • Expose Lance-specific GraphRAG and vector query tools with configurable thresholds, filters, and database parameters.
  • Provide helper utilities for Lance filters, embedding generation, message formatting, and wrapped storage operations with structured logging.
  • Remove the old vector lance configuration file under src/mastra/config/vector.
src/mastra/config/lance.ts
src/mastra/config/vector/lance.ts
Tune agent output processing and instructions for Dane agents to support batched outputs and updated tool usage.
  • Import BatchPartsProcessor alongside TokenLimiterProcessor for use in agent output processing.
  • Replace single high-limit TokenLimiterProcessor with a smaller limit plus a BatchPartsProcessor configuration across Dane agents.
  • Update Dane system instructions to remove unused tools, emphasize webScraperTool instead of googleSearch, and adjust guidance around general knowledge and tools.
  • Lower google thinkingLevel from 'high' to 'low' in the Dane agent configuration.
src/mastra/agents/dane.ts
Retune pg-based semantic recall index and simplify Upstash vector tool provider options.
  • Change pgMemory’s semantic recall indexConfig from HNSW to IVFFlat with list count configuration.
  • Remove the explicit GoogleGenerativeAIProviderOptions type assertion from the Upstash vector query tool’s providerOptions to loosen typing.
src/mastra/config/pg-storage.ts
src/mastra/config/upstash.ts
Adjust coding agent context and GPT-5 Beast Mode tool configuration.
  • Comment out use of projectRoot from the request context in the codeArchitectAgent instructions and remove it from the rendered system prompt.
  • Trim one of the tools (mastraMigration) from the GPT-5 Beast Mode agent’s allowed tools list.
src/mastra/agents/codingAgents.ts
.github/agents/gpt-5-beast-mode.agent.md
Update dependencies and lockfile for vector and UI packages.
  • Add @mastra/qdrant to app dependencies for Qdrant integration support.
  • Bump framer-motion and motion packages from 12.24.11 to 12.24.12 and propagate changes in package-lock.json.
package.json
package-lock.json
Introduce JSON-backed storage for crawler statistics, session pool state, and request queue entries.
  • Add SDK_CRAWLER_STATISTICS_0.json to store crawler performance metrics.
  • Add SDK_SESSION_POOL_STATE.json for session state tracking.
  • Add a request queue JSON file containing queued request metadata and error-tracking fields under the default queue path.
src/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.json
src/mastra/public/storage/key_value_stores/default/SDK_SESSION_POOL_STATE.json
src/mastra/public/storage/request_queues/default/jBMmaaywUPlwSK3.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

🤖 Hi @ssdeanx, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

@coderabbitai
Copy link

coderabbitai bot commented Jan 8, 2026

Summary by CodeRabbit

  • New Features

    • Added LanceDB-powered vector storage and retrieval capabilities for AI-driven context and memory workflows.
    • Introduced Qdrant vector database support.
  • Dependencies

    • Updated motion and framer-motion libraries to v12.24.12.
    • Added @mastra/qdrant beta support.
  • Chores

    • Refactored vector storage configuration and optimized database indexing strategies.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

This PR consolidates LanceDB configuration from a vector-specific module into a unified config file, updates agent output processors to include BatchPartsProcessor alongside TokenLimiterProcessor, changes vector storage indexing from HNSW to IVFFlat, updates dependencies, and adjusts tool references in agents.

Changes

Cohort / File(s) Summary
Agent Configuration Updates
src/mastra/agents/dane.ts, src/mastra/agents/codingAgents.ts
Modified dane agents' output processors to use TokenLimiterProcessor (128576) + BatchPartsProcessor instead of single TokenLimiterProcessor (1048576); updated tool references from legacy tools to webScraperTool; removed projectRoot context from Code Architect Agent.
Vector Storage & LanceDB Consolidation
src/mastra/config/lance.ts, src/mastra/config/vector/lance.ts, src/mastra/config/pg-storage.ts
Created new unified LanceDB config module (lance.ts) with storage, RAG tools, and utility functions; removed legacy vector-specific lance.ts; updated pg-storage to use IVFFlat indexing (lists: 1080) instead of HNSW.
Configuration Module Updates
src/mastra/config/qdrant.ts, src/mastra/config/upstash.ts
Corrected import path (../logger to ./logger) in qdrant.ts; removed type assertion on Google provider options in upstashQueryTool.
Dependencies & Metadata
package.json, .github/agents/gpt-5-beast-mode.agent.md
Added @mastra/qdrant (^1.0.0-beta.3); updated framer-motion and motion to ^12.24.12; corrected tool namespace typo.
Storage State Files
src/mastra/public/storage/key_value_stores/default/*, src/mastra/public/storage/request_queues/default/*
Added crawler statistics, session pool state, and request queue metadata files (auto-generated storage artifacts).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A rabbit hops through config files, 🐰
Lance consolidates with style,
Processors batch, vectors refined,
Old paths replaced, new tools aligned,
The agents dance—optimization in a while!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: refactor vector storage configuration and implement LanceDB' accurately captures the main changes: migrating from Qdrant to LanceDB and refactoring vector storage configuration.
Description check ✅ Passed The description comprehensively outlines the key changes including LanceDB implementation, agent refinements, dependency updates, and new JSON artifacts, directly relating to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch develop

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ssdeanx, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on refactoring the application's vector storage configuration by introducing a new LanceDB implementation for enhanced vector storage and similarity search. It replaces an older LanceDB configuration and updates various AI agent settings, including their processing capabilities and available tools. The PR also adds new JSON files for tracking crawler statistics, managing session states, and handling request queues, improving overall system observability. Notably, while the PR description indicates the removal of Qdrant, the changes show a renaming of the Qdrant configuration file and the addition of its package dependency.

Highlights

  • Vector Storage Refactoring: Introduced a new LanceDB vector configuration (src/mastra/config/lance.ts) to support vector storage and similarity search, replacing the previous LanceDB configuration (src/mastra/config/vector/lance.ts). While the PR description states Qdrant was removed, the src/mastra/config/vector/qdrant.ts file was renamed to src/mastra/config/qdrant.ts and the @mastra/qdrant package was added to package.json.
  • Agent Configuration Updates: Modified several dane agents to use BatchPartsProcessor, adjusted TokenLimiterProcessor values, and streamlined the available tools by removing readPDF, fsTool, googleSearch, and imageTool, while also changing the thinkingLevel for Google provider options.
  • New Data Tracking Files: Introduced new JSON files to track crawler performance metrics (SDK_CRAWLER_STATISTICS_0.json), manage session states (SDK_SESSION_POOL_STATE.json), and handle request queue metadata (jBMmaaywUPlwSK3.json).
  • Dependency and Indexing Adjustments: Updated framer-motion and motion package versions, and changed the PostgreSQL vector index type from hnsw to ivfflat.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

🤖 I'm sorry @ssdeanx, but I was unable to process your request. Please see the logs for more details.

@ssdeanx ssdeanx merged commit e448a0b into main Jan 8, 2026
18 of 26 checks passed
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The LanceDB vector store is created with a hardcoded path ("/path/to/db"); consider wiring this up to LANCE_CONFIG.dbPath or an env var to avoid mismatches between configuration and actual storage location.
  • Both lanceStorage and vectorStore are initialized at module load time with await, which can introduce startup latency and coupling; consider lazy-initializing these in a factory or init function so callers can better control when the cost is paid and how errors are handled.
  • The BatchPartsProcessor configuration (batchSize, maxWaitTime, emitOnNonText) is duplicated across multiple agents in dane.ts; extracting this into a shared constant or helper would reduce repetition and keep these parameters consistent.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The LanceDB vector store is created with a hardcoded path (`"/path/to/db"`); consider wiring this up to `LANCE_CONFIG.dbPath` or an env var to avoid mismatches between configuration and actual storage location.
- Both `lanceStorage` and `vectorStore` are initialized at module load time with `await`, which can introduce startup latency and coupling; consider lazy-initializing these in a factory or init function so callers can better control when the cost is paid and how errors are handled.
- The `BatchPartsProcessor` configuration (batchSize, maxWaitTime, emitOnNonText) is duplicated across multiple agents in `dane.ts`; extracting this into a shared constant or helper would reduce repetition and keep these parameters consistent.

## Individual Comments

### Comment 1
<location> `src/mastra/config/lance.ts:5` </location>
<code_context>
+import { createVectorQueryTool, createGraphRAGTool } from '@mastra/rag'
+import { google } from '@ai-sdk/google'
+import { embedMany } from 'ai'
+import { log } from './logger
+import { Memory } from '@mastra/memory'
+
</code_context>

<issue_to_address>
**issue (bug_risk):** The `log` import is syntactically invalid and will prevent this file from compiling.

The line is missing the closing quote (and semicolon, plus `}` if it was originally there). It should be a valid import such as `import { log } from './logger'` so the module can parse and compile correctly.
</issue_to_address>

### Comment 2
<location> `src/mastra/config/pg-storage.ts:96-102` </location>
<code_context>
       // HNSW index configuration to support high-dimensional embeddings (>2000 dimensions)
       indexConfig: {
-        type: 'hnsw', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
+        type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
         metric: 'cosine', // Distance metric for normalized embeddings
-        hnsw: { m: parseFloat(process.env.PG_M ?? '32'), efConstruction: parseInt(process.env.PG_EF ?? '96') }, // hnsw configuration
</code_context>

<issue_to_address>
**suggestion:** The index type comment is now misleading after switching from HNSW to IVFFlat.

The inline comment still references HNSW constraints (`unlike HNSW limit of 2000`) even though the index type is now `'ivfflat'`, which can mislead future maintainers about what’s actually in use. Please update the comment to accurately describe IVFFlat (and any relevant tradeoffs) or remove the HNSW-specific wording.

```suggestion
      scope: 'resource', // 'resource' | 'thread'
      // IVFFlat index configuration for high-dimensional embeddings and scalable approximate search
      indexConfig: {
        type: 'ivfflat', // IVFFlat index type (approximate search over high-dimensional embeddings; tuned via ivf.lists)
        metric: 'cosine', // Distance metric for normalized embeddings
        ivf: { lists: 1080 }, // IVFFlat configuration (number of inverted lists; higher = better recall, more memory)
      },
```
</issue_to_address>

### Comment 3
<location> `src/mastra/config/lance.ts:46` </location>
<code_context>
+    LANCE_STORAGE_CONFIG.tablePrefix
+)
+
+const vectorStore = await LanceVectorStore.create("/path/to/db");
+
+
</code_context>

<issue_to_address>
**suggestion (bug_risk):** The Lance vector store is created with a hard-coded path instead of the defined `LANCE_CONFIG` settings.

`LANCE_CONFIG.dbPath` / `LANCE_CONFIG.tableName` are defined but `LanceVectorStore.create` uses the literal `"/path/to/db"`. Please derive this path (and table/collection name if applicable) from `LANCE_CONFIG` so Lance storage is configured from a single source of truth and doesn’t drift from the declared settings.

Suggested implementation:

```typescript
const vectorStore = await LanceVectorStore.create(LANCE_CONFIG.dbPath);

```

If `LanceVectorStore.create` in your codebase supports specifying a table/collection name, you should also update the call to include `LANCE_CONFIG.tableName`, for example:

- `LanceVectorStore.create(LANCE_CONFIG.dbPath, LANCE_CONFIG.tableName)`
  or
- `LanceVectorStore.create({ dbPath: LANCE_CONFIG.dbPath, tableName: LANCE_CONFIG.tableName })`

depending on the actual function signature used elsewhere in the project. Ensure any other places instantiating `LanceVectorStore` are updated to use `LANCE_CONFIG` so there is a single source of truth for LanceDB configuration.
</issue_to_address>

### Comment 4
<location> `src/mastra/config/lance.ts:56-65` </location>
<code_context>
+ */
+export async function generateEmbeddings(
+    texts: string[],
+    options: {
+        model?: string
+        dimensions?: number
+    } = {}
+): Promise<number[][]> {
+    try {
</code_context>

<issue_to_address>
**suggestion:** The `generateEmbeddings` options type suggests a string model, but the implementation uses the `google.textEmbedding` model object.

Here `options.model` is typed as `string`, but `LANCE_CONFIG.embeddingModel` is a `google.textEmbedding('gemini-embedding-001')` model object, so `embedMany` receives a `model` that may be either a string or that object while the type only declares `string`. Please update the type to match what `embedMany` actually accepts (e.g., a union of string/model or the concrete model type), or separate the model name from the model instance passed to `embedMany` to avoid type confusion.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +56 to +65
options: {
generateTitle: process.env.LANCE_THREAD_GENERATE_TITLE !== 'false',
// Message management
lastMessages: parseInt(process.env.LANCE_MEMORY_LAST_MESSAGES ?? '500'),
// Advanced semantic recall with LanceDB configuration
semanticRecall: {
topK: parseInt(process.env.LANCE_SEMANTIC_TOP_K ?? '5'),
messageRange: {
before: parseInt(
process.env.LANCE_SEMANTIC_RANGE_BEFORE ?? '3'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The generateEmbeddings options type suggests a string model, but the implementation uses the google.textEmbedding model object.

Here options.model is typed as string, but LANCE_CONFIG.embeddingModel is a google.textEmbedding('gemini-embedding-001') model object, so embedMany receives a model that may be either a string or that object while the type only declares string. Please update the type to match what embedMany actually accepts (e.g., a union of string/model or the concrete model type), or separate the model name from the model instance passed to embedMany to avoid type confusion.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a major refactoring of the vector storage configuration by replacing the previous implementation with LanceDB. It includes a new, comprehensive configuration file for LanceDB, updates to agents to use new processing capabilities, and adjustments to other storage configurations like PostgreSQL and Upstash.

My review focuses on the new LanceDB implementation, where I've found a few critical issues related to a syntax error and a hardcoded path that need to be addressed. I've also pointed out a potential type mismatch and opportunities to improve configuration clarity and logging. Additionally, there's a minor correction for a comment in the PostgreSQL storage configuration.

Overall, this is a significant and well-structured update. Addressing these points will help ensure the new LanceDB integration is robust and maintainable.

import { createVectorQueryTool, createGraphRAGTool } from '@mastra/rag'
import { google } from '@ai-sdk/google'
import { embedMany } from 'ai'
import { log } from './logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This import statement is missing a closing single quote, which will cause a syntax error and prevent the module from loading.

import { log } from './logger'

LANCE_STORAGE_CONFIG.tablePrefix
)

const vectorStore = await LanceVectorStore.create("/path/to/db");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The path "/path/to/db" is hardcoded for creating the LanceVectorStore. This appears to be a placeholder and will fail in any environment. This should use a configurable value, similar to how LANCE_CONFIG.dbPath is used elsewhere in this file.

const vectorStore = await LanceVectorStore.create(LANCE_CONFIG.dbPath);

} = {}
): Promise<number[][]> {
try {
const model = options.model ?? LANCE_CONFIG.embeddingModel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential type mismatch here. The options.model is typed as a string, but it's being assigned to the model variable which is then passed to embedMany. The embedMany function and LANCE_CONFIG.embeddingModel expect a GoogleTextEmbeddingModel object, not a string. If a string is passed via options.model, it will likely cause a runtime error. The type for options.model in the generateEmbeddings function signature should be updated to match the expected model object type.

},
scope: 'resource', // 'resource' | 'thread'
// LanceDB-specific index configuration
indexConfig: {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The indexConfig for semanticRecall is an empty object, but the preceding comment suggests that "LanceDB-specific index configuration" is expected. If you are intentionally using default settings, it would be clearer to add a comment to that effect. Otherwise, the necessary index configuration appears to be missing.

Comment on lines +117 to +120
lastMessages: parseInt(process.env.LANCE_MEMORY_LAST_MESSAGES ?? '500'),
semanticRecall: {
topK: parseInt(process.env.LANCE_SEMANTIC_TOP_K ?? '5'),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This log statement re-parses the environment variables, which is redundant and can lead to inconsistencies if the parsing logic differs from what's used in the lanceMemory configuration. It's better to log the values directly from the lanceMemory.options object to ensure the logs accurately reflect the running configuration.

        lastMessages: lanceMemory.options.lastMessages,
        semanticRecall: {
            topK: lanceMemory.options.semanticRecall.topK,
        },

// HNSW index configuration to support high-dimensional embeddings (>2000 dimensions)
indexConfig: {
type: 'hnsw', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment here is inaccurate. ivfflat is an 'Inverted File with Flat compression' index, not a 'flat index type'. A flat index performs an exhaustive search, while ivfflat partitions data for faster searching. Updating the comment will improve clarity for future maintenance.

Suggested change
type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
type: 'ivfflat', // IVFFlat index type, which is suitable for high-dimensional embeddings.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
src/mastra/agents/codingAgents.ts (3)

364-364: Inconsistent projectRoot usage across agents.

The refactoringAgent still computes and references projectRoot (lines 364, 373), while codeArchitectAgent has commented it out (line 49) and the other agents don't use it at all. This creates inconsistency across the agent configurations.

If the PR intends to phase out projectRoot references (as suggested by the changes in codeArchitectAgent), this agent should be updated similarly. Otherwise, if projectRoot is still needed for refactoringAgent, consider why it was removed from codeArchitectAgent.

♻️ Proposed fix to align with other agents
    const userTier = requestContext.get('user-tier') ?? 'free'
    const language = requestContext.get('language') ?? 'en'
-   const projectRoot = requestContext.get('projectRoot') ?? process.cwd()

    return {
      role: 'system',
      content: `You are a Senior Refactoring Specialist. Your role is to improve code quality through safe, incremental refactoring.

**Context:**
- User Tier: ${userTier}
- Language: ${language}
- Project Root: ${projectRoot}
**Context:**
- User Tier: ${userTier}
- Language: ${language}

Also applies to: 373-373


125-125: Consider extracting shared output processor configuration.

All four agents use identical outputProcessors configuration. Consider extracting this to a shared constant to improve maintainability and reduce duplication.

♻️ Proposed refactor

At the top of the file after imports:

const DEFAULT_OUTPUT_PROCESSORS = [
  new TokenLimiterProcessor(128000), 
  new BatchPartsProcessor({ batchSize: 20, maxWaitTime: 100, emitOnNonText: true })
];

Then in each agent:

-  outputProcessors: [new TokenLimiterProcessor(128000), new BatchPartsProcessor({ batchSize: 20, maxWaitTime: 100, emitOnNonText: true })]
+  outputProcessors: DEFAULT_OUTPUT_PROCESSORS

Note: Only apply this if agents are expected to share the same configuration long-term. If per-agent customization is anticipated, keep them separate.

Also applies to: 238-238, 350-350, 468-468


33-33: Remove commented-out code and ensure consistent projectRoot usage across agents.

The projectRoot field is declared in CodingRuntimeContext (line 33) but is commented out in codeArchitectAgent (line 49) while actively used in refactoringAgent (lines 364, 373). This inconsistency violates the guideline against maintaining commented-out code. Either remove the commented code and keep projectRoot only where needed, or implement it consistently across all agents that declare it in the interface.

src/mastra/config/upstash.ts (3)

4-4: Unused import after removing type assertion.

The GoogleGenerativeAIProviderOptions type is imported but no longer used since the satisfies assertion was removed. This creates dead code.

♻️ Remove unused import or restore type safety

Option 1: Remove unused import

-import type { GoogleGenerativeAIProviderOptions } from '@ai-sdk/google';

Option 2: Restore type assertion for type safety

   providerOptions: {
     google: { retrievalConfig: { dimensions: 1536 } },
-  },
+  } satisfies { google: GoogleGenerativeAIProviderOptions },

Also applies to: 145-147


150-151: Empty databaseConfig object serves no purpose.

The empty object databaseConfig: {} adds no value. Either populate it with the necessary Upstash configuration or remove it entirely.

♻️ Remove or populate databaseConfig
-  databaseConfig: {
-  },

Or add relevant configuration:

databaseConfig: {
  upstash: {
    // Add Upstash-specific configuration
  },
},

31-55: Documentation comments reference incorrect vector store.

The JSDoc comments mention "Pinecone" in multiple places (lines 35, 54), but the implementation uses UpstashVector. This inconsistency can confuse maintainers.

📝 Update documentation to match implementation
-* - Uses PineconeVector for semantic search with cloud-based vectors (768-dim gemini embeddings)
+* - Uses UpstashVector for semantic search with cloud-based vectors (1536-dim gemini embeddings)

-* @classdesc Shared memory instance for all agents using Upstash for storage and [Pinecone] for vector search
+* @classdesc Shared memory instance for all agents using Upstash for storage and UpstashVector for vector search
🤖 Fix all issues with AI agents
In @src/mastra/agents/codingAgents.ts:
- Line 49: Remove the dead commented-out line containing "const projectRoot =
requestContext.get('projectRoot') ?? process.cwd()" from the codingAgents
module; simply delete that commented code (the commented variable/projectRoot
reference) so the file no longer contains commented-out dead code and rely on
version control if the line is ever needed again.

In @src/mastra/agents/dane.ts:
- Around line 225-231: Extract the duplicated outputProcessors array into a
shared constant (e.g., defaultOutputProcessors) and replace each agent's inline
outputProcessors with that constant; specifically, define a top-level constant
that contains new TokenLimiterProcessor(128576) and new BatchPartsProcessor({
batchSize: 10, maxWaitTime: 75, emitOnNonText: true }) and then update each
agent's outputProcessors property to reference defaultOutputProcessors instead
of repeating the array.
- Around line 44-50: The outputProcessors array has inconsistent indentation and
line breaks; refactor the array so each processor entry is on its own line with
consistent indentation and commas (e.g., start the array on one line, place new
TokenLimiterProcessor(128576) and new BatchPartsProcessor({...}) each on their
own indented lines, ensure the BatchPartsProcessor object keys (batchSize,
maxWaitTime, emitOnNonText) are consistently indented and the closing brackets
and commas line up). Locate the outputProcessors declaration where
TokenLimiterProcessor and BatchPartsProcessor are used and apply the same
spacing/indentation style as other agent definitions for readability.

In @src/mastra/config/lance.ts:
- Line 5: The import statement for log is missing its closing quote causing a
syntax error; update the import of symbol "log" from module './logger' to a
properly terminated string (e.g., add the closing quote and end the statement)
so the file parses correctly.
- Around line 218-227: The validateLanceFilter function currently only rejects
null/non-object values; enhance it to recursively validate the structure of
LanceMetadataFilter by ensuring top-level entries are plain objects or allowed
primitive types, arrays contain only valid filter objects or primitives, and
nested objects follow the same rules; update validateLanceFilter to traverse
keys (e.g., field names) and for each value verify it is a
string/number/boolean, an array of those, or an object that itself passes the
same validation, and throw a descriptive Error when encountering unexpected
types or empty objects/arrays so malformed nested filters and array elements are
rejected.
- Line 46: The code currently calls LanceVectorStore.create with a hardcoded
placeholder path ("/path/to/db"); replace this with a configured value (e.g.,
read from process.env.LANCE_DB_PATH or a LANCE_CONFIG value) and pass that
variable into LanceVectorStore.create instead; also add a short runtime check
that the env/config value exists (throw or log a clear error and exit if
missing) so the vector store is not initialized with an unintended path.
- Around line 232-257: The model fallback in generateEmbeddings assigns
LANCE_CONFIG.embeddingModel (a TextEmbeddingModel) to a variable that can also
be a string, causing a type mismatch when passed to embedMany; change the
assignment so if options.model is provided (string) you convert it to a
TextEmbeddingModel (e.g., via google.textEmbedding(options.model)) and otherwise
use LANCE_CONFIG.embeddingModel, then pass that TextEmbeddingModel to embedMany
and update the log to report the model name accordingly; references: function
generateEmbeddings, options.model, LANCE_CONFIG.embeddingModel, and the
embedMany call.

In @src/mastra/config/pg-storage.ts:
- Around line 99-101: The log output still prints the old HNSW fields while the
actual index configuration uses type: 'ivfflat' and ivf: { lists: 1080 }; update
the logging block that prints the index config so it reflects the real keys and
values (log type: 'ivfflat' and the ivf.lists value) instead of the hnsw object,
and remove or conditionalize any code that prints hnsw.m / hnsw.efConstruction
so logs always match the active config (check the code that constructs/prints
the index config object and adjust the properties it reads to use ivf.lists and
type).
- Line 99: Update the misleading inline comment that sits next to the "type:
'ivfflat'" setting: replace the phrase "flat index type" with a note that
ivfflat is an approximate inverted-file index using flat quantization (inverted
lists with clustering), not a true brute-force flat index, so it performs
approximate ANN search rather than exact brute-force search.

In
@src/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.json:
- Around line 1-27: The committed JSON files like SDK_CRAWLER_STATISTICS_0.json,
SDK_SESSION_POOL_STATE.json and runtime request queue files in
src/mastra/public/storage are ephemeral and must be ignored: update .gitignore
to exclude src/mastra/public/storage/ (or add patterns
**/SDK_CRAWLER_STATISTICS_*.json, **/SDK_SESSION_POOL_STATE.json and
src/mastra/public/storage/request_queues/) and remove the already committed
files from the index (use git rm --cached <file> or git rm -r --cached
src/mastra/public/storage/) then commit the .gitignore change and the removal so
these runtime artifacts stop being tracked.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37c2f33 and 5b46b9d.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (12)
  • .github/agents/gpt-5-beast-mode.agent.md
  • package.json
  • src/mastra/agents/codingAgents.ts
  • src/mastra/agents/dane.ts
  • src/mastra/config/lance.ts
  • src/mastra/config/pg-storage.ts
  • src/mastra/config/qdrant.ts
  • src/mastra/config/upstash.ts
  • src/mastra/config/vector/lance.ts
  • src/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.json
  • src/mastra/public/storage/key_value_stores/default/SDK_SESSION_POOL_STATE.json
  • src/mastra/public/storage/request_queues/default/jBMmaaywUPlwSK3.json
💤 Files with no reviewable changes (1)
  • src/mastra/config/vector/lance.ts
🧰 Additional context used
📓 Path-based instructions (15)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit inference engine (.github/instructions/next-js.instructions.md)

**/*.{js,jsx,ts,tsx}: Use next/dynamic for dynamic imports to load components only when needed, improving initial load time.
Use next/image component for automatic image optimization, including lazy loading and responsive images.
Use React.memo to prevent unnecessary re-renders of components.
Use the <Link prefetch> tag to prefetch pages that are likely to be visited.
Use getServerSideProps, getStaticProps, or server components for fetching data on the server-side.
Use SWR or React Query for client-side data fetching and caching.
Use CSS Modules, Styled Components, or Tailwind CSS for component-level styling. Prefer Tailwind CSS for rapid development.
Use React Context, Zustand, Jotai, or Recoil for managing global state. Avoid Redux unless necessary.
Use react-hook-form for managing forms and validation.
Only fetch the data that is needed by the component to avoid over-fetching.
Avoid long-running synchronous operations in the main thread to prevent blocking.
Always use setState or hooks to update state instead of mutating state directly.
Include a complete dependency array in useEffect hooks to prevent unexpected behavior.
Avoid writing server-side code in client components to prevent exposing secrets or causing unexpected behavior.
Use try...catch blocks for handling errors in asynchronous operations.
Implement error boundary components using getDerivedStateFromError or componentDidCatch lifecycle methods.
Sanitize user input to prevent Cross-Site Scripting (XSS) attacks. Be especially careful when rendering HTML directly from user input.
Store authentication tokens in HTTP-only cookies or local storage securely.
Implement role-based access control to restrict access to sensitive resources.
Clean up event listeners and timers in useEffect hooks to avoid memory leaks.
Only update state when necessary to reduce the number of re-renders and improve performance.
Use immutable data structures and avoid mutating data directly to prevent unexpected...

Files:

  • src/mastra/config/qdrant.ts
  • src/mastra/agents/codingAgents.ts
  • src/mastra/config/pg-storage.ts
  • src/mastra/config/lance.ts
  • src/mastra/config/upstash.ts
  • src/mastra/agents/dane.ts
**/*.{js,ts}

📄 CodeRabbit inference engine (.github/instructions/next-js.instructions.md)

Use parameterized queries or an ORM to prevent SQL injection attacks.

Files:

  • src/mastra/config/qdrant.ts
  • src/mastra/agents/codingAgents.ts
  • src/mastra/config/pg-storage.ts
  • src/mastra/config/lance.ts
  • src/mastra/config/upstash.ts
  • src/mastra/agents/dane.ts
**/*.{ts,tsx,js,jsx,py,java,cs,rb,go,rs,cpp,c,h,hpp,swift,kotlin,php,scala,clj,groovy,lua,sh,bash}

📄 CodeRabbit inference engine (.github/instructions/self-explanatory-code-commenting.instructions.md)

**/*.{ts,tsx,js,jsx,py,java,cs,rb,go,rs,cpp,c,h,hpp,swift,kotlin,php,scala,clj,groovy,lua,sh,bash}: Write code that speaks for itself. Comment only when necessary to explain WHY, not WHAT. Avoid obvious comments that state what the code literally does.
Avoid redundant comments that simply repeat what the code is doing
Keep comments accurate and up-to-date with code changes. Remove or update outdated comments that no longer match the implementation.
Write comments for complex business logic that explain the WHY behind specific calculations or business rules
Document non-obvious algorithms with comments explaining the algorithm choice and its reasoning
Add comments explaining what regex patterns match, especially for complex patterns
Document API constraints, rate limits, gotchas, and external dependencies with explanatory comments
Avoid commenting out dead code. Use version control instead of maintaining commented code blocks.
Do not maintain code change history or modification logs as comments. Rely on git history and commit messages instead.
Avoid decorative divider comments (e.g., lines of equals signs or asterisks) for section separation
Ensure comments are placed appropriately above or adjacent to the code they describe
Write comments using proper grammar, spelling, and professional language
Prefer self-documenting code with clear variable/function names over adding comments to explain unclear code

Files:

  • src/mastra/config/qdrant.ts
  • src/mastra/agents/codingAgents.ts
  • src/mastra/config/pg-storage.ts
  • src/mastra/config/lance.ts
  • src/mastra/config/upstash.ts
  • src/mastra/agents/dane.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/instructions/self-explanatory-code-commenting.instructions.md)

**/*.{ts,tsx,js,jsx}: Document public APIs with TSDoc/JSDoc comments including parameter descriptions, return types, examples, and thrown exceptions
Add TSDoc comments to configuration constants and environment variables explaining their source, reasoning, or constraints
Use TSDoc annotation tags (TODO, FIXME, HACK, NOTE, WARNING, PERF, SECURITY, BUG, REFACTOR, DEPRECATED) to mark special comments
Include file headers with @fileoverview, @author, @copyright, and @license tags to document file purpose and ownership
Document function parameters with @param tags, return values with @returns tags, and exceptions with @throws tags in TSDoc comments
Use @see tags in TSDoc comments to reference related functions, methods, or documentation
Include @example tags in public API documentation with code examples showing typical usage

**/*.{ts,tsx,js,jsx}: Organize imports in the following order: (1) external framework imports (React, Next.js, Mastra), (2) type imports, (3) internal imports (config, tools, utils)
Use camelCase for functions and variables
Use PascalCase for classes, types, and interfaces
Use UPPER_SNAKE_CASE for constants
Use kebab-case for file names (e.g., weather-tool.ts, user-profile.tsx)
Enforce strict equality with === instead of ==
Always use curly braces for control flow statements
Prefer arrow functions over function declarations in callbacks and higher-order functions
Use const for variables that are not reassigned; use let only when necessary
Use object shorthand syntax (e.g., { name, age } instead of { name: name, age: age })
Implement structured error handling with try-catch blocks that return error objects or throw custom errors with context
Enforce ESLint rules for strict equality, curly braces, no unused variables, and no explicit any types

Files:

  • src/mastra/config/qdrant.ts
  • src/mastra/agents/codingAgents.ts
  • src/mastra/config/pg-storage.ts
  • src/mastra/config/lance.ts
  • src/mastra/config/upstash.ts
  • src/mastra/agents/dane.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/instructions/self-explanatory-code-commenting.instructions.md)

**/*.{ts,tsx}: Document interface and type definitions with TSDoc comments explaining their purpose and usage context
Document interface properties with /** */ comments explaining each field's purpose and constraints
Document generic type parameters with @template tags explaining what each type parameter represents
Use type guards with comments explaining the runtime validation logic being performed
Document advanced/complex TypeScript types with explanatory comments about their purpose and use cases

**/*.{ts,tsx}: Use interface for public APIs and type for internal definitions
Always use explicit return types for public functions
Never use any type; use unknown or proper type definitions instead
Use type-only imports with import type for TypeScript types
Use optional chaining (?.) for nullable access
Use nullish coalescing (??) for default values

Files:

  • src/mastra/config/qdrant.ts
  • src/mastra/agents/codingAgents.ts
  • src/mastra/config/pg-storage.ts
  • src/mastra/config/lance.ts
  • src/mastra/config/upstash.ts
  • src/mastra/agents/dane.ts
src/mastra/**/*

📄 CodeRabbit inference engine (src/AGENTS.md)

mastra modules can import from utils, but must not import from app or cli (except types)

Files:

  • src/mastra/config/qdrant.ts
  • src/mastra/agents/codingAgents.ts
  • src/mastra/public/storage/request_queues/default/jBMmaaywUPlwSK3.json
  • src/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.json
  • src/mastra/config/pg-storage.ts
  • src/mastra/public/storage/key_value_stores/default/SDK_SESSION_POOL_STATE.json
  • src/mastra/config/lance.ts
  • src/mastra/config/upstash.ts
  • src/mastra/agents/dane.ts
**/*.{js,ts,jsx,tsx,java,py,cs,go,rb,php,swift,kt,scala,rs,cpp,c,h}

📄 CodeRabbit inference engine (.github/instructions/code-review-generic.instructions.md)

**/*.{js,ts,jsx,tsx,java,py,cs,go,rb,php,swift,kt,scala,rs,cpp,c,h}: Use descriptive and meaningful names for variables, functions, and classes
Apply Single Responsibility Principle: each function/class does one thing well
Follow DRY (Don't Repeat Yourself): eliminate code duplication
Keep functions small and focused (ideally < 20-30 lines)
Avoid deeply nested code (max 3-4 levels)
Avoid magic numbers and strings; use named constants instead
Code should be self-documenting; use comments only when necessary
Implement proper error handling at appropriate levels with meaningful error messages
Avoid silent failures or ignored exceptions; fail fast and validate inputs early
Use appropriate error types/exceptions with meaningful context
Validate and sanitize all user inputs
Use parameterized queries for database access; never use string concatenation for SQL queries
Implement proper authentication checks before accessing resources
Verify user has permission to perform actions; implement proper authorization
Use established cryptographic libraries; never roll your own crypto implementation
Avoid N+1 query problems; use proper indexing and eager loading for database queries
Use appropriate algorithms with suitable time/space complexity for the use case
Utilize caching for expensive or repeated operations
Ensure proper cleanup of connections, files, and streams to prevent resource leaks
Implement pagination for large result sets
Load data only when needed (lazy loading pattern)
Document all public APIs with purpose, parameters, and return values
Add explanatory comments for non-obvious logic
No commented-out code or unresolved TODO comments without associated tickets should remain in commits
Ensure code follows consistent style and conventions with the rest of the codebase

Files:

  • src/mastra/config/qdrant.ts
  • src/mastra/agents/codingAgents.ts
  • src/mastra/config/pg-storage.ts
  • src/mastra/config/lance.ts
  • src/mastra/config/upstash.ts
  • src/mastra/agents/dane.ts
**/*.{js,ts,jsx,tsx,java,py,cs,go,rb,php,swift,kt,scala,rs,cpp,c,h,json,yaml,yml,env,config}

📄 CodeRabbit inference engine (.github/instructions/code-review-generic.instructions.md)

Never include passwords, API keys, tokens, or PII in code or logs

Files:

  • src/mastra/config/qdrant.ts
  • src/mastra/agents/codingAgents.ts
  • src/mastra/public/storage/request_queues/default/jBMmaaywUPlwSK3.json
  • package.json
  • src/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.json
  • src/mastra/config/pg-storage.ts
  • src/mastra/public/storage/key_value_stores/default/SDK_SESSION_POOL_STATE.json
  • src/mastra/config/lance.ts
  • src/mastra/config/upstash.ts
  • src/mastra/agents/dane.ts
**/*.{ts,tsx,java,cs,go,php,swift,kt,scala}

📄 CodeRabbit inference engine (.github/instructions/code-review-generic.instructions.md)

Prefer small, focused interfaces (Interface Segregation Principle)

Files:

  • src/mastra/config/qdrant.ts
  • src/mastra/agents/codingAgents.ts
  • src/mastra/config/pg-storage.ts
  • src/mastra/config/lance.ts
  • src/mastra/config/upstash.ts
  • src/mastra/agents/dane.ts
**/*.{js,mjs,cjs,ts,tsx,jsx,py,java,cs,go,rb,php,rs,cpp,c,h,hpp}

📄 CodeRabbit inference engine (.github/instructions/update-docs-on-code-change.instructions.md)

Use automated documentation generators for code documentation - JSDoc/TSDoc for JavaScript/TypeScript, Sphinx/pdoc for Python, Javadoc for Java, xmldoc for C#, godoc for Go, rustdoc for Rust

Files:

  • src/mastra/config/qdrant.ts
  • src/mastra/agents/codingAgents.ts
  • src/mastra/config/pg-storage.ts
  • src/mastra/config/lance.ts
  • src/mastra/config/upstash.ts
  • src/mastra/agents/dane.ts
src/mastra/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use maskSensitiveMessageData() helper from src/mastra/config/pg-storage.ts to mask secrets in logs

Files:

  • src/mastra/config/qdrant.ts
  • src/mastra/agents/codingAgents.ts
  • src/mastra/config/pg-storage.ts
  • src/mastra/config/lance.ts
  • src/mastra/config/upstash.ts
  • src/mastra/agents/dane.ts
src/mastra/agents/**/*.ts

📄 CodeRabbit inference engine (src/mastra/AGENTS.md)

Add agents under src/mastra/agents that wire tools together into higher-level behaviors

Agent implementations must use Agent constructor with id, name, description, instructions, model, tools, and memory

Files:

  • src/mastra/agents/codingAgents.ts
  • src/mastra/agents/dane.ts
**/{package.json,package-lock.json,yarn.lock,pom.xml,build.gradle,Pipfile,Pipfile.lock,requirements.txt,composer.json,composer.lock,pubspec.yaml,Gemfile,Gemfile.lock,go.mod,go.sum,Cargo.toml,Cargo.lock}

📄 CodeRabbit inference engine (.github/instructions/code-review-generic.instructions.md)

Check dependencies for known vulnerabilities and keep them up-to-date

Files:

  • package.json
src/mastra/config/pg-storage.ts

📄 CodeRabbit inference engine (src/mastra/config/AGENTS.md)

src/mastra/config/pg-storage.ts: PostgreSQL storage configuration must include PgVector support for embeddings, memory, and thread persistence
Database connection must use the SUPABASE environment variable with optional performance tuning via DB_MAX_CONNECTIONS, DB_IDLE_TIMEOUT, and DB_CONNECTION_TIMEOUT
Memory configuration must support MEMORY_LAST_MESSAGES, SEMANTIC_TOP_K, SEMANTIC_RANGE_BEFORE, SEMANTIC_RANGE_AFTER, and LISTS environment variables
RAG configuration must support GRAPH_THRESHOLD, GRAPH_RANDOM_WALK_STEPS, GRAPH_RESTART_PROB, PG_MIN_SCORE, and PG_EF environment variables

Files:

  • src/mastra/config/pg-storage.ts
src/mastra/config/upstash.ts

📄 CodeRabbit inference engine (src/mastra/config/AGENTS.md)

Upstash Redis configuration must support caching and session storage functionality

Files:

  • src/mastra/config/upstash.ts
🪛 Biome (2.1.2)
src/mastra/config/lance.ts

[error] 5-5: unterminated string literal

The closing quote must be on the same line.

(parse)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: CodeQL analysis (javascript-typescript)
  • GitHub Check: Agent
  • GitHub Check: Sourcery review
  • GitHub Check: Codacy Security Scan
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (10)
src/mastra/public/storage/request_queues/default/jBMmaaywUPlwSK3.json (1)

1-9: Verify that this runtime artifact should be committed to version control.

This file appears to be a request queue state entry—a runtime artifact generated during crawling/processing. Such files are typically ephemeral and not suitable for version control. Confirm whether:

  1. This is intentionally committed as a test fixture or example.
  2. Request queue files should be added to .gitignore to prevent accidental commits of runtime state.

If this is meant to remain, consider moving it to a dedicated fixtures/ or test/ directory with clear documentation on its purpose.

.github/agents/gpt-5-beast-mode.agent.md (1)

3-3: LGTM!

The tools list configuration appears correct and properly formatted.

package.json (2)

131-131: LGTM on minor version updates.

The minor version bumps for framer-motion and motion (from 12.24.11 to 12.24.12) are reasonable patch-level updates.

Also applies to: 145-145


69-69: @mastra/qdrant is using the latest available beta version and is consistent with the project's dependency strategy.

Version 1.0.0-beta.3 is the current beta release on npm and is not deprecated. All @mastra packages in this project are beta versions (e.g., @mastra/core@^1.0.0-beta.20, @mastra/deployer@^1.0.0-beta.20, mastra@^1.0.0-beta.13), indicating this is an intentional and consistent approach to the @mastra ecosystem. No known security vulnerabilities were found.

src/mastra/public/storage/key_value_stores/default/SDK_SESSION_POOL_STATE.json (1)

1-26: No action needed—this file is intentionally tracked in version control.

The JSON files in src/mastra/public/storage/ (including SDK_SESSION_POOL_STATE.json and similar files like SDK_CRAWLER_STATISTICS_0.json) are committed to git and have no .gitignore exclusion patterns. This is consistent with them being seed data, test fixtures, or expected state files required by the SDK, not accidental runtime artifacts.

Likely an incorrect or invalid review comment.

src/mastra/config/qdrant.ts (1)

5-5: The import path change is correct and the logger module exists at the new location with the expected exports.

src/mastra/agents/dane.ts (1)

197-202: LGTM on tool and configuration updates.

The switch to webScraperTool with updated instructions and thinkingLevel: 'low' aligns with the PR objectives. The tool reference in the instructions matches the actual tool in the tools object.

Also applies to: 210-210

src/mastra/config/lance.ts (3)

52-111: Memory configuration looks comprehensive.

The lanceMemory configuration follows the same pattern as pgMemory with proper storage, vector, and embedder setup. The working memory template is well-structured for user context tracking.


289-322: Good error handling pattern with timing metrics.

The performStorageOperation wrapper provides consistent logging, timing, and error handling for storage operations. This is a solid observability pattern.


40-46: No issues found. The project is already configured for ES modules with "type": "module" in package.json and "module": "esnext" in tsconfig.json, which properly supports the top-level await pattern used in this file.

const userTier = requestContext.get('user-tier') ?? 'free'
const language = requestContext.get('language') ?? 'en'
const projectRoot = requestContext.get('projectRoot') ?? process.cwd()
// const projectRoot = requestContext.get('projectRoot') ?? process.cwd()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove commented-out code.

Commented-out code should be removed rather than left in the codebase. Version control preserves the history if needed.

As per coding guidelines: "Avoid commenting out dead code. Use version control instead of maintaining commented code blocks."

🧹 Proposed fix
-   // const projectRoot = requestContext.get('projectRoot') ?? process.cwd()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// const projectRoot = requestContext.get('projectRoot') ?? process.cwd()
🤖 Prompt for AI Agents
In @src/mastra/agents/codingAgents.ts at line 49, Remove the dead commented-out
line containing "const projectRoot = requestContext.get('projectRoot') ??
process.cwd()" from the codingAgents module; simply delete that commented code
(the commented variable/projectRoot reference) so the file no longer contains
commented-out dead code and rely on version control if the line is ever needed
again.

Comment on lines +44 to +50
outputProcessors: [new TokenLimiterProcessor(128576),
new BatchPartsProcessor({
batchSize: 10,
maxWaitTime: 75,
emitOnNonText: true
}),
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Inconsistent formatting in outputProcessors array.

The array formatting and indentation is inconsistent across agent definitions. Consider aligning the formatting for better readability and maintainability.

♻️ Suggested formatting
-  outputProcessors: [new TokenLimiterProcessor(128576),
-    new BatchPartsProcessor({
-          batchSize: 10,
-          maxWaitTime: 75,
-          emitOnNonText: true
-        }),
-  ]
+  outputProcessors: [
+    new TokenLimiterProcessor(128576),
+    new BatchPartsProcessor({
+      batchSize: 10,
+      maxWaitTime: 75,
+      emitOnNonText: true,
+    }),
+  ],
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
outputProcessors: [new TokenLimiterProcessor(128576),
new BatchPartsProcessor({
batchSize: 10,
maxWaitTime: 75,
emitOnNonText: true
}),
]
outputProcessors: [
new TokenLimiterProcessor(128576),
new BatchPartsProcessor({
batchSize: 10,
maxWaitTime: 75,
emitOnNonText: true,
}),
],
🤖 Prompt for AI Agents
In @src/mastra/agents/dane.ts around lines 44 - 50, The outputProcessors array
has inconsistent indentation and line breaks; refactor the array so each
processor entry is on its own line with consistent indentation and commas (e.g.,
start the array on one line, place new TokenLimiterProcessor(128576) and new
BatchPartsProcessor({...}) each on their own indented lines, ensure the
BatchPartsProcessor object keys (batchSize, maxWaitTime, emitOnNonText) are
consistently indented and the closing brackets and commas line up). Locate the
outputProcessors declaration where TokenLimiterProcessor and BatchPartsProcessor
are used and apply the same spacing/indentation style as other agent definitions
for readability.

Comment on lines +225 to +231
outputProcessors: [new TokenLimiterProcessor(128576),
new BatchPartsProcessor({
batchSize: 10,
maxWaitTime: 75,
emitOnNonText: true
}),
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider extracting shared outputProcessors configuration.

The same outputProcessors configuration is duplicated across all 5 agents. Consider extracting to a shared constant to improve maintainability and ensure consistency.

♻️ Extract shared configuration
// At the top of the file, after imports
const defaultOutputProcessors = [
  new TokenLimiterProcessor(128576),
  new BatchPartsProcessor({
    batchSize: 10,
    maxWaitTime: 75,
    emitOnNonText: true,
  }),
];

// Then in each agent:
outputProcessors: defaultOutputProcessors,
🤖 Prompt for AI Agents
In @src/mastra/agents/dane.ts around lines 225 - 231, Extract the duplicated
outputProcessors array into a shared constant (e.g., defaultOutputProcessors)
and replace each agent's inline outputProcessors with that constant;
specifically, define a top-level constant that contains new
TokenLimiterProcessor(128576) and new BatchPartsProcessor({ batchSize: 10,
maxWaitTime: 75, emitOnNonText: true }) and then update each agent's
outputProcessors property to reference defaultOutputProcessors instead of
repeating the array.

import { createVectorQueryTool, createGraphRAGTool } from '@mastra/rag'
import { google } from '@ai-sdk/google'
import { embedMany } from 'ai'
import { log } from './logger
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Unterminated string literal causes syntax error.

The import statement is missing a closing quote, which will cause a parse error and prevent the module from loading.

🐛 Fix the import statement
-import { log } from './logger
+import { log } from './logger'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import { log } from './logger
import { log } from './logger'
🧰 Tools
🪛 Biome (2.1.2)

[error] 5-5: unterminated string literal

The closing quote must be on the same line.

(parse)

🤖 Prompt for AI Agents
In @src/mastra/config/lance.ts at line 5, The import statement for log is
missing its closing quote causing a syntax error; update the import of symbol
"log" from module './logger' to a properly terminated string (e.g., add the
closing quote and end the statement) so the file parses correctly.

LANCE_STORAGE_CONFIG.tablePrefix
)

const vectorStore = await LanceVectorStore.create("/path/to/db");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Hardcoded placeholder path will cause incorrect behavior.

The path "/path/to/db" appears to be a placeholder that was not updated. This will create the vector store in an unintended location and likely cause runtime failures or data loss.

🐛 Use environment variable or LANCE_CONFIG
-const vectorStore = await LanceVectorStore.create("/path/to/db");
+const vectorStore = await LanceVectorStore.create(LANCE_CONFIG.dbPath);

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @src/mastra/config/lance.ts at line 46, The code currently calls
LanceVectorStore.create with a hardcoded placeholder path ("/path/to/db");
replace this with a configured value (e.g., read from process.env.LANCE_DB_PATH
or a LANCE_CONFIG value) and pass that variable into LanceVectorStore.create
instead; also add a short runtime check that the env/config value exists (throw
or log a clear error and exit if missing) so the vector store is not initialized
with an unintended path.

Comment on lines +218 to +227
export function validateLanceFilter(
filter: LanceMetadataFilter
): LanceMetadataFilter {
// Basic validation - LanceDB has specific requirements for indexed fields
if (filter === null || typeof filter !== 'object') {
throw new Error('Filter must be a valid object')
}

return filter
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Null check is overly permissive and may miss edge cases.

The validation only checks for null or non-object types, but doesn't validate the structure of nested filters or array elements. This could allow malformed filters to pass through.

♻️ Enhanced validation
 export function validateLanceFilter(
     filter: LanceMetadataFilter
 ): LanceMetadataFilter {
     // Basic validation - LanceDB has specific requirements for indexed fields
-    if (filter === null || typeof filter !== 'object') {
+    if (filter === null || filter === undefined || typeof filter !== 'object') {
         throw new Error('Filter must be a valid object')
     }
+
+    // Validate no empty object (common mistake)
+    if (Object.keys(filter).length === 0) {
+        throw new Error('Filter cannot be an empty object')
+    }
 
     return filter
 }
🤖 Prompt for AI Agents
In @src/mastra/config/lance.ts around lines 218 - 227, The validateLanceFilter
function currently only rejects null/non-object values; enhance it to
recursively validate the structure of LanceMetadataFilter by ensuring top-level
entries are plain objects or allowed primitive types, arrays contain only valid
filter objects or primitives, and nested objects follow the same rules; update
validateLanceFilter to traverse keys (e.g., field names) and for each value
verify it is a string/number/boolean, an array of those, or an object that
itself passes the same validation, and throw a descriptive Error when
encountering unexpected types or empty objects/arrays so malformed nested
filters and array elements are rejected.

Comment on lines +232 to +257
export async function generateEmbeddings(
texts: string[],
options: {
model?: string
dimensions?: number
} = {}
): Promise<number[][]> {
try {
const model = options.model ?? LANCE_CONFIG.embeddingModel
const { embeddings } = await embedMany({
values: texts,
model,
})

log.info('Embeddings generated successfully', {
textCount: texts.length,
embeddingDimension: embeddings[0]?.length,
model: options.model ?? 'gemini-embedding-001',
})

return embeddings
} catch (error) {
log.error('Failed to generate embeddings', { error: String(error) })
throw error
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Type mismatch: model parameter expects string but receives embedding model object.

The options.model has type string | undefined, but LANCE_CONFIG.embeddingModel is a TextEmbeddingModel object from the Google SDK. The fallback assignment on line 240 will produce a type error or unexpected behavior.

🐛 Fix the model type handling
 export async function generateEmbeddings(
     texts: string[],
     options: {
-        model?: string
+        model?: Parameters<typeof embedMany>[0]['model']
         dimensions?: number
     } = {}
 ): Promise<number[][]> {
     try {
-        const model = options.model ?? LANCE_CONFIG.embeddingModel
+        const model = options.model ?? LANCE_CONFIG.embeddingModel
         const { embeddings } = await embedMany({
             values: texts,
-            model,
+            model: model,
         })

If string model names are desired, use:

const model = options.model 
    ? google.textEmbedding(options.model) 
    : LANCE_CONFIG.embeddingModel
🤖 Prompt for AI Agents
In @src/mastra/config/lance.ts around lines 232 - 257, The model fallback in
generateEmbeddings assigns LANCE_CONFIG.embeddingModel (a TextEmbeddingModel) to
a variable that can also be a string, causing a type mismatch when passed to
embedMany; change the assignment so if options.model is provided (string) you
convert it to a TextEmbeddingModel (e.g., via
google.textEmbedding(options.model)) and otherwise use
LANCE_CONFIG.embeddingModel, then pass that TextEmbeddingModel to embedMany and
update the log to report the model name accordingly; references: function
generateEmbeddings, options.model, LANCE_CONFIG.embeddingModel, and the
embedMany call.

Comment on lines +99 to +101
type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
metric: 'cosine', // Distance metric for normalized embeddings
hnsw: { m: parseFloat(process.env.PG_M ?? '32'), efConstruction: parseInt(process.env.PG_EF ?? '96') }, // hnsw configuration
ivf: { lists: 1080 }, // IVFFlat configuration
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Log output doesn't match actual configuration.

The configuration was changed from HNSW to IVFFlat (lines 99-101), but the log output (lines 142-145) still displays the old HNSW configuration. This creates a critical observability issue where logs will mislead debugging and monitoring efforts.

Actual config (lines 99-101):

  • type: 'ivfflat'
  • ivf: { lists: 1080 }

Log output (lines 142-145):

  • type: 'hnsw'
  • hnsw: { m: 16, efConstruction: 64 }
🔧 Proposed fix to align log output with actual configuration
   memoryOptions: {
     lastMessages: parseInt(process.env.MEMORY_LAST_MESSAGES ?? '500'),
     semanticRecall: {
       topK: parseInt(process.env.SEMANTIC_TOP_K ?? '5'),
       messageRange: {
         before: parseInt(process.env.SEMANTIC_RANGE_BEFORE ?? '3'),
         after: parseInt(process.env.SEMANTIC_RANGE_AFTER ?? '2'),
       },
       scope: 'resource',
       indexConfig: {
-        type: 'hnsw',
+        type: 'ivfflat',
         metric: 'cosine',
-        hnsw: { m: 16, efConstruction: 64 } // hnsw configuration
+        ivf: { lists: 1080 } // IVFFlat configuration
       }
     },

Also applies to: 142-145

🤖 Prompt for AI Agents
In @src/mastra/config/pg-storage.ts around lines 99 - 101, The log output still
prints the old HNSW fields while the actual index configuration uses type:
'ivfflat' and ivf: { lists: 1080 }; update the logging block that prints the
index config so it reflects the real keys and values (log type: 'ivfflat' and
the ivf.lists value) instead of the hnsw object, and remove or conditionalize
any code that prints hnsw.m / hnsw.efConstruction so logs always match the
active config (check the code that constructs/prints the index config object and
adjust the properties it reads to use ivf.lists and type).

// HNSW index configuration to support high-dimensional embeddings (>2000 dimensions)
indexConfig: {
type: 'hnsw', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading comment: IVFFlat is not the same as "flat index".

The comment states "flat index type" but ivfflat is an inverted file index with flat quantization, not a simple flat/brute-force index. IVFFlat uses clustering for approximate nearest neighbor search, whereas a true flat index (like pgvector's vector type without an index) performs exact brute-force search.

The comment should clarify that IVFFlat is an approximate search method using inverted lists, not exact flat search.

📝 Suggested comment correction
-        type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
+        type: 'ivfflat', // IVFFlat approximate index (supports dimensions > 2000, unlike HNSW's 2000 limit)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
type: 'ivfflat', // IVFFlat approximate index (supports dimensions > 2000, unlike HNSW's 2000 limit)
🤖 Prompt for AI Agents
In @src/mastra/config/pg-storage.ts at line 99, Update the misleading inline
comment that sits next to the "type: 'ivfflat'" setting: replace the phrase
"flat index type" with a note that ivfflat is an approximate inverted-file index
using flat quantization (inverted lists with clustering), not a true brute-force
flat index, so it performs approximate ANN search rather than exact brute-force
search.

Comment on lines +1 to +27
{
"requestsFinished": 1,
"requestsFailed": 0,
"requestsRetries": 0,
"requestsFailedPerMinute": 0,
"requestsFinishedPerMinute": 16,
"requestMinDurationMillis": 2907,
"requestMaxDurationMillis": 2907,
"requestTotalFailedDurationMillis": 0,
"requestTotalFinishedDurationMillis": 2907,
"crawlerStartedAt": "2026-01-08T14:09:03.077Z",
"crawlerFinishedAt": "2026-01-08T14:09:06.880Z",
"statsPersistedAt": "2026-01-08T14:09:06.880Z",
"crawlerRuntimeMillis": 3850,
"crawlerLastStartTimestamp": 1767881343030,
"requestRetryHistogram": [
1
],
"statsId": 0,
"requestAvgFailedDurationMillis": null,
"requestAvgFinishedDurationMillis": 2907,
"requestTotalDurationMillis": 2907,
"requestsTotal": 1,
"requestsWithStatusCode": {},
"errors": {},
"retryErrors": {}
} No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check .gitignore content for patterns related to storage or SDK_CRAWLER
cat .gitignore | head -50
echo "---"
grep -E "(storage|SDK_CRAWLER|/public/)" .gitignore || echo "No storage/SDK_CRAWLER patterns found"

Repository: ssdeanx/AgentStack

Length of output: 527


🏁 Script executed:

# Check if there are multiple SDK_CRAWLER_STATISTICS files in the repo
find . -name "SDK_CRAWLER_STATISTICS_*.json" -type f

Repository: ssdeanx/AgentStack

Length of output: 144


🏁 Script executed:

# Check the directory structure for storage
ls -la src/mastra/public/storage/key_value_stores/default/ 2>/dev/null | head -20

Repository: ssdeanx/AgentStack

Length of output: 324


🏁 Script executed:

# Search for references to SDK_CRAWLER_STATISTICS or storage in code
rg "SDK_CRAWLER_STATISTICS|storage/key_value" --type ts --type js -B 2 -A 2 | head -50

Repository: ssdeanx/AgentStack

Length of output: 44


🏁 Script executed:

# Check if there are other state/storage directories similar to this one
find src/mastra -type d -name "storage" -o -name "key_value_stores"

Repository: ssdeanx/AgentStack

Length of output: 130


🏁 Script executed:

# Look for any README or documentation in the storage directory
find src/mastra/public/storage -name "README*" -o -name "*.md"

Repository: ssdeanx/AgentStack

Length of output: 44


🏁 Script executed:

# Check git status to see if these files are recently added or changed
git log --oneline -n 20 -- "src/mastra/public/storage/" 2>/dev/null || echo "No git history available"

Repository: ssdeanx/AgentStack

Length of output: 135


🏁 Script executed:

# Check what the storage directory is used for - look at parent structure
ls -la src/mastra/public/

Repository: ssdeanx/AgentStack

Length of output: 229


🏁 Script executed:

# Search for any code that might generate or use these JSON files
rg "storage|SESSION_POOL|CRAWLER" --type ts --type js src/mastra/ | head -30

Repository: ssdeanx/AgentStack

Length of output: 2973


🏁 Script executed:

# Check if public/storage should be part of build output or runtime-generated
cat src/mastra/public/.gitignore 2>/dev/null || echo "No .gitignore in public/"

Repository: ssdeanx/AgentStack

Length of output: 86


🏁 Script executed:

# Search for any references to the public/storage directory itself in configs or build files
rg "public/storage" --type ts --type js --type json --type yml --type yaml --type config

Repository: ssdeanx/AgentStack

Length of output: 89


🏁 Script executed:

# Check if these files appear in package.json scripts or build configurations
cat package.json | grep -A 10 -B 10 "storage\|crawler" || echo "No matches in package.json"

Repository: ssdeanx/AgentStack

Length of output: 88


🏁 Script executed:

# Look for any test setup or initialization that might create these files
find . -name "*.test.ts" -o -name "*.spec.ts" -o -name "setup.ts" | xargs grep -l "storage\|SDK_CRAWLER" 2>/dev/null || echo "No test files found"

Repository: ssdeanx/AgentStack

Length of output: 81


🏁 Script executed:

# Check the git history of the specific file to understand when/why it was added
git log --all --full-history -- "src/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.json" | head -20

Repository: ssdeanx/AgentStack

Length of output: 1303


🏁 Script executed:

# Check if there's a .gitkeep or similar file in storage directories
find src/mastra/public/storage -type f

Repository: ssdeanx/AgentStack

Length of output: 291


🏁 Script executed:

# Look for any documentation or README that explains the purpose of public/storage
find src/mastra -name "README*" | xargs cat 2>/dev/null | grep -i "storage" -A 2 -B 2 || echo "No storage documentation found"

Repository: ssdeanx/AgentStack

Length of output: 1854


Remove runtime-generated crawler and session state files from version control.

These JSON files (SDK_CRAWLER_STATISTICS_0.json, SDK_SESSION_POOL_STATE.json, and request queue files) are generated by the SDK crawler at runtime and contain ephemeral state and metrics that will differ between environments and runs. Committing them causes unnecessary merge conflicts and pollutes the repository.

Add to .gitignore:

src/mastra/public/storage/

or more specifically:

**/SDK_CRAWLER_STATISTICS_*.json
**/SDK_SESSION_POOL_STATE.json
src/mastra/public/storage/request_queues/
🤖 Prompt for AI Agents
In
@src/mastra/public/storage/key_value_stores/default/SDK_CRAWLER_STATISTICS_0.json
around lines 1 - 27, The committed JSON files like
SDK_CRAWLER_STATISTICS_0.json, SDK_SESSION_POOL_STATE.json and runtime request
queue files in src/mastra/public/storage are ephemeral and must be ignored:
update .gitignore to exclude src/mastra/public/storage/ (or add patterns
**/SDK_CRAWLER_STATISTICS_*.json, **/SDK_SESSION_POOL_STATE.json and
src/mastra/public/storage/request_queues/) and remove the already committed
files from the index (use git rm --cached <file> or git rm -r --cached
src/mastra/public/storage/) then commit the .gitignore change and the removal so
these runtime artifacts stop being tracked.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request aims to refactor vector storage configuration by implementing LanceDB as an alternative vector storage solution. The PR adds a new lance.ts configuration file, updates several agent configurations with new processor settings, and includes runtime-generated JSON files for crawler statistics and session management.

Key changes:

  • New LanceDB vector storage implementation in src/mastra/config/lance.ts
  • Agent configuration updates including reduced token limits and modified thinking levels
  • Addition of @mastra/qdrant package and version bumps for framer-motion and motion

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
src/mastra/config/lance.ts New LanceDB configuration with vector storage, memory, and RAG tools
src/mastra/config/qdrant.ts Fixed import path from ../logger to ./logger
src/mastra/config/pg-storage.ts Changed index type from HNSW to IVFFlat with outdated documentation
src/mastra/config/upstash.ts Removed satisfies type assertion from providerOptions
src/mastra/agents/dane.ts Reduced TokenLimiter from 1MB to ~125KB, added BatchPartsProcessor, changed thinking level from high to low
src/mastra/agents/codingAgents.ts Removed projectRoot context variable from instructions
package.json Added @mastra/qdrant dependency, updated framer-motion and motion versions
package-lock.json Lockfile updates for new dependencies
src/mastra/public/storage/*.json Runtime-generated crawler and session state files
.github/agents/gpt-5-beast-mode.agent.md Reordered tools list

model: googleAIFlashLite,
memory: pgMemory,
outputProcessors: [new TokenLimiterProcessor(1048576)]
outputProcessors: [new TokenLimiterProcessor(128576),
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TokenLimiterProcessor limit has been reduced from 1048576 (1MB) to 128576 (~125KB), which is approximately an 88% reduction. This significant change could cause truncation of larger responses or context. This should be documented or explained as it may affect the agent's ability to handle complex conversations or large code contexts.

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +111
export const lanceMemory = new Memory({
storage: lanceStorage,
vector: vectorStore,
embedder: google.textEmbedding('gemini-embedding-001'),
options: {
generateTitle: process.env.LANCE_THREAD_GENERATE_TITLE !== 'false',
// Message management
lastMessages: parseInt(process.env.LANCE_MEMORY_LAST_MESSAGES ?? '500'),
// Advanced semantic recall with LanceDB configuration
semanticRecall: {
topK: parseInt(process.env.LANCE_SEMANTIC_TOP_K ?? '5'),
messageRange: {
before: parseInt(
process.env.LANCE_SEMANTIC_RANGE_BEFORE ?? '3'
),
after: parseInt(process.env.LANCE_SEMANTIC_RANGE_AFTER ?? '2'),
},
scope: 'resource', // 'resource' | 'thread'
// LanceDB-specific index configuration
indexConfig: {},
},
// Enhanced working memory with supported template
workingMemory: {
enabled: true,
scope: 'resource', // 'resource' | 'thread'
version: 'vnext', // Enable the improved/experimental tool
template: `# User Profile & Context
## Personal Information
- **Name**: [To be learned]
- **Role/Title**: [To be learned]
- **Organization**: [To be learned]
- **Location**: [To be learned]
- **Time Zone**: [To be learned]

## Communication Preferences
- **Preferred Communication Style**: [To be learned]
- **Response Length Preference**: [To be learned]
- **Technical Level**: [To be learned]

## Current Context
- **Active Projects**: [To be learned]
- **Current Goals**: [To be learned]
- **Recent Activities**: [To be learned]
- **Pain Points**: [To be learned]

## Long-term Memory
- **Key Achievements**: [To be learned]
- **Important Relationships**: [To be learned]
- **Recurring Patterns**: [To be learned]
- **Preferences & Habits**: [To be learned]

## Session Notes
- **Today's Focus**: [To be learned]
- **Outstanding Questions**: [To be learned]
- **Action Items**: [To be learned]
- **Follow-ups Needed**: [To be learned]
`,
},
},
})
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The processors array with TokenLimiter has been removed from lanceMemory configuration. The old implementation included processors: [new TokenLimiter(1048576)] to limit message token counts. Without this processor, the memory system may not properly manage token limits, potentially leading to context overflow or API errors when dealing with large conversations.

Copilot uses AI. Check for mistakes.
vector: vectorStore,
embedder: google.textEmbedding('gemini-embedding-001'),
options: {
generateTitle: process.env.LANCE_THREAD_GENERATE_TITLE !== 'false',
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generateTitle option is incorrectly placed at the root level of options. Based on the pg-storage.ts pattern and Mastra Memory API, generateTitle should be nested under the threads property. The current placement will likely be ignored or cause a type error.

Suggested change
generateTitle: process.env.LANCE_THREAD_GENERATE_TITLE !== 'false',
threads: {
generateTitle: process.env.LANCE_THREAD_GENERATE_TITLE !== 'false',
},

Copilot uses AI. Check for mistakes.
import { createVectorQueryTool, createGraphRAGTool } from '@mastra/rag'
import { google } from '@ai-sdk/google'
import { embedMany } from 'ai'
import { log } from './logger
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing closing single quote on the import statement. This will cause a syntax error that prevents the module from loading.

Suggested change
import { log } from './logger
import { log } from './logger'

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +44
export const lanceStorage = await LanceStorage.create(
LANCE_STORAGE_CONFIG.dbUri,
LANCE_STORAGE_CONFIG.storageName,
LANCE_STORAGE_CONFIG.tablePrefix
)
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-level await is being used outside of an async context. This code will only work in ES modules with top-level await support. The initialization should be wrapped in an async function or these should be initialized lazily to avoid blocking module loading and potential runtime errors.

Copilot uses AI. Check for mistakes.
}> {
return messages.map((message) => ({
id: message.id,
content: message.content,
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatStorageMessages function no longer masks sensitive data from message content. The previous version called maskSensitiveMessageData to redact passwords, secrets, tokens, and API keys before storing messages. This removal could lead to sensitive information being logged or stored in plaintext, which is a security risk.

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +26
"requestsFinished": 1,
"requestsFailed": 0,
"requestsRetries": 0,
"requestsFailedPerMinute": 0,
"requestsFinishedPerMinute": 16,
"requestMinDurationMillis": 2907,
"requestMaxDurationMillis": 2907,
"requestTotalFailedDurationMillis": 0,
"requestTotalFinishedDurationMillis": 2907,
"crawlerStartedAt": "2026-01-08T14:09:03.077Z",
"crawlerFinishedAt": "2026-01-08T14:09:06.880Z",
"statsPersistedAt": "2026-01-08T14:09:06.880Z",
"crawlerRuntimeMillis": 3850,
"crawlerLastStartTimestamp": 1767881343030,
"requestRetryHistogram": [
1
],
"statsId": 0,
"requestAvgFailedDurationMillis": null,
"requestAvgFinishedDurationMillis": 2907,
"requestTotalDurationMillis": 2907,
"requestsTotal": 1,
"requestsWithStatusCode": {},
"errors": {},
"retryErrors": {}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be runtime-generated crawler statistics that should not be committed to version control. Statistics files contain runtime-specific metrics and timestamps ("2026-01-08") that are ephemeral and should be regenerated for each environment. This file should be added to .gitignore.

Suggested change
"requestsFinished": 1,
"requestsFailed": 0,
"requestsRetries": 0,
"requestsFailedPerMinute": 0,
"requestsFinishedPerMinute": 16,
"requestMinDurationMillis": 2907,
"requestMaxDurationMillis": 2907,
"requestTotalFailedDurationMillis": 0,
"requestTotalFinishedDurationMillis": 2907,
"crawlerStartedAt": "2026-01-08T14:09:03.077Z",
"crawlerFinishedAt": "2026-01-08T14:09:06.880Z",
"statsPersistedAt": "2026-01-08T14:09:06.880Z",
"crawlerRuntimeMillis": 3850,
"crawlerLastStartTimestamp": 1767881343030,
"requestRetryHistogram": [
1
],
"statsId": 0,
"requestAvgFailedDurationMillis": null,
"requestAvgFinishedDurationMillis": 2907,
"requestTotalDurationMillis": 2907,
"requestsTotal": 1,
"requestsWithStatusCode": {},
"errors": {},
"retryErrors": {}
"note": "This file is a placeholder. Crawler statistics are generated at runtime and should not be committed to version control.",
"example": {
"requestsFinished": 0,
"requestsFailed": 0,
"requestsRetries": 0,
"requestsFailedPerMinute": 0,
"requestsFinishedPerMinute": 0,
"requestMinDurationMillis": 0,
"requestMaxDurationMillis": 0,
"requestTotalFailedDurationMillis": 0,
"requestTotalFinishedDurationMillis": 0,
"crawlerStartedAt": null,
"crawlerFinishedAt": null,
"statsPersistedAt": null,
"crawlerRuntimeMillis": 0,
"crawlerLastStartTimestamp": 0,
"requestRetryHistogram": [],
"statsId": 0,
"requestAvgFailedDurationMillis": null,
"requestAvgFinishedDurationMillis": 0,
"requestTotalDurationMillis": 0,
"requestsTotal": 0,
"requestsWithStatusCode": {},
"errors": {},
"retryErrors": {}
}

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +101
type: 'ivfflat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
metric: 'cosine', // Distance metric for normalized embeddings
hnsw: { m: parseFloat(process.env.PG_M ?? '32'), efConstruction: parseInt(process.env.PG_EF ?? '96') }, // hnsw configuration
ivf: { lists: 1080 }, // IVFFlat configuration
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index configuration has been changed from HNSW to IVFFlat. However, the comment on line 99 still incorrectly states "flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)". IVFFlat is not the same as "flat" - it's an inverted file index. The comment should be updated to accurately describe IVFFlat's characteristics and why this change was made.

Copilot uses AI. Check for mistakes.
"@mastra/observability": "^1.0.0-beta.10",
"@mastra/otel-exporter": "^1.0.0-beta.11",
"@mastra/pg": "^1.0.0-beta.12",
"@mastra/qdrant": "^1.0.0-beta.3",
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @mastra/qdrant package is being added to dependencies, but based on the PR description and the fact that qdrant.ts is being kept (with only an import path fix), this seems intentional. However, the PR title says "refactor vector storage configuration and implement LanceDB" and the description mentions "Removed the Qdrant vector configuration file", which conflicts with keeping qdrant.ts and adding the @mastra/qdrant dependency. This inconsistency should be clarified.

Suggested change
"@mastra/qdrant": "^1.0.0-beta.3",

Copilot uses AI. Check for mistakes.
LANCE_STORAGE_CONFIG.tablePrefix
)

const vectorStore = await LanceVectorStore.create("/path/to/db");
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded path "/path/to/db" should be replaced with the configured LANCE_CONFIG.dbPath. This appears to be placeholder code that was not updated with the actual configuration value, which will cause the vector store to use the wrong database location.

Suggested change
const vectorStore = await LanceVectorStore.create("/path/to/db");
const vectorStore = await LanceVectorStore.create(LANCE_CONFIG.dbPath);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant