feat: Update configuration and improve PgVector support by ssdeanx · Pull Request #21 · ssdeanx/AgentStack

ssdeanx · 2025-12-08T15:03:59Z

Enhanced .env.example with detailed PgVector HNSW index configuration options.
Removed deprecated GitHub Copilot integration code from copilot.ts.
Updated pg-storage.ts to reflect changes in PgVector dimensions and improved logging for PostgreSQL Store initialization.
Changed API route path from "/api/researchAgent" to "/custom/researchAgent" in index.ts.

Summary by Sourcery

Align PostgreSQL vector storage and routing configuration with updated PgVector and embedding model settings while removing obsolete Copilot integration.

New Features:

Expose IVF list count and graph-related tuning parameters for PgVector via environment configuration.
Update the research agent chat API route to use a custom path under /custom/researchAgent.

Enhancements:

Adjust PgVector configuration, dimensions, and logging to match the gemini-embedding-001 3072-dimension model and improve observability of index initialization.

Documentation:

Expand .env.example with detailed PgVector HNSW and IVF-related configuration options for PostgreSQL vector indexing.

Chores:

Remove deprecated GitHub Copilot integration module and related configuration from the codebase.

- Enhanced .env.example with detailed PgVector HNSW index configuration options. - Removed deprecated GitHub Copilot integration code from copilot.ts. - Updated pg-storage.ts to reflect changes in PgVector dimensions and improved logging for PostgreSQL Store initialization. - Changed API route path from "/api/researchAgent" to "/custom/researchAgent" in index.ts.

continue · 2025-12-08T15:04:03Z

Keep this PR in a mergeable state →

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts

sourcery-ai · 2025-12-08T15:04:09Z

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Updates PgVector-related configuration and logging, makes IVF list counts configurable via environment variables, adjusts dimensions to 3072 for gemini-embedding-001, removes deprecated Copilot integration, and changes the research agent API route path.

Class diagram for updated PgVector and memory configuration

classDiagram
    class PgVector {
        +string connectionString
        +string tenantId
        +string tableName
        +string column
        +string idColumn
        +string metadataColumn
        +string dimensions
        +string indexName
        +init()
    }

    class PgVectorIndexConfig {
        +string type
        +string metric
        +PgVectorIvfConfig ivf
    }

    class PgVectorIvfConfig {
        +number lists
        %% now configurable via env LISTS, default 3072
    }

    class Memory {
        +MemoryIndexConfig longTermMemory
        +MemoryIndexConfig workingMemory
    }

    class MemoryIndexConfig {
        +string name
        +PgVectorIndexConfig indexConfig
    }

    class GraphRAGTool {
        +string vectorStoreName
        +string indexName
        +string model
        +GraphOptions graphOptions
    }

    class GraphOptions {
        +number dimension
        %% 3072 for gemini_embedding_001
        +number threshold
        +number randomWalkSteps
        +number restartProb
        +number topK
    }

    class EnvironmentConfig {
        +string SUPABASE
        +string MASTRA_PG_CONNECTION_STRING
        +string TENANT_ID
        +string LISTS
        %% used to compute ivf.lists
        +string GRAPH_THRESHOLD
        +string GRAPH_RANDOM_WALK_STEPS
        +string GRAPH_RESTART_PROB
        +string GRAPH_TOP_K
    }

    PgVector "1" --> "1" PgVectorIndexConfig : uses
    PgVectorIndexConfig "1" --> "1" PgVectorIvfConfig : has

    Memory "1" --> "2" MemoryIndexConfig : configures
    MemoryIndexConfig "1" --> "1" PgVectorIndexConfig : uses

    GraphRAGTool "1" --> "1" GraphOptions : configured_with

    EnvironmentConfig ..> PgVectorIvfConfig : configures_lists
    EnvironmentConfig ..> GraphOptions : configures_thresholds

File-Level Changes

Change	Details	Files
Improve PgVector store initialization visibility and align configuration with 3072‑dimensional gemini-embedding-001 embeddings.	Log all discovered indexes to the console in addition to structured logging when initializing the PostgresStore. Update PgVector configuration comment to reference 3072-dimensional gemini-embedding-001 embeddings instead of the previous 1568-dimensional configuration. Ensure graph RAG tool comments and options consistently reference 3072 dimensions for gemini-embedding-001.	`src/mastra/config/pg-storage.ts`
Make IVF list configuration for flat PgVector indexes environment-driven and better documented.	Replace hard-coded ivf.list value of 4000 with parseInt(process.env.LISTS ?? '3072') for longTermMemory index configuration. Replace hard-coded ivf.list value of 4000 with parseInt(process.env.LISTS ?? '3072') for memory index configuration. Add inline comments describing IVF configuration intent and tunable list count via LISTS env variable.	`src/mastra/config/pg-storage.ts`
Expose environment-level PgVector and HNSW/IVF tuning options in the example env file.	Document PgVector-related configuration variables for HNSW index options (e.g., dimension, M, efConstruction, efSearch, and IVF list counts). Align example values and comments with gemini-embedding-001 3072-dimensional embeddings and the new LISTS env variable.	`.env.example`
Remove deprecated GitHub Copilot integration.	Delete the copilot.ts configuration/integration file that is no longer supported or used.	`src/mastra/config/copilot.ts`
Change the research agent HTTP endpoint to a custom route prefix.	Update the Mastra chatRoute path for the researchAgent from /api/researchAgent to /custom/researchAgent, leaving behavior and options otherwise unchanged.	`src/mastra/index.ts`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

github-actions · 2025-12-08T15:04:11Z

🤖 Hi @ssdeanx, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

coderabbitai · 2025-12-08T15:04:12Z

Caution

Review failed

The pull request is closed.

Summary by CodeRabbit

Chores
- Removed GitHub Copilot integration from the system.
- Updated vector search configuration with adjusted embedding dimensions and retrieval parameters for improved performance.
- Introduced configurable LISTS parameter for better embedding management.
- Relocated research agent API endpoint path.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

Configuration updates to vector embeddings and RAG parameters (SEMANTIC_TOP_K, PG_MIN_SCORE, PG_EF, dimension 1568→3072), introduction of IVF list and HNSW parameters, complete removal of GitHub Copilot integration module, and a researchAgent API route path adjustment from /api/researchAgent to /custom/researchAgent.

Changes

Cohort / File(s)	Change Summary
Vector Database & Embedding Configuration `\.env\.example`, `src/mastra/config/pg-storage.ts`	Updated RAG defaults and vector parameters; shifted embedding dimension from 1568 to 3072; made IVF list count configurable via LISTS env var (default 3072); adjusted SEMANTIC_TOP_K (5→4), PG_MIN_SCORE (0.7→0.65), PG_EF (100→79); added GRAPH_THRESHOLD and GRAPH_RANDOM_WALK_STEPS; added debug logging for index retrieval.
GitHub Copilot Integration Removal `src/mastra/config/copilot.ts`	Removed all GitHub Copilot integration code including imports, model instantiations (gptAI, gptCodex, grokAI, raptorAI), and githubCopilot instance export.
API Route Updates `src/mastra/index.ts`	Changed researchAgent chat route path from `/api/researchAgent` to `/custom/researchAgent`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Verify no downstream code or clients depend on the removed copilot exports (gptAI, gptCodex, grokAI, raptorAI, githubCopilot)
Confirm dimension changes (1568→3072) and LISTS parameter are correctly applied across pg-storage.ts configuration
Check that LISTS environment variable configuration is properly documented and available in deployment pipelines
Validate that the researchAgent route path change is coordinated with any client or documentation updates

Poem

🐰 Dimensions now dance at 3072's call,
Configuration blooms, embeddings enthrall,
Copilot departs with a graceful goodbye,
Routes are redirected 'neath the autumn sky,
Vectors and graphs in harmony soar! ✨

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch develop

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a46516c and 960655d.

📒 Files selected for processing (4)

.env.example (1 hunks)
src/mastra/config/copilot.ts (0 hunks)
src/mastra/config/pg-storage.ts (4 hunks)
src/mastra/index.ts (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-12-08T15:04:20Z

Summary of Changes

Hello @ssdeanx, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on refining the application's configuration for vector embeddings and cleaning up deprecated code. It introduces more granular control over PgVector HNSW index settings, ensuring better performance and flexibility for high-dimensional embeddings. Additionally, it removes an outdated GitHub Copilot integration and adjusts an API route for improved consistency.

Highlights

Enhanced PgVector Configuration: Detailed HNSW index configuration options have been added to .env.example, including LISTS, PG_EF, PG_MIN_SCORE, GRAPH_THRESHOLD, GRAPH_RANDOM_WALK_STEPS, and SEMANTIC_TOP_K for improved high-dimensional embedding support.
Deprecated Copilot Integration Removed: The src/mastra/config/copilot.ts file, which contained GitHub Copilot integration code, has been entirely removed.
PgVector Dimension Updates: pg-storage.ts has been updated to reflect PgVector configuration for 3072-dimension embeddings (specifically for gemini-embedding-001) and now dynamically uses the LISTS environment variable for IVF configuration.
API Route Path Change: The /api/researchAgent route in index.ts has been updated to /custom/researchAgent for better organization.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Instead of using console.log(allIndexes) with an ESLint disable, consider routing this through the existing log utility or behind a debug flag so it can be controlled per environment and avoids bypassing lint rules.
The parseInt(process.env.LISTS ?? '3072') calls would be safer and clearer with an explicit radix (e.g., parseInt(..., 10)) or by using Number() plus a NaN fallback to ensure invalid LISTS values don't silently propagate.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Instead of using `console.log(allIndexes)` with an ESLint disable, consider routing this through the existing `log` utility or behind a debug flag so it can be controlled per environment and avoids bypassing lint rules.
- The `parseInt(process.env.LISTS ?? '3072')` calls would be safer and clearer with an explicit radix (e.g., `parseInt(..., 10)`) or by using `Number()` plus a NaN fallback to ensure invalid LISTS values don't silently propagate.

## Individual Comments

### Comment 1
<location> `src/mastra/config/pg-storage.ts:109-112` </location>
<code_context>
                 type: 'flat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
                 metric: 'cosine', // Distance metric for normalized embeddings
-                ivf: {lists: 4000},
+                ivf: {lists: parseInt(process.env.LISTS ?? '3072')}, // IVF configuration for flat index
                 }
         },
</code_context>

<issue_to_address>
**issue (bug_risk):** Handle invalid LISTS env values and specify a radix for parseInt to avoid subtle bugs.

If `process.env.LISTS` is set to a non-integer, `parseInt` will return `NaN`, which may propagate into the IVF config and cause runtime errors. Please (1) pass an explicit radix (e.g., `parseInt(..., 10)`) and (2) fall back to the default (3072) when parsing fails, e.g. via a small helper that validates and normalizes this value once.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-12-08T15:05:06Z

src/mastra/config/pg-storage.ts

+                ivf: {lists: parseInt(process.env.LISTS ?? '3072')}, // IVF configuration for flat index
                }
        },
        // Enhanced working memory with supported template


issue (bug_risk): Handle invalid LISTS env values and specify a radix for parseInt to avoid subtle bugs.

If process.env.LISTS is set to a non-integer, parseInt will return NaN, which may propagate into the IVF config and cause runtime errors. Please (1) pass an explicit radix (e.g., parseInt(..., 10)) and (2) fall back to the default (3072) when parsing fails, e.g. via a small helper that validates and normalizes this value once.

github-actions · 2025-12-08T15:05:08Z

🤖 I'm sorry @ssdeanx, but I was unable to process your request. Please see the logs for more details.

gemini-code-assist

Code Review

This pull request updates configuration for PgVector, cleans up deprecated code, and adjusts an API route. The new configurations in .env.example are a good addition, but the comments could be improved for clarity and formatting. The changes in pg-storage.ts correctly reflect the new embedding dimensions and make the setup more flexible. The removal of the Copilot integration and the API route change are straightforward. I've added a few comments to suggest improvements to the configuration file and to clean up a leftover debug log.

gemini-code-assist · 2025-12-08T15:05:56Z

.env.example

+LISTS='3072'                # IVF lists for PgVector (adjust based on dataset size, default: 3072 for gemini-embedding-001 1-4k limits, this so can use 3072 dims)
+#PG_HNSW_M=16                      # HNSW connections per layer (default: 16, higher = better recall, more memory)
+#PG_HNSW_EF_CONSTRUCTION=64        # HNSW build-time candidates (default: 64, higher = better quality, slower build)
+PG_EF='79'                        # HNSW query-time candidates (default: 100, higher = better recall, slower queries)
+PG_MIN_SCORE='0.65'                  # Minimum similarity score for vector search results  (default: 0.7, lower = more results, less relevant)
+# Graph-based Retrieval Configuration
+GRAPH_THRESHOLD='0.7'                # Graph similarity threshold for random walk connections (0.0 - 1.0 range high to low)
+GRAPH_RANDOM_WALK_STEPS='10'      # Number of random walk steps for graph-based retrieval  (default: 10, higher = more exploration, slower)
+SEMANTIC_TOP_K='4'                 # Number of top semantic results to retrieve  (default: 5 higher = more results, slower)


The comments in this new configuration block have some formatting and clarity issues that affect readability:

Inconsistent Alignment: The inline comments are not aligned, making the file harder to scan.

Unclear LISTS comment (line 53): The phrase this so can use 3072 dims is grammatically incorrect and unclear.

Ambiguous GRAPH_THRESHOLD comment (line 59): The range (0.0 - 1.0 range high to low) is ambiguous. It should be clarified whether a high value means high similarity.

Improving these will make the configuration easier to understand for other developers. For example, the comment for GRAPH_THRESHOLD could be changed to # Graph similarity threshold for random walk connections (0.0 to 1.0). Higher values mean stricter connections.

gemini-code-assist · 2025-12-08T15:05:56Z

src/mastra/config/pg-storage.ts

+// eslint-disable-next-line no-console
 console.log(allIndexes);


This console.log appears to be for debugging, especially since a log.info call with the same information follows. It's best practice to use the established logger for all output and remove temporary console.log statements before merging. The eslint-disable comment suggests this was a temporary addition that should now be cleaned up.

greptile-apps · 2025-12-08T15:06:19Z

Greptile Overview

Greptile Summary

This PR updates PgVector configuration to support 3072-dimension embeddings for gemini-embedding-001 and removes deprecated GitHub Copilot integration code. The changes improve the vector storage setup with better documentation and configurable parameters.

Key Changes:

Enhanced .env.example with detailed PgVector HNSW index configuration options and comments
Removed unused GitHub Copilot integration code from copilot.ts (verified no imports exist)
Updated pg-storage.ts to use 3072 dimensions (was 1568) for gemini-embedding-001 embeddings
Made IVF lists configurable via LISTS environment variable
Changed API route from /api/researchAgent to /custom/researchAgent (breaking change)

Considerations:

The route change is a breaking change - ensure all API clients are updated
The dimension update aligns with gemini-embedding-001 capabilities (supports 128-3072 dimensions)
There's an inconsistency in src/mastra/config/upstash.ts:124 where a comment says "1568 dimensions" but the code uses 3072 (outside scope of this PR)

Confidence Score: 4/5

This PR is mostly safe to merge with attention to the API route breaking change
Score reflects solid configuration improvements and cleanup, but reduced by 1 point due to the breaking API route change that requires coordination with API clients
Pay close attention to src/mastra/index.ts for the API route change - ensure dependent services are updated before deployment

Important Files Changed

File Analysis

Filename	Score	Overview
.env.example	5/5	Enhanced PgVector HNSW index configuration with detailed comments explaining dimension choices and parameter tuning
src/mastra/config/copilot.ts	5/5	Removed deprecated GitHub Copilot integration code (unused in codebase)
src/mastra/config/pg-storage.ts	4/5	Updated embeddings from 1568 to 3072 dimensions for gemini-embedding-001, made LISTS configurable via env var
src/mastra/index.ts	3/5	Changed API route from `/api/researchAgent` to `/custom/researchAgent` - breaking change for API clients

Sequence Diagram

sequenceDiagram
    participant Client as API Client
    participant Route as Mastra Route Handler
    participant Agent as Research Agent
    participant Memory as PgMemory
    participant Vector as PgVector (3072D)
    participant DB as PostgreSQL

    Note over Client,Route: Breaking Change: /api/researchAgent → /custom/researchAgent
    
    Client->>Route: POST /custom/researchAgent
    Route->>Agent: Execute research task
    Agent->>Memory: Retrieve semantic context
    Memory->>Vector: Query with topK=4
    Vector->>DB: HNSW search (EF=79, LISTS=3072)
    DB-->>Vector: Return vectors (3072 dims)
    Vector-->>Memory: Return results (min_score=0.65)
    Memory-->>Agent: Context retrieved
    Agent->>Memory: Store new embeddings
    Memory->>Vector: Upsert vectors
    Vector->>DB: Store with IVF index
    DB-->>Vector: Success
    Vector-->>Memory: Success
    Memory-->>Agent: Success
    Agent-->>Route: Response with context
    Route-->>Client: JSON response

greptile-apps

_{4 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-12-08T15:06:18Z

.env.example

+
+# PgVector HNSW Index Configuration (for high-dimensional embeddings)
+# HNSW index type supports dimensions > 2000 (IVFFlat is limited to 2000)
+LISTS='3072'                # IVF lists for PgVector (adjust based on dataset size, default: 3072 for gemini-embedding-001 1-4k limits, this so can use 3072 dims)


style: Comment says "1-4k limits, this so can use 3072 dims" - wording is unclear. Consider: "adjust based on dataset size, default: 3072 to match gemini-embedding-001 dimension range (1-4k supported)"

Prompt To Fix With AI

This is a comment left during a code review. Path: .env.example Line: 53:53 Comment: **style:** Comment says "1-4k limits, this so can use 3072 dims" - wording is unclear. Consider: "adjust based on dataset size, default: 3072 to match gemini-embedding-001 dimension range (1-4k supported)" How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-12-08T15:06:19Z

src/mastra/index.ts

      }),
      chatRoute({
-        path: "/api/researchAgent",
+        path: "/custom/researchAgent",


style: API route changed from /api/researchAgent to /custom/researchAgent - breaking change for clients.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/mastra/index.ts Line: 316:316 Comment: **style:** API route changed from `/api/researchAgent` to `/custom/researchAgent` - breaking change for clients. How can I resolve this? If you propose a fix, please make it concise.

Copilot

Pull request overview

This pull request updates the PgVector configuration and cleans up deprecated code. The main changes focus on updating embedding dimensions from 1568 to 3072 and adjusting IVF configuration parameters, removing unused GitHub Copilot integration code, and changing a custom API route path. However, there are several documentation inconsistencies that need attention, particularly around index type descriptions and embedding model specifications.

Key Changes

PgVector configuration updates: Changed embedding dimensions from 1568 to 3072, with corresponding updates to IVF list counts and configuration comments
Deprecated code removal: Removed GitHub Copilot integration from copilot.ts (no remaining references in codebase)
API route restructure: Changed research agent route from /api/researchAgent to /custom/researchAgent

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 11 comments.

File	Description
.env.example	Added detailed PgVector configuration documentation with IVF and HNSW parameters, though contains several documentation inaccuracies about index types
src/mastra/config/copilot.ts	Completely removed deprecated GitHub Copilot OpenAI compatible integration code
src/mastra/config/pg-storage.ts	Updated dimension references from 1568 to 3072, adjusted IVF lists configuration to use environment variable, updated related comments
src/mastra/index.ts	Changed research agent chat route path from /api/ to /custom/ prefix for better organization

Copilot · 2025-12-08T15:10:07Z

src/mastra/config/pg-storage.ts

    indexName: 'memory_messages_3072',
    model: google.textEmbedding('gemini-embedding-001'),
-    // Supported graph options (updated for 1568 dimensions)
+    // Supported graph options (updated for 3072 dimensions)


The comment was updated from "1568 dimensions" to "3072 dimensions" but this is inconsistent with the actual capabilities of gemini-embedding-001, which produces 768-dimensional embeddings by default. The dimension value should match the actual model being used. Please verify the model and dimension configuration are aligned.

Copilot · 2025-12-08T15:10:07Z

src/mastra/config/pg-storage.ts

+    // Supported graph options (updated for 3072 dimensions)
    graphOptions: {
-        dimension: 3072, // gemini-embedding-001 dimension (1568)
+        dimension: 3072, // gemini-embedding-001 dimension (3072)


The inline comment "(3072)" duplicates information from the preceding comment on line 178. Since line 178 already states "updated for 3072 dimensions", this inline repetition is redundant and should be removed for cleaner code.

Suggested change

dimension: 3072, // gemini-embedding-001 dimension (3072)

dimension: 3072, // gemini-embedding-001 dimension

Copilot · 2025-12-08T15:10:08Z

.env.example

+
+# PgVector HNSW Index Configuration (for high-dimensional embeddings)
+# HNSW index type supports dimensions > 2000 (IVFFlat is limited to 2000)
+LISTS='3072'                # IVF lists for PgVector (adjust based on dataset size, default: 3072 for gemini-embedding-001 1-4k limits, this so can use 3072 dims)


This comment is confusing and unclear. The phrase "default: 3072 for gemini-embedding-001 1-4k limits, this so can use 3072 dims" is grammatically incorrect and difficult to parse. Consider revising to something clearer like: "IVF lists for flat index (default: 3072, adjust based on dataset size. Higher values work better for larger datasets)"

Suggested change

LISTS='3072' # IVF lists for PgVector (adjust based on dataset size, default: 3072 for gemini-embedding-001 1-4k limits, this so can use 3072 dims)

LISTS='3072' # Number of IVF lists for PgVector (default: 3072). Adjust based on dataset size; higher values work better for larger datasets and high-dimensional embeddings.

Copilot · 2025-12-08T15:10:08Z

src/mastra/config/pg-storage.ts

@@ -105,7 +106,7 @@ export const pgMemory = new Memory({
            indexConfig: {
                type: 'flat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)


The comment states "flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)" but this is misleading in the context of the actual configuration. The flat index is being configured with IVF (see line 109), which is typically called IVFFlat in pgvector documentation. While technically accurate that flat-based indexes support higher dimensions than HNSW, the comment could be clearer. Consider: "flat index type with IVF (supports dimensions > 2000, unlike HNSW which is limited to 2000)"

Suggested change

type: 'flat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)

type: 'flat', // flat index type with IVF (supports dimensions > 2000, unlike HNSW which is limited to 2000)

Copilot · 2025-12-08T15:10:08Z

.env.example

+#PG_HNSW_M=16                      # HNSW connections per layer (default: 16, higher = better recall, more memory)
+#PG_HNSW_EF_CONSTRUCTION=64        # HNSW build-time candidates (default: 64, higher = better quality, slower build)


These commented-out HNSW configuration parameters (PG_HNSW_M and PG_HNSW_EF_CONSTRUCTION) are misleading in this context. The actual implementation in pg-storage.ts uses a 'flat' index type with IVF, not HNSW, because HNSW is limited to 2000 dimensions. Since the embedding dimension is 3072, these HNSW-specific parameters are not applicable and should either be removed or moved to a different section with a clear note that they're only for HNSW indexes with dimensions ≤ 2000.

Suggested change

#PG_HNSW_M=16 # HNSW connections per layer (default: 16, higher = better recall, more memory)

#PG_HNSW_EF_CONSTRUCTION=64 # HNSW build-time candidates (default: 64, higher = better quality, slower build)

Copilot · 2025-12-08T15:10:09Z

.env.example

+PG_EF='79'                        # HNSW query-time candidates (default: 100, higher = better recall, slower queries)
+PG_MIN_SCORE='0.65'                  # Minimum similarity score for vector search results  (default: 0.7, lower = more results, less relevant)
+# Graph-based Retrieval Configuration
+GRAPH_THRESHOLD='0.7'                # Graph similarity threshold for random walk connections (0.0 - 1.0 range high to low)


The phrase "0.0 - 1.0 range high to low" is grammatically unclear. It should be clarified to indicate what high and low values mean, such as "0.0 - 1.0 range (higher = stricter)" or "0.0 - 1.0 range (lower = more connections)".

Suggested change

GRAPH_THRESHOLD='0.7' # Graph similarity threshold for random walk connections (0.0 - 1.0 range high to low)

GRAPH_THRESHOLD='0.7' # Graph similarity threshold for random walk connections (0.0 - 1.0 range, higher = stricter, fewer connections)

Copilot · 2025-12-08T15:10:09Z

.env.example

+# Graph-based Retrieval Configuration
+GRAPH_THRESHOLD='0.7'                # Graph similarity threshold for random walk connections (0.0 - 1.0 range high to low)
+GRAPH_RANDOM_WALK_STEPS='10'      # Number of random walk steps for graph-based retrieval  (default: 10, higher = more exploration, slower)
+SEMANTIC_TOP_K='4'                 # Number of top semantic results to retrieve  (default: 5 higher = more results, slower)


Missing comma after "default: 5" for consistency with other comments. Should be "(default: 5, higher = more results, slower)" to match the style of other configuration comments in this file.

Suggested change

SEMANTIC_TOP_K='4' # Number of top semantic results to retrieve (default: 5 higher = more results, slower)

SEMANTIC_TOP_K='4' # Number of top semantic results to retrieve (default: 5, higher = more results, slower)

Copilot · 2025-12-08T15:10:09Z

.env.example

+# PgVector HNSW Index Configuration (for high-dimensional embeddings)
+# HNSW index type supports dimensions > 2000 (IVFFlat is limited to 2000)


This comment is misleading. The statement "HNSW index type supports dimensions > 2000 (IVFFlat is limited to 2000)" is incorrect - it's actually the opposite. According to pgvector documentation, HNSW is limited to 2000 dimensions, while IVFFlat (and flat/IVF) can support higher dimensions. The actual code in pg-storage.ts correctly uses 'flat' index type (line 107), not HNSW. Consider updating this comment to: "# PgVector Flat/IVF Index Configuration (for high-dimensional embeddings)" and clarifying that IVF/Flat is used because HNSW is limited to 2000 dimensions.

Suggested change

# PgVector HNSW Index Configuration (for high-dimensional embeddings)

# HNSW index type supports dimensions > 2000 (IVFFlat is limited to 2000)

# PgVector Flat/IVF Index Configuration (for high-dimensional embeddings)

# IVF/Flat index types support dimensions > 2000 (HNSW is limited to 2000). Use IVF/Flat for high-dimensional embeddings.

Copilot · 2025-12-08T15:10:10Z

src/mastra/config/pg-storage.ts

                type: 'flat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
                metric: 'cosine', // Distance metric for normalized embeddings
-                ivf: {lists: 4000},
+                ivf: {lists: parseInt(process.env.LISTS ?? '3072')}, // IVF configuration for flat index


The comment "IVF configuration for flat index" is somewhat redundant since the code itself clearly shows this is IVF configuration. Consider making it more descriptive about what the 'lists' parameter does, e.g., "Number of IVF clusters (lists) for partitioning the index" or simply remove the comment if it doesn't add value beyond what the code shows.

Suggested change

ivf: {lists: parseInt(process.env.LISTS ?? '3072')}, // IVF configuration for flat index

ivf: {lists: parseInt(process.env.LISTS ?? '3072')}, // Number of IVF clusters (lists) for partitioning the index

Copilot · 2025-12-08T15:10:10Z

src/mastra/config/pg-storage.ts

 });

-// PgVector configuration for 1568 dimension embeddings (gemini-embedding-002)
+// PgVector configuration for 3072 dimension embeddings (gemini-embedding-001)


The comment states "gemini-embedding-001" which produces 768-dimensional embeddings by default, not 3072. If you're actually using 3072-dimensional embeddings, you're likely using a different model like "text-embedding-004" or have configured gemini-embedding-001 with a different output dimension. Please verify and update the comment to reflect the actual model being used.

Suggested change

// PgVector configuration for 3072 dimension embeddings (gemini-embedding-001)

// PgVector configuration for 768 dimension embeddings (gemini-embedding-001 by default)

Copilot AI review requested due to automatic review settings December 8, 2025 15:04

ssdeanx merged commit 10df2df into main Dec 8, 2025
6 of 8 checks passed

Copilot started reviewing on behalf of ssdeanx December 8, 2025 15:04 View session

sourcery-ai bot approved these changes Dec 8, 2025

View reviewed changes

gemini-code-assist bot reviewed Dec 8, 2025

View reviewed changes

greptile-apps bot reviewed Dec 8, 2025

View reviewed changes

Copilot AI reviewed Dec 8, 2025

View reviewed changes

This was referenced Dec 8, 2025

feat: Update role hierarchy and subscription tier configurations #23

Merged

chore: update dependencies and refactor code #33

Merged

coderabbitai bot mentioned this pull request Dec 17, 2025

chore: update dependencies and refactor mastra configuration #60

Merged

This was referenced Jan 7, 2026

feat: enhance write-note tool with logging and input/output hooks #73

Merged

feat: refactor vector storage configuration and implement LanceDB #79

Merged

		// eslint-disable-next-line no-console
		console.log(allIndexes);

	dimension: 3072, // gemini-embedding-001 dimension (3072)
	dimension: 3072, // gemini-embedding-001 dimension

	LISTS='3072' # IVF lists for PgVector (adjust based on dataset size, default: 3072 for gemini-embedding-001 1-4k limits, this so can use 3072 dims)
	LISTS='3072' # Number of IVF lists for PgVector (default: 3072). Adjust based on dataset size; higher values work better for larger datasets and high-dimensional embeddings.

		@@ -105,7 +106,7 @@ export const pgMemory = new Memory({
		indexConfig: {
		type: 'flat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)

	type: 'flat', // flat index type (supports dimensions > 4000, unlike HNSW limit of 2000)
	type: 'flat', // flat index type with IVF (supports dimensions > 2000, unlike HNSW which is limited to 2000)

		#PG_HNSW_M=16 # HNSW connections per layer (default: 16, higher = better recall, more memory)
		#PG_HNSW_EF_CONSTRUCTION=64 # HNSW build-time candidates (default: 64, higher = better quality, slower build)

	GRAPH_THRESHOLD='0.7' # Graph similarity threshold for random walk connections (0.0 - 1.0 range high to low)
	GRAPH_THRESHOLD='0.7' # Graph similarity threshold for random walk connections (0.0 - 1.0 range, higher = stricter, fewer connections)

	SEMANTIC_TOP_K='4' # Number of top semantic results to retrieve (default: 5 higher = more results, slower)
	SEMANTIC_TOP_K='4' # Number of top semantic results to retrieve (default: 5, higher = more results, slower)

		# PgVector HNSW Index Configuration (for high-dimensional embeddings)
		# HNSW index type supports dimensions > 2000 (IVFFlat is limited to 2000)

	ivf: {lists: parseInt(process.env.LISTS ?? '3072')}, // IVF configuration for flat index
	ivf: {lists: parseInt(process.env.LISTS ?? '3072')}, // Number of IVF clusters (lists) for partitioning the index

	// PgVector configuration for 3072 dimension embeddings (gemini-embedding-001)
	// PgVector configuration for 768 dimension embeddings (gemini-embedding-001 by default)

Conversation

ssdeanx commented Dec 8, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

continue bot commented Dec 8, 2025

Uh oh!

sourcery-ai bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for updated PgVector and memory configuration

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

coderabbitai bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot commented Dec 8, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Dec 8, 2025

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

ssdeanx commented Dec 8, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Dec 8, 2025 •

edited

Loading

coderabbitai bot commented Dec 8, 2025 •

edited

Loading