Skip to content

feat: Generic git host support (local & remote) #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 15, 2025

Conversation

brendan-kellam
Copy link
Contributor

@brendan-kellam brendan-kellam commented May 14, 2025

This PR adds support for specifying generic git repositories. "generic" here means any repository that uses Git as its VCS.

Repos may be hosted remotely on a git host (helpful when that host is not officially supported):

{
    "$schema": "./schemas/v3/index.json",
    "connections": {
        "kernel-bluetooth": {
            "type": "git",
            "url": "https://git.kernel.org/pub/scm/bluetooth/sbc.git"
        }
    }
}

Or they can be local**:

{
    "$schema": "./schemas/v3/index.json",
    "connections": {
        "specific-local-repo": {
            "type": "git",
            "url": "file:///path/to/repo"
        },

        // Glob patterns are supported for local directories,
        // making it easy to sync multiple repos at once.
        "multiple-local-repos": {
            "type": "git",
            "url": "file:///repos/*"
        }
    }
}

**Some caveats with local repositories:

  • They must be git repositories.
  • They must have remote.origin.url set in their git config

Fixes #301
Fixes #250

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Added support for connecting to generic Git hosts via HTTP(S) or local file URLs, including glob pattern support for local repositories.
    • Introduced the ability to sync and index code from local Git repositories (read-only, self-hosted deployments only).
    • Search results now include enriched repository metadata and improved repository info handling.
  • Documentation

    • Added new documentation pages for connecting to generic Git hosts and local repositories.
    • Updated navigation and schema references to reflect new connection options and search features.
  • Bug Fixes

    • Improved handling of empty or missing repository URLs in search and repository schemas.
  • Refactor

    • Streamlined repository metadata handling and code host info utilities.
    • Updated internal APIs and UI components to use repository IDs and unified repository info structures.
    • Enhanced authentication and organization membership checks across API endpoints.
  • Chores

    • Updated third-party dependency for code search (zoekt).
    • Enhanced development environment reset commands in Makefile.

@brendan-kellam brendan-kellam requested a review from msukkari May 14, 2025 00:27
Copy link

coderabbitai bot commented May 14, 2025

"""

Walkthrough

This change introduces support for connecting to generic Git hosts and local Git repositories, both in backend logic and documentation. It adds new schema definitions, types, and validation for a new connection type "git", updates repository metadata handling, and refactors repository info usage in the web UI. Documentation and schema files are expanded to describe these new capabilities, and new utility functions and error handling are implemented for repository validation and metadata extraction.

Changes

File(s) / Group Change Summary
Documentation navigation & content
docs/docs.json, docs/docs/connections/generic-git-host.mdx, docs/docs/connections/local-repos.mdx, docs/docs/connections/overview.mdx, docs/docs/search/multi-branch-indexing.mdx, docs/docs/search/search-contexts.mdx, docs/docs/search/syntax-reference.mdx, docs/self-hosting/overview.mdx
Added documentation for generic Git hosts and local repos, updated navigation, added new "Search" group, and adjusted links and support tables.
JSON Schemas (docs & source)
docs/snippets/schemas/v3/connection.schema.mdx, docs/snippets/schemas/v3/genericGitHost.schema.mdx, docs/snippets/schemas/v3/index.schema.mdx, schemas/v3/connection.json, schemas/v3/genericGitHost.json, schemas/v3/index.json
Added/updated schemas for GenericGitHostConnectionConfig, including structure, properties, and references; updated context description URL.
TypeScript Schemas & Types
packages/schemas/src/v3/connection.schema.ts, packages/schemas/src/v3/connection.type.ts, packages/schemas/src/v3/genericGitHost.schema.ts, packages/schemas/src/v3/genericGitHost.type.ts, packages/schemas/src/v3/index.schema.ts, packages/schemas/src/v3/index.type.ts
Added/updated TypeScript interfaces and schema objects for generic Git host connection config; updated union types and documentation URLs.
Backend: Repo & Connection Management
packages/backend/src/connectionManager.ts, packages/backend/src/git.ts, packages/backend/src/repoCompileUtils.ts, packages/backend/src/repoManager.ts, packages/backend/src/utils.ts, packages/backend/src/zoekt.ts
Added support for generic Git host connection type, including repo validation, compiling, read-only handling, and utility functions for repo info.
Web: Actions, Search, and UI
packages/web/src/actions.ts, packages/web/src/app/[domain]/browse/[...path]/page.tsx, packages/web/src/app/[domain]/components/fileHeader.tsx, packages/web/src/app/[domain]/components/repositoryCarousel.tsx, packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx, packages/web/src/app/[domain]/search/components/filterPanel/index.tsx, packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx, packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx, packages/web/src/app/[domain]/search/page.tsx
Refactored repository metadata handling to use repository IDs and new RepositoryInfo type, updated UI components, and added action to fetch repo info by name.
Web: Search schemas and logic
packages/web/src/features/search/listReposApi.ts, packages/web/src/features/search/schemas.ts, packages/web/src/features/search/searchApi.ts, packages/web/src/features/search/types.ts, packages/web/src/features/search/zoektSchema.ts
Added/updated schemas for repository info, enriched search results with repository metadata, adjusted types and parsing logic, and improved error handling.
Web: Utilities
packages/web/src/lib/utils.ts
Consolidated code host info functions, added support for "generic-git-host" type, updated icon and auth support logic.
API & Server Routes
packages/web/src/app/api/(server)/repos/route.ts, packages/web/src/app/api/(server)/search/route.ts, packages/web/src/app/api/(server)/source/route.ts, packages/web/src/features/search/fileSourceApi.ts, packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts
Refactored API routes and feature functions to use direct calls with domain parameter, removed layered authentication wrappers, and updated imports accordingly.
Schemas & MCP
packages/mcp/src/schemas.ts, packages/mcp/src/index.ts, packages/mcp/CHANGELOG.md
Updated MCP schemas to replace url with optional webUrl, added repository ID fields, and updated MCP server tool outputs accordingly.
vendor/zoekt Updated submodule commit reference.
Makefile Extended clean target to flush Redis and reset Prisma migrations in development.
package.json Simplified workspace globs by removing explicit agents path.
packages/backend/package.json Added git-url-parse dependency.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant WebUI
    participant Backend
    participant RepoUtils
    participant GitHost (generic/local/remote)

    User->>WebUI: Configure connection (type: "git", url)
    WebUI->>Backend: Submit connection config
    Backend->>RepoUtils: compileGenericGitHostConfig(config)
    alt url starts with file://
        RepoUtils->>RepoUtils: Resolve glob pattern to repo paths
        loop For each path
            RepoUtils->>GitHost: isPathAValidGitRepoRoot(path)
            RepoUtils->>GitHost: getOriginUrl(path)
        end
        RepoUtils-->>Backend: Return repoData for valid local repos
    else url is http(s)://
        RepoUtils->>GitHost: isUrlAValidGitRepo(url)
        RepoUtils-->>Backend: Return repoData for remote repo
    end
    Backend-->>WebUI: Connection and repo info available
    WebUI-->>User: Display repositories (with metadata)
Loading

Poem

🐰
New tunnels dug for Git abound,
Local or remote, your code is found!
With schemas neat and docs anew,
Sourcebot now knows what to do.
From URLs or file paths deep,
All your repos it will keep—
Hopping forward, leaps and bounds,
In every burrow, code resounds!
🗂️✨
"""

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (18)
docs/docs/search/multi-branch-indexing.mdx (1)

93-93: Consider consistent capitalization: The table entry uses "Generic git host" with a lowercase "git". For consistency with other platform names (e.g., "GitHub", "GitLab"), consider capitalizing "Git": "Generic Git host".

packages/backend/src/repoManager.ts (1)

241-241: Improved variable declarations using block-scoped constants

Changed from mutable outer-scoped variables to block-scoped constants for timing measurements, which is a good practice that prevents accidental modifications and clarifies variable scope.

Also applies to: 268-269

packages/schemas/src/v3/genericGitHost.type.ts (1)

9-10: Small typo in URL description

There appears to be a small typo in the URL description - "read-only modified" should likely be "read-only mode".

- * The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only modified. Local directories support glob patterns.
+ * The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only mode. Local directories support glob patterns.
docs/snippets/schemas/v3/index.schema.mdx (1)

774-774: Same typo in documentation as in type definition

The same typo appears here - "read-only modified" should likely be "read-only mode".

- "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only modified. Local directories support glob patterns.",
+ "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only mode. Local directories support glob patterns.",
docs/docs/connections/local-repos.mdx (2)

11-11: Fix typos in description.

There are two issues in this line:

  • "meaing" should be "meaning"
  • Repetition of "already" (wordiness)
-Sourcebot can sync code from generic git repositories stored in a local directory. This can be helpful in scenarios where you already have a large number of repos already checked out. Local repositories are treated as **read-only**, meaing Sourcebot will **not** `git fetch` new revisions.
+Sourcebot can sync code from generic git repositories stored in a local directory. This can be helpful in scenarios where you have a large number of repos checked out. Local repositories are treated as **read-only**, meaning Sourcebot will **not** `git fetch` new revisions.
🧰 Tools
🪛 LanguageTool

[style] ~11-~11: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...ful in scenarios where you already have a large number of repos already checked out. Local reposi...

(LARGE_NUMBER_OF)


33-33: Fix typo in volume mounting explanation.

There's a spelling error in "seperate".

-        We need to mount a docker volume to the `repos` directory so Sourcebot can read it's contents. Sourcebot will **not** write to local repositories, so we can mount a seperate **read-only** volume:
+        We need to mount a docker volume to the `repos` directory so Sourcebot can read it's contents. Sourcebot will **not** write to local repositories, so we can mount a separate **read-only** volume:
packages/schemas/src/v3/index.schema.ts (1)

774-774: Typo in URL description.

There's a small typo in the URL description: "read-only modified" should be "read-only mode".

-      "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only modified. Local directories support glob patterns.",
+      "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only mode. Local directories support glob patterns.",
packages/schemas/src/v3/genericGitHost.schema.ts (1)

14-14: Same typo in URL description as in index.schema.ts.

The URL description contains the same typo: "read-only modified" should be "read-only mode".

-      "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only modified. Local directories support glob patterns.",
+      "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only mode. Local directories support glob patterns.",
packages/schemas/src/v3/connection.schema.ts (1)

643-643: Same typo in URL description again.

The URL description contains the same typo found in the other files: "read-only modified" should be "read-only mode".

-          "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only modified. Local directories support glob patterns.",
+          "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only mode. Local directories support glob patterns.",
docs/snippets/schemas/v3/genericGitHost.schema.mdx (2)

15-15: Fix grammatical error in description.

There's a grammatical error in the description for the url property. The phrase "are treated as read-only modified" is unclear and likely contains an error.

-  "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only modified. Local directories support glob patterns.",
+  "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or an absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only. Local directories support glob patterns.",

25-25: Clarify maximum revisions limit.

The description states "A maximum of 64 revisions can be indexed", but later for both branches and tags it says "A maximum of 64 branches/tags can be indexed". This could lead to confusion about whether the limit is 64 total revisions or 64 of each type.

-  "description": "The revisions (branches, tags) that should be included when indexing. The default branch (HEAD) is always indexed. A maximum of 64 revisions can be indexed, with any additional revisions being ignored.",
+  "description": "The revisions (branches, tags) that should be included when indexing. The default branch (HEAD) is always indexed. A maximum of 64 branches and 64 tags can be indexed, with any additional revisions being ignored.",
packages/web/src/app/[domain]/search/page.tsx (1)

208-211: Object-key type mismatch may bite later

Record<number, RepositoryInfo> is used, but JS object keys are always strings.
This is harmless at runtime but can confuse TypeScript down the line (e.g. Object.keys(repoInfo) returns string[], not number[]).
Consider switching to Record<string, RepositoryInfo> or Map<number, RepositoryInfo> for stronger type safety.

packages/web/src/features/search/searchApi.ts (2)

96-124: getFileWebUrl is fragile and mutates loop variables

  1. The regexp only matches templates that start with exactly {{URLJoinPath …}}; templates generated by other helpers (e.g. {{printf}}) will silently return undefined, losing links in the UI.
  2. Inside .map, the callback mutates the parameter part, which is an anti-pattern and can break when TypeScript’s no-param-reassign rule is enabled.

If sticking with string manipulation, clone the value first:

-            .map((part) => {
-                // remove wrapping quotes
-                if (part.startsWith("\"")) part = part.substring(1);
-                if (part.endsWith("\"")) part = part.substring(0, part.length - 1);
-                return part;
+            .map((raw) => {
+                const part = raw.replace(/^"|"$/g, "");
+                return part;
             })

Long-term, consider parsing the template once with a proper Go‐template parser on the server side and shipping the finished URL instead of the template string.


559-567: Database hit optimisation

isUrlAValidGitRepo already validated the URL; you can embed the resulting metadata (remoteUrl.origin, repoName) without a second new URL() call in callers, avoiding repeated parsing.

Minor, but easy to address while you are in this area.

packages/web/src/lib/utils.ts (1)

167-170: SVG for generic Git host lacks dark-mode treatment

Other icons (GitHub) specify dark:invert to stay visible in dark themes. Consider adding a matching class for gitLogo to maintain visual consistency.

packages/backend/src/repoCompileUtils.ts (1)

509-513: cloneUrl for local repos should be URL-encoded

Spaces or non-ASCII characters in repoPath yield invalid URLs.

-            cloneUrl: `file://${repoPath}`,
+            cloneUrl: `file://${encodeURI(repoPath)}`,

This also mirrors the encoding used elsewhere (encodeURI in getFileWebUrl).

packages/backend/src/git.ts (2)

79-96: Error message doesn't match function name.

The function implementation looks good, but the error messages use "isPathAGitRepoRoot" while the function is named "isPathAValidGitRepoRoot".

-            throw new Error(`isPathAGitRepoRoot failed: ${error.message}`);
+            throw new Error(`isPathAValidGitRepoRoot failed: ${error.message}`);
-            throw new Error(`isPathAGitRepoRoot failed: ${error}`);
+            throw new Error(`isPathAValidGitRepoRoot failed: ${error}`);

98-108: Silent failure handling differs from other functions.

This function returns false on error rather than throwing exceptions like other functions in the file. While this may be a deliberate choice for validation purposes, it creates an inconsistency in error handling approaches.

Consider documenting this behavior difference in the function's JSDoc comment to make the design choice explicit.

+/**
+ * Checks if the provided URL points to a valid Git repository.
+ * 
+ * @param url - The URL to check
+ * @returns A boolean indicating if the URL is a valid Git repository
+ * @note Unlike other functions in this file, this function returns false instead 
+ *       of throwing an error when the URL is invalid or unreachable.
+ */
 export const isUrlAValidGitRepo = async (url: string) => {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3e47d9 and 80ef9e7.

⛔ Files ignored due to path filters (1)
  • packages/web/public/git.svg is excluded by !**/*.svg
📒 Files selected for processing (43)
  • docs/docs.json (2 hunks)
  • docs/docs/connections/generic-git-host.mdx (1 hunks)
  • docs/docs/connections/local-repos.mdx (1 hunks)
  • docs/docs/connections/overview.mdx (1 hunks)
  • docs/docs/search/multi-branch-indexing.mdx (1 hunks)
  • docs/docs/search/search-contexts.mdx (1 hunks)
  • docs/docs/search/syntax-reference.mdx (1 hunks)
  • docs/self-hosting/overview.mdx (1 hunks)
  • docs/snippets/schemas/v3/connection.schema.mdx (1 hunks)
  • docs/snippets/schemas/v3/genericGitHost.schema.mdx (1 hunks)
  • docs/snippets/schemas/v3/index.schema.mdx (2 hunks)
  • packages/backend/src/connectionManager.ts (2 hunks)
  • packages/backend/src/git.ts (4 hunks)
  • packages/backend/src/repoCompileUtils.ts (2 hunks)
  • packages/backend/src/repoManager.ts (5 hunks)
  • packages/backend/src/utils.ts (1 hunks)
  • packages/backend/src/zoekt.ts (2 hunks)
  • packages/mcp/src/schemas.ts (2 hunks)
  • packages/schemas/src/v3/connection.schema.ts (1 hunks)
  • packages/schemas/src/v3/connection.type.ts (2 hunks)
  • packages/schemas/src/v3/genericGitHost.schema.ts (1 hunks)
  • packages/schemas/src/v3/genericGitHost.type.ts (1 hunks)
  • packages/schemas/src/v3/index.schema.ts (2 hunks)
  • packages/schemas/src/v3/index.type.ts (2 hunks)
  • packages/web/src/actions.ts (4 hunks)
  • packages/web/src/app/[domain]/browse/[...path]/page.tsx (4 hunks)
  • packages/web/src/app/[domain]/components/fileHeader.tsx (2 hunks)
  • packages/web/src/app/[domain]/components/repositoryCarousel.tsx (2 hunks)
  • packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx (1 hunks)
  • packages/web/src/app/[domain]/search/components/filterPanel/index.tsx (6 hunks)
  • packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx (5 hunks)
  • packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx (4 hunks)
  • packages/web/src/app/[domain]/search/page.tsx (9 hunks)
  • packages/web/src/features/search/listReposApi.ts (1 hunks)
  • packages/web/src/features/search/schemas.ts (3 hunks)
  • packages/web/src/features/search/searchApi.ts (3 hunks)
  • packages/web/src/features/search/types.ts (2 hunks)
  • packages/web/src/features/search/zoektSchema.ts (1 hunks)
  • packages/web/src/lib/utils.ts (9 hunks)
  • schemas/v3/connection.json (1 hunks)
  • schemas/v3/genericGitHost.json (1 hunks)
  • schemas/v3/index.json (1 hunks)
  • vendor/zoekt (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (17)
packages/web/src/features/search/types.ts (1)
packages/web/src/features/search/schemas.ts (2)
  • fileSourceResponseSchema (111-114)
  • repositoryInfoSchema (34-40)
packages/backend/src/zoekt.ts (1)
packages/backend/src/utils.ts (1)
  • getRepoPath (97-112)
packages/backend/src/connectionManager.ts (1)
packages/backend/src/repoCompileUtils.ts (1)
  • compileGenericGitHostConfig (441-457)
packages/web/src/app/[domain]/components/repositoryCarousel.tsx (1)
packages/web/src/lib/utils.ts (1)
  • getCodeHostInfoForRepo (53-141)
packages/schemas/src/v3/index.type.ts (2)
packages/schemas/src/v3/connection.type.ts (2)
  • GenericGitHostConnectionConfig (309-319)
  • GitRevisions (90-99)
packages/schemas/src/v3/genericGitHost.type.ts (2)
  • GenericGitHostConnectionConfig (3-13)
  • GitRevisions (17-26)
packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx (1)
packages/web/src/features/search/types.ts (1)
  • RepositoryInfo (29-29)
packages/web/src/app/[domain]/components/fileHeader.tsx (1)
packages/web/src/lib/utils.ts (1)
  • getCodeHostInfoForRepo (53-141)
packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx (1)
packages/web/src/features/search/types.ts (1)
  • RepositoryInfo (29-29)
packages/backend/src/utils.ts (1)
packages/backend/src/types.ts (1)
  • AppContext (4-16)
packages/backend/src/repoManager.ts (4)
packages/backend/src/utils.ts (2)
  • getRepoPath (97-112)
  • measure (10-18)
packages/backend/src/types.ts (1)
  • repoMetadataSchema (26-43)
packages/web/src/lib/utils.ts (1)
  • measure (256-272)
packages/backend/src/git.ts (1)
  • fetchRepository (31-53)
packages/web/src/app/[domain]/browse/[...path]/page.tsx (4)
packages/web/src/actions.ts (1)
  • getRepoInfoByName (447-506)
packages/web/src/lib/utils.ts (1)
  • isServiceError (190-196)
packages/web/src/lib/serviceError.ts (1)
  • ServiceErrorException (16-20)
packages/web/src/app/[domain]/components/fileHeader.tsx (1)
  • FileHeader (24-95)
packages/web/src/app/[domain]/search/page.tsx (1)
packages/web/src/features/search/types.ts (1)
  • RepositoryInfo (29-29)
packages/web/src/features/search/schemas.ts (1)
packages/mcp/src/schemas.ts (1)
  • repositorySchema (85-90)
packages/web/src/actions.ts (1)
packages/web/src/lib/serviceError.ts (1)
  • notFound (91-97)
packages/web/src/features/search/searchApi.ts (3)
packages/web/src/features/search/zoektSchema.ts (1)
  • zoektSearchResponseSchema (51-76)
packages/web/src/app/error.tsx (1)
  • Error (14-54)
packages/web/src/features/search/types.ts (2)
  • SearchResultRange (17-17)
  • SearchResponse (16-16)
packages/schemas/src/v3/genericGitHost.type.ts (2)
packages/schemas/src/v3/connection.type.ts (2)
  • GenericGitHostConnectionConfig (309-319)
  • GitRevisions (90-99)
packages/schemas/src/v3/index.type.ts (2)
  • GenericGitHostConnectionConfig (404-414)
  • GitRevisions (185-194)
packages/backend/src/repoCompileUtils.ts (3)
packages/schemas/src/v3/connection.type.ts (1)
  • GenericGitHostConnectionConfig (309-319)
packages/backend/src/git.ts (3)
  • isPathAValidGitRepoRoot (82-96)
  • getOriginUrl (110-123)
  • isUrlAValidGitRepo (98-108)
packages/backend/src/types.ts (1)
  • RepoMetadata (45-45)
🪛 LanguageTool
docs/docs/connections/local-repos.mdx

[style] ~11-~11: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...ful in scenarios where you already have a large number of repos already checked out. Local reposi...

(LARGE_NUMBER_OF)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build
🔇 Additional comments (75)
vendor/zoekt (1)

1-1:

✅ Verification successful

Confirm Zoekt submodule pointer update
The subproject commit for Zoekt has been bumped to 12a2f4ad075359a09bd8a91793acb002211217aa. Ensure this new version supports the updated indexing command (which now takes the repository ID) in packages/backend/src/zoekt.ts.

Please run:

to confirm the commit is present and review Zoekt’s changelog for any breaking changes.


🏁 Script executed:

#!/bin/bash
# Verify the submodule commit exists and update it locally
git submodule update --init vendor/zoekt
cd vendor/zoekt
git show 12a2f4ad075359a09bd8a91793acb002211217aa

Length of output: 2251


Zoekt submodule update verified
The submodule vendor/zoekt is now at commit 12a2f4ad075359a09bd8a91793acb002211217aa, and this commit indeed introduces the --repo_id flag in zoekt-git-index (mapping to RepositoryID) and updates the Go indexer to populate desc.ID only when repoid is set. No further action is required.

docs/docs/search/search-contexts.mdx (1)

108-108:

✅ Verification successful

Verify updated documentation link path: The link was updated to /docs/search/syntax-reference. Ensure this path correctly maps to the relocated syntax reference page in the documentation hierarchy.


🏁 Script executed:

#!/bin/bash
# Verify that `syntax-reference.mdx` exists under docs/docs/search
fd 'syntax-reference.mdx' docs/docs/search

Length of output: 80


Documentation link verified
The file docs/docs/search/syntax-reference.mdx exists and matches the updated link path /docs/search/syntax-reference. No further changes are needed.

packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx (1)

49-49:

❓ Verification inconclusive

Ensure webUrl existence and type alignment: The code now assigns link: fileMatch.webUrl. Confirm that SearchResultFile guarantees webUrl is always defined and that CodePreviewFile.link expects a string. If webUrl can be undefined, consider adding a fallback or refining the type definitions.


🏁 Script executed:

#!/bin/bash
# Check the definition of SearchResultFile to ensure `webUrl` is required.
rg -R "interface SearchResultFile" -A 5 -g "packages/web/src/features/search/types.ts"

Length of output: 491


🏁 Script executed:

#!/bin/bash
# Verify that `webUrl` is required on SearchResultFile
rg "interface SearchResultFile" -A 5 packages/web/src/features/search/types.ts

Length of output: 80


#!/bin/bash
# Locate the definition of SearchResultFile
echo "🔍 Searching for SearchResultFile references..."
rg -n "SearchResultFile" packages/web/src

# Search for its type or interface declaration
echo "🔍 Searching for type/interface declaration..."
rg -A5 -n "type SearchResultFile" -g "packages/web/src/**/*.{ts,tsx}"
rg -A5 -n "interface SearchResultFile" -g "packages/web/src/**/*.{ts,tsx}"

# Check all usages of webUrl
echo "🔍 Verifying webUrl occurrences..."
rg -n "webUrl" packages/web/src

Ensure webUrl is guaranteed and matches the expected link type
Please verify the following to avoid potential undefined runtime errors:

  • In your SearchResultFile definition (e.g. packages/web/src/features/search/types.ts), confirm that webUrl is a required string (not string | undefined).
  • In the CodePreviewFile type (where link is defined), ensure link is declared as string.

If webUrl can be optional, add a safe fallback—e.g.

link: fileMatch.webUrl ?? ''

or update your type definitions so that webUrl is always defined before assigning it to link.

docs/self-hosting/overview.mdx (1)

85-86: Great addition of new code host options!

The additions of "Other Git hosts" and "Local Git repos" cards appropriately extend the supported code host options for users, aligning with the PR's objective to introduce support for generic Git repositories.

packages/web/src/features/search/types.ts (3)

11-11: LGTM: Added import for repository info schema

The import of repositoryInfoSchema is correctly added to support the new RepositoryInfo type.


27-28: LGTM: Added type annotation for FileSourceResponse

Adding a type annotation for FileSourceResponse derived from the schema improves type safety.


29-29: LGTM: Added RepositoryInfo type for improved repository metadata representation

The new RepositoryInfo type derived from repositoryInfoSchema establishes a consistent way to represent repository metadata across different connection types, including the newly added generic Git hosts.

packages/backend/src/zoekt.ts (2)

18-18: LGTM: Updated repository path extraction to support read-only repositories

The change to destructure the path property from getRepoPath correctly adapts to the updated function signature that now returns both path and read-only status, which is necessary for supporting local Git repositories.


68-68: LGTM: Added repository ID to indexing command

Adding the -repo_id parameter to the zoekt-git-index command ensures proper repository identification during indexing, which is important for consistent repository handling across different connection types.

docs/docs/connections/overview.mdx (1)

33-34: LGTM: Updated documentation with new connection options

The additions of "Other Git hosts" and "Local Git repos" cards in the connections overview documentation properly document the new connection capabilities introduced in this PR.

packages/web/src/features/search/listReposApi.ts (1)

36-36: Good URL handling improvement

This change ensures empty URL strings are properly translated to undefined values rather than passing them through as empty strings. This is a good defensive programming practice that will help prevent issues in UI components that might not handle empty strings gracefully.

packages/web/src/features/search/zoektSchema.ts (1)

57-57: LGTM: Repository ID addition

Adding the optional RepositoryID field is a good enhancement that aligns with the rest of the codebase now using repository IDs as unique identifiers instead of repository names. This provides more reliable identification, especially when dealing with generic Git hosts.

packages/web/src/app/[domain]/components/repositoryCarousel.tsx (1)

9-9: Function update properly supports generic Git hosts

Good refactoring to use the updated utility function which now supports generic Git hosts. The explicit parameter passing is cleaner than passing the whole object and makes the contract between components clearer.

Also applies to: 60-65

packages/backend/src/connectionManager.ts (2)

176-178: LGTM: Generic Git host support added successfully

The implementation correctly adds a new case to handle the 'git' connection type and calls the appropriate compilation function. This change enables support for both remote and local Git repositories as mentioned in the PR objectives.


7-7: Import updated to include the new compilation function

The import statement has been properly updated to include the compileGenericGitHostConfig function from the repoCompileUtils module.

packages/mcp/src/schemas.ts (2)

68-68: LGTM: URL made optional to support generic Git repositories

Making the URL field optional in the search response schema is a necessary change to support generic Git repositories, particularly local ones that might not have a conventional URL.


87-87: LGTM: Repository URL made optional

Making the repository URL optional in the repository schema aligns with the implementation of generic Git host support, especially for local repositories.

docs/docs/connections/generic-git-host.mdx (1)

1-29: Documentation looks great overall

The documentation is well-structured with a clear introduction, getting started section, and schema reference. Once the URL support inconsistency is addressed, this will provide excellent guidance for users setting up generic Git host connections.

docs/docs.json (2)

41-42: LGTM: Navigation structure updated for new connection types

The navigation has been properly updated to include the new generic Git host and local repos documentation pages under the "Supported platforms" section.


48-55: LGTM: New search group added to navigation

A dedicated "Search" group has been added to the navigation with appropriate pages, improving the organization of the documentation.

packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx (3)

3-3: Updated import to use RepositoryInfo type

This change correctly updates the import statement to use the RepositoryInfo type from the search types module, which is part of the broader refactoring to support generic Git repositories.


15-15: Repository info prop now uses numeric IDs as keys

The type signature has been updated from using string keys (repository names) to number keys (repository IDs). This change is important for supporting generic git repositories where multiple repositories might have the same name but different origins.


29-29: Prop naming consistency with repository info

The prop has been renamed from repoMetadata to repoInfo for consistency, and is correctly passed down to child components. This change aligns with the broader refactoring of repository metadata handling across the application.

Also applies to: 154-154

packages/web/src/app/[domain]/components/fileHeader.tsx (3)

1-2: Updated import for repository code host information

The updated import uses the new getCodeHostInfoForRepo utility, which has explicit support for generic Git repositories.


14-19: Explicit repository fields instead of Repository type

The component now uses explicit fields for repository information rather than depending on the full Repository type. This provides better type safety and makes the component more maintainable as it clearly defines exactly which repository properties are needed.


31-36: Updated code host info retrieval

The function call has been updated to use the new getCodeHostInfoForRepo utility with explicit repository properties. This change supports the new generic Git host functionality while maintaining backward compatibility with existing code host types.

packages/schemas/src/v3/connection.type.ts (2)

8-9: Added GenericGitHostConnectionConfig to ConnectionConfig union

This change correctly extends the ConnectionConfig union type to include the new generic Git host connection type, which is essential for supporting the new feature.


309-319: Added GenericGitHostConnectionConfig interface

This new interface defines the structure for generic Git host connections, with:

  • A fixed type property set to "git"
  • A required URL that supports both remote repositories (http/https) and local repositories (file)
  • Support for glob patterns in local directory paths, enabling multi-repository syncing
  • Optional revisions property for specifying branches and tags

The documentation clearly explains that local directories are treated as read-only, which is an important safety feature.

packages/backend/src/repoManager.ts (4)

224-224: Destructure path and isReadOnly flag from getRepoPath

This change extracts both the repository path and its read-only status in one call, preparing for the conditional operations that follow.


230-233: Added read-only checks to prevent modifying local repositories

This change adds important safeguards to prevent modifying local repositories that are marked as read-only. It ensures that operations like deleting, fetching, cloning, and updating git config are only performed on repositories where modifications are permitted.

These checks are essential for supporting local Git repositories as described in the PR objectives, where repositories specified with file:// URLs are treated as read-only.

Also applies to: 235-245, 246-279


323-324: Simplified syncGitRepository function call

Removed passing and returning of timing statistics since they're now captured directly within the function using constants.


475-479: Added read-only check to garbage collection

This change ensures that local repositories marked as read-only are not deleted during garbage collection, which is consistent with the read-only treatment throughout the codebase.

packages/backend/src/utils.ts (1)

97-112: Properly handles local repositories with isReadOnly flag

The updated getRepoPath function now correctly handles both remote and local repositories by returning an object containing path and read-only status. For local git repositories (identified by the file: protocol and generic-git-host type), it returns the actual local filesystem path and marks it as read-only to prevent write operations.

This change is essential for supporting the PR objective of adding local Git repositories while ensuring they're treated appropriately as read-only sources.

packages/schemas/src/v3/index.type.ts (2)

7-13: Added GenericGitHostConnectionConfig to ConnectionConfig union

This addition extends the supported connection types to include generic Git hosts, which aligns with the PR objectives.


404-414: Well-defined interface for GenericGitHostConnectionConfig

The interface definition is clear and properly typed, with the required type: "git" property and url string that supports both remote URLs and local file paths.

The optional revisions property allows for specifying branches and tags to index, maintaining consistency with other connection types that support multi-branch indexing.

docs/snippets/schemas/v3/connection.schema.mdx (1)

631-659: Comprehensive schema for Generic Git Host connections

The schema properly defines:

  • Required properties (type and url) with appropriate constraints
  • A well-formed regex pattern to validate both remote HTTP/HTTPS URLs and local file URLs
  • Clear descriptive text explaining the local directory requirements and read-only nature
  • Helpful examples showing both remote and local repository URL formats including glob patterns
  • Reference to the existing revisions schema for branch/tag specification

This schema documentation is thorough and provides developers with clear guidance on how to configure generic Git host connections.

packages/web/src/app/[domain]/browse/[...path]/page.tsx (6)

13-14: Simplified import with direct repository info fetching

Replacing the previous approach (listing all repositories and filtering) with a direct call to get repository info by name is more efficient.


57-60: Improved repository info fetching with proper error handling

The code now uses the new getRepoInfoByName function and only throws a ServiceErrorException if the error is not a "not found" error, allowing the UI to handle the not-found case gracefully.


79-95: Conditional UI rendering based on repository info

The code properly handles the repository info object, only rendering the file header if valid repository information is available (not a service error).


84-89: Properly structured repository info object

The repository info object now contains all necessary fields (name, displayName, webUrl, codeHostType) for the FileHeader component, maintaining consistent repository metadata handling across the application.


97-111: Enhanced error handling with appropriate UI fallbacks

The code now properly handles repository not found errors by displaying a user-friendly message, while still using the repository info when available to render the code preview.


107-107: Uses consistent repository naming

Using repoInfo.name ensures consistency with the repository metadata structure throughout the application.

packages/schemas/src/v3/genericGitHost.type.ts (3)

1-2: Note about auto-generated file

The comment indicates this file is auto-generated. Ensure you have the proper tooling/scripts in place to regenerate this file when making changes to the schema.


3-13: Generic Git host connection configuration looks good

The interface correctly defines the structure for a generic Git connection with required type and URL fields, plus optional revisions. This aligns with the PR objective of supporting both remote and local Git repositories.


14-26: GitRevisions interface is well documented

The interface clearly defines the structure for specifying branches and tags to index, including support for glob patterns and the 64 item limit per type. The comments are thorough and help users understand the behavior.

packages/web/src/actions.ts (4)

31-31: Import of genericGitHostSchema is appropriate

Adding the schema import for the new connection type is necessary and matches the pattern used for other connection types.


447-506: Well-implemented repository lookup function with thorough documentation

The getRepoInfoByName function implements a repo lookup scoped to an organization domain with proper authentication checks. The detailed comments excellently explain the edge case regarding duplicate repo names with generic git connections and suggest a future improvement with a uniqueness constraint.


1245-1246: Connection type handler for 'git' connections

The connection type handler for 'git' connections is properly added to the switch statement, ensuring validation against the genericGitHostSchema.


1297-1302: Generic Git host connection metadata setup

The implementation correctly sets up the connection metadata for generic Git repositories:

  • numRepos: 1 - Each connection represents exactly one repository
  • hasToken: false - No authentication token is required for these connections

This matches the expected behavior for generic Git repositories described in the PR.

schemas/v3/genericGitHost.json (2)

1-30: Well-structured JSON schema for Generic Git Host

The schema properly defines:

  1. Required type field with a fixed value of "git"
  2. Required url field with proper validation pattern for both remote and local repositories
  3. Optional revisions field referencing the shared GitRevisions definition
  4. Helpful examples for different repository types (GitHub, local repo, glob pattern)

Good work on ensuring consistency with the TypeScript interfaces and including the proper pattern validation for URLs.


11-14: URL pattern validation is accurate and secure

The pattern validation correctly enforces proper formatting for both HTTP/HTTPS URLs and file URLs, which is important for security. The description and examples clearly illustrate the expected formats.

docs/snippets/schemas/v3/index.schema.mdx (2)

122-122: Updated documentation URL for search contexts

The documentation URL for search contexts has been appropriately updated.


762-790: Comprehensive schema documentation for Generic Git Host connections

The schema documentation for GenericGitHostConnectionConfig is thorough and consistent with the TypeScript interface and JSON schema. It correctly references the shared GitRevisions definition and includes clear examples for both remote and local repositories with glob patterns.

docs/docs/connections/local-repos.mdx (3)

14-17: The warning provides valuable prerequisite information.

The warning clearly communicates important prerequisites that users need to know before using local Git repositories.


43-54: Good documentation of configuration patterns.

The step explaining how to create a connection provides clear examples with both the file URL format and glob pattern support, which is essential for users to understand how to configure local repositories correctly.


60-78: Examples section is well structured.

The accordion-based examples showing both individual repo and glob pattern approaches are helpful and clear. The JSON examples illustrate proper configuration formatting.

packages/schemas/src/v3/index.schema.ts (2)

121-121: Updated documentation URL for search contexts.

The documentation URL for search contexts has been updated to point to the new path, which is a good housekeeping change.


761-790: Well-structured Generic Git Host schema definition.

The new GenericGitHostConnectionConfig schema is properly defined with all necessary elements:

  1. Required type and URL properties
  2. Clear description of URL formats (HTTP/HTTPS and file://)
  3. Appropriate regex pattern validation for URLs
  4. Revisions property referencing existing schema
  5. Good examples for both remote and local paths

This schema aligns well with the PR objective to support both remote generic Git hosts and local Git repositories.

packages/schemas/src/v3/genericGitHost.schema.ts (1)

1-70: Well-implemented schema definition for generic Git hosts.

This new schema file correctly defines the structure for generic Git host connections, with:

  1. Proper constant type definition
  2. Comprehensive URL property with validation pattern
  3. Examples covering both remote URLs and local paths with glob patterns
  4. Consistent revisions property structure matching other connection types
  5. Clear property requirements and additional properties restriction

The schema is well-structured and aligns perfectly with the PR objectives to support both remote generic Git hosts and local Git repositories.

packages/schemas/src/v3/connection.schema.ts (1)

630-659: Consistent schema definition added to connection types.

The GenericGitHostConnectionConfig schema has been correctly added to the connection schema's oneOf array, making the generic Git host connection type available in the connection configuration. The schema definition is consistent with the other files and includes:

  1. Appropriate title and properties
  2. Clear type constant and description
  3. URL property with validation pattern
  4. Reference to existing revisions schema
  5. Required properties and additional properties restriction

This seamlessly extends the connection configuration system to support generic Git hosts.

packages/web/src/features/search/schemas.ts (2)

34-40: LGTM: Well-structured repository info schema.

The new repositoryInfoSchema is well-structured with all the necessary fields for repository identification and display. The use of optional fields for displayName and webUrl appropriately handles cases where these might not be available.


94-97: LGTM: Making URL optional is consistent.

Making the url field optional in the repositorySchema is consistent with the other changes in this PR and correctly handles repositories that might not have an associated web URL.

packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx (3)

8-8: LGTM: Correctly importing new repository types.

The import statement correctly updates to use the new RepositoryInfo type from the search feature types.


108-113: LGTM: Explicit repository props ensure type safety.

Providing explicit repository properties to the FileHeader component improves type safety and makes the code more maintainable by clearly showing what fields are being used.


19-19:

✅ Verification successful

Ensure props type compatibility.

The change from Record<string, Repository> to Record<number, RepositoryInfo> is a significant type change. Ensure all parent components are updated to provide the correct prop type.


🏁 Script executed:

#!/bin/bash
# Check parent components that render FileMatchContainer to ensure they pass the correct repo info prop

echo "Checking components that render FileMatchContainer:"
rg --type-add 'tsx:*.tsx' --type tsx '<FileMatchContainer' -A 10 -B 2

Length of output: 1899


🏁 Script executed:

#!/bin/bash
# Locate fileMatchContainer.tsx and display its prop definitions
file=$(fd fileMatchContainer.tsx -t f)
echo "Found file: $file"
echo "---- Props definition snippet ----"
sed -n '1,200p' "$file" | sed -n 's/^/    /'

Length of output: 553


🏁 Script executed:

#!/bin/bash
# Check FileMatchContainer props definition and usage of repoInfo
echo "Props interface or signature in fileMatchContainer.tsx:"
rg --color never "FileMatchContainer" -n -A 5 -B 5 packages/web/src/app/\[domain\]/search/components/searchResultsPanel/fileMatchContainer.tsx

echo ""
echo "Usage of repoInfo in fileMatchContainer.tsx:"
rg --color never "repoInfo" -n packages/web/src/app/\[domain\]/search/components/searchResultsPanel/fileMatchContainer.tsx

echo ""
echo "Check for repoInfo in parent index.tsx:"
rg --color never "repoInfo" -n packages/web/src/app/\[domain\]/search/components/searchResultsPanel/index.tsx

Length of output: 1890


Props type compatibility confirmed

Both FileMatchContainer and its parent in
packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx
now use repoInfo: Record<number, RepositoryInfo>. No further updates are needed.

packages/web/src/app/[domain]/search/components/filterPanel/index.tsx (3)

4-5: LGTM: Import statements correctly updated.

The import statements have been correctly updated to use the new RepositoryInfo type and the consolidated getCodeHostInfoForRepo utility function.


178-189: LGTM: Well-structured aggregation function.

The aggregateMatches function has been nicely refactored to accept an object with key and match parameters, making the code more readable and maintainable.


41-50: ⚠️ Potential issue

Add null check for repository info.

Similar to the issue in fileMatchContainer.tsx, there's no robust null check when accessing repoInfo[match.repositoryId]. If the repository info is not found for a given ID, this could lead to unexpected behavior.

 ({ key, match }) => {
-    const repo: RepositoryInfo | undefined = repoInfo[match.repositoryId];
+    const repo: RepositoryInfo | undefined = repoInfo[match.repositoryId] || {
+        id: match.repositoryId,
+        name: match.repository,
+        codeHostType: 'unknown'
+    };

     const info = repo ? getCodeHostInfoForRepo({
         name: repo.name,
         codeHostType: repo.codeHostType,
         displayName: repo.displayName,
         webUrl: repo.webUrl,
     }) : undefined;

Likely an incorrect or invalid review comment.

packages/backend/src/repoCompileUtils.ts (1)

546-567: Single database write vs. fan-out

compileGenericGitHostConfig_url always returns an array of length 1.
Down-stream code may expect batched inserts; consider returning the single object directly or documenting that repoData may be 1-length to avoid unnecessary array handling.

packages/backend/src/git.ts (6)

1-1: Appropriate imports added for new Git functionality.

The additional imports from 'simple-git' (CheckRepoActions and GitConfigScope) support the new repository validation and metadata retrieval functions.


3-4: Good type abstraction for progress callbacks.

Creating the onProgressFn type alias improves code consistency and readability by standardizing the progress callback signature across multiple functions.


5-5: Function signature updated with the new type.

The cloneRepository function now uses the onProgressFn type for its progress callback parameter, consistent with the new type system.


31-31: Function signature updated with the new type.

The fetchRepository function now uses the onProgressFn type for its progress callback parameter, maintaining consistency with other functions.


61-61: Function signature updated with the new type.

The upsertGitConfig function now uses the onProgressFn type for its progress callback parameter, completing the consistent type usage across the file.


110-123: Good implementation of origin URL retrieval.

The getOriginUrl function correctly retrieves the remote origin URL from a local Git repository using the appropriate GitConfigScope. Error handling follows the consistent pattern used throughout the file.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (2)
packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts (1)

22-24: ⚠️ Potential issue

Potential double decoding of file content

getFileSource returns the file’s raw text (file.content) – it is not base-64 encoded.
Calling base64Decode here will corrupt the content (non-ASCII characters turn into � or the call throws).

Either remove the decode step or update getFileSource to return a base64 flag and conditionally decode.

-const fileContent = base64Decode(fileSourceResponse.source);
+const fileContent = fileSourceResponse.source;              // already plain text
packages/web/src/app/api/(server)/search/route.ts (1)

9-13: 🛠️ Refactor suggestion

Validate presence of X-Org-Domain header
Same header-presence issue as in the repos route; a missing header currently crashes the request.

-    const domain = request.headers.get("X-Org-Domain")!;
+    const domain = request.headers.get("X-Org-Domain");
+    if (!domain) {
+        return new Response(
+            JSON.stringify({ message: "Missing X-Org-Domain header" }),
+            { status: 400, headers: { "Content-Type": "application/json" } },
+        );
+    }
🧹 Nitpick comments (6)
Makefile (2)

35-35: Remove extraneous blank line with trailing whitespace.

The blank line at line 35 appears to contain trailing whitespace and isn’t necessary; please remove it to keep the Makefile tidy.


36-37: Guard destructive commands and ensure tool availability.

The additions to clean will flush all Redis data and reset Prisma migrations, matching soft-reset. A few points to consider:

  1. If redis-cli or the yarn dev:prisma:migrate:reset script isn’t installed or on $PATH, the clean target will error out. You may want to:
    • Prefix these lines with - to ignore errors when tools are missing:
      -redis-cli FLUSHALL
      -yarn dev:prisma:migrate:reset
      +- redis-cli FLUSHALL
      +- yarn dev:prisma:migrate:reset
    • Or add a prerequisite check:
      @command -v redis-cli >/dev/null || { echo "redis-cli not found, skipping FLUSHALL"; }
  2. Document prerequisites in your README/CONTRIBUTING guide so developers know they need Redis and Prisma CLI available.
CHANGELOG.md (1)

10-12: Improve wording and fix minor duplication

Consider tightening the sentence and following the usual capitalization of “Git” and “URL”.

- Added support for indexing generic git hosts given a remote clone url or local path.
+ Added support for indexing generic Git hosts using either a remote clone URL or a local path.
🧰 Tools
🪛 LanguageTool

[duplication] ~10-~10: Possible typo: you repeated a word.
Context: ...pec/v2.0.0.html). ## [Unreleased] ### Added - Added support for indexing generic git hosts ...

(ENGLISH_WORD_REPEAT_RULE)

packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts (1)

8-16: Replace console.log with scoped logger

Raw console.log leaks potentially sensitive repo information and is hard to filter in production.
Use your shared logging utility (e.g., logger.debug) and guard it behind a verbose flag.

Also applies to: 31-32

packages/web/src/features/search/listReposApi.ts (1)

16-20: Remove redundant variable initialisation
header is declared and then immediately overwritten. Declaring it once keeps the function shorter and clearer.

-            let header: Record<string, string> = {};
-            header = {
+            const header: Record<string, string> = {
                 "X-Tenant-ID": orgId.toString()
             };
packages/web/src/features/search/fileSourceApi.ts (1)

11-46: Consistent unauthenticated single-tenant access flag
You pass /* allowSingleTenantUnauthedAccess = */ true to withAuth, which is great. Consider surfacing this intent via a small wrapper helper (e.g. withOptionalAuth) to avoid repeating the magic boolean across files.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 80ef9e7 and ab05a2a.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (22)
  • CHANGELOG.md (1 hunks)
  • Makefile (1 hunks)
  • docs/docs/search/syntax-reference.mdx (1 hunks)
  • docs/snippets/schemas/v3/index.schema.mdx (2 hunks)
  • package.json (1 hunks)
  • packages/backend/package.json (1 hunks)
  • packages/backend/src/repoCompileUtils.ts (2 hunks)
  • packages/mcp/CHANGELOG.md (1 hunks)
  • packages/mcp/src/index.ts (2 hunks)
  • packages/mcp/src/schemas.ts (3 hunks)
  • packages/schemas/src/v3/index.schema.ts (2 hunks)
  • packages/schemas/src/v3/index.type.ts (2 hunks)
  • packages/web/src/app/[domain]/browse/[...path]/page.tsx (5 hunks)
  • packages/web/src/app/api/(server)/repos/route.ts (1 hunks)
  • packages/web/src/app/api/(server)/search/route.ts (1 hunks)
  • packages/web/src/app/api/(server)/source/route.ts (1 hunks)
  • packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts (2 hunks)
  • packages/web/src/features/search/fileSourceApi.ts (1 hunks)
  • packages/web/src/features/search/listReposApi.ts (1 hunks)
  • packages/web/src/features/search/schemas.ts (3 hunks)
  • packages/web/src/features/search/searchApi.ts (2 hunks)
  • schemas/v3/index.json (1 hunks)
✅ Files skipped from review due to trivial changes (4)
  • packages/backend/package.json
  • docs/docs/search/syntax-reference.mdx
  • packages/mcp/CHANGELOG.md
  • packages/mcp/src/index.ts
🚧 Files skipped from review as they are similar to previous changes (9)
  • schemas/v3/index.json
  • packages/schemas/src/v3/index.type.ts
  • packages/web/src/app/[domain]/browse/[...path]/page.tsx
  • packages/schemas/src/v3/index.schema.ts
  • docs/snippets/schemas/v3/index.schema.mdx
  • packages/web/src/features/search/schemas.ts
  • packages/mcp/src/schemas.ts
  • packages/web/src/features/search/searchApi.ts
  • packages/backend/src/repoCompileUtils.ts
🧰 Additional context used
🧬 Code Graph Analysis (4)
packages/web/src/app/api/(server)/repos/route.ts (1)
packages/web/src/features/search/listReposApi.ts (1)
  • listRepositories (7-46)
packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts (1)
packages/web/src/features/search/fileSourceApi.ts (1)
  • getFileSource (11-46)
packages/web/src/app/api/(server)/source/route.ts (1)
packages/web/src/features/search/fileSourceApi.ts (1)
  • getFileSource (11-46)
packages/web/src/features/search/fileSourceApi.ts (4)
packages/web/src/actions.ts (3)
  • sew (43-51)
  • withAuth (53-77)
  • withOrgMembership (79-125)
packages/web/src/app/api/(client)/client.ts (1)
  • search (20-35)
packages/web/src/lib/utils.ts (1)
  • isServiceError (190-196)
packages/web/src/lib/serviceError.ts (1)
  • fileNotFound (67-73)
🪛 LanguageTool
CHANGELOG.md

[duplication] ~10-~10: Possible typo: you repeated a word.
Context: ...pec/v2.0.0.html). ## [Unreleased] ### Added - Added support for indexing generic git hosts ...

(ENGLISH_WORD_REPEAT_RULE)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build
🔇 Additional comments (2)
package.json (1)

4-4:

✅ Verification successful

Consolidate Yarn workspace patterns
Switching to a single "packages/*" glob reduces maintenance overhead and automatically includes any new packages under packages/.

Run the following to confirm only the desired workspaces are matched:


🏁 Script executed:

yarn workspaces list --json

Length of output: 446


Approve consolidated workspace pattern

The single "packages/*" glob correctly includes all current workspaces (@sourcebot/backend, @sourcebot/crypto, @sourcebot/db, @sourcebot/error, @sourcebot/mcp, @sourcebot/schemas, @sourcebot/web) and no unintended paths—verified with yarn workspaces list --json.

packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts (1)

17-18:

❓ Verification inconclusive

Hard-coded "~" domain may bypass org-membership checks

getFileSource now performs withOrgMembership(session, domain …).
Passing the sentinel value "~" will work only for single-tenant deployments that intentionally allow unauthenticated access.
In multi-tenant mode the request will fail and the review-agent will be unable to read files.

If the PR payload already contains an organisation / domain identifier (e.g., pr_payload.orgDomain), forward that instead and fall back to "~" only when the field is absent.

-const response = await getFileSource(fileSourceRequest, "~");
+const domain = pr_payload.orgDomain ?? "~";          // TODO: add orgDomain to payload type
+const response = await getFileSource(fileSourceRequest, domain);

🏁 Script executed:

#!/bin/bash
set -e

# Locate the target file
file=$(fd 'fetchFileContent.ts' -t f | head -n1)
echo ">> Located file: $file"

# Show the beginning of the file for context
echo ">> Beginning of $file (lines 1–200):"
sed -n '1,200p' "$file"
echo

# Search for getFileSource usages in the repo
echo ">> getFileSource occurrences in repo:"
rg "getFileSource" -n || echo "No occurrences found"
echo

# Inspect withOrgMembership references
echo ">> withOrgMembership occurrences in repo:"
rg "withOrgMembership" -n || echo "No occurrences found"
echo

# Check for pr_payload usage in the file
echo ">> pr_payload in $file:"
grep -n "pr_payload" "$file" || echo "No pr_payload references in this file"

Length of output: 8528


#!/bin/bash
set -e

# Locate and display the type definition for sourcebot_pr_payload
types_file=$(fd 'review-agent/types.ts' -t f | head -n1)
echo ">> Located types file: $types_file"
echo ">> Showing sourcebot_pr_payload definition:"
sed -n '1,200p' "$types_file" | sed -n '/export type sourcebot_pr_payload/,/}/!d'

echo

# Locate and display getFileSource implementation
api_file=$(fd 'search/fileSourceApi.ts' -t f | head -n1)
echo ">> Located fileSourceApi: $api_file"
echo ">> Showing getFileSource signature and withOrgMembership usage:"
sed -n '1,200p' "$api_file" | sed -n '/export const getFileSource/,/^}/!d'

🏁 Script executed:

#!/bin/bash
set -e

# Locate the sourcebot_pr_payload type definition
types_file=$(fd 'features/agents/review-agent/types.ts' -t f | head -n1)
echo ">> Located types file: $types_file"

# Show the definition of sourcebot_pr_payload
rg -C3 "export type sourcebot_pr_payload" -n "$types_file"

Length of output: 231


#!/bin/bash
set -e

# 1. Find occurrences of sourcebot_pr_payload to locate its type definition
echo ">> Searching for 'sourcebot_pr_payload' occurrences:"
rg -n "sourcebot_pr_payload" || true

# 2. Display its type definition
types_file=$(rg -l "sourcebot_pr_payload" | head -n1)
echo ">> Located types file: $types_file"
echo ">> Showing definition of sourcebot_pr_payload:"
sed -n '1,200p' "$types_file" | sed -n -e '/export type sourcebot_pr_payload/,/}/!d'

# 3. Locate fileSourceApi.ts and show getFileSource signature and withOrgMembership usage
api_file=$(fd fileSourceApi.ts -t f | head -n1)
echo ">> Located fileSourceApi: $api_file"
echo ">> Showing getFileSource implementation:"
sed -n '1,200p' "$api_file" | sed -n -e '/export const getFileSource/,/^}/!d'

Enforce correct org domain in getFileSource calls

The hard-coded "~" bypasses withOrgMembership and will break in multi-tenant mode. Forward the actual org domain from the PR payload (falling back to "~" only if absent):

- const response = await getFileSource(fileSourceRequest, "~");
+ const domain = pr_payload.orgDomain ?? "~";  // TODO: add orgDomain to payload type
+ const response = await getFileSource(fileSourceRequest, domain);

• Add orgDomain: string to sourcebot_pr_payload in
packages/web/src/features/agents/review-agent/types.ts.
• Update all callers of fetchFileContent to populate the new field.
This ensures withOrgMembership enforces access controls in multi-tenant deployments.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
docs/docs/connections/local-repos.mdx (4)

11-11: Fix typo and improve wording
The sentence reads:

Local repositories are treated as read-only, meaing Sourcebot will not git fetch new revisions.

  • Correct meaingmeaning.
  • To reduce wordiness, consider replacing “a large number of repos” with “many repositories”.
    Apply this diff:
- Sourcebot can sync code from generic git repositories stored in a local directory. This can be helpful in scenarios where you already have a large number of repos already checked out. Local repositories are treated as **read-only**, meaing Sourcebot will **not** `git fetch` new revisions.
+ Sourcebot can sync code from generic git repositories stored in a local directory. This can be helpful if you already have many repositories checked out. Local repositories are treated as **read-only**, meaning Sourcebot will **not** `git fetch` new revisions.
🧰 Tools
🪛 LanguageTool

[style] ~11-~11: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...ful in scenarios where you already have a large number of repos already checked out. Local reposi...

(LARGE_NUMBER_OF)


33-34: Correct grammar and spelling
In the Step “Mount a volume” paragraph:

… so Sourcebot can read it's contents. Sourcebot will not write to local repositories, so we can mount a seperate read-only volume.

  • Change it'sits (possessive).
  • Change seperateseparate.
    Apply:
- We need to mount a docker volume to the `repos` directory so Sourcebot can read it's contents. Sourcebot will **not** write to local repositories, so we can mount a seperate **read-only** volume:
+ We need to mount a docker volume to the `repos` directory so Sourcebot can read its contents. Sourcebot will **not** write to local repositories, so we can mount a separate **read-only** volume:

35-35: Unify code fence language specifier
The Bash snippet starts with “ bash”, but earlier you used “sh”. For consistency, choose one style.
Apply:

- ``` bash
+ ```bash

71-75: Remove unsupported comment in JSON snippet
JSON does not support comments. The line

// Attempt to sync directories contained in `repos/` (non-recursive)

may confuse users. Consider deleting the comment or moving it above the code block as explanatory text. For example:

- ```json
- // Attempt to sync directories contained in `repos/` (non-recursive)
- {
-     "type": "git",
-     "url": "file:///repos/*"
- }
- ```
+ <!-- Sync directories contained in `repos/` (non-recursive) -->
+ ```json
+ {
+     "type": "git",
+     "url": "file:///repos/*"
+ }
+ ```  
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 199a3e0 and 35b5284.

⛔ Files ignored due to path filters (2)
  • packages/web/public/git.svg is excluded by !**/*.svg
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (55)
  • CHANGELOG.md (1 hunks)
  • Makefile (1 hunks)
  • docs/docs.json (2 hunks)
  • docs/docs/connections/generic-git-host.mdx (1 hunks)
  • docs/docs/connections/local-repos.mdx (1 hunks)
  • docs/docs/connections/overview.mdx (1 hunks)
  • docs/docs/search/multi-branch-indexing.mdx (1 hunks)
  • docs/docs/search/search-contexts.mdx (1 hunks)
  • docs/docs/search/syntax-reference.mdx (1 hunks)
  • docs/self-hosting/overview.mdx (1 hunks)
  • docs/snippets/schemas/v3/connection.schema.mdx (1 hunks)
  • docs/snippets/schemas/v3/genericGitHost.schema.mdx (1 hunks)
  • docs/snippets/schemas/v3/index.schema.mdx (2 hunks)
  • package.json (1 hunks)
  • packages/backend/package.json (1 hunks)
  • packages/backend/src/connectionManager.ts (2 hunks)
  • packages/backend/src/git.ts (4 hunks)
  • packages/backend/src/repoCompileUtils.ts (2 hunks)
  • packages/backend/src/repoManager.ts (5 hunks)
  • packages/backend/src/utils.ts (1 hunks)
  • packages/backend/src/zoekt.ts (2 hunks)
  • packages/mcp/CHANGELOG.md (1 hunks)
  • packages/mcp/src/index.ts (2 hunks)
  • packages/mcp/src/schemas.ts (3 hunks)
  • packages/schemas/src/v3/connection.schema.ts (1 hunks)
  • packages/schemas/src/v3/connection.type.ts (2 hunks)
  • packages/schemas/src/v3/genericGitHost.schema.ts (1 hunks)
  • packages/schemas/src/v3/genericGitHost.type.ts (1 hunks)
  • packages/schemas/src/v3/index.schema.ts (2 hunks)
  • packages/schemas/src/v3/index.type.ts (2 hunks)
  • packages/web/src/actions.ts (4 hunks)
  • packages/web/src/app/[domain]/browse/[...path]/page.tsx (5 hunks)
  • packages/web/src/app/[domain]/components/fileHeader.tsx (2 hunks)
  • packages/web/src/app/[domain]/components/repositoryCarousel.tsx (2 hunks)
  • packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx (1 hunks)
  • packages/web/src/app/[domain]/search/components/filterPanel/index.tsx (6 hunks)
  • packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx (5 hunks)
  • packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx (4 hunks)
  • packages/web/src/app/[domain]/search/page.tsx (9 hunks)
  • packages/web/src/app/api/(server)/repos/route.ts (1 hunks)
  • packages/web/src/app/api/(server)/search/route.ts (2 hunks)
  • packages/web/src/app/api/(server)/source/route.ts (2 hunks)
  • packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts (2 hunks)
  • packages/web/src/features/search/fileSourceApi.ts (1 hunks)
  • packages/web/src/features/search/listReposApi.ts (1 hunks)
  • packages/web/src/features/search/schemas.ts (3 hunks)
  • packages/web/src/features/search/searchApi.ts (2 hunks)
  • packages/web/src/features/search/types.ts (2 hunks)
  • packages/web/src/features/search/zoektSchema.ts (1 hunks)
  • packages/web/src/lib/errorCodes.ts (1 hunks)
  • packages/web/src/lib/utils.ts (9 hunks)
  • schemas/v3/connection.json (1 hunks)
  • schemas/v3/genericGitHost.json (1 hunks)
  • schemas/v3/index.json (1 hunks)
  • vendor/zoekt (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • docs/docs/search/syntax-reference.mdx
  • schemas/v3/index.json
  • packages/schemas/src/v3/genericGitHost.type.ts
🚧 Files skipped from review as they are similar to previous changes (50)
  • packages/mcp/CHANGELOG.md
  • vendor/zoekt
  • docs/self-hosting/overview.mdx
  • packages/web/src/features/search/types.ts
  • docs/docs/search/multi-branch-indexing.mdx
  • packages/web/src/app/api/(server)/search/route.ts
  • packages/web/src/lib/errorCodes.ts
  • package.json
  • docs/docs/search/search-contexts.mdx
  • packages/backend/src/zoekt.ts
  • packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx
  • docs/snippets/schemas/v3/genericGitHost.schema.mdx
  • Makefile
  • packages/backend/package.json
  • docs/docs/connections/overview.mdx
  • docs/docs/connections/generic-git-host.mdx
  • schemas/v3/connection.json
  • packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts
  • docs/docs.json
  • packages/mcp/src/index.ts
  • packages/web/src/app/[domain]/components/repositoryCarousel.tsx
  • packages/web/src/features/search/zoektSchema.ts
  • packages/backend/src/connectionManager.ts
  • packages/schemas/src/v3/index.type.ts
  • packages/backend/src/repoManager.ts
  • packages/web/src/lib/utils.ts
  • packages/web/src/features/search/fileSourceApi.ts
  • packages/backend/src/utils.ts
  • packages/schemas/src/v3/connection.type.ts
  • docs/snippets/schemas/v3/index.schema.mdx
  • packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx
  • packages/schemas/src/v3/genericGitHost.schema.ts
  • packages/web/src/app/api/(server)/source/route.ts
  • packages/web/src/app/api/(server)/repos/route.ts
  • packages/schemas/src/v3/connection.schema.ts
  • schemas/v3/genericGitHost.json
  • packages/web/src/app/[domain]/components/fileHeader.tsx
  • packages/web/src/app/[domain]/browse/[...path]/page.tsx
  • packages/web/src/app/[domain]/search/components/filterPanel/index.tsx
  • packages/web/src/features/search/listReposApi.ts
  • packages/web/src/features/search/schemas.ts
  • packages/backend/src/git.ts
  • packages/web/src/app/[domain]/search/page.tsx
  • packages/backend/src/repoCompileUtils.ts
  • docs/snippets/schemas/v3/connection.schema.mdx
  • packages/schemas/src/v3/index.schema.ts
  • packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx
  • packages/web/src/actions.ts
  • packages/web/src/features/search/searchApi.ts
  • packages/mcp/src/schemas.ts
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md

[duplication] ~10-~10: Possible typo: you repeated a word.
Context: ...pec/v2.0.0.html). ## [Unreleased] ### Added - Added support for indexing generic git hosts ...

(ENGLISH_WORD_REPEAT_RULE)

docs/docs/connections/local-repos.mdx

[style] ~11-~11: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...ful in scenarios where you already have a large number of repos already checked out. Local reposi...

(LARGE_NUMBER_OF)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build
🔇 Additional comments (1)
CHANGELOG.md (1)

10-12: Looks good
The “Unreleased” section correctly documents the new support for indexing generic git hosts, referencing PR #307.

🧰 Tools
🪛 LanguageTool

[duplication] ~10-~10: Possible typo: you repeated a word.
Context: ...pec/v2.0.0.html). ## [Unreleased] ### Added - Added support for indexing generic git hosts ...

(ENGLISH_WORD_REPEAT_RULE)

@brendan-kellam
Copy link
Contributor Author

/review

@@ -5,9 +5,8 @@ import gitlabLogo from "@/public/gitlab.svg";
import giteaLogo from "@/public/gitea.svg";
import gerritLogo from "@/public/gerrit.svg";
import bitbucketLogo from "@/public/bitbucket.svg";
import gitLogo from "@/public/git.svg";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imported 'gitLogo' from '@/public/git.svg' is unused in the current code. Remove this import to keep the code clean and avoid unused imports warnings.

Comment on lines +35 to +42
export type CodeHostType =
"github" |
"gitlab" |
"gitea" |
"gerrit" |
"bitbucket-cloud" |
"bitbucket-server" |
"generic-git-host";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new CodeHostType includes 'generic-git-host' but no handling logic or icon is provided for this new type. Please implement a corresponding case in getCodeHostInfoForRepo and getCodeHostIcon to handle this type properly.


type CodeHostInfo = {
type: CodeHostType;
displayName: string;
codeHostName: string;
repoLink: string;
repoLink?: string;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing 'repoLink' from required to optional in CodeHostInfo may lead to nullability issues if callers expect it to always be defined. Confirm that all code consuming this property can handle undefined values safely, or provide a default/fallback value.

Comment on lines +53 to +58
export const getCodeHostInfoForRepo = (repo: {
codeHostType: string,
name: string,
displayName?: string,
webUrl?: string,
}): CodeHostInfo | undefined => {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input type for getCodeHostInfoForRepo uses 'codeHostType' as string instead of the stricter 'CodeHostType'. This weakens type safety and could allow invalid types. It is advisable to use 'CodeHostType' to enforce valid host types.

Comment on lines +61 to 71
switch (codeHostType) {
case 'github': {
const { src, className } = getCodeHostIcon('github')!;
return {
type: "github",
displayName: displayName,
displayName: displayName ?? name,
codeHostName: "GitHub",
repoLink: cloneUrl,
repoLink: webUrl,
icon: src,
iconClassName: className,
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal switch statement in getCodeHostInfoForRepo only shows handling for 'github'. Other previously supported types (gitlab, gitea, gerrit, bitbucket) and the new 'generic-git-host' are missing. Please restore or implement handlers for these types to maintain feature coverage.

Comment on lines +77 to +79
displayName: displayName ?? name,
codeHostName: "GitLab",
repoLink: cloneUrl,
repoLink: webUrl,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change sets displayName to displayName ?? name on line 77, but variable 'name' is not defined or passed into _getCodeHostInfoInternal. This will cause a ReferenceError at runtime. Ensure 'name' is defined or passed as a parameter.

codeHostName: "GitLab",
repoLink: cloneUrl,
repoLink: webUrl,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed repoLink from cloneUrl to webUrl on line 79, but _getCodeHostInfoInternal only receives three arguments (type, displayName, cloneUrl). The variable 'webUrl' is not available here. You should pass webUrl as a parameter instead of cloneUrl if you want to use it as repoLink.

Comment on lines +88 to +90
displayName: displayName ?? name,
codeHostName: "Gitea",
repoLink: cloneUrl,
repoLink: webUrl,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from "displayName" to "displayName ?? name" introduces a variable "name" that is not defined or passed into the function scope in _getCodeHostInfoInternal or at the call site for gitea. This will cause a ReferenceError at runtime. Ensure "name" is defined or passed as an argument before using it as a fallback for displayName.

Comment on lines +90 to 91
repoLink: webUrl,
icon: src,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The property's value for "repoLink" was changed from "cloneUrl" to "webUrl". This change is inconsistent with all other code hosts and _getCodeHostInfoInternal parameters that use cloneUrl. Confirm that "webUrl" is the intended URL string representing the repository HTML web link and is defined at the call site. Otherwise, this mismatch could lead to incorrect or broken repo links for Gitea repositories.

Comment on lines 99 to +102
type: "gerrit",
displayName: displayName,
displayName: displayName ?? name,
codeHostName: "Gerrit",
repoLink: cloneUrl,
repoLink: webUrl,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the updated code for the Gerrit case, the "displayName" field uses a fallback with displayName ?? name, but the name identifier is not defined or passed to the function. This will cause a runtime reference error. You should ensure name is passed as a parameter or otherwise defined before using it as a fallback.

Additionally, the repoLink value was changed from cloneUrl to webUrl. It must be assured that webUrl is the correct URL to use for Gerrit repositories and that the variable in scope matches this intent. Review how webUrl is supplied and replace cloneUrl only if it aligns with semantic expectations.

Comment on lines +111 to +113
displayName: displayName ?? name,
codeHostName: "Bitbucket",
repoLink: cloneUrl,
repoLink: webUrl,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 'bitbucket-server' case, the 'repoLink' field is updated from 'cloneUrl' to 'webUrl', but the parameter is still named 'cloneUrl' in _getCodeHostInfoInternal. This renaming can cause confusion about what URL is being passed. Consider renaming the parameter to a more generic name like 'url' or 'repoUrl' to reflect that it can hold either clone URLs or web URLs. Also, ensure consistency in the use of 'displayName' with fallback to 'name' in other similar cases.

Comment on lines +122 to +127
displayName: displayName ?? name,
codeHostName: "Bitbucket",
repoLink: cloneUrl,
repoLink: webUrl,
icon: src,
iconClassName: className,
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 'bitbucket-cloud' case, 'repoLink' is changed from 'cloneUrl' to 'webUrl' in the new code, but the parameter name remains 'cloneUrl'. For clarity and to avoid confusion, consider renaming the parameter to 'url' or splitting parameters to separately accept 'cloneUrl' and 'webUrl'. This change can reduce ambiguity about which URL is used for 'repoLink'.

Comment on lines +129 to 138
case "generic-git-host": {
const { src, className } = getCodeHostIcon('generic-git-host')!;
return {
type: "generic-git-host",
displayName: displayName ?? name,
codeHostName: "Generic Git Host",
repoLink: webUrl,
icon: src,
iconClassName: className,
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new case 'generic-git-host' returns a CodeHostInfo object, but the function '_getCodeHostInfoInternal' is typed to only return 'CodeHostInfo | undefined' where 'CodeHostInfo' expects 'type' to be a 'CodeHostType'. Since 'generic-git-host' is not included in the 'CodeHostType' union type, this can result in a type mismatch. You should update the 'CodeHostType' type to include 'generic-git-host' or adjust the return type accordingly to avoid TypeScript errors.

Comment on lines 164 to 171
return {
src: bitbucketLogo,
}
case "generic-git-host":
return {
src: gitLogo,
}
default:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new 'generic-git-host' case returns an object with 'src: gitLogo', but 'gitLogo' is not imported in this file. Import the appropriate logo image (e.g., gitLogo) at the top of the file or ensure the variable is defined to avoid a ReferenceError.

Comment on lines 181 to 187
case "bitbucket-cloud":
case "bitbucket-server":
return true;
case "generic-git-host":
case "gerrit":
return false;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new 'generic-git-host' type is added in the switch statement but is not handled in the subsequent code. This will cause _getCodeHostInfoInternal to return undefined for this type which may be unintended. Please add appropriate handling for the 'generic-git-host' case or explicitly handle unknown types with a default case.

Comment on lines +19 to +21
},
{
"$ref": "./genericGitHost.json"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comma after the closing brace at line 19. To correctly add a new object in the array, add a comma at the end of line 19 before starting the new object on line 20.

@@ -78,7 +78,7 @@
},
"contexts": {
"type": "object",
"description": "[Sourcebot EE] Defines a collection of search contexts. This is only available in single-tenancy mode. See: https://docs.sourcebot.dev/self-hosting/more/search-contexts",
"description": "[Sourcebot EE] Defines a collection of search contexts. This is only available in single-tenancy mode. See: https://docs.sourcebot.dev/docs/search/search-contexts",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The '+' symbol at the start of line 81 appears to be a diff marker and should be removed for valid JSON syntax.

@@ -5460,6 +5460,7 @@ __metadata:
cross-fetch: "npm:^4.0.0"
dotenv: "npm:^16.4.5"
express: "npm:^4.21.2"
git-url-parse: "npm:^16.1.0"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure that adding 'git-url-parse' at version '^16.1.0' does not introduce conflicts or compatibility issues with existing dependencies. Verify that the version constraint aligns with the rest of the project's dependencies and that it is properly listed in the project's package.json.

Comment on lines +6088 to +6094
"@types/parse-path@npm:^7.0.0":
version: 7.0.3
resolution: "@types/parse-path@npm:7.0.3"
checksum: 10c0/8344b6c7acba4e4e5a8d542f56f53c297685fa92f9b0c085d7532cc7e1b661432cecfc1c75c76cdb0d161c95679b6ecfe0573d9fef7c836962aacf604150a984
languageName: node
linkType: hard

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new entry for "@types/parse-path@npm:^7.0.0" includes detailed fields such as version, resolution, checksum, languageName, and linkType. Ensure the added checksum is accurate and matches the package content to prevent integrity issues. Otherwise, the addition appears structurally correct.

Comment on lines +9841 to +9858
"git-up@npm:^8.1.0":
version: 8.1.1
resolution: "git-up@npm:8.1.1"
dependencies:
is-ssh: "npm:^1.4.0"
parse-url: "npm:^9.2.0"
checksum: 10c0/2cc4461d8565a3f7a1ecd3d262a58ddb8df0a67f7f7d4915df2913c460b2e88ae570a6ea810700a6d22fb3b9e4bea8dd10a8eb469900ddc12e35c62208608c03
languageName: node
linkType: hard

"git-url-parse@npm:^16.1.0":
version: 16.1.0
resolution: "git-url-parse@npm:16.1.0"
dependencies:
git-up: "npm:^8.1.0"
checksum: 10c0/b8f5ebcbd5b2baf9f1bb77a217376f0247c47fe1d42811ccaac3015768eebb0759a59051f758e50e70adf5c67ae059d1975bf6b750164f36bfd39138d11b940b
languageName: node
linkType: hard

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify the addition of the 'git-up' and 'git-url-parse' packages and their versions in the lockfile to ensure that these updates are intentional and compatible. Confirm that the dependencies 'is-ssh' and 'parse-url' for 'git-up', and 'git-up' for 'git-url-parse' correspond to supported versions and do not introduce conflicts.

Comment on lines +10663 to +10671
"is-ssh@npm:^1.4.0":
version: 1.4.1
resolution: "is-ssh@npm:1.4.1"
dependencies:
protocols: "npm:^2.0.1"
checksum: 10c0/021a7355cb032625d58db3cc8266ad9aa698cbabf460b71376a0307405577fd7d3aa0826c0bf1951d7809f134c0ee80403306f6d7633db94a5a3600a0106b398
languageName: node
linkType: hard

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new entry for "is-ssh@npm:^1.4.0" is missing a 'dependenciesMeta' section, which might be necessary for optional dependencies or other metadata. Also, verify that the 'checksum' is generated correctly and matches the actual content hash. Ensure this entry includes all necessary fields for consistent package management.

Comment on lines +12439 to +12444
"parse-path@npm:^7.0.0":
version: 7.1.0
resolution: "parse-path@npm:7.1.0"
dependencies:
protocols: "npm:^2.0.0"
checksum: 10c0/8c8c8b3019323d686e7b1cd6fd9653bc233404403ad68827836fbfe59dfe26aaef64ed4e0396d0e20c4a7e1469312ec969a679618960e79d5e7c652dc0da5a0f

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checksum values for the 'parse-path' and 'parse-url' packages appear as long strings that might not be correctly formatted or standard. Verify that these checksums are valid and conform to expected formats to ensure integrity verification works properly.

Comment on lines +12452 to +12453
"@types/parse-path": "npm:^7.0.0"
parse-path: "npm:^7.0.0"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 'parse-url' package dependencies, '@types/parse-path' is listed with a version range 'npm:^7.0.0' but without a corresponding entry for '@types/parse-path' in this snippet. Confirm that '@types/parse-path' is correctly declared and available to avoid resolution issues.

Comment on lines +13068 to +13074
"protocols@npm:^2.0.0, protocols@npm:^2.0.1":
version: 2.0.2
resolution: "protocols@npm:2.0.2"
checksum: 10c0/b87d78c1fcf038d33691da28447ce94011d5c7f0c7fd25bcb5fb4d975991c99117873200c84f4b6a9d7f8b9092713a064356236960d1473a7d6fcd4228897b60
languageName: node
linkType: hard

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entry "protocols@npm:^2.0.0, protocols@npm:^2.0.1" added lacks dependencies field. Confirm if this package has no dependencies; if it does, specify them explicitly to ensure proper resolution.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
docs/snippets/schemas/v3/connection.schema.mdx (1)

825-892: ⚠️ Potential issue

URL pattern regex needs improvement for the generic Git host.

The regex pattern for validating URLs in the generic Git host connection contains an unescaped dot character which might cause incorrect validation.

The regex pattern ^(https?:\/\/[^\s/$.?#].[^\s]*|file:\/\/\/[^\s]+)$ contains an unescaped dot character .[^\s]* which should be \.[^\s]* to correctly match a literal dot in URLs. This could lead to validation issues with some URLs.

Suggest updating the pattern to:

-"pattern": "^(https?:\\/\\/[^\\s/$.?#].[^\\s]*|file:\\/\\/\\/[^\\s]+)$",
+"pattern": "^(https?:\\/\\/[^\\s/$.?#]\\.[^\\s]*|file:\\/\\/\\/[^\\s]+)$",
packages/schemas/src/v3/connection.schema.ts (1)

825-892: ⚠️ Potential issue

URL pattern regex needs improvement for the generic Git host.

The regex pattern for validating URLs in the generic Git host connection contains an unescaped dot character which might cause incorrect validation.

The regex pattern ^(https?:\/\/[^\s/$.?#].[^\s]*|file:\/\/\/[^\s]+)$ contains an unescaped dot character .[^\s]* which should be \.[^\s]* to correctly match a literal dot in URLs. This could lead to validation issues with some URLs.

Suggest updating the pattern to:

-"pattern": "^(https?:\\/\\/[^\\s/$.?#].[^\\s]*|file:\\/\\/\\/[^\\s]+)$",
+"pattern": "^(https?:\\/\\/[^\\s/$.?#]\\.[^\\s]*|file:\\/\\/\\/[^\\s]+)$",
docs/snippets/schemas/v3/index.schema.mdx (2)

1067-1067: URL description could be more concise.

The description for the url property is quite verbose, making it potentially hard to scan quickly.

Consider breaking the description into bullet points or simplifying it:

-"description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only modified. Local directories support glob patterns.",
+"description": "The URL to the git repository. Supports:\n- Remote URLs (http:// or https://)\n- Local paths (file:///). Must be a valid git repository root. Local paths support glob patterns and are read-only.",

1068-1068: ⚠️ Potential issue

URL pattern regex needs improvement for the generic Git host.

The regex pattern for validating URLs in the generic Git host connection contains an unescaped dot character which might cause incorrect validation.

The regex pattern ^(https?:\/\/[^\s/$.?#].[^\s]*|file:\/\/\/[^\s]+)$ contains an unescaped dot character .[^\s]* which should be \.[^\s]* to correctly match a literal dot in URLs. This could lead to validation issues with some URLs.

Suggest updating the pattern to:

-"pattern": "^(https?:\\/\\/[^\\s/$.?#].[^\\s]*|file:\\/\\/\\/[^\\s]+)$",
+"pattern": "^(https?:\\/\\/[^\\s/$.?#]\\.[^\\s]*|file:\\/\\/\\/[^\\s]+)$",
🧹 Nitpick comments (4)
packages/schemas/tools/generate.ts (1)

28-36: Ensure dereferenced output stays in-sync with the typed schema

Switching from $RefParser.bundle to dereference is a nice improvement because the generated JSON is now fully self-contained.
One caveat: compileFromFile() still operates on the original file (schemaPath), which means:

  1. The TypeScript types continue to be generated from the non-dereferenced version (they still include $refs).
  2. If a later ref is introduced that cannot be resolved at compile time, the .schema.ts file will silently succeed (because it is already dereferenced) while compileFromFile() will throw.

If the intent is to have both artefacts (.schema.ts and .type.ts) built from the same fully-resolved definition, consider piping the dereferenced object to compile() instead of the raw file path, e.g.:

-const content = await compileFromFile(schemaPath, {
+const tmpSchemaPath = path.join(srcOutDir, `${name}.tmp.deref.json`);
+await writeFile(tmpSchemaPath, schema);
+const content = await compileFromFile(tmpSchemaPath, {

(or use json-schema-to-typescript’s programmatic API that accepts an object).

This will guarantee that both outputs always reflect identical schema sources.

packages/schemas/src/v3/index.schema.ts (2)

1067-1067: Consider clarifying the confusing "read-only modified" wording

The description for the url field states that "Local directories are treated as read-only modified." This wording is confusing or potentially contradictory. Consider revising to clearly state that local repositories are treated as read-only (if that's the intent) or clarify what "modified" means in this context.

-  "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only modified. Local directories support glob patterns.",
+  "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local repositories are treated as read-only. Local directories support glob patterns.",

1078-1094: Consider adding maxItems constraint to enforce the 64 item limit

The description states a maximum of 64 branches/tags can be indexed, but this constraint isn't enforced in the schema. Consider adding "maxItems": 64 to both arrays to ensure consistency between documentation and validation.

  "branches": {
    "type": "array",
    "description": "List of branches to include when indexing. For a given repo, only the branches that exist on the repo's remote *and* match at least one of the provided `branches` will be indexed. The default branch (HEAD) is always indexed. Glob patterns are supported. A maximum of 64 branches can be indexed, with any additional branches being ignored.",
    "items": {
      "type": "string"
    },
+   "maxItems": 64,
    "examples": [
      [
        "main",
        "release/*"
      ],
      [
        "**"
      ]
    ],
    "default": []
  },
  "tags": {
    "type": "array",
    "description": "List of tags to include when indexing. For a given repo, only the tags that exist on the repo's remote *and* match at least one of the provided `tags` will be indexed. Glob patterns are supported. A maximum of 64 tags can be indexed, with any additional tags being ignored.",
    "items": {
      "type": "string"
    },
+   "maxItems": 64,
    "examples": [
      [
        "latest",
        "v2.*.*"
      ],
      [
        "**"
      ]
    ],
    "default": []
  }

The same change should be applied to all other revisions properties for GitHub, GitLab, Gitea, and Bitbucket connections.

Also applies to: 1096-1112

docs/snippets/schemas/v2/index.schema.mdx (1)

1422-1422: Consider updating the Git URL description to match V3 schema

The URL description for the Git configuration in the V2 schema doesn't mention support for local file URLs with glob patterns, unlike the V3 schema. If the V2 schema should support local repositories, consider updating the description to match the V3 schema.

-  "description": "The URL to the git repository."
+  "description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local repositories are treated as read-only. Local directories support glob patterns."
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 9cb85cc and be57e9f.

📒 Files selected for processing (9)
  • docs/snippets/schemas/v1/index.schema.mdx (4 hunks)
  • docs/snippets/schemas/v2/index.schema.mdx (9 hunks)
  • docs/snippets/schemas/v3/connection.schema.mdx (7 hunks)
  • docs/snippets/schemas/v3/index.schema.mdx (8 hunks)
  • packages/schemas/src/v1/index.schema.ts (4 hunks)
  • packages/schemas/src/v2/index.schema.ts (9 hunks)
  • packages/schemas/src/v3/connection.schema.ts (7 hunks)
  • packages/schemas/src/v3/index.schema.ts (8 hunks)
  • packages/schemas/tools/generate.ts (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build
🔇 Additional comments (17)
packages/schemas/src/v2/index.schema.ts (1)

1-2260: Auto-generated file – no manual review required

This file is entirely generated; the inlined refs improve portability and do not affect runtime logic. No actionable issues detected.

packages/schemas/src/v1/index.schema.ts (1)

1-352: Auto-generated file – no manual review required

Same observation as for v2: the expanded definitions are correct and tightly scoped. Nothing to address.

docs/snippets/schemas/v1/index.schema.mdx (3)

22-85: Schema improvements through detailed inline definitions.

The expanded inline definition of GitHub configuration properties enhances documentation clarity and provides comprehensive information for each property. This approach is more maintainable than using references.


87-124: Schema consistency maintained across connection types.

The GitLab configuration schema follows the same pattern of expansion, maintaining consistency with the GitHub configuration structure and providing detailed property information.


239-343: Root schema properly reflects expanded property definitions.

The root schema's Configs array now consistently uses an inline anyOf schema with the detailed GitHub and GitLab schemas, aligning with the rest of the schema refactoring.

docs/snippets/schemas/v3/connection.schema.mdx (4)

235-262: More flexible token configuration with typed objects.

The enhanced token configuration now properly supports both secret references and environment variables, with strict typing, required properties and preventions of additional properties.


375-414: Consistent revision handling across connection types.

The expanded inline revisions property definition now provides detailed documentation for branches and tags across connection types, with consistent structure and constraints.

Also applies to: 534-574, 767-806


827-844: Well-structured generic Git host connection configuration.

The new generic Git host connection type is well defined with clear property constraints and requirements. The support for both remote and local repositories with the same connection type is a great addition.


845-886: Consistent revisions schema for generic Git host.

The revisions object for the generic Git host follows the same structure as other connection types, ensuring consistency across the schema.

packages/schemas/src/v3/connection.schema.ts (2)

234-262: Improved token configuration with typed objects.

The token property now correctly supports both secret references and environment variables consistently across connection types.


887-892: Required fields properly defined for generic Git host.

The schema correctly requires both type and url properties for the generic Git host connection, ensuring proper validation.

docs/snippets/schemas/v3/index.schema.mdx (4)

118-178: Enhanced settings schema with detailed property documentation.

The expanded settings property provides comprehensive documentation for all configuration options, making it easier for users to understand available settings and their impacts.


464-492: Consistent token configuration pattern across connection types.

All connection types now follow the same pattern for token configuration, supporting both secret references and environment variables consistently.

Also applies to: 665-694, 897-925


1055-1122: Generic Git host connection configuration properly integrated.

The new generic Git host connection type is well integrated into the overall schema structure, maintaining consistency with other connection types while adding the unique capability to support both remote and local repositories.


181-181:

✅ Verification successful

URL update in documentation reference.

The documentation URL has been updated from self-hosting/more/search-contexts to docs/search/search-contexts.

Please verify that the new URL path is correct and accessible:


🌐 Web query:

Is https://docs.sourcebot.dev/docs/search/search-contexts a valid documentation URL for Sourcebot?

💡 Result:

Yes, https://docs.sourcebot.dev/docs/search/search-contexts is a valid documentation URL for Sourcebot. This page provides an overview of search contexts, a feature available in Sourcebot's Enterprise Edition. Search contexts allow you to define specific subsets of your codebase to tailor search results, enhancing the efficiency and relevance of your searches. For more detailed information, you can refer to the official documentation.


Documentation URL Verified

The updated URL in docs/snippets/schemas/v3/index.schema.mdx:181 (https://docs.sourcebot.dev/docs/search/search-contexts) is valid and correctly points to the Sourcebot Enterprise Edition Search Contexts documentation. No further changes required.

docs/snippets/schemas/v2/index.schema.mdx (2)

84-102: LGTM! Token schema improvements

The token schemas now use an anyOf structure to support both direct string tokens and environment variable references, providing greater flexibility and security. This is a good improvement.

Also applies to: 282-299, 473-490


772-2255: Impressive schema expansion work

The extensive inlining of schema definitions in the Repos section makes the schema more self-contained and explicit, which improves clarity for users and tools. The schema expansions appear to be consistent with their corresponding definitions.

"format": "url",
"description": "The URL to the git repository. This can either be a remote URL (prefixed with `http://` or `https://`) or a absolute path to a directory on the local machine (prefixed with `file://`). If a local directory is specified, it must point to the root of a git repository. Local directories are treated as read-only modified. Local directories support glob patterns.",
"pattern": "^(https?:\\/\\/[^\\s/$.?#].[^\\s]*|file:\\/\\/\\/[^\\s]+)$",
"examples": [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify the regular expression pattern for file URLs

The regex pattern for URLs needs to be carefully verified to ensure it correctly handles file URLs. The current pattern requires exactly three slashes after 'file:' (as in 'file:///path'). This is generally correct for absolute paths on Unix-like systems, but ensure this matches your implementation expectations.

Run this command to test the pattern with various sample URLs:


🏁 Script executed:

#!/bin/bash
# Test URLs against the pattern in schema
echo "Testing URL pattern..."
PATTERN='^(https?:\/\/[^\s/$.?#].[^\s]*|file:\/\/\/[^\s]+)$'

# Test cases
declare -a URLS=(
  "https://github.com/sourcebot-dev/sourcebot" 
  "http://example.org/repo" 
  "file:///path/to/repo" 
  "file:///repos/*" 
  "file:/path/to/repo"  # Should fail - missing slashes
  "git://github.com/repo"  # Should fail - not http/https/file
)

for url in "${URLS[@]}"; do
  if [[ $url =~ $PATTERN ]]; then
    echo "$url - VALID"
  else
    echo "$url - INVALID"
  fi
done

Length of output: 1454


Update the URL regex to match valid HTTPS and file URIs

Testing shows that https://github.com/sourcebot-dev/sourcebot is rejected by the current pattern. Simplify and correct the regex in packages/schemas/src/v3/index.schema.ts (around the "pattern" definition at line ≈1068):

• Location:
packages/schemas/src/v3/index.schema.ts (the "pattern" property for URL strings)

- "pattern": "^(https?:\\/\\/[^\s/$.?#].[^\s]*|file:\\/\\/\\/[^\s]+)$"
+ "pattern": "^(https?:\\/\\/\\S+|file:\\/\\/\\/\\S+)$"

This updated regex now correctly accepts both your HTTPS test case and file:///… paths.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In packages/schemas/src/v3/index.schema.ts around line 1068, the current regex
pattern for URL validation incorrectly rejects valid HTTPS URLs like
"https://github.com/sourcebot-dev/sourcebot" and requires exactly three slashes
after "file:". Update the regex pattern to a simplified and corrected version
that properly matches valid HTTP, HTTPS, and file URIs with three slashes,
ensuring it accepts both HTTPS URLs and file:/// paths as intended.

@brendan-kellam brendan-kellam merged commit 1aafc22 into main May 15, 2025
5 checks passed
@brendan-kellam brendan-kellam deleted the bkellam/generic_git branch May 15, 2025 20:43
@coderabbitai coderabbitai bot mentioned this pull request Jun 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FR: Support indexing code in local working copies
1 participant