Skip to content

feat: Generic git host support (local & remote) #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- Added support for indexing generic git hosts given a remote clone url or local path. [#307](https://github.com/sourcebot-dev/sourcebot/pull/307)

## [3.2.0] - 2025-05-12

### Added
Expand Down
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ zoekt:
export CTAGS_COMMANDS=ctags

clean:
redis-cli FLUSHALL
yarn dev:prisma:migrate:reset

rm -rf \
bin \
node_modules \
Expand Down
15 changes: 11 additions & 4 deletions docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,21 @@
"docs/connections/bitbucket-data-center",
"docs/connections/gitea",
"docs/connections/gerrit",
"docs/connections/generic-git-host",
"docs/connections/local-repos",
"docs/connections/request-new"
]
}
]
},
{
"group": "Search",
"pages": [
"docs/search/syntax-reference",
"docs/search/multi-branch-indexing",
"docs/search/search-contexts"
]
},
{
"group": "Agents",
"pages": [
Expand All @@ -53,11 +63,8 @@
{
"group": "More",
"pages": [
"docs/more/syntax-reference",
"docs/more/multi-branch-indexing",
"docs/more/roles-and-permissions",
"docs/more/mcp-server",
"docs/more/search-contexts"
"docs/more/mcp-server"
]
}
]
Expand Down
29 changes: 29 additions & 0 deletions docs/docs/connections/generic-git-host.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: Other Git hosts
---

import GenericGitHost from '/snippets/schemas/v3/genericGitHost.schema.mdx'

Sourcebot can sync code from any Git host (by clone url). This is helpful when you want to search code that not in a [supported code host](/docs/connections/overview#supported-code-hosts).

## Getting Started

To connect to a Git host, create a new [connection](/docs/connections/overview) with type `git` and specify the clone url in the `url` property. For example:

```json
{
"type": "git",
"url": "https://github.com/sourcebot-dev/sourcebot"
}
```

Note that only `http` & `https` URLs are supported at this time.

## Schema reference

<Accordion title="Reference">
[schemas/v3/genericGitHost.json](https://github.com/sourcebot-dev/sourcebot/blob/main/schemas/v3/genericGitHost.json)

<GenericGitHost />

</Accordion>
87 changes: 87 additions & 0 deletions docs/docs/connections/local-repos.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
title: Local Git repositories
---

import GenericGitHost from '/snippets/schemas/v3/genericGitHost.schema.mdx'

<Note>
This feature is only supported when [self-hosting](/self-hosting/overview).
</Note>

Sourcebot can sync code from generic git repositories stored in a local directory. This can be helpful in scenarios where you already have a large number of repos already checked out. Local repositories are treated as **read-only**, meaing Sourcebot will **not** `git fetch` new revisions.

## Getting Started

<Warning>
Only folders containing git repositories at their root **and** have a `remote.origin.url` set in their git config are supported at this time. All other folders will be skipped.
</Warning>

Let's assume we have a `repos` directory located at `$(PWD)` with a collection of git repositories:

```sh
repos/
├─ repo_1/
├─ repo_2/
├─ repo_3/
├─ ...
```

To get Sourcebot to index these repositories:

<Steps>
<Step title="Mount a volume">
We need to mount a docker volume to the `repos` directory so Sourcebot can read it's contents. Sourcebot will **not** write to local repositories, so we can mount a seperate **read-only** volume:

``` bash
docker run \
-v $(pwd)/repos:/repos:ro \
/* additional args */ \
ghcr.io/sourcebot-dev/sourcebot:latest
```
</Step>

<Step title="Create a connection">
We can now create a new git [connection](/docs/connections/overview), specifying local paths with the `file://` prefix. Glob patterns are supported. For example:

```json
{
"type": "git",
"url": "file:///repos/*"
}
```

Sourcebot will expand this glob pattern into paths `/repos/repo_1`, `/repos/repo_2`, etc. and index all valid git repositories.
</Step>
</Steps>

## Examples


<AccordionGroup>
<Accordion title="Sync individual repo">
```json
{
"type": "git",
"url": "file:///path/to/git_repo"
}
```
</Accordion>
<Accordion title="Sync multiple repos using glob patterns">
```json
// Attempt to sync directories contained in `repos/` (non-recursive)
{
"type": "git",
"url": "file:///repos/*"
}
```
</Accordion>
</AccordionGroup>

## Schema reference

<Accordion title="Reference">
[schemas/v3/genericGitHost.json](https://github.com/sourcebot-dev/sourcebot/blob/main/schemas/v3/genericGitHost.json)

<GenericGitHost />

</Accordion>
2 changes: 2 additions & 0 deletions docs/docs/connections/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ There are two ways to define connections:
<Card horizontal title="Bitbucket Data Center" icon="bitbucket" href="/docs/connections/bitbucket-data-center" />
<Card horizontal title="Gitea" href="/docs/connections/gitea" />
<Card horizontal title="Gerrit" href="/docs/connections/gerrit" />
<Card horizontal title="Other Git hosts" icon="git-alt" href="/docs/connections/generic-git-host" />
<Card horizontal title="Local Git repos" icon="folder" href="/docs/connections/local-repos" />
</CardGroup>

<Note>Missing your code host? [Submit a feature request on GitHub](https://github.com/sourcebot-dev/sourcebot/discussions/categories/ideas).</Note>
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,5 @@ Additional info:
| Bitbucket Data Center ||
| Gitea ||
| Gerrit ||
| Generic git host ||

Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ Like other prefixes, contexts can be negated using `-` or combined using `or`:
- `-context:web` excludes frontend repositories from results
- `( context:web or context:backend )` searches across both frontend and backend code

See [this doc](/docs/more/syntax-reference) for more details on the search query syntax.
See [this doc](/docs/search/syntax-reference) for more details on the search query syntax.

## Schema reference

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ Expressions can be prefixed with certain keywords to modify search behavior. Som
| `rev:` | Filter results from a specific branch or tag. By default **only** the default branch is searched. | `rev:beta` - Filter results to branches that match regex `/beta/` |
| `lang:` | Filter results by language (as defined by [linguist](https://github.com/github-linguist/linguist/blob/main/lib/linguist/languages.yml)). By default all languages are searched. | `lang:TypeScript` - Filter results to TypeScript files<br/>`-lang:YAML` - Ignore results from YAML files |
| `sym:` | Match symbol definitions created by [universal ctags](https://ctags.io/) at index time. | `sym:\bmain\b` - Filter results to symbols that match regex `/\bmain\b/` |
| `context:` | Filter results to a predefined [search context](/self-hosting/more/search-contexts). | `context:web` - Filter results to the web context<br/>`-context:pipelines` - Ignore results from the pipelines context |
| `context:` | Filter results to a predefined [search context](/docs/search/search-contexts). | `context:web` - Filter results to the web context<br/>`-context:pipelines` - Ignore results from the pipelines context |
2 changes: 2 additions & 0 deletions docs/self-hosting/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ Sourcebot is open source and can be self-hosted using our official [Docker image
<Card horizontal title="Bitbucket Data Center" icon="bitbucket" href="/docs/connections/bitbucket-data-center" />
<Card horizontal title="Gitea" href="/docs/connections/gitea" />
<Card horizontal title="Gerrit" href="/docs/connections/gerrit" />
<Card horizontal title="Other Git hosts" icon="git-alt" href="/docs/connections/generic-git-host" />
<Card horizontal title="Local Git repos" icon="folder" href="/docs/connections/local-repos" />
</CardGroup>

<Note>Missing your code host? [Submit a feature request on GitHub](https://github.com/sourcebot-dev/sourcebot/discussions/categories/ideas).</Note>
Expand Down
Loading