Skip to content

Multi branch / tag support #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .github/images/revisions_filter_dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .github/images/revisions_filter_light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Added support for indexing and searching repositories across multiple revisions (tag or branch). ([#58](https://github.com/sourcebot-dev/sourcebot/pull/58))

## [2.3.0] - 2024-11-01

### Added
Expand Down
40 changes: 40 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,46 @@ docker run -e <b>GITEA_TOKEN=my-secret-token</b> /* additional args */ ghcr.io/s

If you're using a self-hosted GitLab or GitHub instance with a custom domain, you can specify the domain in your config file. See [configs/self-hosted.json](configs/self-hosted.json) for examples.

## Searching multiple branches

By default, Sourcebot will index the default branch. To configure Sourcebot to index multiple branches (or tags), the `revisions` field can be used:

```jsonc
{
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v2/index.json",
"repos": [
{
"type": "github",
"revisions": {
// Index the `main` branch and any branches matching the `releases/*` glob pattern.
"branches": [
"main",
"releases/*"
],
// Index the `latest` tag and any tags matching the `v*.*.*` glob pattern.
"tags": [
"latest",
"v*.*.*"
]
},
"repos": [
"my_org/repo_a",
"my_org/repo_b"
]
}
]
}
```

For each repository (in this case, `repo_a` and `repo_b`), Sourcebot will index all branches and tags matching the `branches` and `tags` patterns provided. Any branches or tags that don't match the patterns will be ignored and not indexed.

To search on a specific revision, use the `revision` filter in the search bar:

<picture>
<source media="(prefers-color-scheme: dark)" srcset=".github/images/revisions_filter_dark.png">
<img style="max-width:700px;width:100%" src=".github/images/revisions_filter_light.png">
</picture>

## Searching a local directory

Local directories can be searched by using the `local` type in your config file:
Expand Down
26 changes: 26 additions & 0 deletions configs/multi-branch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"$schema": "../schemas/v2/index.json",
"repos": [
{
"type": "github",
"revisions": {
// Specify branches to index...
"branches": [
"main",
"release/*"
],
// ... or specify tags
"tags": [
"v*.*.*"
]
},
// For each repo (repoa, repob), Sourcebot will index all branches and tags in the repo
// matching the `branches` and `tags` patterns above. Any branches or tags that don't
// match the patterns will be ignored and not indexed.
"repos": [
"org/repoa",
"org/repob"
]
}
]
}
2 changes: 2 additions & 0 deletions packages/backend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
},
"devDependencies": {
"@types/argparse": "^2.0.16",
"@types/micromatch": "^4.0.9",
"@types/node": "^22.7.5",
"json-schema-to-typescript": "^15.0.2",
"tsc-watch": "^6.2.0",
Expand All @@ -25,6 +26,7 @@
"cross-fetch": "^4.0.0",
"gitea-js": "^1.22.0",
"lowdb": "^7.0.1",
"micromatch": "^4.0.8",
"simple-git": "^3.27.0",
"strip-json-comments": "^5.0.1",
"winston": "^3.15.0"
Expand Down
63 changes: 62 additions & 1 deletion packages/backend/src/gitea.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import { AppContext, GitRepository } from './types.js';
import fetch from 'cross-fetch';
import { createLogger } from './logger.js';
import path from 'path';
import micromatch from 'micromatch';

const logger = createLogger('Gitea');

Expand Down Expand Up @@ -60,7 +61,9 @@ export const getGiteaReposFromConfig = async (config: GiteaConfig, ctx: AppConte
'zoekt.archived': marshalBool(repo.archived),
'zoekt.fork': marshalBool(repo.fork!),
'zoekt.public': marshalBool(repo.internal === false && repo.private === false),
}
},
branches: [],
tags: []
} satisfies GitRepository;
});

Expand All @@ -77,10 +80,68 @@ export const getGiteaReposFromConfig = async (config: GiteaConfig, ctx: AppConte
repos = excludeReposByName(repos, config.exclude.repos, logger);
}
}

logger.debug(`Found ${repos.length} total repositories.`);

if (config.revisions) {
if (config.revisions.branches) {
const branchGlobs = config.revisions.branches;
repos = await Promise.all(
repos.map(async (repo) => {
const [owner, name] = repo.name.split('/');
let branches = (await getBranchesForRepo(owner, name, api)).map(branch => branch.name!);
branches = micromatch.match(branches, branchGlobs);

return {
...repo,
branches,
};
})
)
}

if (config.revisions.tags) {
const tagGlobs = config.revisions.tags;
repos = await Promise.all(
repos.map(async (repo) => {
const [owner, name] = repo.name.split('/');
let tags = (await getTagsForRepo(owner, name, api)).map(tag => tag.name!);
tags = micromatch.match(tags, tagGlobs);

return {
...repo,
tags,
};
})
)
}
}

return repos;
}

const getTagsForRepo = async <T>(owner: string, repo: string, api: Api<T>) => {
logger.debug(`Fetching tags for repo ${owner}/${repo}...`);
const { durationMs, data: tags } = await measure(() =>
paginate((page) => api.repos.repoListTags(owner, repo, {
page
}))
);
logger.debug(`Found ${tags.length} tags in repo ${owner}/${repo} in ${durationMs}ms.`);
return tags;
}

const getBranchesForRepo = async <T>(owner: string, repo: string, api: Api<T>) => {
logger.debug(`Fetching branches for repo ${owner}/${repo}...`);
const { durationMs, data: branches } = await measure(() =>
paginate((page) => api.repos.repoListBranches(owner, repo, {
page
}))
);
logger.debug(`Found ${branches.length} branches in repo ${owner}/${repo} in ${durationMs}ms.`);
return branches;
}

const getReposOwnedByUsers = async <T>(users: string[], api: Api<T>) => {
const repos = (await Promise.all(users.map(async (user) => {
logger.debug(`Fetching repos for user ${user}...`);
Expand Down
77 changes: 72 additions & 5 deletions packages/backend/src/github.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@ import { GitHubConfig } from "./schemas/v2.js";
import { createLogger } from "./logger.js";
import { AppContext, GitRepository } from "./types.js";
import path from 'path';
import { excludeArchivedRepos, excludeForkedRepos, excludeReposByName, getTokenFromConfig, marshalBool } from "./utils.js";
import { excludeArchivedRepos, excludeForkedRepos, excludeReposByName, getTokenFromConfig, marshalBool, measure } from "./utils.js";
import micromatch from "micromatch";

const logger = createLogger("GitHub");

type OctokitRepository = {
name: string,
id: number,
full_name: string,
fork: boolean,
private: boolean,
Expand Down Expand Up @@ -88,7 +90,9 @@ export const getGitHubReposFromConfig = async (config: GitHubConfig, signal: Abo
'zoekt.archived': marshalBool(repo.archived),
'zoekt.fork': marshalBool(repo.fork),
'zoekt.public': marshalBool(repo.private === false)
}
},
branches: [],
tags: [],
} satisfies GitRepository;
});

Expand All @@ -107,10 +111,75 @@ export const getGitHubReposFromConfig = async (config: GitHubConfig, signal: Abo
}

logger.debug(`Found ${repos.length} total repositories.`);


if (config.revisions) {
if (config.revisions.branches) {
const branchGlobs = config.revisions.branches;
repos = await Promise.all(
repos.map(async (repo) => {
const [owner, name] = repo.name.split('/');
let branches = (await getBranchesForRepo(owner, name, octokit, signal)).map(branch => branch.name);
branches = micromatch.match(branches, branchGlobs);

return {
...repo,
branches,
};
})
)
}

if (config.revisions.tags) {
const tagGlobs = config.revisions.tags;
repos = await Promise.all(
repos.map(async (repo) => {
const [owner, name] = repo.name.split('/');
let tags = (await getTagsForRepo(owner, name, octokit, signal)).map(tag => tag.name);
tags = micromatch.match(tags, tagGlobs);

return {
...repo,
tags,
};
})
)
}
}

return repos;
}

const getTagsForRepo = async (owner: string, repo: string, octokit: Octokit, signal: AbortSignal) => {
logger.debug(`Fetching tags for repo ${owner}/${repo}...`);

const { durationMs, data: tags } = await measure(() => octokit.paginate(octokit.repos.listTags, {
owner,
repo,
per_page: 100,
request: {
signal
}
}));

logger.debug(`Found ${tags.length} tags for repo ${owner}/${repo} in ${durationMs}ms`);
return tags;
}

const getBranchesForRepo = async (owner: string, repo: string, octokit: Octokit, signal: AbortSignal) => {
logger.debug(`Fetching branches for repo ${owner}/${repo}...`);
const { durationMs, data: branches } = await measure(() => octokit.paginate(octokit.repos.listBranches, {
owner,
repo,
per_page: 100,
request: {
signal
}
}));
logger.debug(`Found ${branches.length} branches for repo ${owner}/${repo} in ${durationMs}ms`);
return branches;
}


const getReposOwnedByUsers = async (users: string[], isAuthenticated: boolean, octokit: Octokit, signal: AbortSignal) => {
// @todo : error handling
const repos = (await Promise.all(users.map(async (user) => {
Expand Down Expand Up @@ -149,7 +218,6 @@ const getReposOwnedByUsers = async (users: string[], isAuthenticated: boolean, o
}

const getReposForOrgs = async (orgs: string[], octokit: Octokit, signal: AbortSignal) => {
// @todo : error handling
const repos = (await Promise.all(orgs.map(async (org) => {
logger.debug(`Fetching repository info for org ${org}...`);
const start = Date.now();
Expand All @@ -172,7 +240,6 @@ const getReposForOrgs = async (orgs: string[], octokit: Octokit, signal: AbortSi
}

const getRepos = async (repoList: string[], octokit: Octokit, signal: AbortSignal) => {
// @todo : error handling
const repos = await Promise.all(repoList.map(async (repo) => {
logger.debug(`Fetching repository info for ${repo}...`);
const start = Date.now();
Expand Down
41 changes: 40 additions & 1 deletion packages/backend/src/gitlab.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import { excludeArchivedRepos, excludeForkedRepos, excludeReposByName, getTokenF
import { createLogger } from "./logger.js";
import { AppContext, GitRepository } from "./types.js";
import path from 'path';
import micromatch from "micromatch";

const logger = createLogger("GitLab");

Expand Down Expand Up @@ -90,7 +91,9 @@ export const getGitLabReposFromConfig = async (config: GitLabConfig, ctx: AppCon
'zoekt.archived': marshalBool(project.archived),
'zoekt.fork': marshalBool(isFork),
'zoekt.public': marshalBool(project.visibility === 'public'),
}
},
branches: [],
tags: [],
} satisfies GitRepository;
});

Expand All @@ -110,5 +113,41 @@ export const getGitLabReposFromConfig = async (config: GitLabConfig, ctx: AppCon

logger.debug(`Found ${repos.length} total repositories.`);

if (config.revisions) {
if (config.revisions.branches) {
const branchGlobs = config.revisions.branches;
repos = await Promise.all(repos.map(async (repo) => {
logger.debug(`Fetching branches for repo ${repo.name}...`);
let { durationMs, data } = await measure(() => api.Branches.all(repo.name));
logger.debug(`Found ${data.length} branches in repo ${repo.name} in ${durationMs}ms.`);

let branches = data.map((branch) => branch.name);
branches = micromatch.match(branches, branchGlobs);

return {
...repo,
branches,
};
}));
}

if (config.revisions.tags) {
const tagGlobs = config.revisions.tags;
repos = await Promise.all(repos.map(async (repo) => {
logger.debug(`Fetching tags for repo ${repo.name}...`);
let { durationMs, data } = await measure(() => api.Tags.all(repo.name));
logger.debug(`Found ${data.length} tags in repo ${repo.name} in ${durationMs}ms.`);

let tags = data.map((tag) => tag.name);
tags = micromatch.match(tags, tagGlobs);

return {
...repo,
tags,
};
}));
}
}

return repos;
}
Loading