Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sourcegraph: multi-tenant Zoekt #859

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

stefanhengl
Copy link
Member

@stefanhengl stefanhengl commented Nov 14, 2024

This updates webserver and sourcegraph-indexserver to support multi-tenancy.

The change is behind an ENV feature-flag.

Key changes:

  • tenant ID is now part of the index (repo metadata)
  • GRPC: IndexOption and Repository have a new field TenantId
  • If multi-tenancy is enabled, webserver checks if tenant in context matches the tenant id in the shard
  • zoekt-git-index has a new parameter "-shard_prefix ". If set, the value will be used instead of repository name as prefix for the name of the shard. For Sourcegraph we use "<tenant id>_<repository id>" as prefix (see screenshot).

Assumptions:

  • All calls to Sourcegraph are privileged

Test plan:

  • New tests
  • Ran this together with Sourcegraph (with and without MT enabled)

This is how the new Sourcegraph naming scheme looks like on disk:
image

// Skip documents that don't belong to the tenant
if !tenant.EqualsID(ctx, repoMetadata.TenantID) {
continue
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the check at this level means shard merging should work out of the box. We can move it higher if we see it's a bottleneck.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, this is how I imagined how this would work originally. At the core loop we check, but then add some other logic in higher up places for efficiency reasons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying in my comment from the previous PR (#858). What's your new thinking on this?

For other tenant-aware services, we'd tried to design it so there's a "choke point" that clearly enforces tenancy. Often this is single interface and file guarded by //🚨 SECURITY: comments. Right now, we are only enforcing tenancy on Search, but not on List. And there are other places where we access index data, for example to get metadata or ngram stats.

I wonder if we could push this down further to where index data is loaded/ read. Like reader.readTOC and reader.readIndexData? That would help establish a single place on the read path where tenancy is enforced, as close to the data access as possible.

}
t, err := tenanttype.FromContext(ctx)
if err != nil {
return false
Copy link
Member Author

@stefanhengl stefanhengl Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not returning an error because we pprof log in FromContext already. Nice side-effect: No error here means we can avoid running watchdog with a tenant and we don't need http middleware either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, this logic makes sense for zoekt. nice.

Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels so much better than the other PR right? dig it.

build/builder.go Outdated Show resolved Hide resolved
cmd/zoekt-sourcegraph-indexserver/index.go Outdated Show resolved Hide resolved
cmd/zoekt-sourcegraph-indexserver/index.go Outdated Show resolved Hide resolved
cmd/zoekt-sourcegraph-indexserver/index.go Outdated Show resolved Hide resolved
// Skip documents that don't belong to the tenant
if !tenant.EqualsID(ctx, repoMetadata.TenantID) {
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, this is how I imagined how this would work originally. At the core loop we check, but then add some other logic in higher up places for efficiency reasons.

}
t, err := tenanttype.FromContext(ctx)
if err != nil {
return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, this logic makes sense for zoekt. nice.

- prefix flag instead of bool
- padded ids
- include tenant in id
- add tenant tests
Copy link
Contributor

Fuzz test failed on commit b93cb3c. To troubleshoot locally, use the GitHub CLI to download the seed corpus with

gh run download 11855426986 -n testdata

@stefanhengl stefanhengl changed the title sourcegraph: multi-tenant Zoekt v2 sourcegraph: multi-tenant Zoekt Nov 15, 2024
@stefanhengl stefanhengl marked this pull request as ready for review November 15, 2024 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants