-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sourcegraph: multi-tenant Zoekt #859
base: main
Are you sure you want to change the base?
Conversation
// Skip documents that don't belong to the tenant | ||
if !tenant.EqualsID(ctx, repoMetadata.TenantID) { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the check at this level means shard merging should work out of the box. We can move it higher if we see it's a bottleneck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, this is how I imagined how this would work originally. At the core loop we check, but then add some other logic in higher up places for efficiency reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying in my comment from the previous PR (#858). What's your new thinking on this?
For other tenant-aware services, we'd tried to design it so there's a "choke point" that clearly enforces tenancy. Often this is single interface and file guarded by //🚨 SECURITY:
comments. Right now, we are only enforcing tenancy on Search
, but not on List
. And there are other places where we access index data, for example to get metadata or ngram stats.
I wonder if we could push this down further to where index data is loaded/ read. Like reader.readTOC
and reader.readIndexData
? That would help establish a single place on the read path where tenancy is enforced, as close to the data access as possible.
} | ||
t, err := tenanttype.FromContext(ctx) | ||
if err != nil { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not returning an error because we pprof log in FromContext already. Nice side-effect: No error here means we can avoid running watchdog with a tenant and we don't need http middleware either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, this logic makes sense for zoekt. nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels so much better than the other PR right? dig it.
// Skip documents that don't belong to the tenant | ||
if !tenant.EqualsID(ctx, repoMetadata.TenantID) { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, this is how I imagined how this would work originally. At the core loop we check, but then add some other logic in higher up places for efficiency reasons.
} | ||
t, err := tenanttype.FromContext(ctx) | ||
if err != nil { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, this logic makes sense for zoekt. nice.
- prefix flag instead of bool - padded ids - include tenant in id - add tenant tests
Fuzz test failed on commit b93cb3c. To troubleshoot locally, use the GitHub CLI to download the seed corpus with
|
This updates webserver and sourcegraph-indexserver to support multi-tenancy.
The change is behind an ENV feature-flag.
Key changes:
TenantId
Assumptions:
Test plan:
This is how the new Sourcegraph naming scheme looks like on disk: