Skip to content

Cleanup documentation around symbols and repo-updater #1132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/admin/architecture.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@
At its core, Sourcegraph maintains a persistent cache of all repositories that are connected to it. It is persistent because this data is critical for Sourcegraph to function. Still, it is ultimately a cache because the code host is the source of truth, and our cache is eventually consistent.

- `gitserver` is the sharded service that stores repositories and makes them accessible to other Sourcegraph services
- `repo-updater` is the singleton service responsible for ensuring all repositories in gitserver are as up-to-date as possible while respecting code host rate limits. It is also responsible for syncing repository metadata from the code host that is stored in the repo table of our Postgres database
- `worker` is responsible for ensuring all repositories in gitserver are as up-to-date as possible while respecting code host rate limits. It is also responsible for syncing repository metadata from the code host that is stored in the repo table of our Postgres database

## Permission syncing

Repository permissions are mirrored from code hosts to Sourcegraph by default. This builds the foundation of Sourcegraph authorization for repositories to ensure users see consistent content on code hosts. Currently, the background permissions syncer resides in the repo-updater.
Repository permissions are mirrored from code hosts to Sourcegraph by default. This builds the foundation of Sourcegraph authorization for repositories to ensure users see consistent content on code hosts. Currently, the background permissions syncer resides in the `worker`.

<Callout type="note">Learn more in the [Permission Syncing docs](/admin/permissions/syncing)</Callout>

Expand Down Expand Up @@ -94,7 +94,7 @@ You can learn more in the [Code Insights](/code_insights) docs.
- Exhaustive search (with `count:all/count:999999` operator)
- Historical search (= unindexed search, currently)
- Commit search to find historical commits to search over
- Repository Syncing: The code insights backend has direct dependencies on `gitserver` and `repo-updater`
- Repository Syncing: The code insights backend has a direct dependency on `gitserver`
- Permission syncing: The code insights backend depends on synced repository permissions for access control
- Settings cascade:
- Insights and dashboard configuration are stored in user, organization, and global settings. This will change in the future and is planned to be moved to the database
Expand Down
2 changes: 1 addition & 1 deletion docs/admin/code_hosts/rate_limits.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Requests to the configured code host will be staggered as to not exceed `"reques
- For Sourcegraph `<=3.38`, if rate limiting is configured more than once for the same code host instance, the most restrictive limit will be used.
- For Sourcegraph >=3.39, rate limiting should be enabled and configured for each individual code host connection.

To see the status of configured internal rate limits, visit **Site admin > Instrumentation > repo-updater > Rate Limiter State**. This page lists internal rate limits by code host, for example:
To see the status of configured internal rate limits, visit **Site admin > Instrumentation > worker > Rate Limiter State**. This page lists internal rate limits by code host, for example:

```json
{
Expand Down
1 change: 0 additions & 1 deletion docs/admin/config/postgres-conf.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@ The setting `max_connections` determines the number of active connections that c
| --------------------------- | ------------------------------------------ |
| `frontend` | `pgsql`, `codeintel-db`, `codeinsights-db` |
| `gitserver` | `pgsql` |
| `repo-updater` | `pgsql` |
| `precise-code-intel-worker` | `codeintel-db`, `pgsql` |
| `worker` | `codeintel-db`, `pgsql`, `codeinsights-db` |

Expand Down
9 changes: 4 additions & 5 deletions docs/admin/config/private-network.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ services hosted within an organization's private network
* Connecting to external [LLM providers](../../cody/capabilities/supported-models) with Cody
- **gitserver**: Executes git commands against externally hosted [code hosts](../external_service)
- **migrator**: Connects to Postgres instances (which may be [externally hosted](../external_services/postgres)) to process database migrations
- **repo-updater**: Communicates with [code hosts](../external_service) APIs to coordinate repository synchronization
- **worker**: Sourcegraph [Worker](../workers) run various background jobs that may require establishing connections to
services hosted within an organization's private network

Expand All @@ -34,22 +33,22 @@ variables will depend on your Sourcegraph deployment method.
Add the proxy environment variables to your Sourcegraph Helm chart [override file](https://github.com/sourcegraph/deploy-sourcegraph-helm/blob/main/charts/sourcegraph/values.yaml):

```yaml
executor|frontend|gitserver|migrator|repo-updater|worker:
executor|frontend|gitserver|migrator|worker:
env:
- name: HTTP_PROXY
value: http://proxy.example.com:8080
- name: HTTPS_PROXY
value: http://proxy.example.com:8080
- name: NO_PROXY
value: "blobstore,codeinsights-db,codeintel-db,sourcegraph-frontend-internal,sourcegraph-frontend,github-proxy,gitserver,grafana,indexed-search-indexer,indexed-search,jaeger-query,pgsql,precise-code-intel-worker,prometheus,redis-cache,redis-store,repo-updater,searcher,symbols,syntect-server,worker-executors,worker,cloud-sql-proxy,localhost,127.0.0.1,.svc,.svc.cluster.local,kubernetes.default.svc"
value: "blobstore,codeinsights-db,codeintel-db,sourcegraph-frontend-internal,sourcegraph-frontend,github-proxy,gitserver,grafana,indexed-search-indexer,indexed-search,jaeger-query,pgsql,precise-code-intel-worker,prometheus,redis-cache,redis-store,searcher,syntect-server,worker-executors,worker,cloud-sql-proxy,localhost,127.0.0.1,.svc,.svc.cluster.local,kubernetes.default.svc"
```

<Callout type="info">
If the updated Sourcegraph pods fail to pass their readiness or health checks after configuring the HTTP proxy environment variables, you may also need to add your k8s cluster pod & service CIDR ranges to the `NO_PROXY` environment variable. Example:

```yaml
- name: NO_PROXY
value: "blobstore,codeinsights-db,codeintel-db,sourcegraph-frontend-internal,sourcegraph-frontend,github-proxy,gitserver,grafana,indexed-search-indexer,indexed-search,jaeger-query,pgsql,precise-code-intel-worker,prometheus,redis-cache,redis-store,repo-updater,searcher,symbols,syntect-server,worker-executors,worker,cloud-sql-proxy,localhost,127.0.0.1,.svc,.svc.cluster.local,kubernetes.default.svc,10.10.0.0/16,10.20.0.0/16"
value: "blobstore,codeinsights-db,codeintel-db,sourcegraph-frontend-internal,sourcegraph-frontend,github-proxy,gitserver,grafana,indexed-search-indexer,indexed-search,jaeger-query,pgsql,precise-code-intel-worker,prometheus,redis-cache,redis-store,searcher,syntect-server,worker-executors,worker,cloud-sql-proxy,localhost,127.0.0.1,.svc,.svc.cluster.local,kubernetes.default.svc,10.10.0.0/16,10.20.0.0/16"
```
</Callout>

Expand All @@ -62,7 +61,7 @@ services:
environment:
- HTTP_PROXY=http://proxy.example.com:8080
- HTTPS_PROXY=http://proxy.example.com:8080
- NO_PROXY='blobstore,caddy,cadvisor,codeintel-db,codeintel-db-exporter,codeinsights-db,codeinsights-db-exporter,sourcegraph-frontend-0,sourcegraph-frontend-internal,gitserver-0,grafana,migrator,node-exporter,otel-collector,pgsql,pgsql-exporter,precise-code-intel-worker,prometheus,redis-cache,redis-store,repo-updater,searcher-0,symbols-0,syntect-server,worker,zoekt-indexserver-0,zoekt-webserver-0,localhost,127.0.0.1'
- NO_PROXY='blobstore,caddy,cadvisor,codeintel-db,codeintel-db-exporter,codeinsights-db,codeinsights-db-exporter,sourcegraph-frontend-0,sourcegraph-frontend-internal,gitserver-0,grafana,migrator,node-exporter,otel-collector,pgsql,pgsql-exporter,precise-code-intel-worker,prometheus,redis-cache,redis-store,searcher-0,syntect-server,worker,zoekt-indexserver-0,zoekt-webserver-0,localhost,127.0.0.1'
```

<Callout type="warning">Failure to configure `NO_PROXY` correctly can cause the proxy configuration to interfere with
Expand Down
4 changes: 1 addition & 3 deletions docs/admin/deploy/docker-compose/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ If you must use a `.netrc` file to store these credentials instead, follow the p

## Add replicas

When adding replicas for `gitserver`, `indexed-search`, `searcher`, or `symbols`, you must update the corresponding environment variable on each of the frontend services in your docker-compose.override.yaml file, `SRC_GIT_SERVERS`, `INDEXED_SEARCH_SERVERS`, `SEARCHER_URL`, and `SYMBOLS_URL` to the number of replicas for each respective service. Sourcegraph will then automatically infer the endpoints for each replica.
When adding replicas for `gitserver`, `indexed-search`, or `searcher`, you must update the corresponding environment variable on each of the frontend services in your docker-compose.override.yaml file, `SRC_GIT_SERVERS`, `INDEXED_SEARCH_SERVERS`, and `SEARCHER_URL` to the number of replicas for each respective service. Sourcegraph will then automatically infer the endpoints for each replica.

```yaml
# docker-compose.override.yaml
Expand All @@ -136,14 +136,12 @@ services:
- 'SRC_GIT_SERVERS=2'
- 'INDEXED_SEARCH_SERVERS=2'
- 'SEARCHER_URL=1'
- 'SYMBOLS_URL=1'

sourcegraph-frontend-internal:
environment:
- 'SRC_GIT_SERVERS=2'
- 'INDEXED_SEARCH_SERVERS=2'
- 'SEARCHER_URL=1'
- 'SYMBOLS_URL=1'
```

## Shard gitserver
Expand Down
6 changes: 0 additions & 6 deletions docs/admin/deploy/docker-compose/operations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,7 @@ prometheus /bin/prom-wrapper Up
query-runner /sbin/tini -- /usr/local/b ... Up
redis-cache /sbin/tini -- redis-server ... Up 6379/tcp
redis-store /sbin/tini -- redis-server ... Up 6379/tcp
repo-updater /sbin/tini -- /usr/local/b ... Up
searcher-0 /sbin/tini -- /usr/local/b ... Up (healthy)
symbols-0 /sbin/tini -- /usr/local/b ... Up (healthy) 3184/tcp
syntect-server sh -c /http-server-stabili ... Up (healthy) 9238/tcp
worker /sbin/tini -- /usr/local/b ... Up 3189/tcp
zoekt-indexserver-0 /sbin/tini -- zoekt-source ... Up
Expand Down Expand Up @@ -151,9 +149,7 @@ prometheus /bin/prom-wrapper Up
query-runner /sbin/tini -- /usr/local/b ... Up
redis-cache /sbin/tini -- redis-server ... Up 6379/tcp
redis-store /sbin/tini -- redis-server ... Up 6379/tcp
repo-updater /sbin/tini -- /usr/local/b ... Up
searcher-0 /sbin/tini -- /usr/local/b ... Up (healthy)
symbols-0 /sbin/tini -- /usr/local/b ... Up (healthy) 3184/tcp
syntect-server sh -c /http-server-stabili ... Up (healthy) 9238/tcp
worker /sbin/tini -- /usr/local/b ... Up 3189/tcp
zoekt-indexserver-0 /sbin/tini -- zoekt-source ... Up
Expand Down Expand Up @@ -221,9 +217,7 @@ prometheus /bin/prom-wrapper Up
query-runner /sbin/tini -- /usr/local/b ... Up
redis-cache /sbin/tini -- redis-server ... Up 6379/tcp
redis-store /sbin/tini -- redis-server ... Up 6379/tcp
repo-updater /sbin/tini -- /usr/local/b ... Up
searcher-0 /sbin/tini -- /usr/local/b ... Up (healthy)
symbols-0 /sbin/tini -- /usr/local/b ... Up (healthy) 3184/tcp
syntect-server sh -c /http-server-stabili ... Up (healthy) 9238/tcp
worker /sbin/tini -- /usr/local/b ... Up 3189/tcp
zoekt-indexserver-0 /sbin/tini -- zoekt-source ... Up
Expand Down
20 changes: 1 addition & 19 deletions docs/admin/deploy/kubernetes/configure.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -994,7 +994,7 @@ patches:

You can update environment variables for **searcher** with `patches`.

For example, to update the value for `SEARCHER_CACHE_SIZE_MB`:
For example, to update the value for `SEARCHER_CACHE_SIZE_MB` and `SEARCHER_CACHE_SIZE_MB`:

```yaml
# instances/$INSTANCE_NAME/kustomization.yaml
Expand All @@ -1008,21 +1008,6 @@ For example, to update the value for `SEARCHER_CACHE_SIZE_MB`:
value:
name: SEARCHER_CACHE_SIZE_MB
value: "50000"
```

### Symbols

You can update environment variables for **searcher** with `patches`.

For example, to update the value for `SYMBOLS_CACHE_SIZE_MB`:

```yaml
# instances/$INSTANCE_NAME/kustomization.yaml
patches:
- target:
name: symbols
kind: StatefulSet|Deployment
patch: |-
- op: replace
path: /spec/template/spec/containers/0/env/0
value:
Expand Down Expand Up @@ -1098,12 +1083,9 @@ Sourcegraph supports specifying an external Redis server with these environment

When using an external Redis server, the corresponding environment variable must also be added to the following services:


- `sourcegraph-frontend`
- `repo-updater`
- `gitserver`
- `searcher`
- `symbols`
- `worker`

**Step 1**: Include the `services/redis` component in your components:
Expand Down
2 changes: 0 additions & 2 deletions docs/admin/deploy/kubernetes/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -944,11 +944,9 @@ Scale down `deployments` and `statefulSets` that access the database, _this step
The following services must have their replicas scaled to 0:
- Deployments (e.g., `kubectl scale deployment <name> --replicas=0`)
- precise-code-intel-worker
- repo-updater
- searcher
- sourcegraph-frontend
- sourcegraph-frontend-internal
- symbols
- worker
- Stateful sets (e.g., `kubectl scale sts <name> --replicas=0`):
- gitserver
Expand Down
4 changes: 3 additions & 1 deletion docs/admin/deploy/kubernetes/kustomize/migrate.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Here are the benefits of the new base cluster with the new Kustomize setup compa
- Streamlined resource allocation process:
* Allocates resources based on the size of the instance
* Optimized through load testing
* The searcher and symbols use StatefulSets and do not require ephemeral storage
* The searcher StatefulSet does not require ephemeral storage
- Utilizes the Kubernetes-native tool Kustomize:
* Built into kubectl
* No additional scripting required
Expand Down Expand Up @@ -192,6 +192,8 @@ If your instance was deployed using the non-privileged overlay, you can follow t

## Step 9: Build and review new manifests

> NOTE: Symbols has been removed in Sourcegraph 6.4.

`pgsql`, `codeinsights-db`, `searcher`, `symbols`, and `codeintel-db` have been changed from `Deployments` to `StatefulSets`. However, redeploying these services as StatefulSets should not affect your existing deployment as they are all configured to use the same PVCs.

### From Deployment to StatefulSet
Expand Down
2 changes: 0 additions & 2 deletions docs/admin/deploy/kubernetes/operations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -429,11 +429,9 @@ precise-code-intel-worker ClusterIP 10.72.11.102 <none> 3188/TC
prometheus ClusterIP 10.72.12.201 <none> 30090/TCP 25h
redis-cache ClusterIP 10.72.15.138 <none> 6379/TCP,9121/TCP 25h
redis-store ClusterIP 10.72.4.162 <none> 6379/TCP,9121/TCP 25h
repo-updater ClusterIP 10.72.11.176 <none> 3182/TCP,6060/TCP 25h
searcher ClusterIP None <none> 3181/TCP,6060/TCP 23h
sourcegraph-frontend ClusterIP 10.72.12.103 <none> 30080/TCP,6060/TCP 25h
sourcegraph-frontend-internal ClusterIP 10.72.9.155 <none> 80/TCP 25h
symbols ClusterIP None <none> 3184/TCP,6060/TCP 23h
syntect-server ClusterIP 10.72.14.49 <none> 9238/TCP,6060/TCP 25h
worker ClusterIP 10.72.7.72 <none> 3189/TCP,6060/TCP 25h
```
Expand Down
4 changes: 1 addition & 3 deletions docs/admin/deploy/kubernetes/scale.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,9 @@ For production environments, we recommend allocate resources based on your [inst
Here is a simplified list of the key parameters to tune when scaling Sourcegraph to many repositories:

- `sourcegraph-frontend` CPU/memory resource allocations
- `searcher` replica count
- `searcher` replica count and CPU/memory resource allocations
- `indexedSearch` replica count and CPU/memory resource allocations
- `gitserver` replica count
- `symbols` replica count and CPU/memory resource allocations
- `gitMaxConcurrentClones`, because `git clone` and `git fetch` operations are IO and CPU-intensive
- `repoListUpdateInterval` (in minutes), because each interval triggers `git fetch` operations for all repositories

Expand All @@ -38,7 +37,6 @@ Here is a simplified list of key parameters to tune when scaling Sourcegraph to
- `sourcegraph-frontend` CPU/memory resource allocations
- `searcher` CPU/memory resource allocations (allocate enough memory to hold all non-binary files in your repositories)
- `indexedSearch` CPU/memory resource allocations (for the `zoekt-indexserver` pod, allocate enough memory to hold all non-binary files in your largest repository; for the `zoekt-webserver` pod, allocate enough memory to hold ~2.7x the size of all non-binary files in your repositories)
- `symbols` CPU/memory resource allocations
- `gitserver` CPU/memory resource allocations (allocate enough memory to hold your Git packed bare repositories)

---
Expand Down
20 changes: 10 additions & 10 deletions docs/admin/deploy/kubernetes/troubleshoot.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -126,34 +126,34 @@ This error occurs because Envoy, the proxy used by Istio, [drops proxied trailer

In a service mesh like Istio, communication between services is secured using a feature called mutual Transport Layer Security (mTLS). mTLS relies on services communicating with each other using DNS names, rather than IP addresses, to identify the specific services or pods that the communication is intended for.

To illustrate this, consider the following examples of communication flows between the "frontend" component and the "symbols" component:
To illustrate this, consider the following examples of communication flows between the "frontend" component and the "searcher" component:

Example 1: Approved Communication Flow

1. Frontend sends a request to `http://symbol_pod_ip:3184`
1. Frontend sends a request to `http://searcher_pod_ip:3184`
2. The Envoy sidecar intercepts the request
3. Envoy looks up the upstream service using the DNS name "symbols"
4. Envoy forwards the request to the symbols component
3. Envoy looks up the upstream service using the DNS name "searcher"
4. Envoy forwards the request to the searcher component

Example 2: Disapproved Communication Flow

1. Frontend sends a request to `http://symbol_pod_ip:3184`
1. Frontend sends a request to `http://searcher_pod_ip:3184`
2. The Envoy sidecar intercepts the request
3. Envoy tries to look up the upstream service using the IP address `symbol_pod_ip`
3. Envoy tries to look up the upstream service using the IP address `searcher_pod_ip`
4. Envoy is unable to find the upstream service because it's an IP address not a DNS name
5. Envoy will not forward the request to the symbols component
5. Envoy will not forward the request to the searcher component

> NOTE: When using mTLS, communication between services must be made using the DNS names of the services, rather than their IP addresses. This is to ensure that the service mesh can properly identify and secure the communication.

To resolve this issue, the solution is to redeploy the frontend after specifying the service address for symbols by setting the SYMBOLS_URL environment variable in frontend.
To resolve this issue, the solution is to redeploy the frontend after specifying the service address for searcher by setting the SEARCHER_URL environment variable in frontend.

Please make sure the old frontend pods are removed.

```yaml
SYMBOLS_URL=http:symbols:3184
SEARCHER_URL=http:searcher:3184
```

> WARNING: **This option is recommended only for symbols with a single replica**. Enabling this option will negatively impact the performance of the symbols service when it has multiple replicas, as it will no longer be able to distribute requests by repository/commit.
> WARNING: **This option is recommended only for searcher with a single replica**. Enabling this option will negatively impact the performance of the searcher service when it has multiple replicas, as it will no longer be able to distribute requests by repository/commit.

#### Squirrel.LocalCodeIntel http status 502

Expand Down
Loading