Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0001: Service Config #2

Merged
merged 10 commits into from
Dec 27, 2023
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
.idea
tmp
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,6 @@ To start a new proposal, you need to:

| GEP | Title |
|---------------------------------|----------------|
| [0001](./proposals/0001-gep.md) | Service Config |


229 changes: 229 additions & 0 deletions proposals/0001-gep.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
---
MEP: 1
Title: Service Config
Discussion: N/A
Implementation: N/A
---

# Service Config

## Abstract

Glide is going to have many configs that control both functional and non-functional aspects of the service work.
This GEP defines how that config is going to be set, structured and maintained across the service lifetime.

## Motivation

Glide is a gateway that provides a bunch of configurations to set, manage, and maintain.
This ensures flexibility needed to meet various needs of our end users.

For example:

- **Telemetry Configs**: Log Output Format, Min Log Level, OTEL Configs
- **HTTP Server Configs**: Is enabled?, IP and Port to bind, max body size, CORS, etc.
- **SSL/TLS Configs:** Ciphers, Cert Paths, etc.
- **LLM Provider Configs**: API Keys, Default Parameters, Timeouts, Connection Pool Configs, etc.
- **Routing and Load Balancing Configs**: Balancing Strategy, Routing Weights, Provider Pool Definitions, etc.
- **Caching Configs**: Redis Connection Configs, TTLs, etc.
- **gRPC Server Configs**: Is enabled?, IP and Port to bind, etc.
- etc.

The configuration may be versioned via git to track and audit all changes.
At the same time, some configs are sensitive data (notably, LLM API Keys)
and it must not be provided directly in the config (otherwise, they would be leaked in git).

As any other config, users should be able to easily apply new changes to the config.
The API keys may need to be rotated from time to time.

Finally, the more configs we expose, the harder may become to understand the overall config
and find the right setting to adjust. So the config structure is an important thing here to eliminate this fatigue.

### Requirements

- R1: Config format must be readable
- R2: Config format must allow comments
- R3: Config format must be `git diff`-able and versioned
- R4: Config must be easy to understand and reason about
- R5: Config must allow to include sensitive values from environment variables
- R6: Config may allow to include sensitive values from other plain files
- R7: Config should be easy to change and operate on like "hitless" reloads

## Design
roma-glushko marked this conversation as resolved.
Show resolved Hide resolved

There are basically a few sources of configurations possible:
- via Glide CLI params & args
- via environment variables
- via a separate config file
- via a combination of CLI, environment variables, and/or config file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a config file accessed by the providers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config is going to be a singleton inited and loaded in the Gateway class and then passed itself or specific config like HTTP Server Config to the corresponding structs that needs it. Something like that. I did not mention that detail here as the contract is more important to decide on and discuss then to focus on how we get there under the hood.


Setting complex nested configs via CLI or environment variables is usually a tedious error-prone process.
So Glide will try to leverage the file as config approach as long as possible.

### The Config Source

Glide config is represented as a YAML file (R1, R2, R3).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Should be YAML.

YAML is a human-friendly version of JSON that supports comments and some useful magic like (including environment variables).
Go has good support for the YAML format.

### The Config Structure

The config structure uses Glide's internal logical structure that we are going to communicate in Glide docs (R4).
So learning more about how Glide works should give enough context to understand the config.
The opposite should be true as well, so knowing how the config looks will give some understanding about Glide's overall design.
This is a route that the OTEL Collector config takes.

Glide should allow you to pull values of configs from environment variables (see the config example below).
This helps to keep sensitive information (R5) out of the config file that avoids leakages.

### Config Ops

Glide should be able to automatically detect changes to the config (especially, our domain-related configs like routing) and
reload its internal representation together with the related state (R7).
In Kubernetes, configs are going to be represented as a configmap mounted into Glide pods.
On configmap change, the mounted file changes as well. So we need to watch for the file changes there.
Outside of Kubernetes, we may support config reloading on SIGHUP signals.

### Config Example

The most minimalistic possible config setup would look like this:

```yaml
routes:
language:
- name: openai-pool
providers:
- name: openai-boring
provider: openai
model: gpt-3.5-turbo
api_key: ${env:OPENAI_API_KEY}
default_params:
temperature: 0
```

Here is a rich config example to get better sense of the config structure:

```yaml
telemetry:
logging:
level: info
encoding: console # console, json
# other configs

api:
http:
listen_addr: 0.0.0.0:7685
max_body_size: "2Mi"
tls:
ca_path:
cert_path:
# other configs

grpc:
listen_addr: 0.0.0.0:7686
tls:
ca_path:
cert_path:
# other configs

routes:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkrueger12 our conversation around embeddings API in GEP0002 and ideas to support TTS & STT models have opened my eyes on the previously proposed strategy which is to have one uniform list of pools that support all API Glide providers.

I think that if you consider these two type of APIs, it's clear that the idea is not super viable. There is no reason to impose LLM lifecycle on something like STT/TTS. They are just too different. And we should take care of them considering their specific.

Language, embeddings, transcribers, synthesizers are just too different to treat them the same way and we might end up being in trouble trying to go that route.

So the new idea is to have type-specific model pools (e.g. language, embeddings, transcribers, synthesizers). This should aid separating logic needed (fallbacking, load balancing, unified request/response schemas, etc). for one type of models from logic that is appropriate in another.

The current config examples have reflected this approach.

Feel free to take a look and let know how does that feel 🙌

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this makes complete sense. Lets go that route

language:
- name: openai-pool
balancing: priority # round-robin, weighted-round-robin, priority, least-latency, priority, etc.
providers:
- name: openai-boring
provider: openai # anthropic, azureopenai, gemini, other providers we support
model: gpt-3.5-turbo
api_key: ${env:OPENAI_API_KEY}
default_params:
temperature: 0
- name: anthropic-boring
provider: anthropic
model: claude-2
apiKey: ${env:ANTHROPIC_API_KEY}
default_params: # set the default request params
temperature: 0

- name: latency-critical-pool
balancing: least-latency
providers:
- name: openai-boring
provider: openai
model: gpt-3.5-turbo
api_key: ${env:OPENAI_API_KEY}
timeout_ms: 200
- name: anthropic-boring
provider: anthropic
api_key: ${env:ANTHROPIC_API_KEY}
timeout_ms: 200

- name: anthropic
balancing: weighted-round-robin
providers:
- name: openai
provider: openai
api_key: ${env:OPENAI_API_KEY}
weight: 30
- name: anthropic
provider: anthropic
api_key: ${env:ANTHROPIC_API_KEY}
weight: 70

- name: ab-test1
balancing: weighted-round-robin
providers:
- name: openai-gpt4
provider: openai
model: gpt-4
api_key: ${env:OPENAI_API_KEY}
weight: 50
default_params:
temperature: 0.7

- name: openai-chatgpt
provider: openai
model: gpt-3.5-turbo
api_key: ${env:OPENAI_API_KEY}
weight: 50
default_params:
temperature: 0.7

# we want to use OpenAI only in this A/B test, but that's bad resiliency wise
# so add Anthropic just in case OpenAI is down
- name: anthropic-fallback
provider: anthropic
model: claude-2
api_key: ${env:ANTHROPIC_API_KEY}
weight: 0

# in case of speach-to-text models
# transcribers:
# - ...

# in case of text-to-speach models
# synthesizers:
# - ...

# in case of embeddings API
# embeddings:
# - ...

```

### References

- [OTEL Collector: Config Sample](https://opentelemetry.io/docs/collector/configuration/)
- [MLFlow AI Gateway: Configuration](https://mlflow.org/docs/latest/llms/gateway/index.html#ai-gateway-configuration)
- [etcd: Configuration Options](https://etcd.io/docs/v3.4/op-guide/configuration/)
- [etcd: Config Sample](https://github.com/etcd-io/etcd/blob/release-3.4/etcd.conf.yml.sample)
- [Portkey: Config Sample](https://docs.portkey.ai/portkey-docs/portkey-features/ai-gateway/load-balancing)
- [HashiCorp Vault: Configurations](https://developer.hashicorp.com/vault/docs/configuration)
- [Prometheus AlertManager: Config](https://prometheus.io/docs/alerting/latest/configuration/)
- [Kong: Config](https://github.com/Kong/kong/blob/master/kong.conf.default)

## Alternatives Considered

- **Using JSON as config format**. We could still use JSON as secondary config format (OTEL Collector does that), but it doesn't developer-friendly enough to be our primary config format (fails at R1, R2, and it's not too strong with R3)

## Future Work

- If our users want that, we could consider supporting more than one config formats like JSON or TOML.
2 changes: 1 addition & 1 deletion template.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Implementation: Link

### Requirements

[TBU, list all key requirements to keep in mind]
[TBU, list all key requirements to keep in mind, use https://datatracker.ietf.org/doc/html/rfc2119 verbs]

## Design

Expand Down