Skip to content
2 changes: 2 additions & 0 deletions website/docs/deployment/architectures/cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ pagination_next: null

A full cluster-based deployment leveraging **Spice.ai Enterprise**, which includes advanced services and integrations for Kubernetes. This method is ideal for organizations requiring large-scale or complex deployments, including specialized clustering capabilities.

<img width="740" alt="cluster" src="https://github.com/user-attachments/assets/643e0a5c-6745-40c0-8695-0955c795179b" />

**Benefits**

- Provides **enterprise-grade features**: advanced security, monitoring, and support.
Expand Down
2 changes: 2 additions & 0 deletions website/docs/deployment/architectures/hosted.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ pagination_next: null

The Spice Runtime is deployed on a fully managed service within the Spice Cloud Platform, minimizing the operational burden of managing clusters, upgrades, and infrastructure.

<img width="740" alt="hosted" src="https://github.com/user-attachments/assets/a985527b-3481-40f4-a689-f784c893b314" />

**Benefits**

- Reduced overhead for deployment, scaling, and maintenance.
Expand Down
2 changes: 2 additions & 0 deletions website/docs/deployment/architectures/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ pagination_prev: null
pagination_next: null
---

<img width="740" alt="Spice ai OSS as a data and AI compute engine over disaggregated storage" src="https://github.com/user-attachments/assets/da3c0e90-4c48-48ca-b4bd-72eda816cfec" />

- [Sidecar Deployment](sidecar.md)
- [Microservice Deployment (Single or Multiple Replicas)](microservice.md)
- [Tiered Deployment](tiered.md)
Expand Down
2 changes: 2 additions & 0 deletions website/docs/deployment/architectures/microservice.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ pagination_next: null

The Spice Runtime operates as an independent microservice. Multiple replicas may be deployed behind a load balancer to achieve high availability and handle spikes in demand.

<img width="740" alt="microservice" src="https://github.com/user-attachments/assets/b46f050b-e500-4d53-b354-24f0ab30cad3" />

**Benefits**

- Loose coupling between the application and the Spice Runtime.
Expand Down
2 changes: 2 additions & 0 deletions website/docs/deployment/architectures/sharded.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ pagination_next: null

The Spice Runtime instances can be sharded based on specific criteria, such as by customer, state, or other logical partitions. Each shard operates independently, with a 1:N Application to Spice instances ratio.

<img width="740" alt="sharded" src="https://github.com/user-attachments/assets/5730d108-6d22-4ea4-8c14-8e87ad6d0079" />

**Benefits**

- Helps distribute load across multiple instances, improving performance and scalability.
Expand Down
2 changes: 2 additions & 0 deletions website/docs/deployment/architectures/sidecar.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ pagination_next: null

Run the Spice Runtime in a separate container or process on the same machine as the main application. For example, in Kubernetes as a [Sidecar Container](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/). This approach minimizes communication overhead as requests to the Spice Runtime are transported over local loopback.

<img width="740" alt="sidecar" src="https://github.com/user-attachments/assets/716f7c23-1939-4947-85f5-b0ee2bbd63fc" />

**Benefits**

- Low-latency communication between the application and the Spice Runtime.
Expand Down
2 changes: 2 additions & 0 deletions website/docs/deployment/architectures/tiered.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ pagination_next: null

A hybrid approach combining sidecar deployments for performance-critical tasks and a shared microservice for batch processing or less time-sensitive workloads.

<img width="740" alt="tiered" src="https://github.com/user-attachments/assets/e602bad4-bd0d-4069-bc91-5b5678a10710" />

**Benefits**

- Real-time responsiveness where needed (sidecar).
Expand Down
2 changes: 1 addition & 1 deletion website/docs/features/large-language-models/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ tags:

Spice provides a high-performance, OpenAI API-compatible AI Gateway optimized for managing and scaling large language models (LLMs). It offers tools for Enterprise Retrieval-Augmented Generation (RAG), such as SQL query across federated datasets and an advanced search feature (see [Search](/docs/features/search)).

![Spice.ai Large-Language-Model (LLM) AI-Gateway](/img/features/ai-gateway.png).
<img width="740" alt="ai-gateway" src="https://github.com/user-attachments/assets/4a45cd62-ebfc-4a73-956d-661f1ab44cd8" />

Spice supports **full OpenTelemetry observability**, helping with detailed tracking of model tool use, recursion, data flows and requests for full transparency and easier debugging.

Expand Down
2 changes: 1 addition & 1 deletion website/docs/features/observability/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ pagination_next: null

Spice can be monitored using the [Spice Prometheus-compatible Metrics Endpoint](https://prometheus.io/docs/instrumenting/exposition_formats/#basic-info).

![Spice.ai Open Source Monitoring & Observability](/img/features/observability.png)
<img width="740" alt="observability" src="https://github.com/user-attachments/assets/2468e3e7-4fb4-4a74-8b26-45eeeee90310" />

Monitoring clients configuration:

Expand Down