forked from supabase/supabase
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request supabase#15084 from supabase/docs/ai-compute-table
- Loading branch information
Showing
4 changed files
with
80 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
import Layout from '~/layouts/DefaultGuideLayout' | ||
|
||
export const meta = { | ||
id: 'ai-choosing-instance-type', | ||
title: 'Choosing Instance Type', | ||
description: 'Choosing the right instance type for your workload.', | ||
subtitle: 'Choosing the right instance type for your workload.', | ||
sidebar_label: 'Choosing Instance Type', | ||
} | ||
|
||
This guide will help you choose the right instance type for your workload. We'll provide general guidance, as it is impossible to provide specific instructions for every possible use case. The goal is to give you a starting point from which you can make your own benchmarks and optimizations. | ||
|
||
For more information about engineering at scale, see our [Engineering for Scale](/docs/guides/ai/engineering-for-scale) guide. | ||
|
||
## Simple workloads | ||
|
||
We've run a set of benchmarks using the [gist-960-angular](http://corpus-texmex.irisa.fr/) dataset. This dataset contains 1,000,000 embeddings for images, with each embedding being 960 dimensions. | ||
|
||
We used [Vecs](https://github.com/supabase/vecs) to create a collection, upload the embeddings to a single table, and create an `inner-product` index for the embedding column. We then ran a series of queries to measure the performance of different instance types: | ||
|
||
### Results | ||
|
||
The number of vectors in `gist-960-angular` was cut to fit the instance size. | ||
|
||
| Plan | CPU | Memory | Vectors | RPS | Latency Mean | Latency p95 | CPU Usage - Max % | Memory Usage - Max | | ||
| ------ | ------ | ------ | ------- | --- | ------------ | ----------- | ----------------- | ------------------ | | ||
| Free | 2-core | 1 GB | 30,000 | 75 | 0.065 sec | 0.088 sec | 90% | 1 GB + 100 Mb Swap | | ||
| Small | 2-core | 2 GB | 100,000 | 78 | 0.064 sec | 0.092 sec | 80% | 1.8 GB | | ||
| Medium | 2-core | 4 GB | 250,000 | 58 | 0.085 sec | 0.129 sec | 90% | 3.2 GB | | ||
| Large | 2-core | 8 GB | 500,000 | 55 | 0.088 sec | 0.140 sec | 90% | 5 GB | | ||
|
||
The full number of vectors in `gist-960-angular` dataset - 1,000,000. | ||
|
||
| Plan | CPU | Memory | Vectors | RPS | Latency Mean | Latency p95 | CPU Usage - Max % | Memory Usage - Max | | ||
| ---- | ------- | ------ | --------- | ---- | ------------ | ----------- | ----------------- | ------------------ | | ||
| XL | 4-core | 16 GB | 1,000,000 | 110 | 0.046 sec | 0.070 sec | 45% | 14 GB | | ||
| 2XL | 8-core | 32 GB | 1,000,000 | 235 | 0.083 sec | 0.136 sec | 33% | 10 GB | | ||
| 4XL | 16-core | 64 GB | 1,000,000 | 420 | 0.071 sec | 0.106 sec | 45% | 11 GB | | ||
| 8XL | 32-core | 128 GB | 1,000,000 | 815 | 0.072 sec | 0.106 sec | 75% | 13 GB | | ||
| 12XL | 48-core | 192 GB | 1,000,000 | 1150 | 0.052 sec | 0.078 sec | 70% | 15.5 GB | | ||
| 16XL | 64-core | 256 GB | 1,000,000 | 1345 | 0.072 sec | 0.106 sec | 60% | 17.5 GB | | ||
|
||
- Lists set to `Number of vectors / 1000` | ||
- Probes set to `10` | ||
|
||
<Admonition type="note"> | ||
|
||
It is possible to upload more than 1,000,000 vectors to a single table if Memory allows it (for example, 2XL instance and higher). But it will affect the performance of the queries: RPS will be lower, and latency will be higher. Scaling should be almost linear, but it is recommended to benchmark your workload to find the optimal number of vectors per table and per instance. | ||
|
||
</Admonition> | ||
|
||
## Methodology | ||
|
||
We follow techniques outlined in the [ANN Benchmarks](https://github.com/erikbern/ann-benchmarks) methodology. A Python test runner is responsible for uploading the data, creating the index, and running the queries. The pgvector engine is implemented using [vecs](https://github.com/supabase/vecs), a Python client for pgvector. | ||
|
||
<div> | ||
<img | ||
alt="multi database" | ||
className="dark:hidden" | ||
src="/docs/img/ai/instance-type/vecs-benchmark--light.png" | ||
/> | ||
<img | ||
alt="multi database" | ||
className="hidden dark:block" | ||
src="/docs/img/ai/instance-type/vecs-benchmark--dark.png" | ||
/> | ||
</div> | ||
|
||
Each test is run for a minimum of 30-40 minutes. They include a series of experiments executed at different concurrency levels to measure the engine's performance under different load types. The results are then averaged. | ||
|
||
As a general recommendation, we suggest using a concurrency level of 5 or more for most workloads and 30 or more for high-load workloads. | ||
|
||
## Future benchmarks | ||
|
||
We'll continue to add more benchmarks on datasets consisting of different vector dimensions, number of `lists` in the index, and number of `probes` in the index. Stay tuned for more information about how it may affect the performance and precision of your queries. | ||
|
||
export const Page = ({ children }) => <Layout meta={meta} children={children} /> | ||
|
||
export default Page |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.