TextEmbed is a high-throughput, low-latency REST API designed for serving vector embeddings. It supports a wide range of sentence-transformer models and frameworks, making it suitable for various applications in natural language processing.
- High Throughput & Low Latency: Designed to handle a large number of requests efficiently.
- Flexible Model Support: Works with various sentence-transformer models.
- Scalable: Easily integrates into larger systems and scales with demand.
- Batch Processing: Supports batch processing for better and faster inference.
- OpenAI Compatible REST API Endpoint: Provides an OpenAI compatible REST API endpoint.
- Single Line Command Deployment: Deploy multiple models via a single command for efficient deployment.
- Support for Embedding Formats: Supports binary, float16, and float32 embeddings formats for faster retrieval.
Ensure you have Python 3.10 or higher installed. You will also need to install the required dependencies.
-
Install the required dependencies:
pip install -U textembed
-
Start the TextEmbed server with your desired models:
python3 -m textembed.server --models <Model1>, <Model2> --port <Port>
Replace
<Model1>
and<Model2>
with the names of the models you want to use, separated by commas. Replace<Port>
with the port number on which you want to run the server.
For more information about the Docker deployment and configuration, please refer to the documentation setup.md.