This project utilizes the GPT Cache library to create a cacheable GPT-3 or GPT-4 response service with FastAPI, OpenAI API, Faiss, and SQLite. The GPT Cache library optimizes performance and reduces query costs by caching and reusing previous responses for similar queries. This implementation focuses on improving code readability, modularity, and application performance using local file caches.
- Leverage the GPT Cache library to optimize performance and reduce query costs
- Use FastAPI, OpenAI API, Faiss, and SQLite for a cacheable GPT-3 or GPT-4 response service
- Improve code readability and modularity by adopting best software development practices
- Use local file caches to enhance application performance
- Launch a beta version of the application with Docker support
- Easily deploy and manage the service using docker-compose
- Python 3.8 or newer
- Docker and docker-compose installed
- Clone this repository:
git clone https://github.com/SelectCode/GPTCacheService.git
cd gpt-cacheable-response-service
- Create a
.env
file in the project root directory with the following content:
OPENAI_API_KEY=your_openai_api_key_here
Replace your_openai_api_key_here
with your actual OpenAI API key.
- Build and run the project using Docker and docker-compose:
docker-compose build
docker-compose up -d
- Visit
http://localhost:8000
in your browser to access the FastAPI documentation and interact with the API.
There are two main endpoints for query requests:
-
GET /query?prompt=<your_prompt_here>
-
POST /query
{ "prompt": "your prompt here" }
Additionally, the root endpoint returns a simple greeting message:
GET /
- Further expand API functionality by adding more endpoints and options
- Add support for other GPT models, not just GPT-3 and GPT-4
- Improve cache performance by exploring additional storage options, such as Redis or other distributed databases
This project is built using the GPT Cache library as the main component for optimizing GPT queries and managing cached responses.