A Production-Ready, Scalable RAG-powered LLM-based Context-Aware QA App
-
Updated
Jan 27, 2025 - Python
A Production-Ready, Scalable RAG-powered LLM-based Context-Aware QA App
Create Context-Aware Q&A Interfaces from Your Own Data with LLMs and Vector Embeddings - Includes an automated embedding pipeline and a model-powered Q&A interface
Production-grade scalable embedding API server using SentenceTransformers "intfloat/multilingual-e5-base" model, powered by Ray Serve for multi-GPU orchestration, with Prometheus & Grafana monitoring.
A drop-in replacement of fastapi to enable scalable and fault tolerant deployments with ray serve
Ray Serve backend for Arabic Speech Recognition
contains the basic structure that a model serving application should have. This implementation is based on the Ray Serve framework.
Add a description, image, and links to the ray-serve topic page so that developers can more easily learn about it.
To associate your repository with the ray-serve topic, visit your repo's landing page and select "manage topics."