A Retrieval-Augmented Generation (RAG) service that answers cosmetics & beauty questions in Persian.
It combines semantic search (FAISS) + keyword search (SQLite FTS5) with DeepSeek LLM via OpenRouter to produce grounded, concise answers.
- Goal: a conversational AI that answers beauty/product questions using your curated SQLite data and an LLM, with RAG ensuring answers are grounded in real entries.
- Stack: FastAPI (API), SQLite + FTS5 (data + keyword search), FAISS (vector search), OpenRouter LLM (DeepSeek), hybrid retrieval, redaction, rate-limit, basic auth.
- How it works: user query → normalize → hybrid retrieval (FAISS ∪ FTS5) → prompt (context-aware, Farsi policy) → DeepSeek via OpenRouter → reply.
- Hybrid Retrieval
- FAISS for semantic retrieval over product
name + description. - FTS5 for fast keyword matching (BM25).
- FAISS for semantic retrieval over product
- LLM Integration
- DeepSeek (OpenRouter) with ASCII-safe headers and controlled prompts.
- API Endpoints
POST /simulate_dm— main chat endpoint (returns{"reply": "..."}).GET /health— readiness (env, DB/index presence).GET /metrics— minimal counters (requests, errors, fallback).POST /feedback— collect ratings/notes per message.
- Security
- API Key auth (optional), rate-limit (token bucket), safe logging, redaction/allow-list to LLM.
- Ops / Debug
- Optional debug routes (
/debug/retrieve,/debug/prompt) to inspect retrieval/prompt.
- Optional debug routes (
- FastAPI Backend: validates input, orchestrates retrieval, builds prompt, calls LLM.
- SQLite DB + FTS5: products storage and lexical search.
- FAISS Index: normalized embeddings for semantic search.
- LLM (OpenRouter DeepSeek): generates grounded Farsi responses.
git clone <REPO_URL>
cd <project_directory>
python3 -m venv .venvs
source .venvs/bin/activate
pip install --upgrade pip
pip install -r requirements.txtCreate .env (see example below).
You can also verify visually:
Example .env
APP_ENV=prod
APP_PORT=8000
LOG_LEVEL=INFO
# Database / Index
DB_PATH=rag-instabot/db/app_data.sqlite
INDEX_PATH=rag-instabot/data/faiss_index
EMBED_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
# Retrieval thresholds
MIN_VECTOR_SCORE=0.30
MAX_CTX_ITEMS=4
MAX_DESC_CHARS=220
# Security
REQUIRE_API_KEY=true
API_KEY=change-me
# Rate limit
RL_BUCKET_SIZE=60
RL_REFILL_PER_SEC=1.0
RL_IDENTITY_HEADER=X-API-Key
# LLM (OpenRouter - DeepSeek)
LLM_PROVIDER=openrouter
LLM_API_BASE=https://openrouter.ai/api/v1
LLM_MODEL=deepseek/deepseek-chat-v3.1:free
LLM_API_KEY=sk-xxxxxxxx
# (Optional) OpenRouter analytics headers - ASCII only
OR_HTTP_REFERER=http://localhost:8000/
OR_X_TITLE=RAG-InstabotOpenRouter key creation
- Proceed to Official OpenRouter web site: openrouter.ai
- Make an account and login
- Navigate to key / APIkey section and generate a key (note: you will have one key and can access all models via that same key)
- Activate the key in your open router panel before run(note: some times will get disabled by default)
- FTS5 (if not already):
bash scripts/setup_fts.sh- FAISS vectors (CPU-only safe):
python scripts/build_vectors.pyuvicorn app.main:app --host 0.0.0.0 --port 8000Open http://127.0.0.1:8000/ (built-in web chat).
APP_ENV,APP_PORT,LOG_LEVEL— app mode/port/logging.DB_PATH,INDEX_PATH,EMBED_MODEL— data sources & embeddings.MIN_VECTOR_SCORE,MAX_CTX_ITEMS,MAX_DESC_CHARS— quality/safety knobs for retrieval & prompt size.REQUIRE_API_KEY,API_KEY— access control gates.RL_*— token bucket rate limit parameters.LLM_*— provider base URL, model name, API key.OR_*— optional OpenRouter analytics headers (ASCII-only).
Request
{
"sender_id": "u1",
"message_id": "m1",
"text": "سرم ویتامین C برای پوست حساس"
}Response
{
"reply": "..."
}Checks env + DB/index presence.
{
"status": "ok",
"env": "prod",
"llm_provider": "openrouter",
"llm_model": "deepseek/deepseek-chat-v3.1:free",
"db_exists": true,
"index_exists": true
}Text counters:
requests_total 10
fallback_total 0
errors_total 0
{
"message_id": "m1",
"rating": "good",
"note": "Helpful"
}{"ok": true}- Input → normalize (Persian variants, digits).
- Retrieval → FAISS (semantic) + FTS5 (keyword) → merge + threshold → top-K context.
- Prompt → build Farsi, anti-hallucination policy, redaction/allow-list (name/description/price only).
- LLM → OpenRouter DeepSeek with ASCII-safe headers.
- Response → concise Farsi answer; if no evidence → “اطلاعات کافی در دیتابیس موجود نیست.”
-
Manual
-
Web UI:
http://127.0.0.1:8000/ -
curl:
curl -X POST http://127.0.0.1:8000/simulate_dm \ -H "Content-Type: application/json" -H "X-API-Key: change-me" \ -d '{"sender_id":"u1","message_id":"m1","text":"کرم ضدآفتاب مناسب پوست چرب"}'
-
-
Edge cases (visual reference)
-
Debug (optional)
/debug/retrieve?q=...— inspect FAISS/FTS hits./debug/prompt?q=...— preview final prompt (trimmed).
- Auth: optional API key (
REQUIRE_API_KEY=true). - Rate-limit: in-process token bucket (per IP or
X-API-Key). - Redaction/Allow-list: only
name/description/pricego to LLM; strip URLs/markup; truncate lengths. - Headers: enforce ASCII-only for OpenRouter; avoid Unicode header issues.
- Logging: no secrets or full prompts in logs; only minimal structured fields (latency, hits, provider).
- Failover: retries + provider chain (OpenRouter → HF) → safe fallback string.
This project is licensed under the Apache License 2.0.
You may not use the files in this repository except in compliance with the License. You may obtain a copy of the License at:
LICENSE(included in this repository)- https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Please read the full technical documentation for architecture details, security policies, and operational guidance:









