A FastAPI backend prototype built for a bachelor's thesis, comparing RAG-augmented generation against a baseline (no retrieval) for generating DIY upcycling solutions.
The system retrieves domain-specific upcycling factsheets via vector similarity search and injects them as context into the Claude prompt. The research question: does factsheet-based RAG improve solution quality, and how does input specificity affect this?
User Input
│
├─── no-rag ──► System Prompt + User Input ──► Claude ──► Response
│
└─── rag ─────► Embed Input ──► pgvector Search ──► Top-5 Factsheets
└──► System Prompt + Factsheets + User Input ──► Claude ──► Response
Stack:
- FastAPI — REST API with OpenAI-compatible
/v1/chat/completionsendpoint - PostgreSQL + pgvector — vector database for factsheet storage and cosine similarity search
- Ollama (
nomic-embed-text-v2-moe) — local embedding model (768 dimensions) - Claude API (Anthropic) — LLM for response generation
- LibreChat — optional chat UI (connects to the backend as a custom API)
git clone https://github.com/robin-mommsen/thesis-prototype
cd thesis-prototypecp .env.example .envEdit .env and fill in your values:
POSTGRES_USER=raguser
POSTGRES_PASSWORD=your_secure_password
POSTGRES_DB=ragdb
CLAUDE_API_KEY=your_anthropic_api_key_here
CLAUDE_MODEL=claude-sonnet-4-6
RAG_FACTSHEET_LIMIT=5
LIBRECHAT_JWT_SECRET=your_random_secret
LIBRECHAT_JWT_REFRESH_SECRET=your_random_refresh_secret
LIBRECHAT_CREDS_KEY=your_64_char_hex_string
LIBRECHAT_CREDS_IV=your_32_char_hex_stringThe LibreChat secrets are required — the container will not start without them. Generate them with Python:
# JWT_SECRET and JWT_REFRESH_SECRET (any random string, min. 32 chars)
python -c "import secrets; print(secrets.token_hex(32))"
# CREDS_KEY (exactly 64 hex chars)
python -c "import secrets; print(secrets.token_hex(32))"
# CREDS_IV (exactly 32 hex chars)
python -c "import secrets; print(secrets.token_hex(16))"Run each command once and paste the output into the corresponding variable in .env.
docker compose up --buildOn first start, Docker will:
- Start PostgreSQL with the pgvector extension
- Run the DB initialization SQL scripts (
db/) - Start Ollama and pull the embedding model (
nomic-embed-text-v2-moe:latest) — this may take a few minutes - Start the FastAPI backend, which seeds the factsheet database on startup
- Start MongoDB and LibreChat (optional UI)
Once all services are healthy, the API is available at http://localhost:8080.
RAG mode (retrieves relevant factsheets):
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "rag",
"messages": [{"role": "user", "content": "Ich habe zwei alte Europaletten. Was kann ich daraus bauen?"}]
}'Baseline mode (no retrieval):
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "no-rag",
"messages": [{"role": "user", "content": "Ich habe zwei alte Europaletten. Was kann ich daraus bauen?"}]
}'The model field selects the mode: "rag" or "no-rag".
Streaming is also supported:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "rag", "stream": true, "messages": [{"role": "user", "content": "alte Holzbretter"}]}'# List all factsheets
GET http://localhost:8080/factsheets
# Get a single factsheet
GET http://localhost:8080/factsheet/{id}
# Add a new factsheet
POST http://localhost:8080/factsheet
# Update a factsheet
PUT http://localhost:8080/factsheet/{id}
# Delete a factsheet
DELETE http://localhost:8080/factsheet/{id}GET http://localhost:8080/v1/modelsReturns rag and no-rag as available model IDs.
LibreChat is included as a browser-based chat frontend. After docker compose up, it is available at http://localhost:3080.
To connect it to the RAG backend:
- Open LibreChat in your browser and create a local account
- The backend is pre-configured via
librechat.yamlas a custom API endpoint - Select
ragorno-ragas the model in the UI
The experiment runs all 24 test prompts (experiments/test_inputs.json) against both modes and saves results to experiments/results/.
Install Python dependencies (outside Docker):
pip install -r requirements.txtMake sure the Docker stack is running, then:
python experiments/run_experiment.pyThis generates a timestamped JSON file in experiments/results/, e.g. experiment_20240115_120000.json.
Failed generations are automatically retried (up to 10 attempts per prompt/mode).
python experiments/evaluate.py experiments/results/experiment_<timestamp>.jsonThis produces:
experiment_<timestamp>.xlsx— master file with the anonymization mapping (stays with the researcher)experiment_<timestamp>_rater_1.xlsxto_rater_4.xlsx— anonymized scoring sheets for human raters, each in a different randomized order
Raters score each response across 4 criteria:
- Task fit
- Material integration
- Feasibility
- Creativity
After all raters have filled in their sheets:
python experiments/aggregate_rater_scores.py \
--master experiment_<timestamp>.xlsx \
experiment_<timestamp>_rater_1.xlsx \
experiment_<timestamp>_rater_2.xlsx \
experiment_<timestamp>_rater_3.xlsx \
experiment_<timestamp>_rater_4.xlsxThis produces aggregated_rater_scores.xlsx with per-response means across raters, condition means (RAG vs. baseline), Krippendorff's alpha per criterion, and Wilcoxon signed-rank test results comparing RAG against baseline per criterion.
| Variable | Description | Default |
|---|---|---|
POSTGRES_USER |
PostgreSQL username | — |
POSTGRES_PASSWORD |
PostgreSQL password | — |
POSTGRES_DB |
PostgreSQL database name | — |
CLAUDE_API_KEY |
Anthropic API key | — |
CLAUDE_MODEL |
Claude model ID | claude-sonnet-4-6 |
RAG_FACTSHEET_LIMIT |
Number of factsheets retrieved per query | 5 |
LIBRECHAT_JWT_SECRET |
LibreChat JWT signing secret | — |
LIBRECHAT_JWT_REFRESH_SECRET |
LibreChat JWT refresh secret | — |
LIBRECHAT_CREDS_KEY |
LibreChat credentials encryption key (64 hex chars) | — |
LIBRECHAT_CREDS_IV |
LibreChat credentials encryption IV (32 hex chars) | — |
The embedding model (nomic-embed-text-v2-moe:latest) has no versioned tags on the Ollama registry. To verify you are using the same model version as the original experiment, check the model hash after first startup:
docker exec upcycling-rag-ollama ollama listThe model hash used for the thesis experiment: ff9c2f10ef5e
.
├── app/
│ ├── main.py # FastAPI app, startup
│ ├── routers/
│ │ ├── chat.py # /v1/chat/completions endpoint
│ │ ├── factsheet.py # Factsheet CRUD
│ │ └── models.py # /v1/models
│ ├── services/
│ │ ├── rag_service.py # Core RAG and baseline logic, system prompt
│ │ ├── claude_service.py # Anthropic API wrapper
│ │ ├── embedding_service.py # Ollama embeddings
│ │ └── data_initializer.py # Seeds factsheets on startup
│ ├── models/
│ │ ├── upcycling_factsheet.py
│ │ └── query_log.py
│ └── config/
│ ├── config.py
│ └── database.py
├── db/ # PostgreSQL init scripts
├── experiments/ # Experiment runner and evaluation scripts
│ ├── run_experiment.py
│ ├── evaluate.py
│ ├── aggregate_rater_scores.py
│ └── test_inputs.json # 24 prompts (8 vague, 8 medium, 8 concrete)
├── docker-compose.yml
├── Dockerfile
└── .env.example