This project is a small RAG-style pipeline that builds a Chroma vector store from short story .txt files and exposes a CLI to extract structured JSON information about a character using Mistral’s API via LangChain.
Pipeline Flow:
- Load raw stories
- Split into chunks
- Embed with
MistralAIEmbeddings - Store in Chroma
- Retrieve relevant chunks
- Call
ChatMistralAIwith a prompt that outputs a single JSON object.
git clone https://github.com/AdityaC784/DeepStack-Character-Info-Extractor.git
cd DeepStack-Character-Info-ExtractorWindows (cmd):
python -m venv .venv
.\.venv\Scripts\activatemacOS / Linux (bash / zsh):
python -m venv .venv
source .venv/bin/activateUsing a dedicated virtual environment keeps this project’s dependencies isolated from your global Python installation.
All dependencies are listed in requirements.txt.
pip install -r requirements.txtThis installs:
langchain,langchain-core,langchain-communitylangchain-mistralailangchain-chroma,langchain-text-splitterspython-dotenv
Both embedding_pipe.py and retrieval_pipe.py call load_dotenv(), so a .env file is expected at the project root.
1.Create and edit .env
2.Add your Mistral API key (replace ... with the real key):
MISTRAL_API_KEY="..."
Place .txt stories in:
./stories
Example project layout:
project-root/
cli.py
embedding_pipe.py
retrieval_pipe.py
requirements.txt
.env
stories/
a-mother.txt
another-story.txt
db/ # created automatically
Running build_vector_store reads .txt files, attaches metadata, splits them into chunks, and stores embeddings in Chroma.
All interaction occurs through the Typer CLI in cli.py.
python -m cli compute-embeddings --books-dir ./stories --db-path ./db
Arguments:
--books-dir(-b): directory of.txtstories--db-path(-d): directory where Chroma vector store is written
This:
- Ensures DB directory exists
- Loads story files
- Splits into chunks
- Generates embeddings via
MistralAIEmbeddings - Stores them in persistent Chroma
Run this anytime stories change.
Example:
python -m cli get-character-info "Devlin" --db-path ./db
This:
- Loads the Chroma store
- Retrieves top-k chunks
- Builds a second, more focused query
- Ensures the character name exists
- Calls
ChatMistralAIwith a custom prompt - Parses JSON output
Example output:
{
"name": "Devlin",
"storyTitle": "A Mother",
"summary": "...",
"relations": [
{"name": "Mr Kearney", "relation": "husband"}
],
"characterType": "main"
}-
Must run embeddings first
Ifget-character-infofails with missing vector store, runcompute-embeddings. -
Missing stories folder
Ensure./storiesexists and contains.txtfiles. -
Mistral authentication errors
Ensure:.envexistsMISTRAL_API_KEYis present- Virtual environment is activated
This README assumes basic knowledge of Python, the terminal, and virtual environments.
python -m cli compute-embeddings --books-dir ./stories --db-path ./db
Example: Change the Character Name in place of Devlin
python -m cli get-character-info "Devlin" --db-path ./db
If we want to save json file
python -m cli get-character-info "Devlin" --db-path ./db