My implementation of the Langchain Services is based on the AIO Project Langchain Services
This repository uses data from the Ciriculum Vitae (CV) dataset. You can download the full dataset and other necessary files using the following commands:
data_source/generative_ai/download.shNote that the dataset contains over 3000 CVs, which could be resulted in incorrect parsing results. Our repository also provides a subset of the dataset with about 50 CVs for testing purposes.
Python version: 3.11.9
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
# Start the server
uvicorn src.app:app --host "0.0.0.0" --port 5000 --reloadThis will ask for the Google API key, which you can get from the Google Cloud Console. After providing the key, the server will start and you can access it at http://localhost:5000/docs.
docker compose up -dTurn off service
docker compose -f downThe service ingests CV PDFs from local files or Google Drive links, extracts information using an LLM and stores embeddings in a FAISS vector store. The workflow is:
ingestion -> extraction -> storage -> search.
- src/rag/file_loader.py – download/load and split documents.
- src/rag/cv_extractor.py – prompt chain for CV parsing.
- src/rag/vectorstore.py – persistent FAISS store with metadata support.
- src/app.py – FastAPI server exposing upload and search endpoints.
Upload a CV from a local file:
curl -X POST -F "file=@resume.pdf" http://localhost:5000/upload_cvSearch for candidates:
curl -X POST -H "Content-Type: application/json" \
-d '{"query": "python developer"}' \
http://localhost:5000/search_candidatesAfter the service is running, you can deploy it using Langserve in the following url:
https://localhost:5000/langserve/chat/playground
https://localhost:5000/langserve/generative_ai/playground
You can also deploy the service using Streamlit with the following command:
streamlit run src/streamlit.py