Skip to content

A basic pipeline of CV management system using vectordatabase. Actually, I have another refined project to increase the accuracy, I will release soon

chisphung/CV_Extractor_Langchain

Repository files navigation

Langchain Services

1. Setup

1.1. Donwload data

My implementation of the Langchain Services is based on the AIO Project Langchain Services

This repository uses data from the Ciriculum Vitae (CV) dataset. You can download the full dataset and other necessary files using the following commands:

data_source/generative_ai/download.sh

Note that the dataset contains over 3000 CVs, which could be resulted in incorrect parsing results. Our repository also provides a subset of the dataset with about 50 CVs for testing purposes.

1.2. Run service in local

Python version: 3.11.9

python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
# Start the server
uvicorn src.app:app --host "0.0.0.0" --port 5000 --reload

This will ask for the Google API key, which you can get from the Google Cloud Console. After providing the key, the server will start and you can access it at http://localhost:5000/docs.

1.3 Run service in docker

docker compose up -d

Turn off service

docker compose -f down

2. Architecture

The service ingests CV PDFs from local files or Google Drive links, extracts information using an LLM and stores embeddings in a FAISS vector store. The workflow is:

ingestion -> extraction -> storage -> search.

Modules

  • src/rag/file_loader.py – download/load and split documents.
  • src/rag/cv_extractor.py – prompt chain for CV parsing.
  • src/rag/vectorstore.py – persistent FAISS store with metadata support.
  • src/app.py – FastAPI server exposing upload and search endpoints.

API Usage

Upload a CV from a local file:

curl -X POST -F "file=@resume.pdf" http://localhost:5000/upload_cv

Search for candidates:

curl -X POST -H "Content-Type: application/json" \
     -d '{"query": "python developer"}' \
     http://localhost:5000/search_candidates

3. Deployment

Langserve

After the service is running, you can deploy it using Langserve in the following url:

https://localhost:5000/langserve/chat/playground
https://localhost:5000/langserve/generative_ai/playground

Streamlit

You can also deploy the service using Streamlit with the following command:

streamlit run src/streamlit.py

About

A basic pipeline of CV management system using vectordatabase. Actually, I have another refined project to increase the accuracy, I will release soon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published