The Resume Classifier API is a backend service designed to classify resumes (CVs) using machine learning models. It provides an API endpoint for uploading resumes as raw text or files (PDF), processes them to extract relevant content, and returns predicted job categories along with confidence scores.
This project is built using:
- ⚙️ FastAPI for the API layer
- 🧠 Langchain with Gemini LLM for resume analyze
- 📦 MongoDB for storing extracted resumes, classification results, and submission metadata
- ✅ Pydantic for input validation
- 🧪 Pytest for API and service-level testing
- 🐳 Docker for containerized development and deployment
The goal is to create a modular, scalable, and production-ready API that demonstrates backend and ML engineering skills while solving a realistic use case.
Users can submit their resume via POSTrequest to the /upload_resume/ endpoint.
The system performs the following steps:
- Validates file MIME type and it content
- Extracts file content using the
text_extractor_service.py - Analyzes the resume against the job description using Gemini via Langchain
- Return a structured output:
- The classification (Match, Partial Match, No Match)
- A score between 0 and 1
- An AI generated tip with a practical advice of how to improve the resume for the job
- Researched:
- How to build RESTful APIs using FastAPI
- Pydantic models for request validation and OpenAPI documentation
- Connecting MongoDB with FastAPI using the async driver
motor
- Started project scaffolding:
- Created initial folder structure:
app/modelsfor Pydantic schemasapp/dbfor MongoDB connection logicapp/servicesfor ML and file parsing logicapp/apifor route definitions
- Created initial folder structure:
- Planned initial API route
/upload_resume/:- Accepts uploaded file (
UploadFile) - Extracts text and classifies using a discriminative model
- Stores classification result in MongoDB
- Accepts uploaded file (
- Defined the core
/upload_resumeroute logic inroutes.py: - Began implementing the Text Extraction module in
/services/text_extractor.py- Implemented specialized subclasses:
PdfTextExtractor()MSWordTextExtractor()PlainTextExtractor()
- I've also created a centralized extractor selection via
call_text_extractor()
- Implemented specialized subclasses:
- Finished
text_extractormodule and created some tests for it at./tests/text_extractor.py - I'm considering just using a LLM API for the classifier (although IDK if it would be the best decision)
- More Project structure organizing and renamed some folders
- Added a
Makefilefor managing development commands. - Created a
run.shscript to start both the API and MongoDB via Docker Compose.
- Added a
- MongoDB integration:
- Created a MongoDB conncetion handler
DBConnectionHandler - Built a
ResumesRepositorywith a workinginsert_documentmethod for persistence.
- Created a MongoDB conncetion handler
- Created a
docker-compose.ymlfor MongoDB startup - Integrated
lifespancontext manager for startup and shutdown events:- On startup: establish DB connection
- On shutdown: closes DB connection
(I think it's a good one, i don't have anybody to ask for lol)
- Implemented Resume Analyzer with LLM:
- Introduced the use of Langchain with Gemini to process resume and job description inputs using Natural Language
- Created a
llm_classifier.pyandconfig.pyunderapp/services/AI/to encapsulate prompt configuration and model interaction logic.
- Integrated AI analysis directly into the `/upload_resume/ route.
- Schema Updates:
- Added
job_descriptionandai_tipto theResumeUploadRequestmodel. - Updated
ClassificationDictwith AI-generated suggestions for candidates.
- Added
- Cleanup and Refactor:
- Removed obsolete
classifier.py. - Updated
requirements.txtto reflect Langchain and Gemini dependencies.
- Removed obsolete
