Skip to content

A high-performance REST API for Resume Analysis built with FastAPI. Features 'Strict Mode' (TF-IDF) for ATS keyword matching and 'Flexible Mode' (SBERT/BERT) for semantic relevance scoring. Dockerized, stateless, and ready for scalable integration.

License

Notifications You must be signed in to change notification settings

viochris/resume-scanner-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 Resume Scanner Pro API: Dual-Engine ATS Optimizer

Python FastAPI Scikit-Learn SBERT Docker HuggingFace Status

📌 Overview

Resume Scanner Pro API is a high-performance REST API designed to help candidates optimize their resumes for Applicant Tracking Systems (ATS).

Unlike simple keyword counters, this tool utilizes a Dual-Engine Architecture. It combines statistical analysis (TF-IDF) to pass strict robotic filters with semantic AI (SBERT) to ensure context relevance for human recruiters. It is stateless, containerized, and ready for integration into career platforms or personal portfolios.

✨ Key Features

🤖 Dual-Mode Analysis

  1. Strict Mode (ATS Logic): Uses TF-IDF Vectorization to check for exact keyword matches. This simulates legacy ATS software that rejects resumes missing specific hard skills.
  2. Flexible Mode (AI Logic): Uses Sentence-BERT (SBERT) embeddings to understand context. It recognizes that "Python" $\approx$ "Coding", rewarding candidates for semantic relevance even if exact wording differs.

📂 Intelligent PDF Extraction

  • Raw Text Extraction: powered by PyPDF2 to strip formatting and clean artifacts.
  • Page Count Logic: Automatically detects resume length and provides strategic advice (e.g., warning users if their resume exceeds 1 page, which risks being overlooked).

🎯 Critical Skills Verification

  • Auto-Detection: Automatically identifies the top 5 most frequent/important keywords in a Job Description.
  • Gap Analysis: Compares these critical terms against the CV and flags "High Risk" applications if core skills are missing.

🛡️ Robust Backend

  • Stateless Architecture: No databases required; processes data in-memory for maximum privacy and speed.
  • Model Caching: The SBERT model loads once during the application lifespan (startup), ensuring low latency for subsequent requests.
  • Dockerized: Fully containerized for consistent deployment across any cloud environment.

🛠️ Tech Stack

  • Framework: FastAPI (Asynchronous)
  • NLP (Statistical): Scikit-Learn (TF-IDF)
  • NLP (AI): Sentence-Transformers (paraphrase-multilingual-MiniLM-L12-v2)
  • Text Processing: PyPDF2, Pandas, NLTK
  • Deployment: Hugging Face Spaces (Docker)

🚀 The Processing Pipeline

  1. Extraction: User uploads a PDF. System extracts text and validates length.
  2. Preprocessing: Text is cleaned (newlines removed, whitespace trimmed) to normalize input.
  3. Vectorization:
    • Strict: Maps text to a frequency matrix based on the Job Description's vocabulary.
    • Flexible: Encodes text into 384-dimensional dense vectors.
  4. Similarity Calculation: Computes Cosine Similarity (0-100%) between the Resume vector and JD vector.
  5. Response: Returns the score, missing keywords list, and critical skills safety check.

🔌 Integration Guide (API Contract)

Live Base URL

https://silvio0-resume-scanner.hf.space

1. Extract Text (PDF Upload)

Parses a PDF file and returns its raw text content ready for analysis.

  • Endpoint: /extract

  • Method: POST

  • Content-Type: multipart/form-data

  • Body:

    • cv_file: Binary File (.pdf)
  • Response (JSON):

{
  "total_pages": 1,
  "Info": "✅ **Optimal Length:** Single-page resume detected. This concise format is highly preferred by recruiters for rapid screening and parsing.",
  "cv_text": "Silvio Christian Joe\nData Scientist\n..."
}

2. Analyze Resume (Scoring)

The core engine that compares the CV against a Job Description.

  • Endpoint: /analyze

  • Method: POST

  • Content-Type: application/x-www-form-urlencoded

  • Body:

    • cv_text: (String) Raw text from the extraction step.
    • jd_text: (String) The Job Description text.
    • mode: (String) "strict" or "flexible".
    • manual_keywords: (List, Optional) Specific skills to check (e.g., "Python, SQL").
  • Response (JSON):

{
  "score": 85.5,
  "mode": "strict",
  "missing_keywords": [
    "kubernetes",
    "docker"
  ],
  "available_keywords": [
    "python",
    "sql",
    "aws",
    "machine learning",
    "data",
    "kubernetes",
    "docker"
  ],
  "default_critical_keywords": [
    "python",
    "sql",
    "aws",
    "machine learning",
    "data"
  ],
  "critical_check": {
    "keywords_checked": [
      "python",
      "sql",
      "aws",
      "machine learning",
      "data"
    ],
    "missing_critical": [],
    "status": "SAFE"
  }
}

📚 Interactive Documentation (Swagger UI)

Test the API workflow directly in your browser:

  1. Access Docs: https://silvio0-resume-scanner.hf.space/docs
  2. Step 1: Extract Text (Get your CV data)
    • Click on POST /extract -> Try it out.
    • Upload your PDF Resume in the cv_file field.
    • Click Execute and copy the content inside "cv_text" from the Response Body.
  3. Step 2: Analyze Match (Check your score)
    • Click on POST /analyze -> Try it out.
    • Paste your copied text into the cv_text field.
    • Paste a sample Job Description into the jd_text field.
    • Click Execute to see your score and missing keywords.

📦 Local Installation

  1. Clone the Repository
git clone https://github.com/viochris/resume-scanner-api.git
cd resume-scanner-api
  1. Install Dependencies
pip install -r requirements.txt
  1. Run the Server
uvicorn api:app --reload

Output: Uvicorn running on http://127.0.0.1:8000


Author: Silvio Christian, Joe "Optimize for the robot, write for the human."

About

A high-performance REST API for Resume Analysis built with FastAPI. Features 'Strict Mode' (TF-IDF) for ATS keyword matching and 'Flexible Mode' (SBERT/BERT) for semantic relevance scoring. Dockerized, stateless, and ready for scalable integration.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published