A Python-based application that automatically extracts and analyzes information from resume documents (PDF and DOCX formats) using natural language processing.
- Multiple Format Support: Parse resumes in PDF and DOCX formats
- Intelligent Information Extraction: Extract key details including:
- Candidate name
- Email address
- Phone number
- Skills
- Database Storage: Automatically store parsed information in PostgreSQL database
- RESTful API: Simple API endpoint for resume parsing
- Scalable Architecture: Modular design for easy extensions and modifications
- Backend: Python 3.9+, Flask
- Database: PostgreSQL
- NLP: SpaCy
- Document Processing: PyPDF2, python-docx
- Development Tools: pytest, black, flake8
- Clone the repository:
git clone https://github.com/stephenombuya/Automated-Resume-Parser
cd resume-parser- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Download SpaCy model:
python -m spacy download en_core_web_sm- Set up environment variables:
cp .env.example .env
# Edit .env with your database credentials- Initialize the database:
flask db upgrade- Start the Flask application:
python app.py- Send a POST request to parse a resume:
curl -X POST -F "file=@/path/to/resume.pdf" http://localhost:5000/parse{
"name": "John Doe",
"email": "john.doe@email.com",
"phone": "+1 123-456-7890",
"skills": ["python", "java", "sql"]
}resume-parser/
├── app/
│ ├── __init__.py
│ ├── config.py
│ ├── models.py
│ ├── parser/
│ │ ├── pdf_parser.py
│ │ ├── docx_parser.py
│ │ └── nlp_processor.py
│ └── utils.py
├── tests/
├── requirements.txt
├── .env.example
└── README.md
- Run tests:
pytest- Format code:
black .- Check code style:
flake8- Fork the repository
- Create your feature branch:
git checkout -b feature/new-feature - Commit your changes:
git commit -am 'Add new feature' - Push to the branch:
git push origin feature/new-feature - Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- SpaCy for providing excellent NLP capabilities
- PyPDF2 and python-docx for document parsing functionality
- Add support for more document formats
- Implement machine learning for better information extraction
- Add bulk processing capabilities
- Create a web interface for file uploads
- Enhance skills detection with industry-specific vocabularies
- Add export functionality to various formats