Skip to content

JobMiner – A Python-based web scraping toolkit for extracting and organizing job listings from multiple websites into structured data.

License

Notifications You must be signed in to change notification settings

beingvirus/JobMiner

JobMiner 🔍

JobMiner is a powerful Python-based web scraping toolkit for extracting and organizing job listings from multiple websites into structured data. Built with modularity and extensibility in mind, it provides a robust foundation for job market analysis and automated job searching.

✨ Features

  • Modular Architecture: Easy-to-extend scraper system with base classes
  • Multiple Output Formats: Export to JSON, CSV, or both
  • Database Integration: Optional SQLite/PostgreSQL storage with search capabilities
  • CLI Interface: Command-line tool for easy scraping operations
  • Configuration Management: Flexible configuration system with environment variables
  • Rate Limiting: Built-in delays and respectful scraping practices
  • Error Handling: Comprehensive logging and error recovery
  • Template Generation: Quick scraper template creation for new job sites

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/beingvirus/JobMiner.git
cd JobMiner

# Install dependencies
pip install -r requirements.txt

# Optional: Install as package
pip install -e .

Basic Usage

# List available scrapers
python jobminer_cli.py list-scrapers

# Run demo scraper
python jobminer_cli.py scrape demo-company "python developer" --location "san francisco" --pages 2

# Analyze scraped data
python jobminer_cli.py analyze jobs.json

Python API

from scrapers.demo_company.demo_company import DemoCompanyScraper

# Initialize scraper
scraper = DemoCompanyScraper()

# Scrape jobs
jobs = scraper.scrape_jobs(
    search_term="python developer",
    location="san francisco",
    max_pages=2
)

# Save results
scraper.save_to_json(jobs, "jobs.json")
scraper.save_to_csv(jobs, "jobs.csv")

📁 Project Structure

JobMiner/
├── base_scraper.py              # Base scraper class with common functionality
├── jobminer_cli.py              # Command-line interface
├── config.py                    # Configuration management
├── database.py                  # Database integration (optional)
├── requirements.txt             # Project dependencies
├── setup.py                     # Package setup
├── .env.example                 # Environment variables template
├── scrapers/                    # Individual scraper implementations
│   └── demo-company/
│       ├── demo_company.py      # Demo scraper implementation
│       ├── demo_company_readme.md
│       └── requirements.txt
└── output/                      # Default output directory

🛠 Creating New Scrapers

Using the Template Generator

# Generate a new scraper template
python jobminer_cli.py init

# Follow the prompts to create your scraper

Manual Creation

  1. Create a new directory in scrapers/
  2. Implement the BaseScraper class:
from base_scraper import BaseScraper, JobListing

class YourScraper(BaseScraper):
    def get_job_urls(self, search_term, location="", max_pages=1):
        # Implement job URL extraction
        pass
    
    def parse_job(self, job_url):
        # Implement job detail parsing
        return JobListing(...)
  1. Test your scraper:
python your_scraper.py

⚙️ Configuration

Environment Variables

Copy .env.example to .env and customize:

# Database
JOBMINER_DATABASE_URL=sqlite:///jobminer.db

# Logging
JOBMINER_LOG_LEVEL=INFO

# Scraper settings
JOBMINER_DEFAULT_DELAY=2.0

Configuration File

JobMiner automatically creates jobminer_config.json with default settings:

{
  "default_output_format": "both",
  "output_directory": "output",
  "default_scraper_config": {
    "delay": 2.0,
    "timeout": 30,
    "max_retries": 3
  }
}

feature/complete-jobminer-toolkit

💾 Database Integration

Enable database storage for persistent job data:

from config import get_config
from database import get_db_manager

# Enable database in config
config = get_config()
config.database.enabled = True

# Save jobs to database
db_manager = get_db_manager()
db_manager.save_jobs(jobs, scraper_name="demo-company")

# Search jobs
results = db_manager.search_jobs("python developer")

📊 CLI Commands

# List available scrapers
jobminer list-scrapers

# Scrape jobs
jobminer scrape SCRAPER_NAME "SEARCH_TERM" [OPTIONS]

# Analyze results
jobminer analyze FILE_PATH

# Generate new scraper template
jobminer init

CLI Options

  • --location, -l: Search location
  • --pages, -p: Number of pages to scrape
  • --output, -o: Output filename
  • --format, -f: Output format (json/csv/both)
  • --delay, -d: Delay between requests

🤝 Contributing

We welcome contributions! This project is Hacktoberfest-friendly 🎃

Ways to Contribute

  • Add new scrapers for popular job sites
  • Improve existing scrapers with better parsing
  • Add features like advanced filtering or export options
  • Fix bugs and improve error handling
  • Improve documentation and examples

Getting Started

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/your-feature
  3. Make your changes and test thoroughly
  4. Submit a pull request with a clear description

See CONTRIBUTING.md for detailed guidelines.

📋 Supported Job Sites

Currently implemented scrapers:

  • Demo Company - Template/example scraper for testing

Planned Scrapers

  • LinkedIn Jobs
  • Indeed
  • Glassdoor
  • AngelList
  • Stack Overflow Jobs
  • Remote.co

Want to add a scraper for your favorite job site? Check out our contribution guide!

🔧 Requirements

  • Python 3.8+
  • requests
  • beautifulsoup4
  • pandas
  • click
  • sqlalchemy (optional, for database features)
  • selenium (optional, for JavaScript-heavy sites)

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Built for the open-source community
  • Hacktoberfest 2024 participant
  • Inspired by the need for better job market analysis tools

📞 Support


Happy Job Mining! 🎯

Made with ❤️ by the open-source community


Contributors ✨

Contributors Hall of Fame

About

JobMiner – A Python-based web scraping toolkit for extracting and organizing job listings from multiple websites into structured data.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages