Job Scraper Application

A full-featured job scraper application built with Python Flask, PostgreSQL, and Redis.

Overview

The Job Scraper application is designed to scrape job listings from various job boards, store them in a PostgreSQL database, and provide a web interface for browsing, searching, and exporting the collected data.

Key Features

Job Scraping: Automated scraping of job listings from multiple sources
Web Interface: Flask-based dashboard to view, search, and manage job listings
Data Export/Import: Export job data to CSV/JSON and import from external sources
Monitoring: Integrated health checks and Prometheus metrics for observability
Containerization: Docker setup for easy deployment and scaling

Architecture

The application follows a modular architecture with the following components:

Web Interface: Flask application with templates and static assets
Scraper Module: Core scraping functionality with configurable job sources
Database Layer: PostgreSQL storage with SQLAlchemy ORM
Caching Layer: Redis for performance optimization
Monitoring: Prometheus metrics and health check endpoints

Prerequisites

Python 3.10+
PostgreSQL 13+
Redis 6+
Docker and Docker Compose (for containerized deployment)

Installation

Local Development Setup

Clone the repository:

git clone https://github.com/yourusername/job-scraper.git
cd job-scraper

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Set up environment variables:

cp .env.example .env
# Edit .env with your configuration values

Initialize the database:

# Ensure PostgreSQL is running
python3 -m app.db.init_db

Run the application:
```
python3 main.py
```

Docker Deployment

Build and start containers:
```
docker-compose up -d
```
Access the application at http://localhost:5000

Usage

Web Interface

The web interface provides the following features:

Dashboard: Overview of scraping status and job statistics
Job Listings: Browse and search collected job listings
Export: Export job data to various formats
Import: Import job data from external sources
Scraper Control: Start, stop, and monitor scraping jobs

API Endpoints

The application exposes the following API endpoints:

GET /api/jobs: Get a list of jobs with optional filtering
GET /api/jobs/{id}: Get details for a specific job
POST /api/start-scrape: Start a new scraping job
POST /api/stop-scrape: Stop the current scraping job
GET /api/status: Get the current status of the scraper
GET /api/export: Export job data to CSV/JSON
POST /api/import: Import job data from an external source
GET /health: Health check endpoint
GET /metrics: Prometheus metrics endpoint

Configuration

Configuration is managed through YAML files in the config/ directory:

app_config.yaml: General application settings
api_config.yaml: API-specific settings
logging_config.yaml: Logging configuration

Environment-specific settings can be overridden using environment variables defined in the .env file.

Monitoring and Observability

The application includes built-in monitoring capabilities:

Health Checks: /health endpoint for application health status
Prometheus Metrics: /metrics endpoint for Prometheus metrics
Structured Logging: Detailed logs using the Python logging module

Database Structure

The core database tables include:

jobs: Stores job listings with details like title, company, location, etc.
companies: Information about companies posting jobs
scrape_runs: Records of scraping jobs with timestamps and statistics
users: User accounts for the web interface (if authentication is enabled)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Beautiful Soup for HTML parsing
Flask for the web framework
SQLAlchemy for database ORM
Prometheus for metrics collection
Bootstrap for the web interface

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
app		app
archived_files		archived_files
complete_deploy		complete_deploy
config		config
docker		docker
docker_app		docker_app
init-db		init-db
job-scraper-deploy		job-scraper-deploy
nginx_config		nginx_config
scripts		scripts
static		static
temp_deploy		temp_deploy
tests		tests
.cursorrules		.cursorrules
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
ARCHIVABLE_FILES.md		ARCHIVABLE_FILES.md
CLEANUP_SUMMARY.md		CLEANUP_SUMMARY.md
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
Dockerfile.superset		Dockerfile.superset
ENHANCED_DEPLOYMENT.md		ENHANCED_DEPLOYMENT.md
Makefile		Makefile
PROJECT_ORGANIZATION.md		PROJECT_ORGANIZATION.md
PYTHON_COMMAND_GUIDE.md		PYTHON_COMMAND_GUIDE.md
README.md		README.md
archive_unused_files.sh		archive_unused_files.sh
complete-deployment.sh		complete-deployment.sh
deploy-all.sh		deploy-all.sh
deploy-fix.sh		deploy-fix.sh
deploy.conf		deploy.conf
deploy.sh		deploy.sh
deploy_full_app.sh		deploy_full_app.sh
docker-compose.monitoring.yml		docker-compose.monitoring.yml
docker-compose.yml		docker-compose.yml
docker_manage.sh		docker_manage.sh
entrypoint.sh		entrypoint.sh
fix-web-container.sh		fix-web-container.sh
fix_application.sh		fix_application.sh
main.py		main.py
nginx-default.conf		nginx-default.conf
nginx_jobscraper.conf		nginx_jobscraper.conf
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
run_archive.sh		run_archive.sh
server_setup.sh		server_setup.sh
setup.py		setup.py
setup.sh		setup.sh
setup_answers.sh		setup_answers.sh
setup_production.sh		setup_production.sh
setup_sh_fix.patch		setup_sh_fix.patch
test-setup.sh		test-setup.sh
test_app.py		test_app.py
upgrade-app.sh		upgrade-app.sh
wsgi.py		wsgi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Job Scraper Application

Overview

Key Features

Architecture

Prerequisites

Installation

Local Development Setup

Docker Deployment

Usage

Web Interface

API Endpoints

Configuration

Monitoring and Observability

Database Structure

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

AliAzimiD/job_scraper

Folders and files

Latest commit

History

Repository files navigation

Job Scraper Application

Overview

Key Features

Architecture

Prerequisites

Installation

Local Development Setup

Docker Deployment

Usage

Web Interface

API Endpoints

Configuration

Monitoring and Observability

Database Structure

Contributing

License

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages