Skip to content

Indeed Job Scraper. This Python script scrapes job listings from Indeed.com across 14 different country domains for multiple job professions. It extracts job details including title, company name, location, job URL, and company URL, then exports the results to a CSV file.

Notifications You must be signed in to change notification settings

Raimal-Raja/Advanced_Web_Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Country Indeed Job Scraper

This Python script scrapes job listings from Indeed.com across 14 different country domains for multiple job professions. It extracts job details including title, company name, location, job URL, and company URL, then exports the results to a CSV file.

Features

  • Scrapes job listings from 14 Indeed.com country domains
  • Searches for 14 different job professions
  • Extracts comprehensive job details including company URLs
  • Implements multi-threading for faster scraping
  • Uses cloudscraper to bypass anti-bot measures
  • Implements retry mechanism for handling network errors
  • Exports results to a CSV file
  • Includes user authentication for script execution

Prerequisites

Before you begin, ensure you have met the following requirements:

  • Python 3.6+
  • pip (Python package manager)

Installation

  1. Clone this repository:
    git clone https://github.com/yourusername/multi-country-indeed-scraper.git
    
  2. Navigate to the project directory:
    cd multi-country-indeed-scraper
    
  3. Install the required packages:
    pip install -r requirements.txt
    

Usage

  1. Run the script:
    python multi_country_indeed_scraper.py
    
  2. Enter the username and password when prompted:
    • Username: Professor
    • Password: raja
  3. The script will start scraping job listings from all specified countries and job professions.
  4. Results will be saved in Multi_Country_Job_results.csv in the same directory.

Customization

  • To modify the list of countries or their Indeed URLs, edit the domains dictionary in the main() function.
  • To change the job professions being searched, modify the job_professions list in the main() function.
  • Adjust the MAX_RETRIES and RETRY_DELAY variables to fine-tune the retry mechanism.

Dependencies

Additional Resources

  1. Python Documentation
  2. Web Scraping Best Practices
  3. Indeed.com Robot.txt
  4. HTTP Status Codes
  5. Threading in Python
  6. Python Logging

Ethical Considerations

Web scraping may be against the terms of service of some websites. Always review and respect the target website's robots.txt file and terms of service. Use this script responsibly and ensure you have permission to scrape the target websites. The authors are not responsible for any misuse of this script.

Contributing

Contributions, issues, and feature requests are welcome. Feel free to check the issues page if you want to contribute.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Indeed Job Scraper. This Python script scrapes job listings from Indeed.com across 14 different country domains for multiple job professions. It extracts job details including title, company name, location, job URL, and company URL, then exports the results to a CSV file.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages