This Python script scrapes job listings from Indeed.com across 14 different country domains for multiple job professions. It extracts job details including title, company name, location, job URL, and company URL, then exports the results to a CSV file.
- Scrapes job listings from 14 Indeed.com country domains
- Searches for 14 different job professions
- Extracts comprehensive job details including company URLs
- Implements multi-threading for faster scraping
- Uses cloudscraper to bypass anti-bot measures
- Implements retry mechanism for handling network errors
- Exports results to a CSV file
- Includes user authentication for script execution
Before you begin, ensure you have met the following requirements:
- Python 3.6+
- pip (Python package manager)
- Clone this repository:
git clone https://github.com/yourusername/multi-country-indeed-scraper.git
- Navigate to the project directory:
cd multi-country-indeed-scraper
- Install the required packages:
pip install -r requirements.txt
- Run the script:
python multi_country_indeed_scraper.py
- Enter the username and password when prompted:
- Username: Professor
- Password: raja
- The script will start scraping job listings from all specified countries and job professions.
- Results will be saved in
Multi_Country_Job_results.csv
in the same directory.
- To modify the list of countries or their Indeed URLs, edit the
domains
dictionary in themain()
function. - To change the job professions being searched, modify the
job_professions
list in themain()
function. - Adjust the
MAX_RETRIES
andRETRY_DELAY
variables to fine-tune the retry mechanism.
- cloudscraper: For bypassing Cloudflare's anti-bot page.
- BeautifulSoup: For parsing HTML and extracting data.
- pandas: For creating and exporting data to CSV.
- requests: For making HTTP requests.
- concurrent.futures: For implementing multi-threading.
- Python Documentation
- Web Scraping Best Practices
- Indeed.com Robot.txt
- HTTP Status Codes
- Threading in Python
- Python Logging
Web scraping may be against the terms of service of some websites. Always review and respect the target website's robots.txt
file and terms of service. Use this script responsibly and ensure you have permission to scrape the target websites. The authors are not responsible for any misuse of this script.
Contributions, issues, and feature requests are welcome. Feel free to check the issues page if you want to contribute.
This project is licensed under the MIT License - see the LICENSE file for details.