The LinkedIn Job Scraper is a powerful webscrapping program designed to scrape real-time job postings from the LinkedIn
platform.
It processes and cleans the scraped data, exporting it to a CSV file, enabling users to conveniently view and apply for jobs in Excel based on their interests.
It provides a lot of data such as which company, what job-role, when posted, and so on. It will really be helpful to the students of all different domain to find the jobs according to their own domain.
By automating the process of job data extraction, this scraper allows users to quickly gather and organize job postings based on specific criteria such as job type and location. The scraper uses BeautifulSoup for parsing HTML and LXML content, and pandas for data manipulation.
The output is a clean CSV file, which can be easily opened in Excel for further analysis and job application tracking. This tool simplifies the job search process, saving users valuable time and effort.
This tool is ideal for job seekers looking to automate and optimize their job search process on LinkedIn, providing a structured and efficient way to gather job information.
- Overview
- Features
- Prerequisites
- Technical Details
- Installation
- Clone the Repository
- Create a Virtual Environment (Optional)
- Install Dependencies
- Example Usage
- Output
- CSV File Structure
- Example Output
- Data Utilization
- Benefits
- Contributors
- Contributing
- License
- Contact
- Real-time Job Scraping: Fetches up-to-date job postings from LinkedIn.
- Data Cleaning: Ensures the scraped data is well-organized and free from duplicates.
- CSV Export: Exports the cleaned data into a CSV file for easy use in Excel.
- Customizable Inputs: Allows users to specify the region/country and job type.
For Example : If you are a CSE major and have interest in Python search Python Devloper
in India
. Or if you have interest in Data try searching Data Analyst
in United States
. Or if you want to be in Markeing try searching Digital Marketing
in China
.
- Python 3.7 and above
- Stable internet connection
- Excel (Any other spreadsheet software that supports csv file)
- LinkedIn account (optional)
- Git (for installation)
- Language: Python
- Libraries: BeautifulSoup or bs4, pandas, os, csv, requests, json
- Output: CSV file with detailed jobs information based on the input parameters
- Input Parameters: Region/country, job-type
Before you begin, ensure you have the following prerequisites already installed on your system.
First, clone the repository to your local machine using Git. Open open your terminal and run the following command:
git clone https://github.com/LinconDash/LinkedIn-jobs-scrapper.git
cd LinkedIn-jobs-scrapper
It's a good practice to use a virtual environment to manage your project dependencies because the packages and site-packages donot get installed on your local enviroment thereby reducing both storage and dependencies issues. Run the following commands to create and activate a virtual environment:
- In Windows
python -m venv venv
venv\Scripts\activate
- In Mac
python -m venv venv
source venv/bin/activate
Install the required Python packages listed according to their versions in the requirements.txt
file using pip:
pip install -r requirements.txt
- Configuration: Modify the
configfiles.json
file to set your LinkedIn credentials and other parameter like "target-rows" which can be used to fetch the no. of rows of data you want. - Open main.py: Open up the
main.py
and see the main section just like the image below: - Run main.py: Run the file
main.py
by using the terminal:
python main.py
# or python3 main.py
The output of the LinkedIn Jobs Scraper is a CSV file that consolidates all the job data extracted from LinkedIn. This file is generated, cleaned and stored in the Data
directory of the project.
While fetching the data your output should look something like this image below.
Note : The output of the whole process stored in a csv file in the following format jobs-scraped-on-(searched-date).csv
in the Data
directory automatically. It is not a problem if you donot have already the Data
directory as there is a Python script to create one. All the csv files are stored in that directory
The CSV file contains the following columns, providing a comprehensive view of each job listing:
- Job Role: The title of the job position.
- Company: The name of the company offering the job.
- Job Location: The location where the job is based.
- Employment Type: Whether the job is full-time, part-time, contract, etc.
- Posted: The time when the job was posted (e.g., 1 week ago, 2 months ago).
- Easy Apply: Indicates if the job can be applied to easily via LinkedIn.
- Seniority Level: The experience level required for the job (e.g., Entry level, Mid-Senior level).
- Industries: The industry sector of the company.
- Job Function: The primary function or responsibilities of the job.
Here is an example of what the output CSV might look like:
The CSV file can be opened and analyzed using spreadsheet software like Microsoft Excel, Google Sheets, or any data analysis tool that supports CSV format. Users can sort, filter, and search through the job listings to find opportunities that best match their skills and preferences.
- Centralized Data: All job listings are consolidated in one file, making it easy to manage and review.
- Customizable: Users can easily adjust filters and sorting options in their spreadsheet software to focus on the most relevant job opportunities.
- Time-Saving: Automates the tedious process of manually searching for jobs, allowing users to concentrate on preparing applications and interviews.
This structured output format ensures that users have a clear and organized view of job opportunities, enhancing the efficiency and effectiveness of their job search process. Also users who are interested in making some projects related to jobs can use to get the realtime dataset
- Lincon Dash - Github
Contributions are welcome! Please adhere to the following guidelines when contributing to this project:
- Fork the Repository:
- Click on the "Fork" button at the top right corner of the repository page.
- Clone Your Forked Repository:
git clone https://github.com/LinconDash/LinkedIn-jobs-scrapper.git
cd LinkedIn-jobs-scrapper
- Create a New Branch:
- Use a descriptive name for your branch (e.g., new-feature-add, bugfix-fix-scrapping, etc.).
git checkout -b your-branch-name
- Make Your Changes:
- Ensure your code follows the project’s coding standards.
- Test your changes thoroughly.
- Commit Your Changes:
- Write clear and concise commit messages.
git commit -m "Description of the changes made"
- Push to Your Forked Repository:
git push -u origin your-branch-name
- Create a Pull Request:
- Go to the original repository on GitHub.
- Click on "Pull Requests" and then "New Pull Request."
- Select your branch from the "compare" dropdown and the master branch from the "base" dropdown.
- Provide a clear description of your changes and submit the pull request.
- Code Review:
- Be open to feedback and make necessary changes if requested by the project maintainer i.e me :)
This project is licensed under the GNU GENERAL PUBLIC license. See the LICENSE file for details.
you have any questions or need further assistance, feel free to contact me at:
- Email: lincondash02@gmail.com
- GitHub: LinconDash
- LinkedIn: Lincon Dash