SalaryPrediction

Salary Prediction Project (Python)

The goal of the project is to examine a dataset of job postings and salaries, and conduct exploratory data anlysis on the features that are likely to predict salaries for a new set of job postings leveraging on the test dataset

Problem Definition

The HR department of a company need real time solution that would foster effective employment offers to potential employees.

Tool

The codes were wittten using Python 3 and Jupyter Notebook IDE.

Data

train_features.csv: Each row includes metatdata for individual job posting. The JobID is the unique identifier for each posting.
train_salaries.csv: Each row represents a JobID and corresponding salary
test_features.csv: Just like the train_features file, each row represents metadata for individual job posting.

Features in Dataset

Job Type: CEO, CFO, Manager, Janitor etc.
Industry: Finance, Oil, Health etc.
Degree: High School, College, Masters, Doctorate
Major: Business, Engineering, Math, Physics etc
Years of Experience: Candidates Years of Experience
Miles from Metropolis: Distance from Metropolis

Data Discovery Steps

Import Data Libraries and set up directory
Load Data
Merge train_features and train_salaries dataset as one that includes:
- 1 million rows and 9 features
- Both categorical and numeric features
Clean Data
- Rename Columns
- No duplicates and null values in dataset
- Dropped 5 rows where Salary is less than or equal to 0
Explore Features using Data Visualization
- Use IQR to detect/confirm outliers
Utilize Correlation Matrix to examine relationships between variables
Use label encoding to identify correlation between all features

Data Exploration

Univariate Plots

Job Type

Industry

Degree

Major

Insights

Job Type: All job types roughly have a similar count
Industry: All industries roughly have a similar count
Degree: Employees with High school and 'None' degrees have the highest count when compared to other groups. The count of employees with Bachelors, Masters and Doctoral degree are roughly similar.
Major : Employees with no major are the most popular when compared with other employees. The counts of other majors are roughly the similar

Salary (Target Feature)

Insights:

The boxplot indicates the median salary is about 120k and about 75% of employees earn about 85k and more.
The histogram shows the highest salary point is at about 130k. The histogram is almost a normal distribution except it is slightly right skewed.
Both plots indicate some outliers which could potentially be salaries of the top management such as the CEOs.

Bivariate Plots

Insights

Job Type: The box plot shows the CEO's has the highest salary followed colsely by the salaries of the CFO's, and CTO's. The Janitor earns the least salary when compared to other employees.
Industry: The highest earning industries as displayed by the box plot are the Finance and Oil industries while the least two earning industries are the Education and Service sectors.
Degree: Employees with high school or no degrees tend to earn lower salaries when compared to other degrees
Major: The employees who majored in Engineering and Business earn slightly higher than other majors in exception of those with 'none' major. Employees with no major earn the least salaries.
Years of Experience: The plot shows that as employees years of experience increase, their salaries increases as well and vice-versa.
Miles from Metropolis: The plot shows thatas employees miles from metropolis increases, their salaries decreases and vice-versa.

Correlation Matrix

Insights

There is a positive relationship between Salary and Years of Experience.
There is a negative relationship between Salary and theses features :Miles from Metropolis, Major and Degree
Also, there is positive correlation between Major and Degree

Conclusion

Based on the output of the correlation, there is indication that when years of experience increases, salaries of employees seems to increase, that is employees with longer years of work experience are more likely to earn high salaries

Also, there is an indication that when salary increases, miles from metropolis decreases.This could be that employees who earn high income are more likely to afford homes and other expenses of living in or close to the Metropolis.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Images		Images
README.md		README.md
Salary Prediction- Exploratory Data Analysis.ipynb		Salary Prediction- Exploratory Data Analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SalaryPrediction