- Built a regression-based machine learning system to predict annual employee salary
- Objective: analyze salary trends and identify key factors influencing employee pay
- Uses historical hiring and salary data for exploratory analysis and prediction
- train_salary.csv
- Contains employee details including:
- Name
- JobTitle
- Agency
- AgencyID
- HireDate
- AnnualSalary
- GrossPay
- Removed unnecessary columns such as
GrossPay - Cleaned column names by stripping extra spaces
- Removed rows with missing hire dates
- Converted salary from string to numeric format
- Extracted Month, Day, and Year from HireDate
- Removed extreme salary outliers (AnnualSalary > 140000)
- Dropped non-informative columns such as employee name
- Analyzed most common job titles and hiring agencies
- Visualized salary distribution and outliers
- Identified top 10 hiring jobs and highest paying roles
- Examined hiring trends across months and years
- Performed correlation analysis using heatmaps and pair plots
- Applied mean encoding to categorical variables:
- JobTitle
- Agency
- AgencyID
- Replaced categorical values with their corresponding average salary
- Train-test split: 80% training, 20% testing
- Scaled features using StandardScaler
- Models used:
- Linear Regression
- Decision Tree Regressor
- Random Forest Regressor
- XGBoost Regressor
- Evaluation metric: R² score
- Ensemble models showed better predictive performance
-
Clone the repository
git clone https://github.com/Gem793/Salary_Prediction_Model cd Employee_Salary_Predictor -
Install required dependencies
pip install pandas numpy matplotlib seaborn scikit-learn xgboost -
Ensure
train_salary.csvis present in the project directory -
Launch Jupyter Notebook jupyter notebook
-
Run the notebook cells
- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- XGBoost
- Salary prediction can be effectively modeled using regression techniques
- Job title, agency, and hiring year are strong salary predictors
- Demonstrates an end-to-end machine learning workflow from EDA to modeling