Skip to content

This project builds a predictive model to estimate visa approval likelihood using candidate and job-related features. It showcases an end-to-end machine learning workflow with EDA, feature engineering, and model tuning to automate parts of the visa evaluation process.

indu-explores-data/Visa-Approval-Prediction-using-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Visa Approval Prediction using Machine Learning

This project focuses on building a predictive model to determine the likelihood of visa approval based on candidate and job-related features. The goal is to assist in automating parts of the visa evaluation process by leveraging data-driven insights and machine learning models. Through exploratory data analysis (EDA), feature engineering, and model tuning, this project demonstrates the workflow of developing an end-to-end classification model.


🎯 Objectives

  • Understand key factors influencing visa approvals.
  • Perform in-depth univariate and bivariate analysis of applicant and job attributes.
  • Apply data preprocessing, feature selection, and model tuning.
  • Compare different classification algorithms and evaluate their performance.
  • Identify the most significant predictors contributing to visa approval.

🧩 Key Methods

  • Data Preprocessing: Handling missing values, encoding categorical variables, and winsorizing outliers.
  • Exploratory Data Analysis (EDA): Univariate and bivariate analyses to uncover variable distributions and relationships.
  • Feature Engineering: Transformation and selection based on importance scores.
  • Model Development: Logistic Regression, Random Forest, and Gradient Boosting (before and after hyperparameter tuning).
  • Evaluation Metrics: F1-score, precision, recall, and accuracy.

📊 Visualizations

📊 Categorical Analysis

Univariate Analysis Categorical 1
Univariate Analysis Categorical 2

📈 Numerical Analysis

Univariate Analysis Numerical

🔄 Bivariate Relationships

Bivariate Analysis 1
Bivariate Analysis 2

💰 Wage Analysis

Wage Analysis
Boxplot Prevailing Wage
Prevailing Wage After Winsorization

🤖 Model Performance

Models Before vs After Tuning F1 Score Comparison Before vs After Tuning Boosting Models

🌟 Feature Importance

Top 10 Feature Importances Random Forest
Top 10 Important Features Tuned Gradient Boosting


💡 Key Insights & Outcomes

🔍 Model Performance & Interpretability

  • Achieved the highest F1 Score (82.06%), matching Gradient Boosting but with easier interpretability.
  • Handles class imbalance effectively.
  • Reduces overfitting by aggregating multiple diverse decision trees.
  • Robust to noise and scalable to high-dimensional data.
  • Offers faster training and easier tuning than Gradient Boosting.

⚙️ Automated Application Screening

  • Classifies visa petitions as likely to be certified or denied, helping prioritize workload and improve processing efficiency.

🚨 Risk Flagging

  • Automatically flags borderline or uncertain cases for manual verification by legal teams, minimizing decision errors.

📈 Transparent, Data-Driven Dashboards

  • Feature importance visualizations explain the reasons behind predictions, enhancing trust and transparency for stakeholders and applicants.

🧭 Policy Optimization

  • Helps discover patterns (e.g., wage, experience, education) that impact certification rates, guiding internal policy and strategy decisions.

🏆 Final Model Selection

The Tuned Random Forest Classifier combines:

  • Strong predictive performance
  • Excellent interpretability
  • Speed and scalability
  • Ease of deployment

💡 Conclusion:
The Tuned Random Forest model proved to be the most practical and effective solution for EasyVisa’s automated visa prediction system, balancing accuracy, explainability, and operational efficiency.


🛠️ Technologies Used

  • Python
  • Pandas, NumPy for data manipulation
  • Matplotlib, Seaborn for visualization
  • Scikit-learn for model building and evaluation
  • Jupyter Notebook for analysis workflow

⚙️ Setup & Installation Instructions

1. Clone the repository:

git clone https://github.com/yourusername/visa-approval-prediction.git
cd visa-approval-prediction

2. Install dependencies:

pip install -r requirements.txt

3. Open the Jupyter Notebook:

jupyter notebook "Visa Approval Prediction using ML.ipynb"

▶️ Usage Instructions

  • Run each cell in the notebook sequentially to reproduce the analysis.
  • Modify parameters (e.g., test size, model hyperparameters) to experiment with model behavior.
  • Visualizations are auto-generated during execution for interactive exploration.

🔗 Connect with Me

Let’s connect on LinkedIn for project discussions or data-driven collaborations:

LinkedIn


🙌 Feedback & Support

If you found this project helpful, please ⭐ star the repository and share your thoughts. Suggestions and contributions are always welcome!

About

This project builds a predictive model to estimate visa approval likelihood using candidate and job-related features. It showcases an end-to-end machine learning workflow with EDA, feature engineering, and model tuning to automate parts of the visa evaluation process.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published