Classification of Poverty Levels in Indonesia Using Machine Learning

Project Description

This project aims to build the best classification model for predicting poverty levels in Indonesia based on socio-economic and demographic features of each region. The dataset consists of 514 entries representing provinces and districts/cities in Indonesia with 12 features. These features include economic indicators, education levels, health metrics, and other key socio-economic statistics. The data is preprocessed and analyzed to ensure robust model development and evaluation. data source:https://www.kaggle.com/datasets/ermila/klasifikasi-tingkat-kemiskinan-di-indonesia

Project Goals

Develop multiple classification models to predict poverty levels in Indonesia.
Evaluate and compare the models based on their accuracy to identify the best-performing model.
Analyze the relationships between socio-economic features and their impact on poverty prediction.

Methodology

Load Data: The dataset was imported and cleaned, containing features such as education levels, sanitation access, unemployment rates, and regional economic performance.
Exploratory Data Analysis (EDA): Insights were extracted using descriptive statistics and visualizations, such as heatmaps, to identify patterns and correlations among features.
Data Preprocessing:
- Missing values were handled by imputing numerical values with the mean and encoding categorical features with LabelEncoder.
- Features were standardized using StandardScaler.
Splitting and Scaling Data:
- The dataset was split into training and test sets (80:20 ratio), ensuring balanced representation.
Modeling:
- Three classification models were developed: Random Forest.
- The Random Forest model achieved the highest accuracy of 99.03%.
Evaluation:
- Classification reports and confusion matrices were generated to assess model performance.

Project Key Insights

The features P0 Persen (percentage of poor population) and Mean Lama Sekolah 15+ (average years of schooling) are highly significant in classifying poverty levels.
Visualizing the dataset through correlation heatmaps reveals strong relationships between poverty and access to education, health metrics, and sanitation.
The Random Forest model demonstrated superior accuracy and generalization ability, making it the most reliable choice for this classification task.

Best Model

Random Forest Classifier
- Accuracy: 99.03%
- F1-Score: 0.99
- This model excels in balancing precision and recall, ensuring stable performance for unseen data.

Contact

If you have suggestions or want to collaborate, feel free to reach out:

Email: mochhabibibier@gmail.com
LinkedIn: https://www.linkedin.com/in/mhabibierobbi12/

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Data		Data
Klasifikasi Tingkat Kemiskinan di Indonesia.ipynb		Klasifikasi Tingkat Kemiskinan di Indonesia.ipynb
Portofolio - Datascience.pdf		Portofolio - Datascience.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification of Poverty Levels in Indonesia Using Machine Learning

Project Description

Project Goals

Methodology

Project Key Insights

Best Model

Contact

About

Releases

Packages

Languages

MahirBye/KLASIFIKASI-TINGKAT-KEMISKINAN-DI-INDONESIA

Folders and files

Latest commit

History

Repository files navigation

Classification of Poverty Levels in Indonesia Using Machine Learning

Project Description

Project Goals

Methodology

Project Key Insights

Best Model

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages