Skip to content

Snail664/country-clustering

Repository files navigation

Country Clustering

About

This is our Mini-Project for SC1015 (Introduction to Data Science and Artificial Intelligence) which focuses on clustering analysis on countries data.

Problem statement

Real world problem
We are a Non-Governmental Organisation seeking to roll out a new programme to reform and help underdeveloped countries to progress. Due to budget. constraints, we need to strategically determine which countries to assist. Hence, we aim to identify the countries that are performing poorly in areas beyond just GDP per capita. These areas include healthcare, education, access to necessities etc.

Data science problem
Identify a cluster of countries that are underdeveloped in multiple areas based on a comprehensive set of indicators.

Repository Contents

Hi there! Join us on our data journey.
View files in following order:

  1. un_member_states.ipynb
  2. country_name_map.ipynb
  3. data_cleaning.ipynb
  4. data_exploration.ipynb
  5. PCA.ipynb
  6. kmeans.ipynb
  7. hierarchical.ipynb
  8. model_comparison.ipynb

Our Findings

We have identified 34 countries, which is the intersection of the worst performing clusters of our K-Means and Hierarchical models, to be the least developed countries that we should offer aid to.

Contributions

Chiraag - data cleaning, clustering models.
Chen rui - principal component analysis, cluster interpretation (visualisation).
Vaish - data preparation, EDA.

References

Arias, F. J. C. (2019, February 7). Fuzzy String Matching in Python Tutorial. DataCamp Community. Retrieved March 21, 2022, from https://www.datacamp.com/community/tutorials/fuzzy-string-python

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published