Skip to content

Latest commit

 

History

History
35 lines (28 loc) · 2.37 KB

File metadata and controls

35 lines (28 loc) · 2.37 KB

Data-Science-Project_01_Sinkhole_prevention_analysis

image

Introduction

This analysis project sought to identify the key contributing factors that lead to road crises, and make potential risk map for prevention. By examining architectural and urban planning aspects, the project raised a series of critical questions. To address these questions, various open data sources were utilized. Additionally, the project aimed to develop predictive models to anticipate the occurrence of road crises. image

Feature

  • Open data ETL | Integrated 23 datasets from 4 opendata platform into different training features and analysis. The format of datasets includes csv, xml, shp, geojson and geopackage. A Selenium crawler to collect building cases information wich used to analyze the possibility that sinkhole case happened next to a building case.
  • Data preprocess | Process spatial and time-series training features based on different geographic scale. (eg. Case location / Villages / 5m Hexes across Taipei City)
  • Model training | Using XGBoost and LightGBM to find out important features.
  • Visualization | Using Tableau, QGIS and also matplotlib to demonstrate important findings.

Crawler Demo | Open data List |

Quick Start

  1. install python 3.X and anaconda
  2. create environment using anaconda and Yaml file.
    conda env create --file environment_name.yaml
    
  3. Run related ipynb to make your own training data.

Tools

Tool Description
Selenium Automated web information crawling
Dask Parallel processing for Python
Geopandas Geographic data analysis in Python
XGBoost Scalable and efficient gradient boosting machine learning
LightGBM Gradient boosting machine learning with high performance
Optuna Hyperparameter optimization library for machine learning
QGIS Spatial visualization