Data Analytics - Structured Data

How to analyze Structured Data using Python

introduction about this course

Prerequisites

Theoretical things

Read about Mathematics & Statistics for Data Science
- Lecture Slides : Google Drive
- Data Science School : https://datascienceschool.net/view-notebook/04358acdcf3347fc989c4cfc0ef6121c/
- Online Graphing Calculator : https://www.desmos.com/calculator

Technical things

Python : Prerequisite 1 - Python Notebook
IPython & Jupyter Notebook : Prerequisite 2 - Python Data Science Environment Notebook
NumPy : Prerequisite 3 - NumPy Notebook
Pandas : Prerequisite 4 - Pandas Notebook
Matplotlib : https://matplotlib.org
- Reference - Plots with Matplotlib
Seaborn : https://seaborn.pydata.org
- Reference - Visualizaiton with Seaborn
Scikit-learn : https://scikit-learn.org/stable/

Project Details

For this project, we will use DataSets in Kaggle Competition.

Getting Started

Follow the steps below!

Step 1: Data Analytics Workflow

General Data Analytics Workflow

Step 2: Regression

Boston Housing : Predicting Boston Housing Prices https://www.kaggle.com/samratp/boston-housing-prices-evaluation-validation

Ames House Prices: Advanced Regression Techniques(https://www.kaggle.com/c/house-prices-advanced-regression-techniques/overview)

Step 3: Classification

Titanic: Machine Learning from Disaster https://www.kaggle.com/c/titanic

Step 4: Clustering

Mall Customer Segmentation Data Market Basket Analysis https://www.kaggle.com/vjchoudhary7/customer-segmentation-tutorial-in-python

Instructions

To setup our project environment to run the code in this repository, follow the instructions below.

Install Git
Install Anaconda
- Linux: https://docs.anaconda.com/anaconda/install/linux/
- Mac: https://docs.anaconda.com/anaconda/install/mac-os/
  - Download for Mac
- Windows: https://docs.anaconda.com/anaconda/install/windows/
  - Download for Windows
  - Anaconda for Windows 설치시 주의사항
    - 설치시 'Advanced Options' 단계에서 'Add Anaconda to my PATH environment variable' 옵션을 체크합니다.(Windows의 Default Command Prompt에서 Anaconda 명령어를 사용하기 위함입니다.)
Clone this repository
- Reference #1 : Intorduction to Git for Data Science
- Reference #2 : Git the simple guide

git clone https://github.com/parksurk/da_structured-data.git

Create (and activate) a new environment with Python 3.6.

Linux or Mac:

conda create --name kaggle python=3.6
source activate kaggle

Windows:

conda create --name kaggle python=3.6
activate kaggle

Install Python Scientific Libraries

pip install jupyter numpy pandas matplotlib seaborn scikit-learn scipy plotly cufflinks tqdm

Install Etc Libraries like XGBoost, LightGBM, graphviz, python-graphviz

Linux or Mac:

conda install -c conda-forge xgboost lightgbm graphviz python-graphviz

Windows:

conda install -c anaconda py-xgboost
conda install -c conda-forge lightgbm graphviz python-graphviz

Install RISE - Jupyter notebook slideshow library (Optional for Presenter)

conda install -c conda-forge rise

Create an IPython kernel for the kaggle environment. (Skip if you done already)

pip install ipykernel
python -m ipykernel install --user --name kaggle --display-name "kaggle"

Run Jupyter Notebook

jupyter notebook

Click .ipynb on root directory
Before running code in a notebook, change the kernel to match the 'kaggle' environment by using the drop-down Kernel menu.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
assets/images		assets/images
data		data
lecture_slides		lecture_slides
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
prerequisites1 - python.ipynb		prerequisites1 - python.ipynb
prerequisites2 - python data science environment.ipynb		prerequisites2 - python data science environment.ipynb
prerequisites3 - numpy.ipynb		prerequisites3 - numpy.ipynb
prerequisites4 - pandas - quiz - assign.ipynb		prerequisites4 - pandas - quiz - assign.ipynb
prerequisites4 - pandas - quiz - assign_solutions.ipynb		prerequisites4 - pandas - quiz - assign_solutions.ipynb
prerequisites4 - pandas - quiz - groupby.ipynb		prerequisites4 - pandas - quiz - groupby.ipynb
prerequisites4 - pandas - quiz - groupby_solutions.ipynb		prerequisites4 - pandas - quiz - groupby_solutions.ipynb
prerequisites4 - pandas - quiz - indexing.ipynb		prerequisites4 - pandas - quiz - indexing.ipynb
prerequisites4 - pandas - quiz - indexing_solutions.ipynb		prerequisites4 - pandas - quiz - indexing_solutions.ipynb
prerequisites4 - pandas.ipynb		prerequisites4 - pandas.ipynb
step1 - Data Analytics Workflow.ipynb		step1 - Data Analytics Workflow.ipynb
step2 - Regression.ipynb		step2 - Regression.ipynb
step3 - Classification.ipynb		step3 - Classification.ipynb
step4 - Clustering.ipynb		step4 - Clustering.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analytics - Structured Data

How to analyze Structured Data using Python

Prerequisites

Theoretical things

Technical things

Project Details

Getting Started

Step 1: Data Analytics Workflow

Step 2: Regression

Step 3: Classification

Step 4: Clustering

Instructions

About

Releases

Packages

Languages

License

parksurk/da_structured-data

Folders and files

Latest commit

History

Repository files navigation

Data Analytics - Structured Data

How to analyze Structured Data using Python

Prerequisites

Theoretical things

Technical things

Project Details

Getting Started

Step 1: Data Analytics Workflow

Step 2: Regression

Step 3: Classification

Step 4: Clustering

Instructions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages