Skip to content

parksurk/da_structured-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Analytics - Structured Data

How to analyze Structured Data using Python


introduction about this course

Prerequisites

Theoretical things

Technical things

Project Details

For this project, we will use DataSets in Kaggle Competition.


Getting Started

Follow the steps below!

Step 1: Data Analytics Workflow

General Data Analytics Workflow

Step 2: Regression

Boston Housing : Predicting Boston Housing Prices https://www.kaggle.com/samratp/boston-housing-prices-evaluation-validation

Ames House Prices: Advanced Regression Techniques(https://www.kaggle.com/c/house-prices-advanced-regression-techniques/overview)

Step 3: Classification

Titanic: Machine Learning from Disaster https://www.kaggle.com/c/titanic

Step 4: Clustering

Mall Customer Segmentation Data Market Basket Analysis https://www.kaggle.com/vjchoudhary7/customer-segmentation-tutorial-in-python


Instructions

To setup our project environment to run the code in this repository, follow the instructions below.

  1. Install Git
  2. Install Anaconda
  3. Clone this repository
git clone https://github.com/parksurk/da_structured-data.git
  1. Create (and activate) a new environment with Python 3.6.
    • Linux or Mac:
      conda create --name kaggle python=3.6
      source activate kaggle
      
    • Windows:
      conda create --name kaggle python=3.6
      activate kaggle
      
  2. Install Python Scientific Libraries
pip install jupyter numpy pandas matplotlib seaborn scikit-learn scipy plotly cufflinks tqdm
  1. Install Etc Libraries like XGBoost, LightGBM, graphviz, python-graphviz

    • Linux or Mac:
      conda install -c conda-forge xgboost lightgbm graphviz python-graphviz
      
    • Windows:
      conda install -c anaconda py-xgboost
      conda install -c conda-forge lightgbm graphviz python-graphviz
      
  2. Install RISE - Jupyter notebook slideshow library (Optional for Presenter)

conda install -c conda-forge rise
  1. Create an IPython kernel for the kaggle environment. (Skip if you done already)
pip install ipykernel
python -m ipykernel install --user --name kaggle --display-name "kaggle"
  1. Run Jupyter Notebook
jupyter notebook
  1. Click .ipynb on root directory

  2. Before running code in a notebook, change the kernel to match the 'kaggle' environment by using the drop-down Kernel menu.

About

Data Analytics - Structured Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published