introduction about this course
- Read about Mathematics & Statistics for Data Science
- Lecture Slides : Google Drive
- Data Science School : https://datascienceschool.net/view-notebook/04358acdcf3347fc989c4cfc0ef6121c/
- Online Graphing Calculator : https://www.desmos.com/calculator
- Python : Prerequisite 1 - Python Notebook
- IPython & Jupyter Notebook : Prerequisite 2 - Python Data Science Environment Notebook
- NumPy : Prerequisite 3 - NumPy Notebook
- Pandas : Prerequisite 4 - Pandas Notebook
- Matplotlib : https://matplotlib.org
- Seaborn : https://seaborn.pydata.org
- Scikit-learn : https://scikit-learn.org/stable/
For this project, we will use DataSets in Kaggle Competition.
Follow the steps below!
General Data Analytics Workflow
Boston Housing : Predicting Boston Housing Prices https://www.kaggle.com/samratp/boston-housing-prices-evaluation-validation
Ames House Prices: Advanced Regression Techniques(https://www.kaggle.com/c/house-prices-advanced-regression-techniques/overview)
Titanic: Machine Learning from Disaster https://www.kaggle.com/c/titanic
Mall Customer Segmentation Data Market Basket Analysis https://www.kaggle.com/vjchoudhary7/customer-segmentation-tutorial-in-python
To setup our project environment to run the code in this repository, follow the instructions below.
- Install Git
- Install Anaconda
- Linux: https://docs.anaconda.com/anaconda/install/linux/
- Mac: https://docs.anaconda.com/anaconda/install/mac-os/
- Windows: https://docs.anaconda.com/anaconda/install/windows/
- Download for Windows
- Anaconda for Windows 설치시 주의사항
- Clone this repository
git clone https://github.com/parksurk/da_structured-data.git
- Create (and activate) a new environment with Python 3.6.
- Linux or Mac:
conda create --name kaggle python=3.6 source activate kaggle
- Windows:
conda create --name kaggle python=3.6 activate kaggle
- Linux or Mac:
- Install Python Scientific Libraries
pip install jupyter numpy pandas matplotlib seaborn scikit-learn scipy plotly cufflinks tqdm
-
Install Etc Libraries like XGBoost, LightGBM, graphviz, python-graphviz
- Linux or Mac:
conda install -c conda-forge xgboost lightgbm graphviz python-graphviz
- Windows:
conda install -c anaconda py-xgboost conda install -c conda-forge lightgbm graphviz python-graphviz
- Linux or Mac:
-
Install RISE - Jupyter notebook slideshow library (Optional for Presenter)
conda install -c conda-forge rise
- Create an IPython kernel for the kaggle environment. (Skip if you done already)
pip install ipykernel
python -m ipykernel install --user --name kaggle --display-name "kaggle"
- Run Jupyter Notebook
jupyter notebook
-
Click .ipynb on root directory
-
Before running code in a notebook, change the kernel to match the 'kaggle' environment by using the drop-down Kernel menu.