update

chaklam-silpasuwanchai · chaklam-silpasuwanchai · commit 515438af0373 · 2024-12-09T13:29:53.000+07:00
diff --git a/00 - Case Study/code-along/01 - PMDS regression.ipynb b/00 - Case Study/code-along/01 - PMDS regression.ipynb
@@ -0,0 +1,150 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Running regression end to end\n",
+    "\n",
+    "This tutorial aims to train students about regression and machine learning in general, from end-to-end, and to encourage best practices"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import seaborn as sns\n",
+    "import matplotlib.pyplot as plt\n",
+    "import sklearn as sk\n",
+    "# !pip install matplotlib"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Load Data\n",
+    "\n",
+    "- Is about data engineer job\n",
+    "- You work with legacy database\n",
+    "- Most of the time, you will work with AWS / Azure"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. EDA\n",
+    "\n",
+    "- Understand your data\n",
+    "- Spend 70% of your time here\n",
+    "- But today, I will do so quickly...but it should not be like this...."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Feature Engineering\n",
+    "\n",
+    "- Create new features based on existing features"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Feature Selection\n",
+    "\n",
+    "- Select salient features X"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Preprocessing\n",
+    "\n",
+    "- Imputation\n",
+    "- Scaling"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6. Modeling\n",
+    "\n",
+    "- Compare all regression models using cross validation\n",
+    "- Once you got the best model, do cross validation on only one model with different parameters \"Grid search\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 7. Testing\n",
+    "\n",
+    "- Test your model on test set (you should never touch your test set until now)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 8. Analysis\n",
+    "\n",
+    "- Try to come up with explanation of your model\n",
+    "- What works?  What features are important? \n",
+    "- Why certain models work better?\n",
+    "- How many samples are enough?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 9. Inference\n",
+    "\n",
+    "- Test with real-world data\n",
+    "- You don't really know how good is your model, you just try it"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10. Deployment\n",
+    "\n",
+    "- We gonna skip this, but you have at least beware that there is still a lot to do in deployment\n",
+    "\n",
+    "- Deploy your model using FastAPI.  How to host your model in AWS / Azure."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}