Skip to content

Commit 515438a

Browse files
update
1 parent 420c279 commit 515438a

File tree

1 file changed

+150
-0
lines changed

1 file changed

+150
-0
lines changed
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Running regression end to end\n",
8+
"\n",
9+
"This tutorial aims to train students about regression and machine learning in general, from end-to-end, and to encourage best practices"
10+
]
11+
},
12+
{
13+
"cell_type": "code",
14+
"execution_count": 11,
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"import numpy as np\n",
19+
"import pandas as pd\n",
20+
"import seaborn as sns\n",
21+
"import matplotlib.pyplot as plt\n",
22+
"import sklearn as sk\n",
23+
"# !pip install matplotlib"
24+
]
25+
},
26+
{
27+
"cell_type": "markdown",
28+
"metadata": {},
29+
"source": [
30+
"## 1. Load Data\n",
31+
"\n",
32+
"- Is about data engineer job\n",
33+
"- You work with legacy database\n",
34+
"- Most of the time, you will work with AWS / Azure"
35+
]
36+
},
37+
{
38+
"cell_type": "markdown",
39+
"metadata": {},
40+
"source": [
41+
"## 2. EDA\n",
42+
"\n",
43+
"- Understand your data\n",
44+
"- Spend 70% of your time here\n",
45+
"- But today, I will do so quickly...but it should not be like this...."
46+
]
47+
},
48+
{
49+
"cell_type": "markdown",
50+
"metadata": {},
51+
"source": [
52+
"## 3. Feature Engineering\n",
53+
"\n",
54+
"- Create new features based on existing features"
55+
]
56+
},
57+
{
58+
"cell_type": "markdown",
59+
"metadata": {},
60+
"source": [
61+
"## 4. Feature Selection\n",
62+
"\n",
63+
"- Select salient features X"
64+
]
65+
},
66+
{
67+
"cell_type": "markdown",
68+
"metadata": {},
69+
"source": [
70+
"## 5. Preprocessing\n",
71+
"\n",
72+
"- Imputation\n",
73+
"- Scaling"
74+
]
75+
},
76+
{
77+
"cell_type": "markdown",
78+
"metadata": {},
79+
"source": [
80+
"## 6. Modeling\n",
81+
"\n",
82+
"- Compare all regression models using cross validation\n",
83+
"- Once you got the best model, do cross validation on only one model with different parameters \"Grid search\""
84+
]
85+
},
86+
{
87+
"cell_type": "markdown",
88+
"metadata": {},
89+
"source": [
90+
"## 7. Testing\n",
91+
"\n",
92+
"- Test your model on test set (you should never touch your test set until now)"
93+
]
94+
},
95+
{
96+
"cell_type": "markdown",
97+
"metadata": {},
98+
"source": [
99+
"## 8. Analysis\n",
100+
"\n",
101+
"- Try to come up with explanation of your model\n",
102+
"- What works? What features are important? \n",
103+
"- Why certain models work better?\n",
104+
"- How many samples are enough?"
105+
]
106+
},
107+
{
108+
"cell_type": "markdown",
109+
"metadata": {},
110+
"source": [
111+
"## 9. Inference\n",
112+
"\n",
113+
"- Test with real-world data\n",
114+
"- You don't really know how good is your model, you just try it"
115+
]
116+
},
117+
{
118+
"cell_type": "markdown",
119+
"metadata": {},
120+
"source": [
121+
"## 10. Deployment\n",
122+
"\n",
123+
"- We gonna skip this, but you have at least beware that there is still a lot to do in deployment\n",
124+
"\n",
125+
"- Deploy your model using FastAPI. How to host your model in AWS / Azure."
126+
]
127+
}
128+
],
129+
"metadata": {
130+
"kernelspec": {
131+
"display_name": ".venv",
132+
"language": "python",
133+
"name": "python3"
134+
},
135+
"language_info": {
136+
"codemirror_mode": {
137+
"name": "ipython",
138+
"version": 3
139+
},
140+
"file_extension": ".py",
141+
"mimetype": "text/x-python",
142+
"name": "python",
143+
"nbconvert_exporter": "python",
144+
"pygments_lexer": "ipython3",
145+
"version": "3.11.2"
146+
}
147+
},
148+
"nbformat": 4,
149+
"nbformat_minor": 2
150+
}

0 commit comments

Comments
 (0)