Skip to content

Commit b72b946

Browse files
authored
Merge pull request #30 from kitrak-rev/main
Added Root Mean Square Error code and readme
2 parents 6adb3c0 + 048c722 commit b72b946

File tree

2 files changed

+176
-0
lines changed

2 files changed

+176
-0
lines changed
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Root Mean Squared Error
2+
3+
The Root mean squared error (RMSE) tells you how close a regression line is to a set of points. It does this by taking the distances from the points to the regression line (these distances are the “errors”) ,squaring them and taking whole root. The squaring is necessary to remove any negative signs. It also gives more weight to larger differences, to keep the final value from reaching very high values, root of it is taken as error metric in RMSE. It’s called the root [mean ](https://www.statisticshowto.com/mean/)squared error as you’re finding the root of the average of a set of errors squred. The lower the RMSE, the better the forecast.
4+
5+
![image](https://www.gstatic.com/education/formulas2/472522532/en/root_mean_square_deviation.svg)
6+
7+
The calculations for the root mean squared error are similar to the standard deviation. To find the RMSE, take the observed value, subtract the predicted value, and square that difference. Repeat that for all observations. Then, sum all of those squared values and divide by the number of observations.And finally take root of the value obtained in the previous step.
8+
9+
For example, in regression the root mean squared error represents the root of average squared residual
10+
11+
![image](https://user-images.githubusercontent.com/78155475/194711594-67ecd6cb-d9f9-42dc-b47f-3b154d4aff2d.png)
12+
13+
As the data points fall closer to the regression line, the model has less error, decreasing the MSE. A model with less error produces more precise predictions.
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
{
2+
"nbformat": 4,
3+
"nbformat_minor": 0,
4+
"metadata": {
5+
"colab": {
6+
"provenance": [],
7+
"collapsed_sections": []
8+
},
9+
"kernelspec": {
10+
"name": "python3",
11+
"display_name": "Python 3"
12+
},
13+
"language_info": {
14+
"name": "python"
15+
}
16+
},
17+
"cells": [
18+
{
19+
"cell_type": "code",
20+
"execution_count": null,
21+
"metadata": {
22+
"colab": {
23+
"base_uri": "https://localhost:8080/"
24+
},
25+
"id": "Uvsgr8sQd12g",
26+
"outputId": "461cad3e-f1b9-4e1b-f5f4-8b0eacb92914"
27+
},
28+
"outputs": [
29+
{
30+
"output_type": "stream",
31+
"name": "stdout",
32+
"text": [
33+
"X-shape: (442, 10) Y-shape: (442,)\n",
34+
" age sex bmi bp s1 s2 s3 \\\n",
35+
"0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401 \n",
36+
"1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412 \n",
37+
"2 0.085299 0.050680 0.044451 -0.005671 -0.045599 -0.034194 -0.032356 \n",
38+
"3 -0.089063 -0.044642 -0.011595 -0.036656 0.012191 0.024991 -0.036038 \n",
39+
"4 0.005383 -0.044642 -0.036385 0.021872 0.003935 0.015596 0.008142 \n",
40+
"\n",
41+
" s4 s5 s6 \n",
42+
"0 -0.002592 0.019908 -0.017646 \n",
43+
"1 -0.039493 -0.068330 -0.092204 \n",
44+
"2 -0.002592 0.002864 -0.025930 \n",
45+
"3 0.034309 0.022692 -0.009362 \n",
46+
"4 -0.002592 -0.031991 -0.046641 \n",
47+
"0 151.0\n",
48+
"1 75.0\n",
49+
"2 141.0\n",
50+
"3 206.0\n",
51+
"4 135.0\n",
52+
"Name: target, dtype: float64\n"
53+
]
54+
}
55+
],
56+
"source": [
57+
"import pandas as pd\n",
58+
"import numpy as np\n",
59+
"from sklearn.model_selection import train_test_split\n",
60+
"from sklearn import datasets\n",
61+
"from sklearn.linear_model import LinearRegression\n",
62+
"lr= LinearRegression()\n",
63+
"X, y = datasets.load_diabetes(return_X_y=True,as_frame=True)\n",
64+
"print(\"X-shape: \",X.shape,\"Y-shape: \",y.shape)\n",
65+
"print(X.head())\n",
66+
"print(y.head())"
67+
]
68+
},
69+
{
70+
"cell_type": "code",
71+
"source": [
72+
"#About the data"
73+
],
74+
"metadata": {
75+
"id": "ODGG7Yxw5nLw"
76+
},
77+
"execution_count": null,
78+
"outputs": []
79+
},
80+
{
81+
"cell_type": "markdown",
82+
"source": [
83+
"The above dataset has 10 dimensions giving details on different aspects of diabetes with the output being numerical representation of the progress of the disease."
84+
],
85+
"metadata": {
86+
"id": "whGXfcci5vpI"
87+
}
88+
},
89+
{
90+
"cell_type": "code",
91+
"source": [
92+
"X_train, X_test, y_train, y_test= train_test_split(X,y, test_size=0.2, random_state=42)\n",
93+
"lr.fit(X_train,y_train)\n",
94+
"y_pred= lr.predict(X_test)"
95+
],
96+
"metadata": {
97+
"id": "fHPyY8Pq3dGl"
98+
},
99+
"execution_count": null,
100+
"outputs": []
101+
},
102+
{
103+
"cell_type": "markdown",
104+
"source": [
105+
"**Root Mean Squared Error**"
106+
],
107+
"metadata": {
108+
"id": "PTKQ59QO3hzB"
109+
}
110+
},
111+
{
112+
"cell_type": "code",
113+
"source": [
114+
"RMSE=np.sqrt(np.square(np.subtract(y_test,y_pred)).mean())\n",
115+
"print(RMSE)"
116+
],
117+
"metadata": {
118+
"colab": {
119+
"base_uri": "https://localhost:8080/"
120+
},
121+
"id": "cL09sEN93lqv",
122+
"outputId": "cdc9e846-d327-4119-f0f4-e0406a7d31d1"
123+
},
124+
"execution_count": null,
125+
"outputs": [
126+
{
127+
"output_type": "stream",
128+
"name": "stdout",
129+
"text": [
130+
"42.79389304196525\n",
131+
"53.853256984914395\n"
132+
]
133+
}
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"source": [
139+
"#Lets check this out with sklearn inbuilt module\n",
140+
"from sklearn.metrics import mean_absolute_error,mean_squared_error\n",
141+
"print(mean_squared_error(y_test,y_pred,squared=False))\n"
142+
],
143+
"metadata": {
144+
"colab": {
145+
"base_uri": "https://localhost:8080/"
146+
},
147+
"id": "Q60ANYm13zKt",
148+
"outputId": "32177f76-f79f-443f-aea7-bd875ca2a60f"
149+
},
150+
"execution_count": null,
151+
"outputs": [
152+
{
153+
"output_type": "stream",
154+
"name": "stdout",
155+
"text": [
156+
"42.79389304196525\n",
157+
"53.853256984914395\n"
158+
]
159+
}
160+
]
161+
}
162+
]
163+
}

0 commit comments

Comments
 (0)