srt/7 - 2 - Cost Function (10 min).srt

﻿1
00:00:00,144 --> 00:00:02,011
In this video, I'd like to
在这段视频中 我想要
(字幕整理：中国海洋大学 黄海广,haiguang2000@qq.com )

2
00:00:02,011 --> 00:00:03,990
convey to you, the main intuitions
传达给你一个直观的感受

3
00:00:03,990 --> 00:00:05,771
behind how regularization works.
告诉你正规化是如何进行的

4
00:00:05,771 --> 00:00:07,386
And, we'll also write down
而且 我们还要写出

5
00:00:07,386 --> 00:00:11,724
the cost function that we'll use, when we were using regularization.
我们使用正规化时 需要使用的代价函数

6
00:00:11,780 --> 00:00:13,327
With the hand drawn examples that
根据我们幻灯片上的

7
00:00:13,327 --> 00:00:14,916
we have on these slides, I
这些例子

8
00:00:14,950 --> 00:00:17,642
think I'll be able to convey part of the intuition.
我想我可以给你一个直观的感受

9
00:00:17,700 --> 00:00:19,608
But, an even better
但是 一个更好的

10
00:00:19,608 --> 00:00:21,192
way to see for yourself, how
让你自己去理解正规化

11
00:00:21,192 --> 00:00:22,643
regularization works, is if
如何工作的方法是

12
00:00:22,643 --> 00:00:25,869
you implement it, and, see it work for yourself.
你自己亲自去实现它 并且看看它是如何工作的

13
00:00:25,869 --> 00:00:26,888
And, if you do the
如果在这节课后

14
00:00:26,888 --> 00:00:28,603
appropriate exercises after this,
你进行一些适当的练习

15
00:00:28,603 --> 00:00:30,053
you get the chance
你就有机会亲自体验一下

16
00:00:30,053 --> 00:00:33,927
to self see regularization in action for yourself.
正规化到底是怎么工作的

17
00:00:33,930 --> 00:00:36,519
So, here is the intuition.
那么 这里就是一些直观解释

18
00:00:36,519 --> 00:00:38,233
In the previous video, we saw
在前面的视频中 我们看到了

19
00:00:38,233 --> 00:00:39,771
that, if we were to fit
如果说我们要

20
00:00:39,771 --> 00:00:41,420
a quadratic function to this
用一个二次函数来

21
00:00:41,420 --> 00:00:44,283
data, it gives us a pretty good fit to the data.
拟合这些数据 它给了我们一个对数据很好的拟合

22
00:00:44,283 --> 00:00:45,286
Whereas, if we were to
然而 如果我们

23
00:00:45,310 --> 00:00:47,175
fit an overly high order
用一个更高次的

24
00:00:47,210 --> 00:00:48,823
degree polynomial, we end
多项式去拟合 我们最终

25
00:00:48,850 --> 00:00:50,111
up with a curve that may fit
可能得到一个曲线

26
00:00:50,111 --> 00:00:51,760
the training set very well, but,
能非常好地拟合训练集 但是

27
00:00:51,760 --> 00:00:53,381
really not be a,
这真的不是一个好的结果

28
00:00:53,420 --> 00:00:54,497
but overfit the data
它过度拟合了数据

29
00:00:54,497 --> 00:00:57,225
poorly, and, not generalize well.
因此 一般性并不是很好

30
00:00:57,900 --> 00:01:00,453
Consider the following, suppose we
让我们考虑下面的假设

31
00:01:00,453 --> 00:01:02,088
were to penalize, and, make
我们想要加上惩罚项 从而使

32
00:01:02,088 --> 00:01:04,753
the parameters theta 3 and theta 4 really small.
参数 θ3 和 θ4 足够的小

33
00:01:04,753 --> 00:01:06,543
Here's what I
这里我的意思就是

34
00:01:06,543 --> 00:01:09,676
mean, here is our optimization
这是我们的优化目标

35
00:01:09,690 --> 00:01:10,859
objective, or here is our
或者客观的说 这就是我们需要

36
00:01:10,870 --> 00:01:12,574
optimization problem, where we minimize
优化的问题 我们需要尽量减少

37
00:01:12,580 --> 00:01:15,526
our usual squared error cause function.
代价函数的均方误差

38
00:01:15,526 --> 00:01:17,350
Let's say I take this objective
对于这个函数

39
00:01:17,370 --> 00:01:19,125
and modify it and add
我们对它进行一些 添加一些项

40
00:01:19,160 --> 00:01:23,291
to it, plus 1000 theta
加上 1000 乘以 θ3 的平方

41
00:01:23,291 --> 00:01:28,334
3 squared, plus 1000 theta 4 squared.
再加上 1000 乘以 θ4 的平方

42
00:01:28,334 --> 00:01:32,354
1000 I am just writing down as some huge number.
1000 只是我随便写的某个较大的数字而已

43
00:01:32,354 --> 00:01:33,538
Now, if we were to
现在 如果我们要

44
00:01:33,540 --> 00:01:35,127
minimize this function, the
最小化这个函数

45
00:01:35,140 --> 00:01:36,688
only way to make this
为了使这个

46
00:01:36,710 --> 00:01:38,620
new cost function small is
新的代价函数最小化

47
00:01:38,620 --> 00:01:40,769
if theta 3 and data
我们要让 θ3 和 θ4

48
00:01:40,769 --> 00:01:42,133
4 are small, right?
尽可能小 对吧？

49
00:01:42,133 --> 00:01:43,264
Because otherwise, if you have
因为 如果你有

50
00:01:43,264 --> 00:01:44,956
a thousand times theta 3, this
1000 乘以 θ3 这个

51
00:01:44,970 --> 00:01:48,103
new cost functions gonna be big.
新的代价函数将会是很大的

52
00:01:48,140 --> 00:01:49,245
So when we minimize this
所以 当我们最小化

53
00:01:49,245 --> 00:01:50,402
new function we are going
这个新的函数时 我们将使

54
00:01:50,402 --> 00:01:52,107
to end up with theta 3
θ3 的值

55
00:01:52,110 --> 00:01:53,776
close to 0 and theta
接近于0

56
00:01:53,776 --> 00:01:56,700
4 close to 0, and as
θ4 的值也接近于0

57
00:01:56,700 --> 00:01:59,691
if we're getting rid
就像我们忽略了

58
00:01:59,691 --> 00:02:03,206
of these two terms over there.
这两个值一样

59
00:02:03,710 --> 00:02:05,282
And if we do that, well then,
如果我们做到这一点

60
00:02:05,290 --> 00:02:06,783
if theta 3 and theta 4
如果 θ3 和 θ4

61
00:02:06,783 --> 00:02:07,973
close to 0 then we are
接近0 那么我们

62
00:02:07,973 --> 00:02:09,643
being left with a quadratic function,
将得到一个近似的二次函数

63
00:02:09,643 --> 00:02:11,089
and, so, we end up with
所以 我们最终

64
00:02:11,110 --> 00:02:13,343
a fit to the data, that's, you know, quadratic
恰当地拟合了数据  你知道

65
00:02:13,343 --> 00:02:15,463
function plus maybe, tiny
二次函数加上一些项

66
00:02:15,463 --> 00:02:17,856
contributions from small terms,
这些很小的项 贡献很小

67
00:02:17,860 --> 00:02:20,207
theta 3, theta 4, that they may be very close to 0.
因为 θ3 θ4 它们是非常接近于0的

68
00:02:20,207 --> 00:02:27,293
And, so, we end up with
所以 我们最终得到了

69
00:02:27,293 --> 00:02:29,386
essentially, a quadratic function, which is good.
实际上 很好的一个二次函数

70
00:02:29,386 --> 00:02:30,544
Because this is a
因为这是一个

71
00:02:30,544 --> 00:02:34,060
much better hypothesis.
更好的假设

72
00:02:34,104 --> 00:02:36,666
In this particular example, we looked at the effect
在这个具体的例子中 我们看到了

73
00:02:36,700 --> 00:02:39,023
of penalizing two of
惩罚这两个

74
00:02:39,023 --> 00:02:41,446
the parameter values being large.
大的参数值的效果

75
00:02:41,446 --> 00:02:46,510
More generally, here is the idea behind regularization.
更一般地 这里给出了正规化背后的思路

76
00:02:46,980 --> 00:02:48,924
The idea is that, if we
这种思路就是 如果我们

77
00:02:48,924 --> 00:02:50,303
have small values for the
的参数值

78
00:02:50,303 --> 00:02:53,083
parameters, then, having
对应一个较小值的话

79
00:02:53,083 --> 00:02:55,250
small values for the parameters,
就是说 参数值比较小

80
00:02:55,250 --> 00:02:57,866
will somehow, will usually correspond
那么往往我们会得到一个

81
00:02:57,866 --> 00:03:00,386
to having a simpler hypothesis.
形式更简单的假设

82
00:03:00,386 --> 00:03:02,279
So, for our last example, we
所以 我们最后一个例子中

83
00:03:02,279 --> 00:03:04,024
penalize just theta 3 and
我们惩罚的只是 θ3 和

84
00:03:04,024 --> 00:03:05,666
theta 4 and when both
θ4 使这两个

85
00:03:05,666 --> 00:03:07,046
of these were close to zero,
值均接近于零

86
00:03:07,046 --> 00:03:08,450
we wound up with a much simpler
我们得到了一个更简单的假设

87
00:03:08,480 --> 00:03:12,549
hypothesis that was essentially a quadratic function.
也即这个假设大抵上是一个二次函数

88
00:03:12,549 --> 00:03:13,991
But more broadly, if we penalize all
但更一般地说 如果我们就像这样

89
00:03:13,991 --> 00:03:15,989
the parameters usually that, we
惩罚的其它参数 通常我们

90
00:03:15,989 --> 00:03:17,416
can think of that, as trying
可以把它们都想成是

91
00:03:17,420 --> 00:03:19,076
to give us a simpler hypothesis
得到一个更简单的假设

92
00:03:19,110 --> 00:03:20,943
as well because when, you
因为你知道

93
00:03:20,943 --> 00:03:22,380
know, these parameters are
当这些参数越接近这个例子时

94
00:03:22,410 --> 00:03:23,700
as close as you in this
假设的结果越接近

95
00:03:23,700 --> 00:03:26,105
example, that gave us a quadratic function.
一个二次函数

96
00:03:26,105 --> 00:03:29,038
But more generally, it is
但更一般地

97
00:03:29,038 --> 00:03:30,493
possible to show that having
可以表明

98
00:03:30,530 --> 00:03:32,536
smaller values of the parameters
这些参数的值越小

99
00:03:32,540 --> 00:03:34,416
corresponds to usually smoother
通常对应于越光滑的函数

100
00:03:34,416 --> 00:03:36,780
functions as well for the simpler.
也就是更加简单的函数

101
00:03:36,780 --> 00:03:41,667
And which are therefore, also, less prone to overfitting.
因此 就不易发生过拟合的问题

102
00:03:41,680 --> 00:03:43,245
I realize that the reasoning for
我知道

103
00:03:43,245 --> 00:03:45,441
why having all the parameters be small.
为什么要所有的部分参数变小的这些原因

104
00:03:45,441 --> 00:03:46,944
Why that corresponds to a simpler
为什么越小的参数对应于一个简单的假设

105
00:03:46,960 --> 00:03:48,916
hypothesis; I realize that
我知道这些原因

106
00:03:48,916 --> 00:03:51,572
reasoning may not be entirely clear to you right now.
对你来说现在不一定完全理解

107
00:03:51,590 --> 00:03:52,784
And it is kind of hard
但现在解释起来确实比较困难

108
00:03:52,784 --> 00:03:54,477
to explain unless you implement
除非你自己实现一下

109
00:03:54,480 --> 00:03:56,446
yourself and see it for yourself.
自己亲自运行了这部分

110
00:03:56,470 --> 00:03:58,247
But I hope that the example of
但是我希望 这个例子中

111
00:03:58,247 --> 00:03:59,610
having theta 3 and theta
使 θ3 和 θ4

112
00:03:59,650 --> 00:04:01,230
4 be small and how
很小 并且这样做

113
00:04:01,230 --> 00:04:02,535
that gave us a simpler
能给我们一个更加简单的

114
00:04:02,540 --> 00:04:04,776
hypothesis, I hope that
假设 我希望这个例子

115
00:04:04,800 --> 00:04:06,314
helps explain why, at least give
有助于解释原因 至少给了

116
00:04:06,330 --> 00:04:09,320
some intuition as to why this might be true.
我们一些直观感受 为什么这应该是这样的

117
00:04:09,320 --> 00:04:11,476
Lets look at the specific example.
来让我们看看具体的例子

118
00:04:12,010 --> 00:04:13,873
For housing price prediction we
对于房屋价格预测我们

119
00:04:13,873 --> 00:04:15,465
may have our hundred features
可能有上百种特征

120
00:04:15,480 --> 00:04:17,223
that we talked about where may
我们谈到了一些可能的特征

121
00:04:17,250 --> 00:04:18,756
be x1 is the size, x2
比如说 x1 是房屋的尺寸

122
00:04:18,756 --> 00:04:20,096
is the number of bedrooms, x3
x2 是卧室的数目

123
00:04:20,096 --> 00:04:21,963
is the number of floors and so on.
x3 是房屋的层数等等

124
00:04:21,963 --> 00:04:24,502
And we may we may have a hundred features.
那么我们可能就有一百个特征

125
00:04:24,502 --> 00:04:26,896
And unlike the polynomial
跟前面的多项式例子不同

126
00:04:26,920 --> 00:04:28,459
example, we don't know, right,
我们是不知道的 对吧

127
00:04:28,460 --> 00:04:29,826
we don't know that theta 3,
我们不知道 θ3

128
00:04:29,826 --> 00:04:32,641
theta 4, are the high order polynomial terms.
θ4 是高阶多项式的项

129
00:04:32,641 --> 00:04:34,515
So, if we have just a
所以 如果我们有一个袋子

130
00:04:34,540 --> 00:04:35,863
bag, if we have just a
如果我们有一百个特征

131
00:04:35,863 --> 00:04:38,074
set of a hundred features, it's hard
在这个袋子里 我们是很难

132
00:04:38,100 --> 00:04:40,210
to pick in advance which are
提前选出那些

133
00:04:40,260 --> 00:04:42,729
the ones that are less likely to be relevant.
关联度更小的特征的

134
00:04:42,729 --> 00:04:45,773
So we have a hundred or a hundred one parameters.
也就是说如果我们有一百或一百零一个参数

135
00:04:45,780 --> 00:04:47,340
And we don't know which
我们不知道

136
00:04:47,340 --> 00:04:48,987
ones to pick, we
挑选哪一个

137
00:04:49,010 --> 00:04:50,445
don't know which
我们并不知道

138
00:04:50,450 --> 00:04:54,272
parameters to try to pick, to try to shrink.
如何选择参数 如何缩小参数的数目

139
00:04:54,430 --> 00:04:56,237
So, in regularization, what we're
因此在正规化里

140
00:04:56,237 --> 00:04:58,438
going to do, is take our
我们要做的事情 就是把我们的

141
00:04:58,438 --> 00:05:01,213
cost function, here's my cost function for linear regression.
代价函数 这里就是线性回归的代价函数

142
00:05:01,213 --> 00:05:02,656
And what I'm going to do
我现在要做的就是

143
00:05:02,660 --> 00:05:04,326
is, modify this cost
来修改这个代价函数

144
00:05:04,340 --> 00:05:06,246
function to shrink all
从而缩小

145
00:05:06,270 --> 00:05:07,643
of my parameters, because, you know,
我所有的参数值 因为你知道

146
00:05:07,643 --> 00:05:09,059
I don't know which
我不知道是哪个

147
00:05:09,059 --> 00:05:10,440
one or two to try to shrink.
哪一个或两个要去缩小

148
00:05:10,440 --> 00:05:11,690
So I am going to modify my
所以我就修改我的

149
00:05:11,690 --> 00:05:16,732
cost function to add a term at the end.
代价函数 在这后面添加一项

150
00:05:17,390 --> 00:05:20,436
Like so we have square brackets here as well.
就像我们在方括号里的这项

151
00:05:20,440 --> 00:05:22,212
When I add an extra
当我添加一个额外的

152
00:05:22,212 --> 00:05:23,516
regularization term at the
正则化项的时候

153
00:05:23,530 --> 00:05:25,510
end to shrink every
我们收缩了每个

154
00:05:25,560 --> 00:05:27,286
single parameter and so this
参数 并且因此

155
00:05:27,320 --> 00:05:28,745
term we tend to shrink
我们会使

156
00:05:28,760 --> 00:05:30,747
all of my parameters theta 1,
我们所有的参数 θ1

157
00:05:30,747 --> 00:05:32,746
theta 2, theta 3 up
θ2 θ3

158
00:05:32,746 --> 00:05:35,490
to theta 100.
直到 θ100 的值变小

159
00:05:36,790 --> 00:05:39,629
By the way, by convention the summation
顺便说一下 按照惯例来讲

160
00:05:39,629 --> 00:05:41,007
here starts from one so I
我们从第一个这里开始

161
00:05:41,007 --> 00:05:43,341
am not actually going penalize theta
所以我实际上没有去惩罚 θ0

162
00:05:43,360 --> 00:05:45,416
zero being large.
因此 θ0 的值是大的

163
00:05:45,470 --> 00:05:46,435
That sort of the convention that,
这就是一个约定

164
00:05:46,435 --> 00:05:48,664
the sum I equals one through
从1到 n 的求和

165
00:05:48,664 --> 00:05:50,185
N, rather than I equals zero
而不是从0到 n 的求和

166
00:05:50,190 --> 00:05:51,953
through N. But in practice,
但其实在实践中

167
00:05:51,960 --> 00:05:53,464
it makes very little difference, and,
这只会有非常小的差异

168
00:05:53,490 --> 00:05:54,788
whether you include, you know,
无论你是否包括这项

169
00:05:54,788 --> 00:05:56,221
theta zero or not, in
就是 θ0 这项

170
00:05:56,221 --> 00:05:59,532
practice, make very little difference to the results.
实际上 结果只有非常小的差异

171
00:05:59,540 --> 00:06:01,804
But by convention, usually, we regularize
但是按照惯例 通常情况下我们还是只

172
00:06:01,804 --> 00:06:03,356
only theta  through theta
从 θ1 到 θ100 进行正规化

173
00:06:03,360 --> 00:06:06,084
100. Writing down
这里我们写下来

174
00:06:06,084 --> 00:06:08,978
our regularized optimization objective,
我们的正规化优化目标

175
00:06:08,978 --> 00:06:10,655
our regularized cost function again.
我们的正规化后的代价函数

176
00:06:10,655 --> 00:06:11,718
Here it is. Here's J of
就是这样的

177
00:06:11,718 --> 00:06:13,903
theta where, this term
J(θ) 这个项

178
00:06:13,970 --> 00:06:15,863
on the right is a regularization
右边的这项就是一个正则化项

179
00:06:15,863 --> 00:06:17,548
term and lambda
并且 λ

180
00:06:17,570 --> 00:06:23,950
here is called the regularization parameter and
在这里我们称做正规化参数

181
00:06:23,973 --> 00:06:26,334
what lambda does, is it
λ 要做的就是控制

182
00:06:26,334 --> 00:06:28,480
controls a trade off
在两个不同的目标中

183
00:06:28,510 --> 00:06:30,636
between two different goals.
的一个平衡关系

184
00:06:30,636 --> 00:06:32,478
The first goal, capture it
第一个目标

185
00:06:32,500 --> 00:06:34,399
by the first goal objective, is
第一个需要抓住的目标

186
00:06:34,399 --> 00:06:36,081
that we would like to train,
就是我们想要训练

187
00:06:36,090 --> 00:06:38,350
is that we would like to fit the training data well.
使假设更好地拟合训练数据

188
00:06:38,390 --> 00:06:41,083
We would like to fit the training set well.
我们希望假设能够很好的适应训练集

189
00:06:41,083 --> 00:06:42,954
And the second goal is,
而第二个目标是

190
00:06:42,954 --> 00:06:44,474
we want to keep the parameters
我们想要保持参数值较小

191
00:06:44,474 --> 00:06:46,053
small, and that's captured by
这就是第二项的目标

192
00:06:46,060 --> 00:06:49,103
the second term, by the regularization objective. And by the regularization term.
通过正则化目标函数

193
00:06:49,103 --> 00:06:53,583
And what lambda, the regularization
这就是λ 这个正则化

194
00:06:53,583 --> 00:06:55,937
parameter does is the controls the trade of
参数需要控制的

195
00:06:55,937 --> 00:06:57,694
between these two
它会这两者之间的平衡

196
00:06:57,694 --> 00:06:58,938
goals, between the goal of fitting the training set well
目标就是平衡拟合训练的目的

197
00:06:58,960 --> 00:07:00,562
and the
和

198
00:07:00,562 --> 00:07:02,043
goal of keeping the parameter plan
保持参数值较小的目的

199
00:07:02,080 --> 00:07:05,688
small and therefore keeping the hypothesis relatively
从而来保持假设的形式相对简单

200
00:07:05,688 --> 00:07:09,134
simple to avoid overfitting.
来避免过度的拟合

201
00:07:09,290 --> 00:07:11,026
For our housing price prediction
对于我们的房屋价格预测来说

202
00:07:11,030 --> 00:07:13,026
example, whereas, previously, if
这个例子 尽管我们之前有

203
00:07:13,030 --> 00:07:14,256
we had fit a very high
我们已经用非常高的

204
00:07:14,256 --> 00:07:15,968
order polynomial, we may
高阶多项式来拟合 我们将会

205
00:07:15,968 --> 00:07:17,461
have wound up with a very,
得到一个

206
00:07:17,480 --> 00:07:19,020
sort of wiggly or curvy function like
非常弯曲和复杂的曲线函数

207
00:07:19,020 --> 00:07:22,460
this. If you still fit a high order polynomial
就像这个 如果你还是用高阶多项式拟合

208
00:07:22,460 --> 00:07:24,120
with all the polynomial
就是用这里所有的多项式特征来拟合的话

209
00:07:24,120 --> 00:07:26,038
features in there, but instead,
但现在我们不这样了

210
00:07:26,038 --> 00:07:27,956
you just make sure, to use
你只需要确保使用了

211
00:07:27,970 --> 00:07:30,798
this sole of regularized objective, then what
正规化目标的方法

212
00:07:30,798 --> 00:07:32,272
you can get out is in
那么你就可以得到

213
00:07:32,272 --> 00:07:34,332
fact a curve that isn't
实际上是一个曲线 但这个曲线不是

214
00:07:34,340 --> 00:07:36,465
quite a quadratic function, but is
一个真正的二次函数

215
00:07:36,490 --> 00:07:38,510
much smoother and much simpler
而是更加的流畅和简单

216
00:07:38,510 --> 00:07:39,870
and maybe a curve like the magenta
也许就像这条紫红色的曲线一样

217
00:07:39,870 --> 00:07:42,261
line that, you know, gives a
那么 你知道的

218
00:07:42,261 --> 00:07:45,445
much better hypothesis for this data.
这样就得到了对于这个数据更好的假设

219
00:07:45,445 --> 00:07:46,613
Once again, I realize
再一次说明下

220
00:07:46,613 --> 00:07:47,919
it can be a bit difficult to see why strengthening the
我了解这部分有点难以明白 为什么加上

221
00:07:47,919 --> 00:07:50,064
parameters can have
参数的影响可以具有

222
00:07:50,064 --> 00:07:51,668
this effect, but if you
这种效果 但如果你

223
00:07:51,690 --> 00:07:54,584
implement yourselves with regularization
亲自实现了正规化

224
00:07:54,650 --> 00:07:56,063
you will be able to see
你将能够看到

225
00:07:56,090 --> 00:07:58,859
this effect firsthand.
这种影响的最直观的感受

226
00:08:00,620 --> 00:08:02,777
In regularized linear regression, if
在正规化线性回归中 如果

227
00:08:02,777 --> 00:08:05,748
the regularization parameter monitor
正规化参数值

228
00:08:05,748 --> 00:08:07,669
is set to be very large,
被设定为非常大

229
00:08:07,669 --> 00:08:09,542
then what will happen is
那么将会发生什么呢？

230
00:08:09,542 --> 00:08:11,698
we will end up penalizing the
我们将会非常大地惩罚

231
00:08:11,698 --> 00:08:13,513
parameters theta 1, theta
参数θ1 θ2

232
00:08:13,520 --> 00:08:15,207
2, theta 3, theta
θ3 θ4

233
00:08:15,230 --> 00:08:17,409
4 very highly.
也就是说

234
00:08:17,430 --> 00:08:21,916
That is, if our hypothesis is this is one down at the bottom.
如果我们的假设是底下的这个

235
00:08:21,930 --> 00:08:23,674
And if we end up penalizing
如果我们最终惩罚

236
00:08:23,674 --> 00:08:24,913
theta 1, theta 2, theta
θ1 θ2 θ3

237
00:08:24,990 --> 00:08:26,145
3, theta 4 very heavily, then we
θ4 在一个非常大的程度 那么我们

238
00:08:26,145 --> 00:08:29,463
end up with all of these parameters close to zero, right?
会使所有这些参数接近于零的 对不对？

239
00:08:29,463 --> 00:08:32,240
Theta 1 will be close to zero; theta 2 will be close to zero.
θ1 将接近零   θ2 将接近零

240
00:08:32,240 --> 00:08:34,410
Theta three and theta four
θ3 和 θ4

241
00:08:34,410 --> 00:08:36,646
will end up being close to zero.
最终也会接近于零

242
00:08:36,646 --> 00:08:37,810
And if we do that, it's as
如果我们这么做 那么就是

243
00:08:37,810 --> 00:08:39,143
if we're getting rid of these
我们的假设中

244
00:08:39,160 --> 00:08:41,189
terms in the hypothesis so that
相当于去掉了这些项 并且使

245
00:08:41,189 --> 00:08:43,597
we're just left with a hypothesis
我们只是留下了一个简单的假设

246
00:08:43,597 --> 00:08:44,224
that will say that.
这个假设只能表明

247
00:08:44,230 --> 00:08:46,020
It says that, well, housing
那就是 房屋价格

248
00:08:46,020 --> 00:08:48,624
prices are equal to theta zero,
就等于 θ0 的值

249
00:08:48,650 --> 00:08:50,830
and that is akin to fitting
那就是类似于拟合了

250
00:08:50,830 --> 00:08:54,679
a flat horizontal straight line to the data.
一条水平直线 对于数据来说

251
00:08:54,679 --> 00:08:56,533
And this is an
这就是一个

252
00:08:56,570 --> 00:08:58,773
example of underfitting, and
欠拟合 (underfitting)

253
00:08:58,773 --> 00:09:00,926
in particular this hypothesis, this
这种情况下这一假设

254
00:09:00,950 --> 00:09:02,552
straight line it just fails
它是条失败的直线

255
00:09:02,570 --> 00:09:04,063
to fit the training set
对于训练集来说

256
00:09:04,070 --> 00:09:05,423
well. It's just a fat straight
这只是一条平滑直线

257
00:09:05,423 --> 00:09:07,173
line, it doesn't go, you know, go near.
它没有任何趋势

258
00:09:07,173 --> 00:09:10,432
It doesn't go anywhere near most of the training examples.
它不会去趋向大部分训练样本的任何值

259
00:09:10,432 --> 00:09:11,592
And another way of saying this
这句话的另??一种方式来表达就是

260
00:09:11,592 --> 00:09:13,697
is that this hypothesis has
这种假设有

261
00:09:13,720 --> 00:09:15,410
too strong a preconception or
过于强烈的"偏见" 或者

262
00:09:15,450 --> 00:09:17,091
too high bias that housing
过高的偏差 (bais)

263
00:09:17,120 --> 00:09:18,446
prices are just equal
认为预测的价格只是

264
00:09:18,460 --> 00:09:20,183
to theta zero, and despite
等于 θ0 并且

265
00:09:20,230 --> 00:09:22,123
the clear data to the contrary,
尽管我们的数据集

266
00:09:22,123 --> 00:09:23,207
you know chooses to fit a sort
选择去拟合一条

267
00:09:23,207 --> 00:09:25,648
of, flat line, just a
扁平的直线 仅仅是一条

268
00:09:25,650 --> 00:09:28,230
flat horizontal line. I didn't draw that very well.
扁平的水平线  我画得不好

269
00:09:28,230 --> 00:09:30,447
This just a horizontal flat line
对于数据来说

270
00:09:30,447 --> 00:09:33,059
to the data. So for
这只是一条水平线 因此

271
00:09:33,060 --> 00:09:35,626
regularization to work well, some
为了使正则化运作良好

272
00:09:35,626 --> 00:09:37,835
care should be taken,
我们应当注意一些方面

273
00:09:37,850 --> 00:09:39,903
to choose a good choice for
应该去选择一个不错的

274
00:09:39,903 --> 00:09:42,991
the regularization parameter lambda as well.
正则化参数 λ

275
00:09:42,991 --> 00:09:44,908
And when we talk about multi-selection
并且当我们以后讲到多重选择时

276
00:09:44,920 --> 00:09:46,717
later in this course, we'll talk
在后面的课程中 我们将讨论

277
00:09:46,717 --> 00:09:48,413
about a way, a variety
一种方法

278
00:09:48,420 --> 00:09:50,803
of ways for automatically choosing
一系列的方法来自动选择

279
00:09:50,810 --> 00:09:54,833
the regularization parameter lambda as well. So, that's
正则化参数 λ  所以

280
00:09:54,833 --> 00:09:56,570
the idea of the high regularization
这就是高度正则化的思路

281
00:09:56,570 --> 00:09:58,254
and the cost function reviews in
回顾一下代价函数

282
00:09:58,254 --> 00:10:00,454
order to use regularization In the
为了使用正则化

283
00:10:00,454 --> 00:10:01,885
next two videos, lets take
在接下来的两段视频中 让我们

284
00:10:01,885 --> 00:10:03,736
these ideas and apply them
把这些概念 应用到

285
00:10:03,750 --> 00:10:05,440
to linear regression and to
到线性回归和

286
00:10:05,440 --> 00:10:07,111
logistic regression, so that
逻辑回归中去

287
00:10:07,111 --> 00:10:09,020
we can then get them to
那么我们就可以让他们

288
00:10:09,060 --> 00:10:10,982
avoid overfitting.
避免过度拟合了