srt/10 - 6 - Learning Curves (12 min).srt

﻿1
00:00:00,090 --> 00:00:02,040
In this video, I'd like to tell you about learning curves.
本节课我们介绍学习曲线
(字幕整理：中国海洋大学 黄海广,haiguang2000@qq.com )

2
00:00:03,310 --> 00:00:05,850
Learning curves is often a very useful thing to plot.
绘制学习曲线非常有用

3
00:00:06,710 --> 00:00:08,170
If either you wanted to sanity check
也许你想检查你的学习算法

4
00:00:08,430 --> 00:00:09,590
that your algorithm is working correctly,
运行是否一切正常

5
00:00:10,400 --> 00:00:12,730
or if you want to improve the performance of the algorithm.
或者你希望改进算法的表现或效果

6
00:00:13,950 --> 00:00:15,200
And learning curves is a
那么学习曲线

7
00:00:15,310 --> 00:00:16,410
tool that I actually use
就是一种很好的工具

8
00:00:16,820 --> 00:00:17,920
very often to try to
我经常使用学习曲线

9
00:00:18,290 --> 00:00:20,030
diagnose if a physical learning algorithm may be
来判断某一个学习算法

10
00:00:20,180 --> 00:00:23,220
suffering from bias, sort of variance problem or a bit of both.
是否处于偏差 方差问题 或是二者皆有

11
00:00:27,170 --> 00:00:28,070
Here's what a learning curve is.
下面我们就来介绍学习曲线

12
00:00:28,830 --> 00:00:30,550
To plot a learning curve, what
为了绘制一条学习曲线

13
00:00:30,700 --> 00:00:31,760
I usually do is plot
我通常先绘制出Jtrain

14
00:00:32,210 --> 00:00:33,950
j train which is, say,
也就是训练集数据的

15
00:00:35,030 --> 00:00:36,050
average squared error on my training
平均误差平方和

16
00:00:36,440 --> 00:00:39,090
set or Jcv which is
或者Jcv 也即交叉验证集数据的

17
00:00:39,340 --> 00:00:41,130
the average squared error on my cross validation set.
平均误差平方和

18
00:00:41,590 --> 00:00:42,900
And I'm going to plot
我要将其绘制成一个

19
00:00:43,140 --> 00:00:44,160
that as a function
关于参数m的函数

20
00:00:44,500 --> 00:00:46,380
of m, that is as a function
也就是一个关于训练集

21
00:00:47,230 --> 00:00:51,260
of the number of training examples I have.
样本总数的函数

22
00:00:51,950 --> 00:00:53,420
And so m is usually a constant like maybe I just have, you know, a 100
所以m一般都是一个常数 比如m等于100

23
00:00:53,650 --> 00:00:55,220
training examples but what I'm
表示100组训练样本

24
00:00:55,330 --> 00:00:57,670
going to do is artificially with
但我要自己取一些m的值

25
00:00:57,860 --> 00:00:59,280
use my training set exercise. So, I
也就是说我要自行对m的取值

26
00:00:59,500 --> 00:01:01,460
deliberately limit myself to using only,
做一点限制

27
00:01:01,840 --> 00:01:03,440
say, 10 or 20 or
比如说我取10 20或者

28
00:01:03,660 --> 00:01:06,040
30 or 40 training examples and
30 40组训练集

29
00:01:06,170 --> 00:01:07,610
plot what the training error is and
然后绘出训练集误差

30
00:01:07,740 --> 00:01:09,640
what the cross validation is for this
以及交叉验证集误差

31
00:01:10,040 --> 00:01:12,260
smallest training set exercises. So
好的 那么我们来看看

32
00:01:12,620 --> 00:01:14,090
let's see what these plots may look
这条曲线绘制出来是什么样子

33
00:01:14,270 --> 00:01:15,530
like. Suppose I have only
假设我只有一组训练样本

34
00:01:15,730 --> 00:01:17,210
one training example like that
也即m=1

35
00:01:17,390 --> 00:01:18,450
shown in this this first example
正如第一幅图中所示

36
00:01:18,860 --> 00:01:19,970
here and let's say I'm fitting a quadratic function. Well, I
并且假设使用二次函数来拟合模型

37
00:01:22,470 --> 00:01:24,490
have only one training example. I'm
那么由于我只有一个训练样本

38
00:01:25,040 --> 00:01:26,100
going to be able to fit it perfectly
拟合的结果很明显会很好

39
00:01:26,650 --> 00:01:28,590
right? You know, just fit the quadratic function. I'm
是吧 用二次函数来拟合

40
00:01:28,760 --> 00:01:30,000
going to have 0
对这一个训练样本拟合

41
00:01:30,150 --> 00:01:32,240
error on the one training example. If I
其误差一定为0

42
00:01:32,570 --> 00:01:34,170
have two training examples. Well the quadratic function can also fit that very well. So,
如果有两组训练样本 二次函数也能很好地拟合

43
00:01:37,050 --> 00:01:38,550
even if I am using regularization,
即使是使用正则化

44
00:01:38,750 --> 00:01:40,220
I can probably fit this quite well.
拟合的结果也会很好

45
00:01:41,080 --> 00:01:41,970
And if I am using no neural regularization,
而如果不使用正则化的话

46
00:01:42,030 --> 00:01:45,200
I'm going to fit this perfectly and
那么拟合效果绝对棒极了

47
00:01:45,440 --> 00:01:46,400
if I have three training examples
如果我用三组训练样本的话

48
00:01:47,260 --> 00:01:48,380
again. Yeah, I can fit a quadratic
好吧 看起来依然能很好地

49
00:01:48,660 --> 00:01:51,320
function perfectly so if
用二次函数拟合

50
00:01:51,550 --> 00:01:52,590
m equals 1 or m equals 2 or m equals 3,
也就是说 当m等于1 m=2 或m=3时

51
00:01:54,850 --> 00:01:56,770
my training error
对训练集数据进行预测

52
00:01:57,350 --> 00:01:58,870
on my training set is
得到的训练集误差

53
00:01:59,110 --> 00:02:01,180
going to be 0 assuming I'm
都将等于0

54
00:02:01,220 --> 00:02:02,760
not using regularization or it may
这里假设我不使用正则化

55
00:02:03,150 --> 00:02:04,290
slightly large in 0 if
当然如果使用正则化

56
00:02:04,560 --> 00:02:06,400
I'm using regularization and
那么误差就稍大于0

57
00:02:06,500 --> 00:02:07,350
by the way if I have
顺便提醒一下

58
00:02:07,740 --> 00:02:08,980
a large training set and I'm artificially
如果我的训练集样本很大

59
00:02:09,940 --> 00:02:11,040
restricting the size of my
而我要人为地限制训练集

60
00:02:11,120 --> 00:02:13,080
training set in order to J train.
样本的容量

61
00:02:13,830 --> 00:02:14,770
Here if I set
比如说这里

62
00:02:15,110 --> 00:02:16,720
M equals 3, say, and I
我将m值设为3

63
00:02:17,040 --> 00:02:18,290
train on only three examples,
然后我仅用这三组样本进行训练

64
00:02:19,270 --> 00:02:21,030
then, for this figure I
然后对应到这个图中

65
00:02:21,110 --> 00:02:22,430
am going to measure my training error
我只看对这三组训练样本

66
00:02:22,830 --> 00:02:24,450
only on the three examples that
进行预测得到的训练误差

67
00:02:24,550 --> 00:02:25,580
actually fit my data too
也是和我模型拟合的三组样本

68
00:02:27,150 --> 00:02:28,130
and so even I have to say
所以即使我有100组训练样本

69
00:02:28,290 --> 00:02:31,160
a 100 training examples but if I want to plot what my
而我还是想绘制

70
00:02:31,430 --> 00:02:32,620
training error is the m equals 3. What I'm going to do
当m等于3时的训练误差

71
00:02:34,270 --> 00:02:35,200
is to measure the
那么我要关注的仍然是

72
00:02:35,340 --> 00:02:36,660
training error on the
对这三组训练样本进行预测的误差

73
00:02:36,750 --> 00:02:39,870
three examples that I've actually fit to my hypothesis 2.
同样 这三组样本也是我们用来拟合模型的三组样本

74
00:02:41,290 --> 00:02:42,900
And not all the other examples that I have
所有其他的样本

75
00:02:43,010 --> 00:02:44,940
deliberately omitted from the training
我都在训练过程中选择性忽略了

76
00:02:45,140 --> 00:02:46,750
process. So just to summarize what we've
好的 总结一下

77
00:02:46,960 --> 00:02:48,460
seen is that if the training set
我们现在已经看到

78
00:02:48,820 --> 00:02:50,560
size is small then the
当训练样本容量m很小的时候

79
00:02:50,630 --> 00:02:52,630
training error is going to be small as well.
训练误差也会很小

80
00:02:52,960 --> 00:02:53,900
Because you know, we have a
因为很显然

81
00:02:53,930 --> 00:02:55,150
small training set is
如果我们训练集很小

82
00:02:55,350 --> 00:02:56,790
going to be very easy to
那么很容易就能把

83
00:02:56,900 --> 00:02:58,080
fit your training set
训练集拟合到很好

84
00:02:58,720 --> 00:02:59,490
very well may be even
甚至拟合得天衣无缝

85
00:02:59,790 --> 00:03:02,970
perfectly now say
现在我们来看

86
00:03:03,190 --> 00:03:04,460
we have m equals 4 for example. Well then
当m等于4的时候

87
00:03:04,680 --> 00:03:06,800
a quadratic function can be
好吧 二次函数似乎也能

88
00:03:06,920 --> 00:03:07,900
a longer fit this data set
对数据拟合得很好

89
00:03:08,100 --> 00:03:09,680
perfectly and if I
那我们再看

90
00:03:09,790 --> 00:03:11,350
have m equals 5 then you
当m等于5的情况

91
00:03:11,460 --> 00:03:13,830
know, maybe quadratic function will fit to stay there so
这时候再用二次函数来拟合

92
00:03:14,090 --> 00:03:15,940
so, then as my training set gets larger.
好像效果有下降但还是差强人意

93
00:03:16,980 --> 00:03:18,460
It becomes harder and harder to
而当我的训练集越来越大的时候

94
00:03:18,620 --> 00:03:19,860
ensure that I can
你不难发现 要保证使用二次函数

95
00:03:20,060 --> 00:03:21,820
find the quadratic function that process through
的拟合效果依然很好

96
00:03:21,960 --> 00:03:25,460
all my examples perfectly. So
就显得越来越困难了

97
00:03:25,840 --> 00:03:27,300
in fact as the training set size
因此 事实上随着训练集容量的增大

98
00:03:27,690 --> 00:03:28,770
grows what you find
我们不难发现

99
00:03:29,300 --> 00:03:30,960
is that my average training error
我们的平均训练误差

100
00:03:31,310 --> 00:03:33,080
actually increases and so if you plot
是逐渐增大的

101
00:03:33,500 --> 00:03:34,650
this figure what you find
因此如果你画出这条曲线

102
00:03:35,220 --> 00:03:36,860
is that the training set
你就会发现

103
00:03:37,130 --> 00:03:38,520
error that is the average
训练集误差 也就是

104
00:03:38,940 --> 00:03:40,660
error on your hypothesis grows
对假设进行预测的误差平均值

105
00:03:41,300 --> 00:03:44,730
as m grows and just to repeat when the intuition is that when
随着m的增大而增大

106
00:03:45,020 --> 00:03:46,200
m is small when you have very
再重复一遍对这一问题的理解

107
00:03:46,500 --> 00:03:48,070
few training examples. It's pretty
当训练样本很少的时候

108
00:03:48,350 --> 00:03:49,420
easy to fit every single
对每一个训练样本

109
00:03:49,790 --> 00:03:51,350
one of your training examples perfectly and
都能很容易地拟合到很好

110
00:03:51,610 --> 00:03:52,840
so your error is going
所以训练误差将会很小

111
00:03:52,940 --> 00:03:54,540
to be small whereas
而反过来

112
00:03:54,710 --> 00:03:56,100
when m is larger then gets
当m的值逐渐增大

113
00:03:56,460 --> 00:03:57,900
harder all the training
那么想对每一个训练样本都拟合到很好

114
00:03:58,220 --> 00:03:59,900
examples perfectly and so
就显得愈发的困难了

115
00:04:00,430 --> 00:04:01,830
your training set error becomes
因此训练集误差就会越来越大

116
00:04:02,370 --> 00:04:05,840
more larger now, how about the cross validation error.
那么交叉验证集误差的情况如何呢

117
00:04:06,720 --> 00:04:08,460
Well, the cross validation is
好的 交叉验证集误差

118
00:04:08,590 --> 00:04:10,100
my error on this cross
是对完全陌生的交叉验证集数据

119
00:04:10,350 --> 00:04:12,660
validation set that I haven't seen and
进行预测得到的误差

120
00:04:12,880 --> 00:04:14,600
so, you know, when I have
那么我们知道

121
00:04:14,720 --> 00:04:15,900
a very small training set, I'm
当训练集很小的时候

122
00:04:16,080 --> 00:04:16,890
not going to generalize well, just
泛化程度不会很好

123
00:04:17,020 --> 00:04:19,610
not going to do well on that.
意思是不能很好地适应新样本

124
00:04:19,850 --> 00:04:21,220
So, right, this hypothesis here doesn't
因此这个假设

125
00:04:21,620 --> 00:04:22,720
look like a good one, and
就不是一个理想的假设

126
00:04:23,020 --> 00:04:23,970
it's only when I get
只有当我使用

127
00:04:24,050 --> 00:04:25,270
a larger training set that,
一个更大的训练集时

128
00:04:25,500 --> 00:04:26,380
you know, I'm starting to get
我才有可能

129
00:04:26,890 --> 00:04:28,100
hypotheses that maybe fit
得到一个能够更好拟合数据的

130
00:04:28,480 --> 00:04:30,810
the data somewhat better.
可能的假设

131
00:04:31,380 --> 00:04:32,050
So your cross validation error and
因此 你的验证集误差和

132
00:04:32,260 --> 00:04:35,650
your test set error will tend
测试集误差

133
00:04:35,890 --> 00:04:37,160
to decrease as your training
都会随着训练集样本容量m的增加

134
00:04:37,470 --> 00:04:39,150
set size increases because the
而减小 因为你使用的数据越多

135
00:04:39,250 --> 00:04:40,700
more data you have, the better
你越能获得更好地泛化表现

136
00:04:40,990 --> 00:04:43,410
you do at generalizing to new examples.
或者说对新样本的适应能力更强

137
00:04:44,010 --> 00:04:46,730
So, just the more data you have, the better the hypothesis you fit.
因此 数据越多 越能拟合出合适的假设

138
00:04:47,560 --> 00:04:48,560
So if you plot j train,
所以 如果你把Jtrain和Jcv绘制出来

139
00:04:49,420 --> 00:04:51,670
and Jcv this is the sort of thing that you get.
就应该得到这样的曲线

140
00:04:52,490 --> 00:04:53,550
Now let's look at what
现在我们来看看

141
00:04:53,770 --> 00:04:54,940
the learning curves may look like
当处于高偏差或者高方差的情况时

142
00:04:55,360 --> 00:04:56,550
if we have either high
这些学习曲线

143
00:04:56,930 --> 00:04:58,210
bias or high variance problems.
又会变成什么样子

144
00:04:58,920 --> 00:05:00,530
Suppose your hypothesis has high
假如你的假设处于高偏差问题

145
00:05:00,830 --> 00:05:02,150
bias and to explain this
为了更清楚地解释这个问题

146
00:05:02,370 --> 00:05:03,780
I'm going to use a, set an
我要用一个简单的例子来说明

147
00:05:03,940 --> 00:05:05,250
example, of fitting a straight
也就是用一条直线

148
00:05:05,440 --> 00:05:06,500
line to data that, you
来拟合数据的例子

149
00:05:06,770 --> 00:05:08,240
know, can't really be fit well by a straight line.
很显然一条直线不能很好地拟合数据

150
00:05:09,540 --> 00:05:12,330
So we end up with a hypotheses that maybe looks like that.
所以最后得到的假设很有可能是这样的

151
00:05:13,910 --> 00:05:15,450
Now let's think what would
现在我们来想一想

152
00:05:15,750 --> 00:05:16,840
happen if we were to increase
如果我们增大训练集样本容量

153
00:05:17,470 --> 00:05:18,880
the training set size. So if
会发生什么情况呢

154
00:05:19,160 --> 00:05:20,480
instead of five examples like
所以现在不像画出的这样

155
00:05:20,590 --> 00:05:22,400
what I've drawn there, imagine that
只有这五组样本了

156
00:05:22,570 --> 00:05:24,080
we have a lot more training examples.
我们有了更多的训练样本

157
00:05:25,280 --> 00:05:27,230
Well what happens, if you fit a straight line to this.
那么如果你用一条直线来拟合

158
00:05:27,980 --> 00:05:29,700
What you find is that, you
不难发现

159
00:05:30,040 --> 00:05:31,360
end up with you know, pretty much the same straight line.
还是会得到类似的一条直线假设

160
00:05:31,690 --> 00:05:32,940
I mean a straight line that
我的意思是

161
00:05:33,530 --> 00:05:35,110
just cannot fit this
刚才的情况用一条直线不能很好地拟合

162
00:05:35,270 --> 00:05:37,320
data and getting a ton more data, well
而现在把样本容量扩大了

163
00:05:37,890 --> 00:05:39,460
the straight line isn't going to change that much.
这条直线也基本不会变化太大

164
00:05:40,230 --> 00:05:41,400
This is the best possible straight-line
因为这条直线是对这组数据

165
00:05:41,840 --> 00:05:42,770
fit to this data, but the
最可能也是最接近的拟合

166
00:05:42,890 --> 00:05:44,160
straight line just can't fit this
但一条直线再怎么接近

167
00:05:44,320 --> 00:05:45,630
data set that well. So,
也不可能对这组数据进行很好的拟合

168
00:05:45,870 --> 00:05:47,420
if you plot across validation error,
所以 如果你绘出交叉验证集误差

169
00:05:49,260 --> 00:05:50,170
this is what it will look like.
应该是这样子的

170
00:05:51,320 --> 00:05:54,470
Option on the left, if you have already a miniscule training set size like you know,
最左端表示训练集样本容量很小 比如说只有一组样本

171
00:05:55,410 --> 00:05:57,710
maybe just one training example and is not going to do well.
那么表现当然很不好

172
00:05:58,550 --> 00:05:59,470
But by the time you have
而随着你增大训练集样本数

173
00:05:59,660 --> 00:06:00,760
reached a certain number of training
当达到某一个容量值的时候

174
00:06:00,940 --> 00:06:02,350
examples, you have almost
你就会找到那条最有可能

175
00:06:02,810 --> 00:06:04,010
fit the best possible straight
拟合数据的那条直线

176
00:06:04,200 --> 00:06:05,400
line, and even if
并且此时即便

177
00:06:05,490 --> 00:06:06,260
you end up with a much
你继续增大训练集的

178
00:06:06,480 --> 00:06:07,790
larger training set size, a
样本容量

179
00:06:07,970 --> 00:06:09,170
much larger value of m,
即使你不断增大m的值

180
00:06:10,010 --> 00:06:12,040
you know, you're basically getting the same straight line,
你基本上还是会得到的一条差不多的直线

181
00:06:12,370 --> 00:06:14,190
and so, the cross-validation error
因此 交叉验证集误差

182
00:06:14,480 --> 00:06:15,420
- let me label that -
我把它标在这里

183
00:06:15,650 --> 00:06:17,040
or test set error or
或者测试集误差

184
00:06:17,140 --> 00:06:18,660
plateau out, or flatten out
将会很快变为水平而不再变化

185
00:06:18,990 --> 00:06:20,480
pretty soon, once you reached
只要训练集样本容量值达到

186
00:06:20,910 --> 00:06:22,920
beyond a certain the number
或超过了那个特定的数值

187
00:06:23,270 --> 00:06:24,700
of training examples, unless you
交叉验证集误差和测试集误差就趋于不变

188
00:06:25,130 --> 00:06:27,480
pretty much fit the best possible straight line.
这样你会得到最能拟合数据的那条直线

189
00:06:28,390 --> 00:06:29,540
And how about training error?
那么训练误差又如何呢

190
00:06:30,120 --> 00:06:33,050
Well, the training error will again be small.
同样 训练误差一开始也是很小的

191
00:06:34,620 --> 00:06:36,280
And what you find
而在高偏差的情形中

192
00:06:36,760 --> 00:06:38,080
in the high bias case is
你会发现训练集误差

193
00:06:38,210 --> 00:06:40,770
that the training error will end
会逐渐增大

194
00:06:41,000 --> 00:06:42,510
up close to the cross
一直趋于接近

195
00:06:42,830 --> 00:06:44,700
validation error, because you
交叉验证集误差

196
00:06:44,810 --> 00:06:46,370
have so few parameters and so
这是因为你的参数很少

197
00:06:46,590 --> 00:06:48,070
much data, at least when m is large.
但当m很大的时候 数据太多

198
00:06:48,900 --> 00:06:49,840
The performance on the training
此时训练集和交叉验证集的

199
00:06:50,220 --> 00:06:52,500
set and the cross validation set will be very similar.
预测效果将会非常接近

200
00:06:53,800 --> 00:06:54,750
And so, this is what your
这就是当你的学习算法处于

201
00:06:54,870 --> 00:06:56,460
learning curves will look like,
高偏差情形时

202
00:06:56,770 --> 00:06:58,850
if you have an algorithm that has high bias.
学习曲线的大致走向

203
00:07:00,220 --> 00:07:01,470
And finally, the problem with
最后补充一点

204
00:07:01,630 --> 00:07:03,260
high bias is reflected in
高偏差的情形

205
00:07:03,450 --> 00:07:04,930
the fact that both the
反映出的问题是

206
00:07:05,580 --> 00:07:07,350
cross validation error and the
交叉验证集和训练集

207
00:07:07,420 --> 00:07:09,130
training error are high,
误差都很大

208
00:07:09,560 --> 00:07:10,440
and so you end up with
也就是说 你最终会得到一个

209
00:07:10,650 --> 00:07:12,040
a relatively high value of
值比较大Jcv

210
00:07:12,280 --> 00:07:14,250
both Jcv and the j train.
和Jtrain

211
00:07:15,370 --> 00:07:16,820
This also implies something very
这也得出一个很有意思的结论

212
00:07:17,120 --> 00:07:18,520
interesting, which is that,
那就是

213
00:07:18,800 --> 00:07:19,990
if a learning algorithm has high
如果一个学习算法

214
00:07:20,360 --> 00:07:22,250
bias, as we
有很大的偏差

215
00:07:22,390 --> 00:07:23,430
get more and more training examples,
那么当我们选用更多的训练样本时

216
00:07:24,060 --> 00:07:25,100
that is, as we move to
也就是在这幅图中

217
00:07:25,210 --> 00:07:26,600
the right of this figure, we'll
随着我们增大横坐标

218
00:07:26,740 --> 00:07:27,880
notice that the cross
我们发现交叉验证集误差的值

219
00:07:28,220 --> 00:07:29,430
validation error isn't going
不会表现出明显的下降

220
00:07:29,740 --> 00:07:31,020
down much, it's basically fattened
实际上是变为水平了

221
00:07:31,560 --> 00:07:32,820
up, and so if
所以如果学习算法

222
00:07:32,950 --> 00:07:35,020
learning algorithms are really suffering from high bias.
正处于高偏差的情形

223
00:07:36,640 --> 00:07:38,200
Getting more training data by
那么选用更多的训练集数据

224
00:07:38,370 --> 00:07:39,710
itself will actually not help
对于改善算法表现无益

225
00:07:40,190 --> 00:07:41,580
that much,and as our figure
正如我们右边的

226
00:07:41,760 --> 00:07:43,120
example in the figure
这两幅图所体现的

227
00:07:43,210 --> 00:07:45,670
on the right, here we had only five training.
这里我们只有五组训练样本

228
00:07:46,060 --> 00:07:47,970
examples, and we fill certain straight line.
然后我们找到这条直线来拟合

229
00:07:48,550 --> 00:07:49,270
And when we had a ton
然后我们增加了更多的训练样本

230
00:07:49,540 --> 00:07:50,730
more training data, we still
但我们仍然得到几乎一样的

231
00:07:51,040 --> 00:07:52,710
end up with roughly the same straight line.
一条直线

232
00:07:53,200 --> 00:07:54,290
And so if the learning algorithm
因此如果学习算法

233
00:07:54,440 --> 00:07:57,090
has high bias give me a lot more training data.
处于高偏差时

234
00:07:57,650 --> 00:07:59,060
That doesn't actually help you
给我再多的训练数据也于事无补

235
00:07:59,830 --> 00:08:01,290
get a much lower cross validation
交叉验证集误差或测试集误差

236
00:08:01,890 --> 00:08:02,890
error or test set error.
也不会降低多少

237
00:08:03,730 --> 00:08:04,950
So knowing if your learning
所以 能够看清你的算法正处于

238
00:08:05,250 --> 00:08:06,600
algorithm is suffering from high
高偏差的情形

239
00:08:06,780 --> 00:08:07,620
bias seems like a useful
是一件很有意义的事情

240
00:08:08,100 --> 00:08:09,500
thing to know because this can
因为这样可以让你避免

241
00:08:09,640 --> 00:08:11,140
prevent you from wasting a
把时间浪费在

242
00:08:11,290 --> 00:08:12,520
lot of time collecting more training
收集更多的训练集数据上

243
00:08:12,920 --> 00:08:15,440
data where it might just not end up being helpful.
因为再多的数据也是无意义的

244
00:08:16,200 --> 00:08:17,070
Next let us look at the
接下来我们再来看看

245
00:08:17,140 --> 00:08:18,530
setting of a learning algorithm
当学习算法正处于高方差的时候

246
00:08:19,470 --> 00:08:20,340
that may have high variance.
学习曲线应该是什么样子的

247
00:08:21,590 --> 00:08:22,880
Let us just look at the
首先我们来看

248
00:08:23,550 --> 00:08:24,260
training error in a around if
训练集误差

249
00:08:25,120 --> 00:08:26,350
you have very smart training
如果你的训练集样本容量很小

250
00:08:26,680 --> 00:08:28,730
set like five training examples shown on
比如像图中所示情形

251
00:08:29,130 --> 00:08:30,720
the figure on the right and
只有五组训练样本

252
00:08:31,150 --> 00:08:32,170
if we're fitting say a
如果我们用很高阶次的

253
00:08:32,200 --> 00:08:33,050
very high order polynomial,
多项式来拟合

254
00:08:34,380 --> 00:08:36,530
and I've written a hundredth degree polynomial which
比如这里我用了100次的多项式函数

255
00:08:37,090 --> 00:08:38,750
really no one uses, but just an illustration.
当然不会有人这么用的 这里只是演示

256
00:08:39,920 --> 00:08:41,460
And if we're using a
并且假设我们使用

257
00:08:41,550 --> 00:08:43,160
fairly small value of lambda,
一个很小的lambda值

258
00:08:43,800 --> 00:08:44,920
maybe not zero, but a fairly
可能不等于0

259
00:08:45,070 --> 00:08:46,830
small value of lambda, then
但足够小的lambda

260
00:08:47,040 --> 00:08:47,980
we'll end up, you know,
那么很显然 我们会对这组数据

261
00:08:48,190 --> 00:08:50,590
fitting this data very well that with
拟合得非常非常好

262
00:08:50,860 --> 00:08:53,390
a function that overfits this.
因此这个假设函数对数据过拟合

263
00:08:54,380 --> 00:08:55,640
So, if the training
所以 如果训练集

264
00:08:55,990 --> 00:08:57,820
set size is small, our training
样本容量很小时

265
00:08:58,320 --> 00:08:59,530
error, that is, j train
训练集误差Jtrain

266
00:09:00,030 --> 00:09:01,810
of theta will be small.
将会很小

267
00:09:03,130 --> 00:09:04,330
And as this training set size increases
随着训练集样本容量的增加

268
00:09:04,940 --> 00:09:05,870
a bit, you know, we may
可能这个假设函数仍然会

269
00:09:06,000 --> 00:09:07,160
still be overfitting this
对数据或多或少

270
00:09:07,330 --> 00:09:08,810
data a little bit but
有一点过拟合

271
00:09:09,780 --> 00:09:11,880
it also becomes slightly harder to
但很明显此时要对数据很好地拟合

272
00:09:12,020 --> 00:09:12,970
fit this data set perfectly,
显得更加困难和吃力了

273
00:09:13,940 --> 00:09:15,140
and so, as the training set size
所以 随着训练集样本容量的增大

274
00:09:15,350 --> 00:09:16,810
increases, we'll find that
我们会发现Jtrain的值

275
00:09:16,960 --> 00:09:19,390
j train increases, because
会随之增大

276
00:09:19,840 --> 00:09:21,040
it is just a little harder to fit
因为当训练样本越多的时候

277
00:09:21,260 --> 00:09:22,720
the training set perfectly when we have
我们就越难跟训练集数据拟合得很好

278
00:09:22,890 --> 00:09:25,700
more examples, but the training set error will still be pretty low.
但总的来说训练集误差还是很小

279
00:09:26,530 --> 00:09:28,600
Now, how about the cross validation error?
交叉验证集误差又如何呢

280
00:09:29,220 --> 00:09:30,590
Well, in high variance
好的 在高方差的情形中

281
00:09:31,040 --> 00:09:32,760
setting, a hypothesis is
假设函数对数据过拟合

282
00:09:32,980 --> 00:09:34,190
overfitting and so the
因此交叉验证集误差

283
00:09:34,290 --> 00:09:35,680
cross validation error will remain
将会一直都很大

284
00:09:36,120 --> 00:09:37,650
high, even as we
即便我们选择一个

285
00:09:37,750 --> 00:09:38,930
get you know, a moderate number
比较合适恰当的

286
00:09:39,260 --> 00:09:40,520
of training examples and, so
训练集样本数

287
00:09:41,170 --> 00:09:42,950
maybe, the cross validation
因此交叉验证集误差

288
00:09:43,730 --> 00:09:45,520
error may look like that.
画出来差不多是这样的

289
00:09:45,660 --> 00:09:47,720
And the indicative diagnostic that we
所以算法处于高方差情形

290
00:09:47,830 --> 00:09:49,200
have a high variance problem,
最明显的一个特点是

291
00:09:50,210 --> 00:09:51,490
is the fact that there's
在训练集误差

292
00:09:51,720 --> 00:09:54,010
this large gap between
和交叉验证集误差之间

293
00:09:54,340 --> 00:09:56,440
the training error and the cross validation error.
有一段很大的差距

294
00:09:57,440 --> 00:09:58,180
And looking at this figure.
而这个曲线图也反映出

295
00:09:58,720 --> 00:10:00,170
If we think about adding more
如果我们要考虑增大训练集的样本数

296
00:10:00,440 --> 00:10:01,810
training data, that is, taking
也就是在这幅图中

297
00:10:02,110 --> 00:10:03,660
this figure and extrapolating to
向右延伸曲线

298
00:10:03,790 --> 00:10:05,220
the right, we can kind
我们大致可以看出

299
00:10:05,330 --> 00:10:06,830
of tell that, you know the
这两条学习曲线

300
00:10:07,030 --> 00:10:08,120
two curves, the blue curve
蓝色和红色的两条曲线

301
00:10:08,480 --> 00:10:10,480
and the magenta curve, are converging to each other.
正在相互靠近

302
00:10:11,420 --> 00:10:12,360
And so, if we were to
因此 如果我们将曲线

303
00:10:12,520 --> 00:10:13,840
extrapolate this figure to
向右延伸出去

304
00:10:13,980 --> 00:10:21,230
the right, then it
那么似乎

305
00:10:21,360 --> 00:10:23,000
seems it likely that the
训练集误差很可能会

306
00:10:23,170 --> 00:10:24,120
training error will keep on
逐渐增大

307
00:10:24,270 --> 00:10:25,740
going up and the
而交叉验证集误差

308
00:10:27,130 --> 00:10:29,040
cross-validation error would keep on going down.
则会持续下降

309
00:10:30,000 --> 00:10:32,340
And the thing we really care about is the cross-validation error
当然我们最关心的还是交叉验证集误差

310
00:10:33,010 --> 00:10:35,150
or the test set error, right?
或者测试集误差 对吧

311
00:10:35,300 --> 00:10:36,460
So in this sort
所以从这幅图中

312
00:10:36,730 --> 00:10:37,850
of figure, we can tell that
我们基本可以预测

313
00:10:38,230 --> 00:10:39,420
if we keep on adding training
如果继续增大训练样本的数量

314
00:10:39,820 --> 00:10:40,930
examples and extrapolate to the
将曲线向右延伸

315
00:10:41,050 --> 00:10:42,650
right, well our cross validation
交叉验证集误差将会

316
00:10:43,290 --> 00:10:44,610
error will keep on coming down.
逐渐下降

317
00:10:45,120 --> 00:10:46,090
And, so, in the high
所以 在高方差的情形中

318
00:10:46,330 --> 00:10:47,980
variance setting, getting more
使用更多的训练集数据

319
00:10:48,180 --> 00:10:49,550
training data is, indeed,
对改进算法的表现

320
00:10:50,170 --> 00:10:51,240
likely to help.
事实上是有效果的

321
00:10:51,520 --> 00:10:52,810
And so again, this seems like a
这同样也体现出

322
00:10:53,060 --> 00:10:54,180
useful thing to know if your
知道你的算法正处于

323
00:10:54,330 --> 00:10:55,830
learning algorithm is suffering
高方差的情形

324
00:10:56,150 --> 00:10:57,460
from a high variance problem, because
也是非常有意义的

325
00:10:57,810 --> 00:10:59,150
that tells you, for example that it
因为它能告诉你

326
00:10:59,220 --> 00:11:00,100
may be be worth your while
是否有必要花时间

327
00:11:00,680 --> 00:11:02,430
to see if you can go and get some more training data.
来增加更多的训练集数据

328
00:11:03,700 --> 00:11:04,920
Now, on the previous slide
好的 在前一页和这一页幻灯片中

329
00:11:05,330 --> 00:11:06,450
and this slide, I've drawn fairly
我画出的学习曲线

330
00:11:06,970 --> 00:11:08,510
clean fairly idealized curves.
都是相当理想化的曲线

331
00:11:08,900 --> 00:11:10,050
If you plot these curves for
针对一个实际的学习算法

332
00:11:10,170 --> 00:11:11,970
an actual learning algorithm, sometimes
如果你画出学习曲线的话

333
00:11:12,500 --> 00:11:13,910
you will actually see, you know, pretty
你会看到基本类似的结果

334
00:11:14,560 --> 00:11:15,900
much curves, like what I've drawn here.
就像我在这里画的一样

335
00:11:16,600 --> 00:11:17,730
Although, sometimes you see curves
虽然如此

336
00:11:18,150 --> 00:11:19,160
that are a little bit noisier and
有时候你也会看到

337
00:11:19,230 --> 00:11:20,820
a little bit messier than this.
带有一点噪声或干扰的曲线

338
00:11:21,090 --> 00:11:22,440
But plotting learning curves like
但总的来说

339
00:11:22,620 --> 00:11:23,850
these can often tell
像这样画出学习曲线

340
00:11:24,120 --> 00:11:25,460
you, can often help you
确实能帮助你

341
00:11:25,570 --> 00:11:26,650
figure out if your learning algorithm is
看清你的学习算法

342
00:11:26,950 --> 00:11:29,080
suffering from bias, or variance or even a little bit of both.
是否处于高偏差 高方差 或二者皆有的情形

343
00:11:29,170 --> 00:11:31,030
So when I'm
所以当我打算

344
00:11:31,200 --> 00:11:32,700
trying to improve the performance of
改进一个学习算法

345
00:11:32,760 --> 00:11:34,060
a learning algorithm, one thing
的表现时

346
00:11:34,260 --> 00:11:35,720
that I'll almost always do
我通常会进行的一项工作

347
00:11:35,960 --> 00:11:37,440
is plot these learning
就是画出这些学习曲线

348
00:11:37,970 --> 00:11:39,460
curves, and usually this will
一般来讲 这项工作会让你

349
00:11:39,490 --> 00:11:41,710
give you a better sense of whether there is a bias or variance problem.
更轻松地看出偏差或方差的问题

350
00:11:44,280 --> 00:11:45,180
And in the next video
在下一节视频中

351
00:11:45,420 --> 00:11:46,440
we'll see how this can
我们将介绍如何判断

352
00:11:46,650 --> 00:11:48,370
help suggest specific actions is
是否应采取具体的某个行为

353
00:11:48,450 --> 00:11:49,580
to take, or to not take,
来改进学习算法的表现

354
00:11:50,260 --> 00:11:53,250
in order to try to improve the performance of your learning algorithm.