srt/10 - 7 - Deciding What to Do Next Revisited (7 min).srt

﻿1
00:00:00,260 --> 00:00:01,340
We've talked about how to evaluate
我们已经介绍了怎样评价一个学习算法
(字幕整理：中国海洋大学 黄海广,haiguang2000@qq.com )

2
00:00:01,960 --> 00:00:03,360
learning algorithms, talked about model selection,
我们讨论了模型选择问题

3
00:00:04,150 --> 00:00:06,490
talked a lot about bias and variance.
偏差和方差的问题

4
00:00:06,970 --> 00:00:08,110
So how does this help us figure
那么这些诊断法则怎样帮助我们弄清

5
00:00:08,330 --> 00:00:09,730
out what are potentially fruitful,
哪些方法有助于

6
00:00:10,340 --> 00:00:11,710
potentially not fruitful things to
改进学习算法的效果

7
00:00:11,950 --> 00:00:13,980
try to do to improve the performance of a learning algorithm.
哪些又是徒劳的呢

8
00:00:15,480 --> 00:00:16,660
Let's go back to our original
让我们再次回到最开始的例子

9
00:00:16,940 --> 00:00:18,890
motivating example and go for the result.
在那里寻找答案

10
00:00:21,030 --> 00:00:22,570
So here is our earlier example
这就是我们之前的例子

11
00:00:23,000 --> 00:00:24,120
of maybe having fit regularized
我们试图用正则化的线性回归拟合模型

12
00:00:24,720 --> 00:00:27,640
linear regression and finding that it doesn't work as well as we're hoping.
并评价该算法是否达到预期效果

13
00:00:28,300 --> 00:00:30,080
We said that we had this menu of options.
我们提出了如下这些选择

14
00:00:30,910 --> 00:00:32,430
So is there some way to
那么到底有没有某种方法

15
00:00:32,590 --> 00:00:34,530
figure out which of these might be fruitful options?
能够明确指出以上哪些方法有效呢

16
00:00:35,480 --> 00:00:36,490
The first thing all of this
第一种可供选择的方法

17
00:00:36,660 --> 00:00:38,770
was getting more training examples.
是使用更多的训练集数据

18
00:00:39,550 --> 00:00:40,700
What this is good for,
这种方法对于高方差的情况

19
00:00:40,880 --> 00:00:42,890
is this helps to fix high variance.
是有帮助的

20
00:00:45,320 --> 00:00:46,610
And concretely, if you instead
也就是说

21
00:00:47,150 --> 00:00:48,550
have a high bias problem and
如果你的模型不处于高方差问题

22
00:00:48,680 --> 00:00:50,530
don't have any variance problem, then
而是高偏差的时候

23
00:00:50,830 --> 00:00:52,000
we saw in the previous video
那么通过前面的视频

24
00:00:52,500 --> 00:00:53,560
that getting more training examples,
我们已经知道 获取更多的训练集数据

25
00:00:54,640 --> 00:00:56,380
while maybe just isn't going to help much at all.
并不会有太明显的帮助

26
00:00:57,360 --> 00:00:58,320
So the first option is useful
所以 要选择第一种方法

27
00:00:58,780 --> 00:01:00,230
only if you, say, plot
你应该先画出

28
00:01:00,580 --> 00:01:01,620
the learning curves and figure
学习曲线

29
00:01:01,720 --> 00:01:02,820
out that you have at least
然后看出你的模型

30
00:01:02,860 --> 00:01:03,970
a bit of a variance, meaning that
应该至少有那么一点方差问题

31
00:01:04,170 --> 00:01:06,530
the cross-validation error is, you know,
也就是说你的交叉验证集误差

32
00:01:06,680 --> 00:01:08,800
quite a bit bigger than your training set error.
应该比训练集误差大一点

33
00:01:08,910 --> 00:01:10,400
How about trying a smaller set of features?
第二种方法情况又如何呢

34
00:01:10,940 --> 00:01:11,920
Well, trying a smaller
第二种方法是

35
00:01:12,350 --> 00:01:13,570
set of features, that's again
少选几种特征

36
00:01:13,970 --> 00:01:16,060
something that fixes high variance.
这同样是对高方差时有效

37
00:01:17,100 --> 00:01:18,080
And in other words, if you figure
换句话说

38
00:01:18,420 --> 00:01:19,440
out, by looking at learning curves
如果你通过绘制学习曲线

39
00:01:19,820 --> 00:01:20,830
or something else that you used,
或者别的什么方法

40
00:01:21,190 --> 00:01:22,110
that have a high bias
看出你的模型处于高偏差问题

41
00:01:22,370 --> 00:01:23,460
problem; then for goodness
那么切记

42
00:01:23,670 --> 00:01:25,000
sakes, don't waste your time
千万不要浪费时间

43
00:01:25,540 --> 00:01:27,250
trying to carefully select out
试图从已有的特征中

44
00:01:27,450 --> 00:01:29,130
a smaller set of features to use.
挑出一小部分来使用

45
00:01:29,330 --> 00:01:31,190
Because if you have a high bias problem, using
因为你已经发现高偏差的问题了

46
00:01:32,060 --> 00:01:33,220
fewer features is not going to help.
使用更少的特征仍然无济于事

47
00:01:33,890 --> 00:01:35,270
Whereas in contrast, if you
反过来 如果你发现

48
00:01:35,490 --> 00:01:36,730
look at the learning curves or something
从你的学习曲线

49
00:01:36,900 --> 00:01:38,020
else you figure out that you
或者别的某种诊断图中

50
00:01:38,360 --> 00:01:39,780
have a high variance problem, then,
你看出了高方差的问题

51
00:01:40,320 --> 00:01:41,730
indeed trying to select
那么恭喜你

52
00:01:42,160 --> 00:01:43,180
out a smaller set of features,
花点时间挑选出一小部分合适的特征吧

53
00:01:43,440 --> 00:01:45,380
that might indeed be a very good use of your time.
这是把时间用在了刀刃上

54
00:01:45,790 --> 00:01:47,120
How about trying to get additional
方法三 选用更多的特征又如何呢

55
00:01:47,710 --> 00:01:49,640
features, adding features, usually,
通常来讲

56
00:01:50,170 --> 00:01:51,380
not always, but usually we
不是所有时候都适用

57
00:01:51,490 --> 00:01:53,020
think of this as a solution
但通常来说 增加特征数

58
00:01:54,070 --> 00:01:56,920
for fixing high bias problems.
是解决高偏差问题的法宝

59
00:01:57,600 --> 00:01:58,700
So if you are adding extra
所以如果你要增加

60
00:01:58,980 --> 00:02:00,640
features it's usually because
更多的特征时

61
00:02:01,750 --> 00:02:03,150
your current hypothesis is too
一般是由于你现有的

62
00:02:03,280 --> 00:02:04,280
simple, and so we want
假设函数太简单

63
00:02:04,540 --> 00:02:06,520
to try to get additional features to
因此我们才决定增加一些

64
00:02:06,730 --> 00:02:08,540
make our hypothesis better able
别的特征来让假设函数

65
00:02:09,060 --> 00:02:10,800
to fit the training set. And
更好地拟合训练集

66
00:02:11,420 --> 00:02:13,460
similarly, adding polynomial features;
接下来 类似地

67
00:02:13,770 --> 00:02:14,930
this is another way of adding
增加更多的多项式特征

68
00:02:15,140 --> 00:02:16,420
features and so there
这实际上也是属于增加特征

69
00:02:16,570 --> 00:02:18,220
is another way to try
因此也是用于

70
00:02:18,430 --> 00:02:19,950
to fix the high bias problem.
修正高偏差问题

71
00:02:21,020 --> 00:02:22,820
And, if concretely if
具体来说

72
00:02:23,210 --> 00:02:24,350
your learning curves show you
如果你画出的学习曲线告诉你

73
00:02:24,570 --> 00:02:25,410
that you still have a high
你还是处于高方差问题

74
00:02:25,520 --> 00:02:27,190
variance problem, then, you know, again this
那么采取这种方法

75
00:02:27,320 --> 00:02:29,360
is maybe a less good use of your time.
依然是浪费时间

76
00:02:30,640 --> 00:02:32,690
And finally, decreasing and increasing lambda.
最后 增大和减小lambda

77
00:02:33,200 --> 00:02:34,090
This are quick and easy to try,
这种方法尝试起来最方便

78
00:02:34,470 --> 00:02:36,000
I guess these are less likely to
我想尝试这个方法

79
00:02:36,140 --> 00:02:38,190
be a waste of, you know, many months of your life.
不至于花费你几个月时间

80
00:02:39,070 --> 00:02:41,530
But decreasing lambda, you
但我们已经知道

81
00:02:41,650 --> 00:02:43,400
already know fixes high bias.
减小lambda可以修正高偏差

82
00:02:45,360 --> 00:02:46,340
In case this isn't clear to
如果我说的你还不清楚的话

83
00:02:46,500 --> 00:02:47,340
you, you know, I do encourage
我建议你暂停视频

84
00:02:47,810 --> 00:02:50,350
you to pause the video and think through this that
仔细回忆一下

85
00:02:50,990 --> 00:02:52,790
convince yourself that decreasing lambda
减小lambda的值

86
00:02:53,620 --> 00:02:55,030
helps fix high bias, whereas increasing
有助于修正高偏差

87
00:02:55,590 --> 00:02:57,480
lambda fixes high variance.
而增大lambda的值解决高方差

88
00:02:59,870 --> 00:03:00,930
And if you aren't sure why
如果你确实不明白

89
00:03:01,270 --> 00:03:02,470
this is the case, do
为什么是这样的话

90
00:03:02,650 --> 00:03:04,130
pause the video and make
那就暂停一下好好想想

91
00:03:04,150 --> 00:03:05,820
sure you can convince yourself that this is the case.
直到真的弄清楚这个道理

92
00:03:06,580 --> 00:03:07,320
Or take a look at the curves
或者看看

93
00:03:07,800 --> 00:03:09,040
that we were plotting at the
上一节视频最后

94
00:03:09,190 --> 00:03:10,590
end of the previous video and
我们绘制的学习曲线

95
00:03:10,720 --> 00:03:11,650
try to make sure you understand
试着理解清楚

96
00:03:12,170 --> 00:03:13,670
why these are the case.
为什么是我说的那样

97
00:03:15,080 --> 00:03:16,120
Finally, let us take everything
最后 我们回顾一下

98
00:03:16,440 --> 00:03:17,840
we have learned and relate it back
这几节课介绍的这些内容

99
00:03:18,400 --> 00:03:19,980
to neural networks and so,
并且看看它们和神经网络的联系

100
00:03:20,130 --> 00:03:21,190
here is some practical
我想介绍一些

101
00:03:21,720 --> 00:03:22,720
advice for how I usually
很实用的经验或建议

102
00:03:23,520 --> 00:03:25,060
choose the architecture or the
这些来自于我平时为神经网络模型

103
00:03:25,530 --> 00:03:28,660
connectivity pattern of the neural networks I use.
选择结构或者连接形式的一些技巧

104
00:03:30,070 --> 00:03:31,190
So, if you are fitting
当你在进行神经网络拟合的时候

105
00:03:31,410 --> 00:03:33,160
a neural network, one option would
你可以选择一种

106
00:03:33,400 --> 00:03:34,680
be to fit, say, a pretty
比如说

107
00:03:34,840 --> 00:03:36,540
small neural network with you know, relatively
一个相对比较简单的神经网络模型

108
00:03:37,530 --> 00:03:38,670
few hidden units, maybe just
相对来讲 隐藏单元比较少

109
00:03:38,930 --> 00:03:40,430
one hidden unit. If you're fitting
或者甚至只有一个隐藏单元

110
00:03:40,890 --> 00:03:42,670
a neural network, one option would
如果你要进行神经网络的拟合

111
00:03:42,800 --> 00:03:44,440
be to fit a relatively small
其中一个选择是

112
00:03:44,920 --> 00:03:46,500
neural network with, say,
选用一个相对简单的网络结构

113
00:03:48,030 --> 00:03:49,630
relatively few, maybe only one
比如说只有一个

114
00:03:49,980 --> 00:03:51,760
hidden layer and maybe
隐藏层

115
00:03:52,070 --> 00:03:53,370
only a relatively few number
或者可能相对来讲

116
00:03:53,750 --> 00:03:55,160
of hidden units.
比较少的隐藏单元

117
00:03:55,570 --> 00:03:56,580
So, a network like this might have relatively
因此像这样的一个简单的神经网络

118
00:03:57,050 --> 00:03:59,170
few parameters and be more prone to underfitting.
参数就不会很多 并且很容易出现欠拟合

119
00:04:00,450 --> 00:04:01,850
The main advantage of these small
这种比较小型的神经网络

120
00:04:02,260 --> 00:04:04,760
neural networks is that the computation will be cheaper.
其最大优势在于计算量较小

121
00:04:05,820 --> 00:04:06,910
An alternative would be to
与之相对的另一种情况

122
00:04:07,010 --> 00:04:08,470
fit a, maybe relatively large
是相对较大型的神经网络结构

123
00:04:08,900 --> 00:04:10,790
neural network with either more
隐藏层单元会比较多

124
00:04:10,970 --> 00:04:12,370
hidden units--there's a lot
比如每一层中的隐藏单元数很多

125
00:04:12,560 --> 00:04:14,940
of hidden in one there--or with more hidden layers.
或者有很多个隐藏层

126
00:04:16,200 --> 00:04:17,800
And so these neural networks tend
因此这种比较复杂的神经网络

127
00:04:18,010 --> 00:04:20,870
to have more parameters and therefore be more prone to overfitting.
参数一般较多 也更容易出现过拟合

128
00:04:22,410 --> 00:04:24,010
One disadvantage, often not a
这种结构的一大劣势

129
00:04:24,050 --> 00:04:25,160
major one but something to
也许不是主要的 但还是需要考虑

130
00:04:25,250 --> 00:04:26,440
think about, is that if you have
那就是当网络中的

131
00:04:27,000 --> 00:04:28,450
a large number of neurons
神经元数量很多的时候

132
00:04:28,960 --> 00:04:30,040
in your network, then it can
这种结构会显得

133
00:04:30,230 --> 00:04:31,920
be more computationally expensive.
计算量较为庞大

134
00:04:33,070 --> 00:04:35,790
Although within reason, this is often hopefully not a huge problem.
虽然有这个情况 但通常来讲不成问题

135
00:04:36,840 --> 00:04:38,420
The main potential problem of
这种大型网络结构最主要的问题

136
00:04:38,540 --> 00:04:39,710
these much larger neural networks is that it could be more prone to overfitting
还是它更容易出现过拟合现象

137
00:04:39,980 --> 00:04:44,120
and it turns out if you're applying neural
事实上 如果你经常应用神经网络

138
00:04:44,700 --> 00:04:46,900
network very often using
特别是大型神经网络的话

139
00:04:47,240 --> 00:04:48,900
a large neural network often it's actually the larger, the better
你就会发现越大型的网络性能越好

140
00:04:50,610 --> 00:04:51,700
but if it's overfitting, you can
但如果发生了过拟合

141
00:04:51,890 --> 00:04:53,800
then use regularization to address
你可以使用正则化的方法

142
00:04:54,230 --> 00:04:56,510
overfitting, usually using
来修正过拟合

143
00:04:56,910 --> 00:04:58,480
a larger neural network by using
一般来说 使用一个大型的神经网络

144
00:04:58,720 --> 00:04:59,980
regularization to address is
并使用正则化来修正过拟合问题

145
00:05:00,310 --> 00:05:01,910
overfitting that's often more
通常比使用一个小型的神经网络

146
00:05:02,130 --> 00:05:04,160
effective than using a smaller neural network.
效果更好

147
00:05:05,100 --> 00:05:06,940
And the main possible disadvantage is
但主要可能出现的一大问题

148
00:05:07,130 --> 00:05:09,420
that it can be more computationally expensive.
就是计算量相对较大

149
00:05:10,470 --> 00:05:11,940
And finally, one of the other decisions is, say,
最后 你还需要选择的

150
00:05:12,280 --> 00:05:14,340
the number of hidden layers you want to have, right?
是隐藏层的层数

151
00:05:14,480 --> 00:05:16,400
So, do you want
你是应该用一个

152
00:05:17,030 --> 00:05:18,130
one hidden layer or do
隐藏层呢

153
00:05:18,380 --> 00:05:19,700
you want three hidden layers, as
还是应该用三个呢 就像我们这里画的

154
00:05:20,040 --> 00:05:21,790
we've shown here, or do you want two hidden layers?
或者还是用两个隐藏层呢

155
00:05:23,250 --> 00:05:24,850
And usually, as I
通常来说

156
00:05:24,980 --> 00:05:25,720
think I said in the previous
正如我在前面的视频中讲过的

157
00:05:26,190 --> 00:05:27,420
video, using a single
默认的情况是

158
00:05:27,640 --> 00:05:29,570
hidden layer is a reasonable default, but
使用一个隐藏层是比较合理的选择

159
00:05:29,780 --> 00:05:30,800
if you want to choose the
但是如果你想要选择

160
00:05:30,890 --> 00:05:32,400
number of hidden layers, one
一个最合适的隐藏层层数

161
00:05:32,580 --> 00:05:33,610
other thing you can try is
你也可以试试

162
00:05:34,270 --> 00:05:35,800
find yourself a training cross-validation,
把数据分割为训练集 验证集

163
00:05:36,660 --> 00:05:38,320
and test set split and try
和测试集 然后试试使用

164
00:05:38,730 --> 00:05:40,070
training neural networks with one
一个隐藏层的神经网络来训练模型

165
00:05:40,260 --> 00:05:41,210
hidden layer or two hidden
然后试试两个 三个隐藏层

166
00:05:41,490 --> 00:05:42,810
layers or three hidden layers and
以此类推

167
00:05:43,230 --> 00:05:44,300
see which of those neural
然后看看哪个神经网络

168
00:05:44,460 --> 00:05:47,460
networks performs best on the cross-validation sets.
在交叉验证集上表现得最理想

169
00:05:48,180 --> 00:05:49,190
You take your three neural networks
也就是说 你得到了三个神经网络模型

170
00:05:49,660 --> 00:05:50,510
with one, two and three hidden
分别有一个 两个 三个隐藏层

171
00:05:50,780 --> 00:05:52,410
layers, and compute the
然后你对每一个模型

172
00:05:52,570 --> 00:05:53,870
cross validation error at Jcv
都用交叉验证集数据进行测试

173
00:05:54,140 --> 00:05:55,120
and all of
算出三种情况下的

174
00:05:55,240 --> 00:05:56,630
them and use that to
交叉验证集误差Jcv

175
00:05:56,960 --> 00:05:58,350
select which of these
然后选出你认为最好的

176
00:05:58,690 --> 00:06:00,290
is you think the best neural network.
神经网络结构

177
00:06:02,580 --> 00:06:04,020
So, that's it for
好的 以上就是我们介绍的

178
00:06:04,230 --> 00:06:05,490
bias and variance and ways
偏差和方差问题

179
00:06:05,780 --> 00:06:08,170
like learning curves, who tried to diagnose these problems.
以及如学习曲线这样的诊断法

180
00:06:08,560 --> 00:06:09,860
As far as what
在改进学习算法的表现时

181
00:06:09,930 --> 00:06:11,020
you think is implied, for one
你可以充分运用

182
00:06:11,250 --> 00:06:12,480
might be truthful or not
以上这些内容来判断

183
00:06:12,630 --> 00:06:13,500
truthful things to try
哪些途径是有帮助的

184
00:06:13,910 --> 00:06:15,720
to improve the performance of a learning algorithm.
哪些方法是无意义的

185
00:06:16,960 --> 00:06:18,000
If you understood the contents
如果你理解了以上几节视频中

186
00:06:18,990 --> 00:06:20,700
of the last few videos and if
介绍的内容

187
00:06:20,790 --> 00:06:22,020
you apply them you actually
并且懂得如何运用

188
00:06:22,630 --> 00:06:24,300
be much more effective already and
那么你已经很厉害了

189
00:06:24,430 --> 00:06:25,890
getting learning algorithms to work on problems
你也能像硅谷的

190
00:06:26,610 --> 00:06:27,970
and even a large fraction,
大部分机器学习专家一样

191
00:06:28,560 --> 00:06:29,810
maybe the majority of practitioners
他们每天的工作就是

192
00:06:30,540 --> 00:06:31,860
of machine learning here in
有效地使用这些学习算法

193
00:06:32,060 --> 00:06:34,760
Silicon Valley today doing these things as their full-time jobs.
来解决众多具体的问题

194
00:06:35,820 --> 00:06:37,560
So I hope that these
我希望这几节中

195
00:06:37,990 --> 00:06:39,110
pieces of advice
提到的一些技巧

196
00:06:39,560 --> 00:06:41,420
on by experience in diagnostics
关于方差 偏差 以及学习曲线为代表的诊断法

197
00:06:42,730 --> 00:06:44,110
will help you to much effectively
能够真正帮助你更有效率地

198
00:06:44,790 --> 00:06:47,270
and powerfully apply learning and
应用机器学习

199
00:06:48,000 --> 00:06:49,300
get them to work very well.
让它们高效地工作