srt/18 - 4 - Ceiling Analysis_ What Part of the Pipeline to Work on Next (14 min).srt

﻿1
00:00:00,090 --> 00:00:01,140
in earlier videos, I have
在前面的视频中  (字幕整理：中国海洋大学 黄海广,haiguang2000@qq.com )

2
00:00:01,260 --> 00:00:02,510
said over and over that, when
我不止一次地说过

3
00:00:02,650 --> 00:00:03,980
you are developing machine learning system,
在你开发机器学习系统时

4
00:00:04,770 --> 00:00:06,630
one of the most valuable resources is
你最宝贵的资源

5
00:00:06,810 --> 00:00:08,050
your time as the developer
就是你的时间

6
00:00:08,490 --> 00:00:09,820
in terms of picking what
作为一个开发者

7
00:00:09,950 --> 00:00:11,520
to work on next.
你需要正确选择下一步的工作

8
00:00:11,950 --> 00:00:12,710
Or, you have a team of developers
或者也许你有一个开发团队

9
00:00:13,300 --> 00:00:14,610
or a team of engineers working together
或者一个工程师小组

10
00:00:15,090 --> 00:00:16,620
on a machine learning system, again
共同开发一个机器学习系统

11
00:00:16,930 --> 00:00:18,420
one of the most valuable resources is
同样 最宝贵的还是

12
00:00:18,990 --> 00:00:20,790
the time of the engineers or the developers working on the system.
开发系统所花费的时间

13
00:00:22,420 --> 00:00:23,340
And what you really want to
你需要尽量避免的

14
00:00:23,430 --> 00:00:25,340
avoid is that you or
情况是你或者

15
00:00:25,360 --> 00:00:26,410
your colleagues or your friends spend
你的同事 你的朋友

16
00:00:26,680 --> 00:00:27,560
a lot of time working on
花费了大量时间

17
00:00:27,970 --> 00:00:29,510
some component, only to realize
在某一个模块上

18
00:00:30,470 --> 00:00:31,540
after weeks or months of
在几周甚至几个月的努力以后

19
00:00:31,620 --> 00:00:33,070
time spent, that all that
才意识到所有这些付出的劳动

20
00:00:33,310 --> 00:00:35,090
work, you know, just doesn't
都对你最终系统的表现

21
00:00:35,380 --> 00:00:38,120
make a huge difference on the performance of the final system.
并没有太大的帮助

22
00:00:39,350 --> 00:00:40,430
In this video, what I'd
在这段视频中

23
00:00:40,550 --> 00:00:42,960
like to to is, to talk about something called ceiling analysis.
我将介绍一下关于上限分析(ceiling analysis)的内容

24
00:00:44,510 --> 00:00:45,760
When you or your team
当你自己或你跟

25
00:00:46,280 --> 00:00:47,270
are working on a pipeline
你的团队在设计某个

26
00:00:47,520 --> 00:00:48,860
machine learning system, this can
机器学习系统的流水线时

27
00:00:49,020 --> 00:00:50,380
sometimes give you a very
这种方式通常能

28
00:00:50,630 --> 00:00:51,650
strong signal, a very strong
提供一种很有价值的信号

29
00:00:52,340 --> 00:00:53,730
guidance, on what parts
或者说很有用的导向

30
00:00:54,150 --> 00:00:56,550
of the pipeline might be the best use of your time to work on.
告诉你流水线中的哪个部分最值得你花时间去完成

31
00:00:59,740 --> 00:01:01,700
To talk about ceiling analysis, I'm
为了介绍上限分析

32
00:01:01,860 --> 00:01:03,140
going to keep on using the
我将继续使用之前用过的

33
00:01:03,690 --> 00:01:04,910
example of the photo
照片OCR流水线的例子

34
00:01:05,640 --> 00:01:06,870
OCR pipeline and I said
在之前的课程中

35
00:01:07,170 --> 00:01:08,270
earlier each of these
我讲过这些方框

36
00:01:08,480 --> 00:01:09,900
boxes text detection, character
文字检测、字符分割

37
00:01:10,200 --> 00:01:12,140
segmentation, character recognition, each
字符识别

38
00:01:12,310 --> 00:01:13,730
of these boxes can have even
这每一个方框都可能

39
00:01:14,100 --> 00:01:15,550
a small engineering team working
需要一个小团队来完成

40
00:01:15,920 --> 00:01:17,370
on it, or maybe the
当然也可能

41
00:01:17,690 --> 00:01:18,640
entire system is just built
你一个人来构建整个系统

42
00:01:18,800 --> 00:01:19,700
by you, either way, but
不管怎样

43
00:01:19,960 --> 00:01:22,340
the question is, where should you allocate resources?
问题是 你应该怎样分配资源呢？

44
00:01:22,730 --> 00:01:24,250
Which of these boxes is
哪一个方框最值得

45
00:01:24,430 --> 00:01:26,630
most worth your efforts, trying
你投入精力去做

46
00:01:26,920 --> 00:01:28,260
to improve the performance of.
投入时间去改善效果

47
00:01:29,070 --> 00:01:30,350
In order to explain the idea
（以下这段同前重复，译者注）

48
00:01:30,840 --> 00:01:32,560
of ceiling analysis, I'm going
为了解释上限分析的原理

49
00:01:32,730 --> 00:01:35,690
to keep using the example of our photo OCR pipeline.
我将继续使用照片OCR流水线的例子

50
00:01:37,000 --> 00:01:38,320
As I mentioned earlier, each of
在之前的视频中我讲过

51
00:01:38,430 --> 00:01:39,630
these boxes here, each of
这里的每个方框

52
00:01:39,850 --> 00:01:41,860
these machine learning components could be
都表示一个机器学习的组成部分

53
00:01:42,170 --> 00:01:43,270
the work of even a
需要一个小团队来完成

54
00:01:43,470 --> 00:01:44,720
small team of engineers, or
当然也可能

55
00:01:45,280 --> 00:01:48,110
maybe the whole system could be built by just one person.
整个系统都由一个人来完成

56
00:01:48,780 --> 00:01:49,920
But the question is, where should
但问题是

57
00:01:50,100 --> 00:01:51,990
you allocate scarce resources?
你应该如何分配资源呢？

58
00:01:52,130 --> 00:01:53,200
Now this, which of these
也就是说

59
00:01:53,690 --> 00:01:54,860
components, or which one or
这些模块中

60
00:01:54,950 --> 00:01:56,250
two or maybe all three of these components
哪一个 或者哪两个、三个

61
00:01:57,080 --> 00:01:58,540
is most worth your time
是最值得你花更多的

62
00:01:59,200 --> 00:02:01,060
to try to improve the performance of.
精力去改善它的效果的？

63
00:02:01,660 --> 00:02:02,810
So here's the idea of ceiling analysis.
这便是上限分析要做的事

64
00:02:04,140 --> 00:02:05,520
As in the development process for
跟其他机器学习系统的

65
00:02:05,890 --> 00:02:07,170
other machine learning systems as
开发过程一样

66
00:02:07,340 --> 00:02:08,490
well, in order to make
为了决定

67
00:02:08,670 --> 00:02:09,740
decisions on what to do
要开发这个系统应该

68
00:02:09,970 --> 00:02:11,150
for developing the system
采取什么样的行动

69
00:02:11,710 --> 00:02:12,770
is going to be
一个有效的方法是

70
00:02:12,900 --> 00:02:14,070
very helpful to have a
对学习系统使用一个

71
00:02:14,580 --> 00:02:17,650
single road number evaluation metric for this learning system.
数值评价量度

72
00:02:18,450 --> 00:02:19,390
So let's say we pick characters level accuracy.
所以假如我们用字符准确度作为这个量度

73
00:02:19,530 --> 00:02:21,140
So if, you know, given a
因此 给定一个

74
00:02:21,570 --> 00:02:22,840
test set image, while just
测试样本图像

75
00:02:22,860 --> 00:02:24,710
a fraction of alphabets of
那么这个数值就表示

76
00:02:25,060 --> 00:02:26,570
characters in the testing image that
我们对测试图像中的文字

77
00:02:28,980 --> 00:02:29,390
we recognize correctly.
识别正确的比例

78
00:02:29,550 --> 00:02:30,830
Or you can pick some other single world
或者你也可以选择

79
00:02:31,030 --> 00:02:32,270
number evaluation metric, if you
其他的某个数值评价度量值

80
00:02:32,370 --> 00:02:33,740
want, but let's say that
随你选择

81
00:02:34,040 --> 00:02:35,820
whatever evaluation metric we
但不管选择什么评价量度值

82
00:02:35,920 --> 00:02:37,680
pick, we get that, we
我们只是假设

83
00:02:37,880 --> 00:02:40,090
find that the overall system currently has 72% accuracy.
整个系统的估计准确率为72%

84
00:02:40,350 --> 00:02:42,210
So, in other
所以换句话说

85
00:02:42,350 --> 00:02:43,380
words, we have some set
我们有一些测试集图像

86
00:02:43,520 --> 00:02:44,960
of test set images and for
并且对测试集中的

87
00:02:45,180 --> 00:02:46,460
each test set images, we
每一幅图像

88
00:02:46,640 --> 00:02:47,850
run it through text section, then
我们都对其分别运行

89
00:02:47,980 --> 00:02:49,280
character 7 nation, then character
文字检测、字符分割

90
00:02:49,560 --> 00:02:50,680
recognition, and we find
然后字符识别

91
00:02:51,010 --> 00:02:52,240
that on our test set, the
然后我们发现

92
00:02:52,370 --> 00:02:53,570
overall accuracy of the
整个测试集的准确率是72%

93
00:02:53,800 --> 00:02:56,220
entire system was 72% on one of the metric you chose.
不管你用什么度量值来度量

94
00:02:58,120 --> 00:02:59,700
Now just the idea behind
下面是上限分析的

95
00:03:00,070 --> 00:03:01,610
sealing analysis which is that
主要思想

96
00:03:01,910 --> 00:03:03,530
we're going to go to let
首先我们关注

97
00:03:03,670 --> 00:03:05,100
see the first module of a
这个机器学习流程中的

98
00:03:05,400 --> 00:03:06,810
machinery pipelines text detection.
第一个模块 文字检测

99
00:03:07,270 --> 00:03:08,400
And what we are going
而我们要做的

100
00:03:08,420 --> 00:03:09,170
to do is we are going to
实际上是在

101
00:03:09,270 --> 00:03:11,310
monkey around with the test set.
给测试集样本捣点儿乱

102
00:03:11,980 --> 00:03:12,920
We are going to go to the
我们要对

103
00:03:12,990 --> 00:03:14,270
test set and for every test example
每一个测试集样本

104
00:03:14,830 --> 00:03:16,170
we are just going to provide it
都给它提供一个

105
00:03:16,380 --> 00:03:18,230
the correct text detection outputs.
正确的文字检测结果

106
00:03:19,210 --> 00:03:20,300
In other words, we are going
换句话说

107
00:03:20,560 --> 00:03:21,760
to the test set and just
我们要遍历每个测试集样本

108
00:03:21,960 --> 00:03:23,340
manually tell the algorithm
然后人为地告诉算法

109
00:03:24,350 --> 00:03:26,210
where the text is
每一个测试样本中

110
00:03:26,780 --> 00:03:27,940
in each of the test examples.
什么地方出现了文字

111
00:03:28,950 --> 00:03:29,960
So in other words, we
因此换句话说

112
00:03:30,030 --> 00:03:31,510
are going to simulate what happens
我们是要仿真出

113
00:03:32,030 --> 00:03:33,640
if we have a text detection
如果是100%

114
00:03:33,890 --> 00:03:35,350
system with a 100%
正确地检测出

115
00:03:35,610 --> 00:03:37,180
accuracy, for the purpose
图片中的文字信息

116
00:03:38,300 --> 00:03:40,410
of detecting text in an image.
应该是什么样的

117
00:03:42,050 --> 00:03:43,070
And really the way you
当然 要做到这个

118
00:03:43,110 --> 00:03:44,210
do that is very simple right, instead
是很容易的

119
00:03:44,620 --> 00:03:45,840
of letting your learning algorithm
现在不用你的学习算法

120
00:03:46,340 --> 00:03:47,630
detect the text in the images.
来检测图像中的文字了

121
00:03:48,180 --> 00:03:49,110
You wouldn't say go to the
你只需要找到对应的图像

122
00:03:49,340 --> 00:03:51,230
images and just manually label what
然后人为地识别出

123
00:03:51,540 --> 00:03:53,620
is the location of the text in my test set image.
测试集图像中出现文字的区域

124
00:03:54,200 --> 00:03:55,040
And you would then let these
然后你要做的就是让这些

125
00:03:55,530 --> 00:03:56,620
correct, so let these ground
绝对正确的结果

126
00:03:56,990 --> 00:03:58,370
true labels of where as
这些绝对为真的标签

127
00:03:58,560 --> 00:04:00,010
the text be part of
也就是告诉你

128
00:04:00,090 --> 00:04:01,330
your text set and use these
图像中哪些位置

129
00:04:01,580 --> 00:04:02,990
ground true labels what you
有文字信息的标签

130
00:04:03,110 --> 00:04:04,200
feed in to the next
把它们传给下一个模块

131
00:04:04,470 --> 00:04:07,550
stage of the pipeline, to the character segmentation pipeline.
也就是传给字符分割模块

132
00:04:07,710 --> 00:04:09,250
So just said it again, by
我再说一遍

133
00:04:09,680 --> 00:04:10,790
putting a checkmark over here,
这里打钩的地方

134
00:04:11,500 --> 00:04:12,590
what I mean is Im going
我想做的是

135
00:04:12,750 --> 00:04:13,750
to go to my test set and
遍历我的测试集

136
00:04:13,860 --> 00:04:14,970
just give it the correct answers,
直接向它公布“标准答案”

137
00:04:15,480 --> 00:04:16,520
give it the correct labels, for
为这个流程中的文字检测部分

138
00:04:16,650 --> 00:04:18,250
the text detection part of the pipeline.
直接提供正确的标签

139
00:04:19,240 --> 00:04:20,280
So that, as it, I have
这样好像我就会

140
00:04:20,410 --> 00:04:21,700
a perfect text detection system
有一个非常棒的文字检测系统

141
00:04:22,370 --> 00:04:24,270
on my test One into
能很好地检测我的测试样本

142
00:04:24,460 --> 00:04:26,570
do that run this data
然后我们要做的是

143
00:04:27,190 --> 00:04:28,150
to the rest of five points
继续运行完接下来的几个模块

144
00:04:28,530 --> 00:04:29,860
paper presentation and counter definition.
也就是字符分割和字符识别

145
00:04:30,680 --> 00:04:31,930
And then, use the same
然后使用跟之前一样的

146
00:04:32,300 --> 00:04:33,310
evaluation metric as before,
评价量度指标

147
00:04:34,000 --> 00:04:35,240
to measure what is the
来测量整个系统的

148
00:04:35,450 --> 00:04:36,900
overall accuracy of the entire system.
总体准确度

149
00:04:37,790 --> 00:04:39,890
And with perfect hopefully the performance goes up.
这样用准确的文字检测结果 系统的表现应该会有提升

150
00:04:40,330 --> 00:04:41,870
Let 's say it
假如说 准确率

151
00:04:41,930 --> 00:04:44,550
goes up 89% and then
提高到89%

152
00:04:44,680 --> 00:04:45,830
were going to keep going, next lets
然后我们继续进行

153
00:04:46,090 --> 00:04:47,120
go to the next selection of
接着执行流水线中的下一模块 字符分割

154
00:04:47,330 --> 00:04:50,230
pipeline, two character segmentation and again were going to go to my test.
同前面一样 我还是去找出我的测试集

155
00:04:50,540 --> 00:04:52,300
And now going to
然后现在我不仅用

156
00:04:52,390 --> 00:04:54,140
give the correct text detection
标准的文字检测结果

157
00:04:54,900 --> 00:04:55,970
output and give the correct
我还同时用标准的

158
00:04:56,490 --> 00:04:58,220
character segmentation outputs and
字符分割结果

159
00:04:59,400 --> 00:05:00,780
manually label the correct
所以还是遍历测试样本

160
00:05:01,330 --> 00:05:03,710
segment orientations of text into individual characters.
人工地给出正确的字符分割结果

161
00:05:04,730 --> 00:05:05,560
And see how much that helps.
然后看看这样做以后 效果怎样变化

162
00:05:05,810 --> 00:05:06,670
And let's say it goes up to
假如我们这样做以后

163
00:05:06,800 --> 00:05:09,140
90% accuracy for the overall system.
整个系统准确率提高到90%

164
00:05:10,070 --> 00:05:11,060
Alright so as always the accuracy is.
注意跟前面一样 这里说的准确率

165
00:05:11,340 --> 00:05:13,420
Accuracy of the overall systems.
是指整个系统的准确率

166
00:05:14,120 --> 00:05:15,460
So whatever the final output
所以无论最后一个模块

167
00:05:15,830 --> 00:05:17,450
of the character recognition system is.
字符识别模块给出的最终输出是什么

168
00:05:17,560 --> 00:05:18,870
Whatever the final output of
无论整个流水线的

169
00:05:19,040 --> 00:05:19,660
the overall pipeline is, it's going
最后输出结果是什么

170
00:05:19,930 --> 00:05:22,400
to measure the accuracy of that.
我们都是测出的整个系统的准确率

171
00:05:22,520 --> 00:05:23,720
And then finally like character recognition
最后我们还是执行最后一个模块 字符识别

172
00:05:24,170 --> 00:05:26,170
system and give that the correct label as well.
同样也是人工给出这一模块的正确标签

173
00:05:26,780 --> 00:05:29,270
And if I do that too then, no surprise that I should get a 100% accuracy.
这样做以后 我应该理所当然得到100%准确率

174
00:05:31,270 --> 00:05:32,530
Now, the nice thing about having
进行上限分析的

175
00:05:32,850 --> 00:05:34,340
done this analysis analysis is we
一个好处是

176
00:05:34,450 --> 00:05:36,080
can now understand what is
我们现在就知道了

177
00:05:36,700 --> 00:05:40,250
the upside potential for improving each of these components.
如果对每一个模块进行改善 它们各自的上升空间是多大

178
00:05:41,390 --> 00:05:44,180
So we see that if we get perfect text detection.
所以 我们可以看到 如果我们拥有完美的文字检测模块

179
00:05:44,950 --> 00:05:46,360
Our performance went up from
那么整个系统的表现将会从

180
00:05:46,710 --> 00:05:48,080
72 to 89 percent, so
准确率72%上升到89%

181
00:05:48,420 --> 00:05:50,670
that's' a 17 percent performance gain.
因此效果的增益是17%

182
00:05:51,640 --> 00:05:52,680
So this means that you've
这就意味着

183
00:05:52,890 --> 00:05:54,030
to take your current system you
如果你在现有系统的基础上

184
00:05:54,160 --> 00:05:56,130
spend a lot of time improving text detection.
花费时间和精力改善文字检测模块的效果

185
00:05:57,330 --> 00:05:58,750
That means that we could potentially improve
那么系统的表现

186
00:05:59,200 --> 00:06:00,640
our system's performance by 17 percent.
可能会提高17%

187
00:06:01,020 --> 00:06:02,850
This seems like it's well worth our while.
看起来这还挺值得

188
00:06:03,770 --> 00:06:05,840
Whereas in contrast, when going
而相对来讲

189
00:06:06,200 --> 00:06:08,360
from text detection When we
如果我们取得完美的字符分割模块

190
00:06:08,640 --> 00:06:12,450
gave it perfect character segmentation, performance went up only by one percent.
那么最终系统表现只提升了1%

191
00:06:13,020 --> 00:06:14,820
So, that's a more sobering message.
这便提供了一个很重要的信息

192
00:06:15,250 --> 00:06:16,880
It means that no matter how
这就告诉我们

193
00:06:17,090 --> 00:06:18,510
much time you spend character segmentation,
不管我们投入多大精力在字符分割上

194
00:06:19,800 --> 00:06:20,990
maybe the upside potential is
系统效果的潜在上升空间

195
00:06:21,080 --> 00:06:22,280
going to be pretty small, and maybe
也都是很小很小

196
00:06:22,460 --> 00:06:23,420
you do not want to
所以你就不会让一个

197
00:06:23,580 --> 00:06:24,340
have a large team of engineers
比较大的工程师团队

198
00:06:24,860 --> 00:06:26,860
working on character segmentation that
花时间忙于字符分割模块

199
00:06:26,990 --> 00:06:28,860
this sort of analysis shows that
因为通过上限分析我们知道了

200
00:06:29,150 --> 00:06:30,180
even when you give it the
即使你把字符分割模块做得再好

201
00:06:30,260 --> 00:06:32,480
perfect character segmentation, your
再怎么完美 你的系统表现

202
00:06:32,620 --> 00:06:34,180
performance goes up by only one percent.
最多也只能提升1%

203
00:06:34,620 --> 00:06:36,090
So right there, this is really estimates.
所以这就估计出

204
00:06:36,890 --> 00:06:38,080
What is the ceiling, or what's
通过改善各个模块的质量

205
00:06:38,300 --> 00:06:39,360
an upper bound on how much
你的系统表现

206
00:06:39,550 --> 00:06:40,690
you can improve the performance of your
所能提升的上限值

207
00:06:40,740 --> 00:06:42,710
system by working on one of these components?
或者说最大值 是多少

208
00:06:44,330 --> 00:06:45,600
And finally, going for character,
最后

209
00:06:46,320 --> 00:06:47,700
when we get better
如果我们取得完美的字符识别模块

210
00:06:47,900 --> 00:06:50,080
character recognition, the performance went up by ten percent.
那么整个系统的表现将提高10%

211
00:06:50,530 --> 00:06:51,640
So you know, again you
所以 同样

212
00:06:51,750 --> 00:06:52,570
can decide, is a ten
你也可以分析

213
00:06:52,860 --> 00:06:55,630
percent improvement, how much is that working out?
10%的效果提升值得投入多少工作量

214
00:06:55,830 --> 00:06:57,200
It tells you that maybe
也许这也告诉你

215
00:06:57,400 --> 00:06:58,670
with more efforts spent on the
如果把精力投入在

216
00:06:58,730 --> 00:06:59,690
last station of the pipeline,
流水线的最后这个模块

217
00:07:00,360 --> 00:07:02,840
you can improve the performance
那么系统的性能

218
00:07:03,760 --> 00:07:04,500
of the systems as well.
还是能得到较大的提高

219
00:07:05,610 --> 00:07:06,580
Another way of thinking about this
另一种认识这种分析方法的角度是

220
00:07:06,870 --> 00:07:08,090
is that, by going through this
通过这样的分析

221
00:07:08,290 --> 00:07:09,470
sort of analysis you're trying to
你就能总结出

222
00:07:09,570 --> 00:07:10,640
figure out, you know, what is
改善每个模块的性能

223
00:07:10,740 --> 00:07:12,700
the upside potential, of improving
系统的上升空间是多少

224
00:07:13,480 --> 00:07:14,980
each of these components or how
或者说如果其中的某个模块

225
00:07:15,080 --> 00:07:16,730
much could you possibly gain if
变得绝对完美时

226
00:07:17,260 --> 00:07:18,910
one of these components became absolutely
你能得到什么收获

227
00:07:19,380 --> 00:07:20,780
perfect and just really
这就像是给系统表现

228
00:07:21,060 --> 00:07:23,230
places an upper bound on the performance of that system.
加上了一个提升的上限值

229
00:07:24,220 --> 00:07:26,290
So, the idea of ceiling analysis is pretty important.
所以 上限分析的概念是很重要的

230
00:07:26,900 --> 00:07:29,840
Let me just illustrate this idea again, but with a different example but a more complex one.
下面我再用一个稍微复杂一点的例子来演绎一下上限分析的原理

231
00:07:31,860 --> 00:07:32,990
Let's say that you want to
假如说你想对这张图像

232
00:07:33,260 --> 00:07:34,830
do face recognition from images,
进行人脸识别

233
00:07:35,280 --> 00:07:35,960
so unless you want to look at
也就是说看着这张照片

234
00:07:35,990 --> 00:07:37,650
the picture and recognize whether or
你希望识别出

235
00:07:37,820 --> 00:07:38,770
not the person in this picture
照片里这个人

236
00:07:39,470 --> 00:07:40,640
is a particular friend of yours,
是不是你的朋友

237
00:07:40,670 --> 00:07:43,880
trying to recognize the person shown in this image.
希望辨识出图像中的人

238
00:07:44,180 --> 00:07:46,260
This is a slightly artificial example.
这是一个偏人工智能的例子

239
00:07:47,130 --> 00:07:51,080
This isn't actually how face
当然这并不是现实中的

240
00:07:51,320 --> 00:07:52,790
recognition is done in
人脸识别技术

241
00:07:52,800 --> 00:07:53,660
practice, but I want to step through an example of what a
但我想通过这个例子

242
00:07:53,870 --> 00:07:54,800
pipeline might look like to
来向你展示一个流水线

243
00:07:54,940 --> 00:07:56,220
give you another example of how
并且给你另一个关于

244
00:07:56,450 --> 00:07:57,820
a ceiling analysis process might look.
上限分析的实例

245
00:07:58,710 --> 00:07:59,980
So, we have a
假如我们有张照片

246
00:08:00,160 --> 00:08:03,830
camera image and let's say that we design a pipeline as follows.
我们设计了如下的流水线

247
00:08:04,420 --> 00:08:05,120
Let's say the first thing you want
假如我们第一步要做的

248
00:08:05,380 --> 00:08:07,480
to do is do pre-processing of
是图像预处理

249
00:08:07,560 --> 00:08:08,770
the image, so let's take those
假如我们就用

250
00:08:08,910 --> 00:08:10,310
images like I have shown on
右上角这张照片

251
00:08:10,390 --> 00:08:11,040
the upper right, and let's say we
现在加入我们想要

252
00:08:11,140 --> 00:08:12,510
want to remove the background, so
把背景去掉

253
00:08:13,030 --> 00:08:14,790
through pre-processing the background disappears.
那么经过预处理 背景就被去掉了

254
00:08:16,070 --> 00:08:18,820
Next we want to say detect the face of the person.
下一步我们希望检测出人脸的位置

255
00:08:19,370 --> 00:08:20,550
That's usually done with a learning algorithm.
这通常通过一个学习算法来实现

256
00:08:20,930 --> 00:08:21,960
So we'll run a sliding
我们会运行一个滑动窗分类器

257
00:08:22,180 --> 00:08:24,900
windows crossfire to draw a box around the person's face.
在人脸上画一个框

258
00:08:25,680 --> 00:08:26,720
Having detected the face it
在检测到脸部以后

259
00:08:26,790 --> 00:08:27,650
turns out that if you
如果你想要

260
00:08:27,770 --> 00:08:29,320
want to recognize people it turns
识别出这个人

261
00:08:29,530 --> 00:08:31,630
out that the eyes is a highly useful cue.
那么眼睛是一个很重要的线索

262
00:08:32,000 --> 00:08:33,860
We actually, in terms
事实上

263
00:08:34,130 --> 00:08:35,420
ofrecognizing your friends, the
要辨认出你的朋友

264
00:08:35,700 --> 00:08:36,870
appearance of their eyes is actually
你通常会看眼睛

265
00:08:37,330 --> 00:08:38,680
one of the most important cues that you use.
这是个比较重要的线索

266
00:08:39,470 --> 00:08:41,610
So let's run another crossfire to detect the eyes of the person.
所以 我们需要运行另一个分类器来检测人的眼睛

267
00:08:42,500 --> 00:08:43,660
So, segment out the eyes,
分割出眼睛

268
00:08:44,410 --> 00:08:45,650
and then and since this
这样就提供了

269
00:08:45,900 --> 00:08:47,290
will give us useful features to
识别出一个人的

270
00:08:47,380 --> 00:08:48,840
recognize a person, and then
很重要的特征

271
00:08:49,100 --> 00:08:50,400
other parts of the face of physical interest.
然后继续识别脸上其他重要的部位

272
00:08:50,990 --> 00:08:52,330
Maybe segment out the nose,
比如分割出鼻子

273
00:08:52,830 --> 00:08:54,750
segment out the mouth, and
分割出嘴巴

274
00:08:54,980 --> 00:08:56,230
then, having found the
这样找出了

275
00:08:56,370 --> 00:08:57,060
eyes, the nose and the mouth,
眼睛、鼻子、嘴巴

276
00:08:57,340 --> 00:08:58,420
all of these give us useful
所有这些都是非常有用的特征

277
00:08:58,740 --> 00:08:59,920
features to maybe feed into
然后这些特征可以被输入给某个

278
00:09:00,580 --> 00:09:01,540
a logistic regression crossfire.
逻辑回归的分类器

279
00:09:02,500 --> 00:09:03,200
And it's the job of the
然后这个分类器的任务

280
00:09:03,480 --> 00:09:04,420
crossfire to then give us the
就是给出最终的标签

281
00:09:04,710 --> 00:09:05,850
overall label to find the
找出我们认为能

282
00:09:05,970 --> 00:09:06,930
label for who we think
辨别出这个人是谁的

283
00:09:07,190 --> 00:09:08,450
is the identity of this person.
最终的标签

284
00:09:10,110 --> 00:09:11,730
So this is a kind of complicated pipeline.
这是一个稍微复杂一些的流水线

285
00:09:12,160 --> 00:09:13,300
It's actually probably more complicated
如果你真的想识别出人的话

286
00:09:13,950 --> 00:09:16,810
than you should be using, if you actually want to recognize people.
可能实际的流程比这个还要复杂

287
00:09:17,620 --> 00:09:20,330
But there's an illustrative example that's useful to think about for ceiling analysis.
但这给出了很好的一个上限分析的例子

288
00:09:22,150 --> 00:09:24,510
So how do you go through ceiling analysis for this pipeline?
对这个流水线怎么进行上限分析呢？

289
00:09:25,000 --> 00:09:26,790
Well, we'll step through these pieces one at a time.
我们还是每次关注一个步骤

290
00:09:27,470 --> 00:09:28,900
Let's say your overall system has
假如说你整个系统的准确率

291
00:09:29,150 --> 00:09:30,560
85 percent accuracy, the first
达到了85%

292
00:09:30,720 --> 00:09:31,670
thing I do is go to
那么我要做的第一件事情

293
00:09:31,750 --> 00:09:32,890
my test set and manually
还是找到我的测试集

294
00:09:33,860 --> 00:09:36,200
give it a ground foreground, background,
然后对前景和背景

295
00:09:36,740 --> 00:09:38,090
segmentations, and then manually go to
进行分割

296
00:09:38,150 --> 00:09:39,670
the test set, and use Photoshop
然后使用Photoshop

297
00:09:40,290 --> 00:09:41,750
or something, to just tell it
或者别的什么软件

298
00:09:41,950 --> 00:09:43,130
where's the background, and just
识别出哪些区域是背景

299
00:09:43,360 --> 00:09:45,230
manually remove the background, so
然后手动把背景删掉

300
00:09:45,470 --> 00:09:48,050
ground true background, and see how much the accuracy changes.
然后观察准确率提高多少

301
00:09:48,990 --> 00:09:50,320
In this example, the accuracy
假设在这个例子中

302
00:09:50,800 --> 00:09:53,700
goes up by 0.1%  so
准确率提高了0.1%

303
00:09:53,860 --> 00:09:54,900
this is a strong sign that
这是个很明显的信号

304
00:09:55,100 --> 00:09:56,240
even if you had perfect background
它告诉你即便你

305
00:09:56,630 --> 00:09:59,680
segmentation your performance, even
把背景分割做得很好

306
00:09:59,840 --> 00:10:01,650
if perfect background removal, the
完全去除了背景图案

307
00:10:01,730 --> 00:10:03,740
performance of your system isn't going to go up that much.
但整个系统的表现也并不会提高多少

308
00:10:03,880 --> 00:10:05,000
So this is maybe not worth a
所以似乎并不值得

309
00:10:05,190 --> 00:10:07,720
huge effort to work on pre-processing, on background removal.
花太多精力在预处理或者背景移除上

310
00:10:09,270 --> 00:10:10,170
Then, everything goes to the
接下来 再遍历测试集

311
00:10:10,230 --> 00:10:11,290
test set, given the correct
给出正确的脸部识别图案

312
00:10:11,780 --> 00:10:13,650
face detection images, then again
接下来还是依次运行

313
00:10:14,140 --> 00:10:16,690
step through the eyes, nose, mouth segmentations in some order.
眼睛、鼻子和嘴巴的分割

314
00:10:17,100 --> 00:10:17,470
Pick one order.
选择一种顺序就行了

315
00:10:17,700 --> 00:10:18,890
Let's give the correct location
给出眼睛的正确位置

316
00:10:19,340 --> 00:10:20,520
of the eyes, correct location of
鼻子的正确位置

317
00:10:20,750 --> 00:10:22,510
the nose, correct location of
嘴巴的正确位置

318
00:10:22,520 --> 00:10:23,740
the mouth, and then finally
最后 再给出最终的正确标签

319
00:10:24,130 --> 00:10:26,200
if I just give it the correct overall label, I get 100% accuracy.
准确率提高到100%

320
00:10:27,900 --> 00:10:29,390
And so, you know, as
注意看

321
00:10:29,500 --> 00:10:30,430
I go through the system
在我每次通过这个系统的时候

322
00:10:31,040 --> 00:10:32,080
and just give more and more
我给测试集提供的

323
00:10:32,210 --> 00:10:33,900
components the correct labels
正确的模块越来越多

324
00:10:34,370 --> 00:10:35,350
in the test set, the performance
因此整个系统的表现

325
00:10:35,830 --> 00:10:37,550
So if the overall system goes up,
逐步上升

326
00:10:37,730 --> 00:10:38,640
and you can look at how much
这样你就能很清楚地看到

327
00:10:38,890 --> 00:10:39,860
the performance went up on
通过不同的步骤

328
00:10:40,240 --> 00:10:41,660
different steps, so, you know, from
系统的表现增加了多少

329
00:10:42,550 --> 00:10:43,830
giving it the perfect face detection,
比如 有了完美的脸部识别

330
00:10:44,440 --> 00:10:45,270
and it looks like the overall
整个系统的表现似乎

331
00:10:45,570 --> 00:10:48,290
performance of this system went up by 5.9 percent.
提高了5.9%

332
00:10:49,710 --> 00:10:50,670
So that's a pretty big jump,
这算是比较大的提高了

333
00:10:50,980 --> 00:10:52,100
means that maybe it's worth quite
这告诉你也许在脸部检测上

334
00:10:52,370 --> 00:10:53,660
a bit of effort on better face detection.
多做点努力是有意义的

335
00:10:54,670 --> 00:10:56,290
Went four percent there, went
这里提高4%

336
00:10:56,710 --> 00:10:58,680
one percent there, one percent
这两步都是提高1%

337
00:10:59,160 --> 00:11:00,600
there and three percent there.
这一步提高3%

338
00:11:01,520 --> 00:11:02,840
So it looks like the
所以从整体上看

339
00:11:02,980 --> 00:11:04,250
components that most worth
最值得我付出努力的模块

340
00:11:04,730 --> 00:11:06,520
our while are, when
按顺序排列一下

341
00:11:06,680 --> 00:11:08,540
I gave it perfect face detection,
排在最前的是脸部检测

342
00:11:09,680 --> 00:11:10,190
system went up.
系统表现提高了

343
00:11:10,490 --> 00:11:11,990
By 5.9 performance, might give
5.9%

344
00:11:12,170 --> 00:11:14,170
it perfect eye segmentation, went up
给它完美的眼部分割

345
00:11:14,380 --> 00:11:15,540
by 4%, and then my final logistical
系统表现提高4%

346
00:11:16,000 --> 00:11:19,220
crossfire, well there's another 3 percent gap there maybe.
最终是我的逻辑回归分类器 提高大约3%

347
00:11:19,570 --> 00:11:20,580
And so, this tells us
因此 这很清楚地指出了

348
00:11:20,810 --> 00:11:23,400
maybe one of the components that are most worth our while working on.
哪一个模块是最值得花精力去完善的

349
00:11:24,610 --> 00:11:25,690
And by the way, I
顺便一提

350
00:11:25,830 --> 00:11:28,110
want to tell you, it's a true cautionary story.
我还想讲一个真实的故事

351
00:11:28,680 --> 00:11:29,620
The reason I put in this
我在预处理这里

352
00:11:29,850 --> 00:11:32,350
pre-processing background removal is
放入背景移除这个部分的原因是

353
00:11:32,600 --> 00:11:34,050
because I actually know
我知道一件真实的事情

354
00:11:34,340 --> 00:11:35,530
of a true story where there
原来有一个研究小组

355
00:11:35,770 --> 00:11:37,140
was a research team that actually
大概有两个人

356
00:11:37,480 --> 00:11:38,990
literally had two people spend
不夸张地说

357
00:11:39,580 --> 00:11:40,250
about a year and a half,
花了一年半的时间

358
00:11:40,530 --> 00:11:42,410
spend 18 months, working on
整整18个月

359
00:11:42,770 --> 00:11:44,050
better background removal.
都在完善背景移除的效果

360
00:11:44,480 --> 00:11:45,680
We are rushing here... I am
我不太清楚

361
00:11:46,120 --> 00:11:47,490
obscuring the details for obvious
具体的细节和原因是什么

362
00:11:47,970 --> 00:11:48,770
reasons, but there was a
但确实是有两个工程师

363
00:11:48,820 --> 00:11:50,610
computer vision application where there
为了开发某个

364
00:11:50,720 --> 00:11:51,660
was a team of two engineers
计算机视觉的应用系统

365
00:11:51,770 --> 00:11:52,850
who literally spent I think
大概花了一年半的时间

366
00:11:52,990 --> 00:11:54,210
about a year and a half, working
就为了得到一个

367
00:11:54,550 --> 00:11:56,050
on better background removal.
更好的背景移除效果

368
00:11:56,550 --> 00:11:57,720
Actually they worked out
事实上他们确实研究出了非常复杂的算法

369
00:11:57,820 --> 00:12:00,270
really complicated algorithms, so I ended up publishing I think, one research paper.
貌似最后还发表了一篇文章

370
00:12:01,080 --> 00:12:02,000
But after all that work they
但最终他们发现

371
00:12:02,110 --> 00:12:03,020
found that, it just did
所有付出的这些劳动

372
00:12:03,260 --> 00:12:04,910
not make a huge difference to
都不能给他们研发系统

373
00:12:05,200 --> 00:12:06,490
the overall performance of the
的整体表现带来

374
00:12:06,710 --> 00:12:09,120
actual application they were working on.
比较大的提升

375
00:12:09,450 --> 00:12:10,770
And if only, you know if
而如果要是之前

376
00:12:10,770 --> 00:12:13,170
only someone were to do a [xx] analysis
他们组某个人做一下上限分析

377
00:12:13,700 --> 00:12:15,790
beforehand, maybe we could have realized this.
他们就会提前意识到这个问题

378
00:12:17,240 --> 00:12:18,360
And one of them said to me
后来 他们中有一个人

379
00:12:18,480 --> 00:12:19,510
afterward, you know, if only they
跟我说 如果他们之前

380
00:12:19,640 --> 00:12:20,580
had done the sort of analysis
也做了某种这样的分析

381
00:12:20,850 --> 00:12:21,710
like this, maybe they could
他们就会在长达

382
00:12:21,990 --> 00:12:23,190
have realized before that 18 months
18个月的辛苦劳动以前

383
00:12:23,440 --> 00:12:25,180
of work, that they
意识到这个问题

384
00:12:25,240 --> 00:12:26,300
should have spent their effort focusing
他们就可以把精力花在

385
00:12:26,680 --> 00:12:28,920
on some different component than literally
其他更重要的模块上

386
00:12:29,380 --> 00:12:31,230
spending 18 months working on background removal.
而不是把18个月花在背景移除上

387
00:12:33,910 --> 00:12:36,140
So to summarize, pipelines are
总结一下

388
00:12:36,390 --> 00:12:38,630
pretty pervasive and complex machine learning applications.
流水线是非常常用却又很复杂的机器学习应用

389
00:12:39,890 --> 00:12:40,950
And when you are working on
当你在开发某个

390
00:12:41,200 --> 00:12:42,780
a big machine learning application, I
机器学习应用的时候

391
00:12:42,830 --> 00:12:45,450
mean I think your time as a developer is so valuable.
作为一个开发者 你的时间是相当宝贵的

392
00:12:46,090 --> 00:12:47,360
So just don't waste your
所以真的不要花时间

393
00:12:47,460 --> 00:12:50,120
time working on something that ultimately isn't going to matter.
去做一些到头来没意义的事情

394
00:12:51,350 --> 00:12:52,370
And in this video, we talked
因此在这节课中

395
00:12:52,490 --> 00:12:53,570
about this idea of ceiling analysis,
我给大家介绍了上限分析的概念

396
00:12:54,340 --> 00:12:55,750
which I've often found to
我经常觉得上限分析

397
00:12:55,850 --> 00:12:57,000
be a very good tool for
是个很有用的工具

398
00:12:57,130 --> 00:12:58,660
identifying the component, and if
当你想花精力到某个模块上时

399
00:12:58,760 --> 00:12:59,830
you actually put a focused effort
你可以用上线分析的方法

400
00:13:00,050 --> 00:13:01,010
on that component, and make a
来确定你的努力

401
00:13:01,250 --> 00:13:02,420
big difference, it would actually
会不会产生什么意义

402
00:13:03,050 --> 00:13:04,360
have a huge effect on the
整个系统的表现

403
00:13:04,620 --> 00:13:06,040
overall performance of your final system.
会不会产生明显的提高

404
00:13:07,070 --> 00:13:08,010
So, over the years, working
所以 经过这么多年

405
00:13:08,340 --> 00:13:09,520
with machine learning, I've actually learned
在机器学习中的摸爬滚打

406
00:13:09,710 --> 00:13:10,900
to not trust my own gut
我已经学会了不要凭自己的直觉

407
00:13:11,100 --> 00:13:13,200
feeling about what component to work on.
来判断应该干什么

408
00:13:13,280 --> 00:13:14,310
So, very often, when you have
虽然我从事

409
00:13:14,540 --> 00:13:15,440
worked with machine learning for a
机器学习的工作

410
00:13:15,570 --> 00:13:17,160
long time, but often, our local
已经很多年了

411
00:13:17,360 --> 00:13:18,770
machine learning problem, and I
但经常遇到某个机器学习问题时

412
00:13:18,950 --> 00:13:20,130
may have some gut feeling about,
总有一些直觉告诉我

413
00:13:20,450 --> 00:13:22,970
oh, let's, you know, jump on that component, and just spend more time on that.
我们应该跳到那一个模块 应该把时间花在那儿

414
00:13:24,120 --> 00:13:25,050
That over the years that I
但经过这么多年

415
00:13:25,160 --> 00:13:26,600
have come to even trust my
我现在也开始慢慢意识到

416
00:13:26,740 --> 00:13:27,800
own gut feelings and knowing not
还是不能太相信

417
00:13:28,130 --> 00:13:29,310
to trust gut feelings that much
自己的感觉

418
00:13:29,980 --> 00:13:31,450
and instead really have a
相反地 如果要解决某个机器学习问题

419
00:13:31,520 --> 00:13:33,060
solid machine learning problem, where it's
最好能把问题

420
00:13:33,180 --> 00:13:34,750
possible to structure things.
分成多个模块

421
00:13:34,960 --> 00:13:36,340
To do a ceiling analysis often
然后做一下上限分析

422
00:13:36,660 --> 00:13:37,720
does a much better and much
这通常给你更可靠

423
00:13:37,910 --> 00:13:39,110
more reliable way for deciding
更好的方法

424
00:13:39,670 --> 00:13:40,900
where to put a focused effort
来为你决定

425
00:13:40,940 --> 00:13:42,270
into, to really improve this,
该把劲儿往哪儿使

426
00:13:42,690 --> 00:13:44,570
the performance of some component and
该提高哪个模块的效果

427
00:13:44,680 --> 00:13:45,900
we kind of be sure that when
这样我们就会非常确认

428
00:13:46,180 --> 00:13:46,960
do that it will actually have
把这个模块做好就能提高

429
00:13:47,200 --> 00:13:49,460
a huge effect on the final performance of your process system.
系统的最终表现【果壳教育无边界字幕组】翻译：所罗门捷列夫