srt/11 - 4 - Trading Off Precision and Recall (14 min).srt

﻿1
00:00:00,410 --> 00:00:01,520
In the last video, we talked
在之前的课程中 我们谈到
(字幕整理：中国海洋大学 黄海广,haiguang2000@qq.com )

2
00:00:01,820 --> 00:00:04,130
about precision and recall as
查准率和召回率

3
00:00:04,280 --> 00:00:06,180
an evaluation metric for classification
作为遇到偏斜类问题

4
00:00:06,840 --> 00:00:08,220
problems with skew classes.
的评估度量值

5
00:00:09,530 --> 00:00:11,020
For many applications, we'll want
在很多应用中

6
00:00:11,180 --> 00:00:13,350
to somehow control the trade
我们希望能够保证

7
00:00:13,630 --> 00:00:15,640
off between position and recall.
查准率和召回率的相对平衡

8
00:00:16,500 --> 00:00:17,310
Let me tell you how
在这节课中

9
00:00:17,470 --> 00:00:19,020
to do that and also show
我将告诉你应该怎么做

10
00:00:19,390 --> 00:00:20,520
you some, even more effective
同时也向你展示一些

11
00:00:21,050 --> 00:00:22,810
ways to use precision and
查准率和召回率

12
00:00:22,980 --> 00:00:24,290
recall as an evaluation
作为算法评估度量值的

13
00:00:24,720 --> 00:00:27,380
metric for learning algorithms.
更有效的方式

14
00:00:28,620 --> 00:00:30,180
As a reminder, here are the
回忆一下

15
00:00:30,250 --> 00:00:32,150
definitions of precision and
这是查准率和召回率的定义

16
00:00:32,380 --> 00:00:34,100
recall from the previous video.
我们在上一节中讲到的

17
00:00:35,920 --> 00:00:37,650
Let's continue our cancer classification
让我们继续用

18
00:00:38,680 --> 00:00:39,980
example, where y equals
癌症分类的例子

19
00:00:40,370 --> 00:00:41,790
one if the patient has cancer
如果病人患癌症 则y=1

20
00:00:42,270 --> 00:00:43,310
and y equals zero otherwise.
反之则y=0

21
00:00:44,800 --> 00:00:46,060
And let's say we've trained in
假设我们用

22
00:00:46,360 --> 00:00:48,580
logistic regression classifier, which outputs
逻辑回归模型训练了数据

23
00:00:49,070 --> 00:00:50,690
probabilities between zero and one.
输出概率在0-1之间的值

24
00:00:51,740 --> 00:00:52,830
So, as usual, we're
因此

25
00:00:53,010 --> 00:00:54,690
going to predict one, y equals
我们预测y=1

26
00:00:55,080 --> 00:00:56,290
one if h of x
如果h(x)

27
00:00:56,560 --> 00:00:57,980
is greater than or equal to
大于或等于0.5

28
00:00:58,090 --> 00:00:59,720
0.5 and predict zero if
预测值为0

29
00:01:00,140 --> 00:01:01,570
the hypothesis outputs a value
如果方程输出值

30
00:01:01,820 --> 00:01:03,720
less than 0.5 and this
小于0.5

31
00:01:04,040 --> 00:01:05,400
classifier may give us
这个回归模型

32
00:01:05,710 --> 00:01:08,430
some value for precision and some value for recall.
能够计算查准率和召回率

33
00:01:10,420 --> 00:01:11,860
But now, suppose we want
但是现在

34
00:01:12,140 --> 00:01:13,440
to predict that a patient
假如我们希望

35
00:01:13,730 --> 00:01:15,510
has cancer only if we're
在我们非常确信地情况下

36
00:01:15,750 --> 00:01:17,190
very confident that they really do.
才预测一个病人得了癌症

37
00:01:18,010 --> 00:01:18,900
Because you know if you go
因为你知道

38
00:01:19,140 --> 00:01:20,180
to a patient and you tell
如果你告诉一个病人

39
00:01:20,480 --> 00:01:21,570
them that they have cancer, it's
告诉他们你得了癌症

40
00:01:21,710 --> 00:01:22,450
going to give them a huge
他们会非常震惊

41
00:01:22,680 --> 00:01:23,860
shock because this is seriously
因为这是一个

42
00:01:24,220 --> 00:01:25,610
bad news and they may
非常坏的消息

43
00:01:25,700 --> 00:01:27,080
end up going through a pretty
而且他们会经历一段

44
00:01:27,660 --> 00:01:29,570
painful treatment process and so on.
非常痛苦的治疗过程

45
00:01:29,780 --> 00:01:30,770
And so maybe we want to
因此我们希望

46
00:01:30,980 --> 00:01:31,880
tell someone that we think
只有在我们非常确信的情况下

47
00:01:32,090 --> 00:01:34,240
they have cancer only if they're very confident.
才告诉这个人他得了癌症

48
00:01:36,230 --> 00:01:37,210
One way to do this would
这样做的一种方法

49
00:01:37,320 --> 00:01:38,940
be to modify the algorithm, so
是修改算法

50
00:01:39,120 --> 00:01:40,290
that instead of setting the
我们不再将临界值

51
00:01:40,710 --> 00:01:42,270
threshold at 0.5, we
设为0.5

52
00:01:42,820 --> 00:01:44,360
might instead say that we'll
也许 我们只在

53
00:01:44,510 --> 00:01:45,370
predict that y is equal
h(x)的值大于或等于0.7

54
00:01:46,330 --> 00:01:48,630
to 1, only if H of
的情况下

55
00:01:48,700 --> 00:01:50,200
x is greater than or equal to 0.7.
才预测y=1

56
00:01:50,490 --> 00:01:51,620
So this, I think
因此

57
00:01:52,360 --> 00:01:53,400
will tell someone if they
我们会告诉一个人

58
00:01:53,510 --> 00:01:54,530
have cancer only if we think
他得了癌症

59
00:01:54,810 --> 00:01:56,280
there's a greater than, greater
在我们认为

60
00:01:56,730 --> 00:01:59,060
than or equal to 70% that they have cancer.
他有大于等于70%得癌症的概率情况下

61
00:02:00,830 --> 00:02:02,000
And if you do this then
如果你这么做

62
00:02:02,850 --> 00:02:03,740
you're predicting some of this
那么你只在

63
00:02:03,840 --> 00:02:04,990
cancer only when you're
非常确信地情况下

64
00:02:05,100 --> 00:02:07,230
more confident, and so
才预测癌症

65
00:02:07,520 --> 00:02:08,830
you end up with a classifier
那么你的回归模型

66
00:02:09,920 --> 00:02:13,410
that has higher precision, because
会有较高的查准率

67
00:02:14,140 --> 00:02:15,300
all the patients that you're
因为所有你准备

68
00:02:15,450 --> 00:02:16,630
going to and say, you know,
告诉他们

69
00:02:16,860 --> 00:02:18,220
we think you have cancer, all
患有癌症的病人

70
00:02:18,440 --> 00:02:19,760
of those patients are now
所有这些人

71
00:02:20,350 --> 00:02:21,420
pretty, once they hear, pretty
有比较高的可能性

72
00:02:21,720 --> 00:02:23,100
confident they actually have cancer.
他们真的患有癌症

73
00:02:24,260 --> 00:02:26,050
And so, a higher fraction of
你预测患有癌症的病人中

74
00:02:26,150 --> 00:02:27,460
the patients that you predict to
有较大比率的人

75
00:02:27,530 --> 00:02:28,990
have cancer, will actually turn
他们确实患有癌症

76
00:02:29,280 --> 00:02:30,720
out to have cancer, because in
因为这是我们

77
00:02:31,000 --> 00:02:32,870
making those predictions we are pretty confident.
在非常确信的情况下做出的预测

78
00:02:34,510 --> 00:02:36,360
But in contrast, this classifier will
与之相反

79
00:02:36,540 --> 00:02:38,530
have lower recall, because
这个回归模型会有较低的召回率

80
00:02:39,140 --> 00:02:40,220
now we are going
因为

81
00:02:40,340 --> 00:02:41,650
to make predictions, we are
当我们做预测的时候

82
00:02:41,740 --> 00:02:44,180
going to predict y equals one, on a smaller number of patients.
我们只给很小一部分的病人预测y=1

83
00:02:45,090 --> 00:02:45,920
Now we could even take this further.
现在我们把这个情况夸大一下

84
00:02:46,330 --> 00:02:47,520
Instead of setting the threshold
我们不再把临界值

85
00:02:48,080 --> 00:02:49,210
at 0.7, we can set
设在0.7

86
00:02:49,490 --> 00:02:51,550
this at 0.9 and we'll predict
我们把它设为0.9

87
00:02:52,430 --> 00:02:53,270
y1 only if we are
我们只在至少90%肯定

88
00:02:53,320 --> 00:02:54,560
more than 90% certain that
这个病人患有癌症的情况下

89
00:02:55,380 --> 00:02:57,020
the patient has cancer, and so,
预测y=1

90
00:02:57,600 --> 00:02:58,720
you know, a large fraction that
那么这些病人当中

91
00:02:58,850 --> 00:02:59,820
those patients will turn out
有非常大的比率

92
00:03:00,020 --> 00:03:01,380
to have cancer and so,
真正患有癌症

93
00:03:01,560 --> 00:03:03,060
this is the high precision classifier
因此这是一个高查准率的模型

94
00:03:04,160 --> 00:03:06,090
will have lower recall because we
但是召回率会变低

95
00:03:06,190 --> 00:03:08,550
want to correctly detect that those patients have cancer.
因为我们希望能够正确检测患有癌症的病人

96
00:03:09,310 --> 00:03:10,780
Now consider a different example.
现在考虑一个不同的例子

97
00:03:12,100 --> 00:03:13,200
Suppose we want to avoid
假设我们希望

98
00:03:13,470 --> 00:03:15,530
missing too many actual cases of cancer.
避免遗漏掉患有癌症的人

99
00:03:15,960 --> 00:03:17,480
So we want to avoid the false negatives.
即我们希望避免假阴性

100
00:03:18,600 --> 00:03:19,820
In particular, if a patient
具体地说

101
00:03:20,350 --> 00:03:22,280
actually has cancer, but we
如果一个病人实际患有癌症

102
00:03:22,520 --> 00:03:23,700
fail to tell them that
但是我们并没有告诉他患有癌症

103
00:03:23,860 --> 00:03:25,710
they have cancer, then that can be really bad.
那这可能造成严重后果

104
00:03:25,880 --> 00:03:27,460
Because if we tell
因为

105
00:03:27,760 --> 00:03:28,870
a patient that they don't
如果我们告诉病人他们没有患癌症

106
00:03:29,240 --> 00:03:31,460
have cancer then they are
那么

107
00:03:31,530 --> 00:03:32,870
not going to go for treatment and
他们就不会接受治疗

108
00:03:32,980 --> 00:03:33,890
if it turns out that they
但是如果

109
00:03:34,050 --> 00:03:35,380
have cancer or we fail
他们患有癌症

110
00:03:35,520 --> 00:03:36,410
to tell them they have
我们又没有告诉他们

111
00:03:36,660 --> 00:03:39,060
cancer, well they may not get treated at all.
那么他们就根本不会接受治疗

112
00:03:39,430 --> 00:03:40,520
And so that would be
那么

113
00:03:40,640 --> 00:03:41,820
a really bad outcome because he
这么可能造成严重后果

114
00:03:42,080 --> 00:03:43,050
died because we told them
病人丧失生命

115
00:03:43,140 --> 00:03:44,560
they don't have cancer they failed
因为我们没有告诉他患有癌症

116
00:03:44,670 --> 00:03:46,780
to get treated, but it turns
他没有接受治疗

117
00:03:48,230 --> 00:03:48,790
out that they actually have cancer.
但事实上他又患有癌症

118
00:03:49,260 --> 00:03:50,260
When in doubt, we want to
这种i情况下

119
00:03:50,360 --> 00:03:52,430
predict that y equals one.
我们希望预测y=1

120
00:03:52,720 --> 00:03:54,260
So when in doubt, we want
我们希望

121
00:03:54,480 --> 00:03:55,510
to predict that they have
预测病人患有癌症

122
00:03:55,770 --> 00:03:56,820
cancer so that at least
这样

123
00:03:57,110 --> 00:03:58,150
they look further into it
他们会做进一步的检测

124
00:03:59,400 --> 00:04:00,720
and this can get treated,
然后接受治疗

125
00:04:01,180 --> 00:04:02,750
in case they do turn out to have cancer.
以避免他们真的患有癌症

126
00:04:04,870 --> 00:04:06,300
In this case, rather than setting
在这个例子中

127
00:04:06,750 --> 00:04:08,920
higher probability threshold, we might
我们不再设置高的临界值

128
00:04:09,100 --> 00:04:11,370
instead take this value
我们会设置另一个值

129
00:04:12,270 --> 00:04:13,310
and this then sets it to
将临界值

130
00:04:13,540 --> 00:04:14,710
a lower value, so maybe
设得较低

131
00:04:15,060 --> 00:04:17,390
0.3 like so.
比如0.3

132
00:04:18,760 --> 00:04:19,780
By doing so, we're saying
这样做

133
00:04:20,070 --> 00:04:21,380
that, you know what, if we
我们认为

134
00:04:21,480 --> 00:04:22,190
think there's more than a 30%
他们有大于30%的几率

135
00:04:22,220 --> 00:04:24,660
chance that they have caner, we better
患有癌症

136
00:04:24,890 --> 00:04:26,270
be more conservative and tell
我们以更加保守的方式

137
00:04:26,510 --> 00:04:27,330
them that they may have cancer,
告诉他们患有癌症

138
00:04:27,850 --> 00:04:29,610
so they can seek treatment if necessary.
因此他们能够接受治疗

139
00:04:31,110 --> 00:04:32,630
And in this case, what
在这种情况下

140
00:04:32,790 --> 00:04:34,200
we would have is going to
我们会有一个

141
00:04:35,120 --> 00:04:38,260
be a higher recall classifier,
较高召回率的模型

142
00:04:39,550 --> 00:04:41,440
because we're going to
因为

143
00:04:41,580 --> 00:04:43,330
be correctly flagging a higher
确实患有癌症的病人

144
00:04:43,580 --> 00:04:44,760
fraction of all of
有很大一部分

145
00:04:44,800 --> 00:04:45,920
the patients that actually do have
被我们正确标记出来了

146
00:04:46,130 --> 00:04:47,570
cancer, but we're going
但是

147
00:04:47,740 --> 00:04:51,040
to end up with lower precision,
我们会得到较低的查准率

148
00:04:51,670 --> 00:04:53,490
because the higher fraction of
因为

149
00:04:53,600 --> 00:04:54,700
the patients that we said have
我们预测患有癌症的病人比例越大

150
00:04:54,820 --> 00:04:57,530
cancer, the higher fraction of them will turn out not to have cancer after all.
那么就有较大比例的人其实没有患癌症

151
00:05:00,470 --> 00:05:01,320
And by the way, just as an
顺带一提

152
00:05:01,400 --> 00:05:02,640
aside, when I talk
当我在给

153
00:05:02,920 --> 00:05:04,900
about this to other
别的学生讲这个的时候

154
00:05:05,160 --> 00:05:07,760
students up until before, it's pretty amazing.
令人惊讶的是

155
00:05:08,390 --> 00:05:09,720
Some of my students say is
有的学生问

156
00:05:09,850 --> 00:05:11,960
how I can tell the story both ways.
怎么可以从两面来看这个问题

157
00:05:12,550 --> 00:05:14,220
Why we might want to have
为什么我总是

158
00:05:14,450 --> 00:05:15,490
higher precision or higher recall
只想要高查准率或高召回率

159
00:05:16,130 --> 00:05:18,570
and the story actually seems to work both ways.
但是这看起来可以使两边都提高

160
00:05:19,340 --> 00:05:20,550
But I hope the details of
但是我希望

161
00:05:20,670 --> 00:05:22,720
the algorithm is true and the
算法是正确的

162
00:05:22,990 --> 00:05:24,360
more general principle is, depending
更普遍的一个原则是

163
00:05:24,780 --> 00:05:26,150
on where you want, whether
这取决于你想要什么

164
00:05:26,330 --> 00:05:28,010
you want high precision, lower recall
你想要高查准率 低召回率

165
00:05:28,540 --> 00:05:30,340
or higher recall, lower precision, you
还是高召回率 低查准率

166
00:05:30,450 --> 00:05:32,100
can end up predicting y equals
你可以预测y=1

167
00:05:32,540 --> 00:05:35,040
one when h(x) is greater than some threshold.
当h(x)大于某个临界值

168
00:05:36,590 --> 00:05:39,240
And so, in general, for
因此 总的来说

169
00:05:39,880 --> 00:05:41,330
most classifiers, there is going
对于大多数的回归模型

170
00:05:41,540 --> 00:05:44,200
to be a trade off between precision and recall.
你得权衡查准率和召回率

171
00:05:45,360 --> 00:05:46,540
And as you vary the value
当你改变

172
00:05:47,050 --> 00:05:48,700
of this threshold, this value,
临界值的值时

173
00:05:49,030 --> 00:05:49,850
this special that I have
我在这儿画了一个

174
00:05:49,910 --> 00:05:51,470
joined here, you can actually
临界值

175
00:05:51,790 --> 00:05:53,850
plot us some curve that
你可以画出曲线

176
00:05:54,030 --> 00:05:56,060
trades off precision and
来权衡查准率

177
00:05:56,200 --> 00:05:58,010
recall, where a value
和召回率

178
00:05:58,410 --> 00:06:00,020
up here, this would correspond
这里的一个值

179
00:06:01,360 --> 00:06:02,620
to a very high value of
反应出一个较高的临界值

180
00:06:02,770 --> 00:06:04,490
the threshold, maybe threshold equals
这个临界值可能等于0.99

181
00:06:05,420 --> 00:06:06,790
over 0.99, so that say, predict
我们假设

182
00:06:07,090 --> 00:06:08,270
y equals 1 only where
只在有大于99%的确信度的情况下

183
00:06:08,480 --> 00:06:09,640
no more than 99 percent
才预测y=1

184
00:06:10,290 --> 00:06:11,700
confident, at least 99
至少

185
00:06:11,950 --> 00:06:13,460
percent probability this once, so
有99%的可能性

186
00:06:13,760 --> 00:06:15,390
that will be a precision relatively
因此这个点反应高查准率

187
00:06:15,960 --> 00:06:17,550
low recall, whereas the point
低召回率

188
00:06:17,820 --> 00:06:20,380
down here will correspond to
然而这里的一个点

189
00:06:20,490 --> 00:06:22,240
a value of the threshold that's
反映一个较低的临界值

190
00:06:22,450 --> 00:06:24,940
much lower, maybe 0.01.
比如说0.01

191
00:06:25,520 --> 00:06:26,810
When in doubt at all, put down y1.
毫无疑问 在这里预测y=1

192
00:06:27,120 --> 00:06:28,380
And if you do that, you
如果你这么做

193
00:06:28,520 --> 00:06:29,570
end up with a much
你最后会得到

194
00:06:29,760 --> 00:06:31,730
lower precision higher recall classifier.
很低的查准率 但是较高的召回率

195
00:06:33,350 --> 00:06:34,970
And as you vary the threshold, if
当你改变临界值

196
00:06:35,430 --> 00:06:36,550
you want, you can actually trace
如果你愿意

197
00:06:37,000 --> 00:06:38,280
all the curve from your classifier
你可以画出回归模型的所有曲线

198
00:06:38,930 --> 00:06:41,420
to see the range of different values you can get for precision recall.
来看看你能得到的查准率和召回率的范围

199
00:06:43,050 --> 00:06:43,810
And by the way, the position
顺带一提

200
00:06:44,230 --> 00:06:46,860
recall curve can look like many different shapes.
查准率-召回率曲线可以是各种不同的形状

201
00:06:47,260 --> 00:06:49,140
Sometimes it'll look this, sometimes
有时它看起来是这样

202
00:06:50,550 --> 00:06:51,260
it'll look like that.
有时是那样

203
00:06:52,330 --> 00:06:53,210
Now, there are many different possible
查准率-召回率曲线的形状

204
00:06:53,610 --> 00:06:54,820
shapes in the position of recall
有很多可能性

205
00:06:55,020 --> 00:06:56,850
curve, depending on the details of the classifier.
这取决于回归模型的具体算法

206
00:06:57,990 --> 00:06:59,620
So this raises another
因此这又产生了

207
00:06:59,900 --> 00:07:01,680
interesting question which is, is
另一个有趣的问题

208
00:07:01,870 --> 00:07:03,130
there a way to choose
那就是

209
00:07:03,510 --> 00:07:06,100
this threshold automatically?
有没有办法自动选取临界值

210
00:07:06,810 --> 00:07:07,890
Or, more generally, if we have
或者 更广泛地说

211
00:07:08,500 --> 00:07:10,060
a few different algorithms or a
如果我们有不同的算法

212
00:07:10,150 --> 00:07:12,290
few different ideas for algorithms, how
或者不同的想法

213
00:07:12,490 --> 00:07:15,340
do we compare different precision recall numbers?
我们如何比较不同的查准率和召回率呢？

214
00:07:16,400 --> 00:07:16,400
completely.
具体来说

215
00:07:17,170 --> 00:07:18,250
Suppose we have three different
假设我们有三个

216
00:07:18,590 --> 00:07:20,050
learning algorithms, or actually maybe
不同的学习算法

217
00:07:20,120 --> 00:07:22,060
these are three different learning algorithms, may
或者这三个不同的学习曲线

218
00:07:22,250 --> 00:07:25,010
be these are the same algorithm, but just with different values for the threshold.
是同样的算法 但是临界值不同

219
00:07:26,190 --> 00:07:28,560
How do we decide which of these algorithms is best?
我们怎样决定哪一个算法是最好的

220
00:07:29,770 --> 00:07:30,460
One of the things we talked
我们之前讲到的

221
00:07:30,680 --> 00:07:31,630
about earlier is the importance
其中一件事就是

222
00:07:32,520 --> 00:07:34,590
of a single real number evaluation metric.
评估度量值的重要性

223
00:07:35,880 --> 00:07:36,890
And that is the idea of
这个概念是

224
00:07:36,970 --> 00:07:38,050
having a number that just
通过一个具体的数字

225
00:07:38,370 --> 00:07:40,130
tells you how well is your classifier doing.
来反映你的回归模型到底如何

226
00:07:41,270 --> 00:07:42,260
But by switching to the precision
但是查准率和召回率的问题

227
00:07:42,690 --> 00:07:44,330
recall metric, we've actually lost that.
我们却不能这样做

228
00:07:44,580 --> 00:07:46,090
We now have two real numbers.
因为在这里我们有两个可以判断的数字

229
00:07:47,190 --> 00:07:48,600
And so we often end up
因此 我们经常会

230
00:07:48,770 --> 00:07:50,580
facing situations, like if
不得不面对这样的情况

231
00:07:50,750 --> 00:07:52,770
we are trying to compare algorithm 1
如果我们正在试图比较算法1

232
00:07:52,970 --> 00:07:54,350
to algorithm 2, we
和算法2

233
00:07:54,420 --> 00:07:55,420
end up asking ourselves, Is a
我们最后问自己

234
00:07:55,450 --> 00:07:56,550
position of point five and
到底是0.5的查准率与

235
00:07:56,700 --> 00:07:57,580
a recall of point four, well
0.4的召回率好

236
00:07:57,830 --> 00:07:58,830
is that better or worse than
还是说

237
00:07:58,960 --> 00:08:00,120
a position of point seven or
0.7的查准率与

238
00:08:00,300 --> 00:08:01,890
a recall point one?
0.1的召回率好

239
00:08:02,150 --> 00:08:03,020
If every time you try
或者每一次

240
00:08:03,350 --> 00:08:04,730
on a new algorithm you end up
你设计一个新算法

241
00:08:04,890 --> 00:08:06,070
having to sit around and think
你都要坐下来思考

242
00:08:06,530 --> 00:08:07,710
well, maybe point five
到底0.5 0.4好

243
00:08:07,960 --> 00:08:09,170
point four, is better than point
还是说

244
00:08:09,330 --> 00:08:11,120
seven point one, maybe not, I do not know.
0.7 0.1好 我不知道

245
00:08:11,590 --> 00:08:13,740
If you end up having to sit around and think and make these decisions.
如果你最后这样坐下来思考

246
00:08:14,440 --> 00:08:15,830
that really slows down your
这回降低

247
00:08:16,030 --> 00:08:18,710
decision making process, for what
你的决策速度

248
00:08:19,120 --> 00:08:21,560
changes are useful to incorporate into your algorithm.
思考到底哪些改变是有用的 应该被融入到你的算法

249
00:08:23,070 --> 00:08:24,810
Where as in contrast, if we
与此相反的是

250
00:08:24,880 --> 00:08:26,410
had a single real number evaluation metric,
如果我们有一个评估度量值

251
00:08:27,220 --> 00:08:28,240
like a number that just
一个数字

252
00:08:28,590 --> 00:08:31,140
tells us is either algorithm 1 or is algorithm 2 better.
能够告诉我们到底是算法1好还是算法2好

253
00:08:31,710 --> 00:08:33,110
That helps us much more
这能够帮助我们

254
00:08:33,370 --> 00:08:34,840
quickly decide which algorithm to
更快地决定

255
00:08:34,950 --> 00:08:36,290
go with and helps us
哪一个算法更好

256
00:08:36,450 --> 00:08:37,520
as well to much more quickly
同时也能够更快地帮助我们

257
00:08:38,110 --> 00:08:39,700
evaluate different changes that we
评估不同的改动

258
00:08:39,830 --> 00:08:41,370
may be contemplating for an algorithm.
哪些应该被融入进算法里面

259
00:08:42,050 --> 00:08:43,080
So, how can we get
那么 我们怎样才能

260
00:08:43,480 --> 00:08:45,910
a single real number evaluation metric.
得到这个评估度量值呢？

261
00:08:47,480 --> 00:08:48,590
One natural thing that you
你可能会去尝试的

262
00:08:48,660 --> 00:08:49,910
might try is to look
一件事情是

263
00:08:50,150 --> 00:08:52,110
at the average between precision and recall.
计算一下查准率和召回率的平均值

264
00:08:52,330 --> 00:08:53,310
So using p and r
用 P 和 R

265
00:08:53,570 --> 00:08:54,800
to denote position and recall, what
来表示查准率和召回率

266
00:08:55,010 --> 00:08:56,300
you could do is just compute the
你可以做的是

267
00:08:56,520 --> 00:08:57,280
average and look at what classifier
计算它们的平均值

268
00:08:57,770 --> 00:08:59,300
has the highest average value.
看一看哪个模型有最高的均值

269
00:09:00,840 --> 00:09:01,990
But this turns out not to
但是这可能

270
00:09:02,040 --> 00:09:04,990
be such a good solution because, similar
并不是一个很好的解决办法

271
00:09:05,350 --> 00:09:06,480
to the example we had earlier,
因为 像我们之前的例子一样

272
00:09:07,860 --> 00:09:08,970
it turns out that if we
如果我们的回归模型

273
00:09:09,070 --> 00:09:10,260
have a classifier that predicts
总是预测

274
00:09:11,310 --> 00:09:13,830
y1 all the time, then if
y=1

275
00:09:13,960 --> 00:09:15,540
you do that, you can get a very high recall.
这么做你可能得到非常高的召回率

276
00:09:16,290 --> 00:09:18,700
That's you end up with a very low value of Vision.
得到非常低的查准率

277
00:09:19,690 --> 00:09:21,230
Conversely,if you have a classifier
相反地 如果你的模型

278
00:09:21,640 --> 00:09:24,060
that predicts y=0 almost all
总是预测y=0

279
00:09:25,340 --> 00:09:26,400
the time, that is, if
就是说

280
00:09:26,490 --> 00:09:28,100
it predicts y one very sparingly.
如果很少预测y=1

281
00:09:28,910 --> 00:09:30,820
This corresponds to setting
对应的

282
00:09:31,130 --> 00:09:34,190
a very high threshold using the notation of previous line.
设置了一个高临界值

283
00:09:34,490 --> 00:09:35,610
And then you can actually
最后 你会得到非常高的

284
00:09:35,670 --> 00:09:37,650
end up with a very high precision with a very low recall.
查准率和非常低的召回率

285
00:09:39,280 --> 00:09:40,470
So the two extremes of either
这两个极端情况

286
00:09:40,790 --> 00:09:42,380
are a very high threshold or a
一个有非常高的临界值

287
00:09:42,540 --> 00:09:44,050
very low threshold, neither of
一个有非常低的临界值

288
00:09:44,170 --> 00:09:45,610
them would give it paticularary good classifier.
它们中的任何一个都不是一个好的模型

289
00:09:46,280 --> 00:09:47,560
And we recognize that is
我们可以通过

290
00:09:47,650 --> 00:09:48,650
by seeing if we end
非常低的查准率

291
00:09:48,710 --> 00:09:49,830
up with a very low
或者非常低的召回率

292
00:09:50,030 --> 00:09:52,710
precision or a very low recall.
判断这不是一个好模型

293
00:09:54,140 --> 00:09:56,120
And if you just take the average of people's ro2.
如果你只是使用(P+R)/2

294
00:09:57,140 --> 00:09:58,980
One does the example the average
算法3的这个值

295
00:09:59,760 --> 00:10:01,410
is actually highest for algorithm 3.
是最高的

296
00:10:01,810 --> 00:10:02,800
Even though you can get
即使你可以通过

297
00:10:02,910 --> 00:10:04,010
that sort of performance by predicting
使用总是预测y=1这样的方法

298
00:10:04,510 --> 00:10:05,850
y1 all the time.
来得到这样的值

299
00:10:06,220 --> 00:10:08,580
And that is just not a very good classifier, right?
但这并不是一个好的模型 对吧

300
00:10:08,670 --> 00:10:09,680
You predict y equals one all
你总是预测y=1

301
00:10:09,780 --> 00:10:11,010
the time is just not a
这不是一个有用的模型

302
00:10:11,210 --> 00:10:13,950
useful classifier if all it does is prints out y equals one.
因为它只输出y=1

303
00:10:15,000 --> 00:10:16,580
And so algorithm one or algorithm
那么算法1和

304
00:10:17,040 --> 00:10:18,080
two would be more
算法2

305
00:10:18,280 --> 00:10:19,620
useful than algorithm three,
比算法3更有用

306
00:10:20,500 --> 00:10:22,330
but in this example algorithm three
但是在这个例子中

307
00:10:23,080 --> 00:10:24,840
has a higher average value of
查准率和召回率的平均值

308
00:10:24,920 --> 00:10:27,460
precision recall than algorithm one and two.
算法3是最高的

309
00:10:28,770 --> 00:10:29,780
So we usually think of
因此我们通常认为

310
00:10:29,900 --> 00:10:31,410
this average of precision recall
查准率和召回率的平均值

311
00:10:32,280 --> 00:10:35,000
as not a particularly good way to evaluate our learning algorithm.
不是评估算法的一个好的方法

312
00:10:38,200 --> 00:10:39,820
In contrast, there is a
相反地

313
00:10:40,030 --> 00:10:41,770
different way of combining precision recall.
有一种结合查准率和召回率的不同方式

314
00:10:42,370 --> 00:10:44,940
It is called the f score and it uses that formula.
叫做F值 公式是这样

315
00:10:46,420 --> 00:10:48,740
So, in this example, here are the f scores.
在这个例子中 F值是这样的

316
00:10:49,530 --> 00:10:50,440
And so we would tell
我们可以通过

317
00:10:50,900 --> 00:10:52,140
from these f scores and
F值来判断

318
00:10:52,270 --> 00:10:53,660
we'll say algorithm 1 has
算法1

319
00:10:53,820 --> 00:10:55,290
the highest f score.
有最高的F值

320
00:10:55,620 --> 00:10:56,910
Algorithm 2 has the second highest and
算法2第二

321
00:10:57,180 --> 00:10:58,560
algorithm 3 has the lowest and so
算法3是最低的

322
00:10:59,040 --> 00:10:59,920
you know, if we go by
因此 通过F值

323
00:11:00,190 --> 00:11:02,700
the f score, we would pick probably algorithm of 1 over the others.
我们会在这几个算法中选择算法1

324
00:11:04,950 --> 00:11:06,120
The f score, which is also
F值

325
00:11:06,310 --> 00:11:07,470
called the f1 score,
也叫做F1值

326
00:11:07,670 --> 00:11:09,110
is usually written f1 score
一般写作F1值

327
00:11:09,340 --> 00:11:11,620
that I have here, but often people will just say f score.
但是人们一般只说F值

328
00:11:12,800 --> 00:11:14,750
It determines use is a
它的定义

329
00:11:15,080 --> 00:11:16,130
little bit like taking the
会考虑一部分

330
00:11:16,290 --> 00:11:17,660
average of precision of recall,
查准率和召回率的平均值

331
00:11:18,080 --> 00:11:19,220
but it gives the lower
但是它

332
00:11:19,580 --> 00:11:20,860
value of precision and recall
会给查准率和召回率中较低的值

333
00:11:21,060 --> 00:11:23,460
- whichever it is - it gives it a higher weight.
更高的权重

334
00:11:23,950 --> 00:11:25,220
And so, you see in
因此

335
00:11:25,360 --> 00:11:27,090
the numerator here that the
你可以看到F值的分子

336
00:11:27,250 --> 00:11:29,910
f score takes a product or position of equal.
是查准率和召回率的乘积

337
00:11:30,460 --> 00:11:31,900
And so, if either position is
因此如果查准率等于0

338
00:11:32,050 --> 00:11:33,070
0 or recall is equal to
或者召回率等于0

339
00:11:33,180 --> 00:11:34,310
0, the f score will
F值也会

340
00:11:34,600 --> 00:11:35,590
be equal to o. So
等于0

341
00:11:35,690 --> 00:11:38,290
in that sense, it kind of combines position and recall.
因此它结合了查准率和召回率

342
00:11:38,850 --> 00:11:40,160
but for the f score to
对于一个较大的F值

343
00:11:40,300 --> 00:11:41,600
be large, both position
查准率

344
00:11:42,100 --> 00:11:43,480
and recall have to be pretty large.
和召回率都必须较大

345
00:11:44,490 --> 00:11:45,770
I should say that there are
我必须说

346
00:11:45,950 --> 00:11:47,950
many different possible formulas for
有较多的公式

347
00:11:48,060 --> 00:11:49,450
combining position and recall.
可以结合查准率和召回率

348
00:11:50,040 --> 00:11:51,400
This f score formula is
F值公式

349
00:11:51,730 --> 00:11:53,470
really, maybe just one out
只是

350
00:11:53,640 --> 00:11:54,800
of a much larger number of
其中一个

351
00:11:54,880 --> 00:11:57,200
possibilities, but historically or
但是出于历史原因

352
00:11:57,270 --> 00:11:58,310
traditionally this is what
和习惯问题

353
00:11:58,460 --> 00:12:00,110
people in machine learning use.
人们在机器学习中使用F值

354
00:12:01,550 --> 00:12:02,840
And the term f score, you
这个术语F值

355
00:12:02,960 --> 00:12:04,160
know, it doesn't really mean
没有什么特别的意义

356
00:12:04,390 --> 00:12:05,460
anything, so don't worry about
所以不要担心

357
00:12:05,690 --> 00:12:07,680
why it's called f score or f1 score.
它到底为什么叫做F值或者F1值

358
00:12:08,510 --> 00:12:10,900
But this usually gives
但是它给了你

359
00:12:11,370 --> 00:12:12,230
you the effect that you want
你需要的有效方法

360
00:12:12,600 --> 00:12:14,070
because if either position is
因为无论是查准率等于0

361
00:12:14,370 --> 00:12:15,410
0 or recall is 0, this
还是召回率等于0

362
00:12:15,470 --> 00:12:17,470
gives you a very low f score.
它都会得到一个很低的F值

363
00:12:17,610 --> 00:12:18,730
And so, to have a
因此

364
00:12:18,770 --> 00:12:20,030
high f score you can't
如果要得到一个很高的F值

365
00:12:20,280 --> 00:12:21,790
need a preserve quality 1
你的算法的查准率和召回率都要接近于1

366
00:12:22,230 --> 00:12:24,630
and completely if p
具体地说

367
00:12:25,010 --> 00:12:26,300
equals zero or i
如果P=0或者

368
00:12:26,450 --> 00:12:28,440
equals zero then this
R=0

369
00:12:28,650 --> 00:12:31,540
gives you the f score equals zero.
你的F值也会等于0

370
00:12:33,430 --> 00:12:34,630
Where as a perfect f
对于一个最完美的F值

371
00:12:34,820 --> 00:12:36,120
score, so if position equals
如果查准率等于1

372
00:12:36,550 --> 00:12:38,520
one and  [xx] equals
同时召回率

373
00:12:38,940 --> 00:12:40,380
one that would give
也等于1

374
00:12:40,580 --> 00:12:43,450
you an f score that's
那你得到的F值

375
00:12:43,680 --> 00:12:44,780
equal to one times one
等于1乘以1

376
00:12:45,100 --> 00:12:46,650
over two times two.
除以2再乘以2

377
00:12:46,800 --> 00:12:47,590
So the f score will be
那么F值

378
00:12:47,900 --> 00:12:48,610
equal to 1 if you
就等于1

379
00:12:48,680 --> 00:12:50,300
have perfect precision and perfect recall.
如果你能得到最完美的查准率和召回率

380
00:12:51,280 --> 00:12:53,230
And intermediate values between 0
在0和1中间的值

381
00:12:53,520 --> 00:12:54,980
and 1, this usually gives a
这经常是

382
00:12:55,210 --> 00:12:57,240
reasonable rank ordering of different classifiers.
回归模型最经常出现的分数

383
00:13:00,000 --> 00:13:01,070
So this video we talked
在这次的视频中

384
00:13:01,370 --> 00:13:03,240
about the notion of trading
我们讲到了如何

385
00:13:03,760 --> 00:13:05,290
off between position and recall
权衡查准率和召回率

386
00:13:06,140 --> 00:13:07,410
and how we can vary the
以及我们如何变动

387
00:13:07,540 --> 00:13:09,110
threshold that we use to
临界值

388
00:13:09,250 --> 00:13:10,340
decide whether to predict y
来决定我们希望预测y=1

389
00:13:10,540 --> 00:13:11,980
equals one or y equals zero.
还是y=0

390
00:13:12,180 --> 00:13:13,990
This threshold that says do
比如我们

391
00:13:14,070 --> 00:13:15,180
we need to be at least
需要一个

392
00:13:15,500 --> 00:13:16,970
seventy percent confident or ninety
70%还是90%置信度的临界值

393
00:13:17,200 --> 00:13:19,340
percent confident or whatever before
或者别的

394
00:13:19,650 --> 00:13:21,150
we predict y equals one and
来预测y=1

395
00:13:21,260 --> 00:13:22,610
by varying the threshold you
通过变动临界值

396
00:13:22,950 --> 00:13:23,930
can control a trade off
你可以控制权衡

397
00:13:24,300 --> 00:13:25,960
between precision and recall.
查准率和召回率

398
00:13:26,430 --> 00:13:27,150
Ross talked about the f score
之后我们讲到了F值

399
00:13:27,420 --> 00:13:28,850
which takes precision and recall
它权衡查准率和召回率

400
00:13:29,640 --> 00:13:30,730
and gives you a single
给了你一个

401
00:13:31,270 --> 00:13:32,480
real number evaluation metric.
评估度量值

402
00:13:33,320 --> 00:13:34,460
And of course, if your goal is
当然 如果你的目标是

403
00:13:34,740 --> 00:13:36,590
to automatically set that
自动选择临界值

404
00:13:36,880 --> 00:13:38,390
threshold, to decide which
来决定

405
00:13:38,590 --> 00:13:39,320
one of y equals 1 or
你希望预测y=1

406
00:13:39,520 --> 00:13:41,180
y equals 0, one pretty
还是y=0

407
00:13:41,420 --> 00:13:42,410
reasonable way to do that
那么一个比较理想的办法是

408
00:13:42,740 --> 00:13:44,140
would also be to try
试一试不同的

409
00:13:44,640 --> 00:13:46,350
a range of different values of thresholds.
临界值

410
00:13:46,930 --> 00:13:47,740
So, try a range of values
试一下

411
00:13:48,190 --> 00:13:50,430
of thresholds and evaluate these
不同的临界值

412
00:13:50,620 --> 00:13:51,610
different threshold on say your
然后评估这些不同的临界值

413
00:13:51,790 --> 00:13:53,650
cross validation set, and then
在交叉检验集上进行测试

414
00:13:53,840 --> 00:13:55,760
to pick whatever value of threshold
然后选择哪一个临界值

415
00:13:56,580 --> 00:13:57,910
gives you the highest f score
能够在交叉检验集上

416
00:13:58,060 --> 00:13:59,760
on your cross validation setting.
得到最高的F值

417
00:14:00,130 --> 00:14:01,220
That would be a pretty reasonable way
这是自动选择临界值的较好办法

418
00:14:01,720 --> 00:14:04,620
to automatically chose the threshold for your classifier as well.
较好办法