srt/10 - 2 - Evaluating a Hypothesis (8 min).srt

﻿1
00:00:00,146 --> 00:00:02,515
In this video, I would like to talk about how to
在本节视频中我想介绍一下
(字幕整理：中国海洋大学 黄海广,haiguang2000@qq.com )

2
00:00:02,523 --> 00:00:06,662
evaluate a hypothesis that has been learned by your algorithm.
怎样评价通过你的学习算法得到的一个假设

3
00:00:06,685 --> 00:00:09,200
In later videos, we will build on this
基于这节课的讨论 在之后的视频中

4
00:00:09,231 --> 00:00:11,846
to talk about how to prevent in the problems of
我们还将讨论如何防止

5
00:00:11,869 --> 00:00:14,908
overfitting and underfitting as well.
过拟合和欠拟合的问题

6
00:00:15,615 --> 00:00:19,023
When we fit the parameters of our learning algorithm
当我们确定学习算法的参数时

7
00:00:19,038 --> 00:00:23,154
we think about choosing the parameters to minimize the training error.
我们考虑的是选择参数来使训练误差最小化

8
00:00:23,169 --> 00:00:26,077
One might think that getting a really low value of
有人认为 得到一个很小的训练误差

9
00:00:26,100 --> 00:00:28,108
training error might be a good thing,
一定是一件好事

10
00:00:28,108 --> 00:00:29,562
but we have already seen that
但我们已经知道

11
00:00:29,562 --> 00:00:32,400
just because a hypothesis has low training error,
仅仅是因为这个假设具有很小的训练误差

12
00:00:32,400 --> 00:00:35,254
that doesn't mean it is necessarily a good hypothesis.
并不能说明它一定是一个好的假设

13
00:00:35,254 --> 00:00:40,223
And we've already seen the example of how a hypothesis can overfit.
我们也学习了过拟合假设的例子

14
00:00:40,415 --> 00:00:45,785
And therefore fail to generalize the new examples not in the training set.
这时推广到新的训练样本上就不灵了

15
00:00:45,962 --> 00:00:50,000
So how do you tell if the hypothesis might be overfitting.
那么 你怎样判断一个假设是否是过拟合的呢

16
00:00:50,015 --> 00:00:54,346
In this simple example  we could plot the hypothesis h of x
对于这个简单的例子 我们可以

17
00:00:54,365 --> 00:00:56,338
and just see what was going on.
画出假设函数h(x) 然后观察

18
00:00:56,346 --> 00:01:00,538
But in general for problems with more features than just one feature,
但对于更一般的情况 特征不止一个的例子

19
00:01:00,554 --> 00:01:03,531
for problems with a large number of features like these
就像这样有很多特征的问题

20
00:01:03,546 --> 00:01:06,692
it becomes hard or may be impossible
想要通过画出假设函数来观察

21
00:01:06,708 --> 00:01:09,515
to plot what the hypothesis looks like
就变得很难甚至不可能了

22
00:01:09,531 --> 00:01:13,046
and so we need some other way to evaluate our hypothesis.
因此 我们需要另一种评价假设函数的方法

23
00:01:13,062 --> 00:01:17,315
The standard way to evaluate a learned hypothesis is as follows.
如下给出了一种评价假设的标准方法

24
00:01:17,331 --> 00:01:19,308
Suppose we have a data set like this.
假如我们有这样一组数据组

25
00:01:19,323 --> 00:01:21,977
Here I have just shown  10 training examples,
在这里我只展示了10组训练样本

26
00:01:21,992 --> 00:01:23,969
but of course usually we may have
当然我们通常可以有

27
00:01:23,985 --> 00:01:27,254
dozens or hundreds or maybe thousands of training examples.
成百上千组训练样本

28
00:01:27,269 --> 00:01:30,246
In order to make sure we can evaluate our hypothesis,
为了确保我们可以评价我们的假设函数

29
00:01:30,262 --> 00:01:32,808
what we are going to do is split
我要做的是

30
00:01:32,823 --> 00:01:35,554
the data we have into two portions.
将这些数据分成两部分

31
00:01:35,569 --> 00:01:40,723
The first portion is going  to be our usual training set
第一部分将成为我们的训练集

32
00:01:42,638 --> 00:01:47,446
and the second portion  is going to be our test set,
第二部分将成为我们的测试集

33
00:01:47,462 --> 00:01:50,398
and a pretty typical split of this
将所有数据分成训练集和测试集

34
00:01:50,413 --> 00:01:53,482
all the data we have into a training set and test set
其中一种典型的分割方法是

35
00:01:53,498 --> 00:01:57,936
might be around say a 70%, 30% split.
按照7:3的比例

36
00:01:57,952 --> 00:02:00,052
Worth more today to grade the training set
将70%的数据作为训练集

37
00:02:00,067 --> 00:02:02,367
and relatively less to the test set.
30%的数据作为测试集

38
00:02:02,382 --> 00:02:05,782
And so now, if we have some data set,
因此 现在如果我们有了一些数据

39
00:02:05,790 --> 00:02:08,459
we run a sine of say 70%
我们只用其中的70%

40
00:02:08,475 --> 00:02:11,529
of the data to be our training set where here "m"
作为我们的训练集

41
00:02:11,544 --> 00:02:14,336
is as usual our number of training examples
这里的m依然表示训练样本的总数

42
00:02:14,352 --> 00:02:16,913
and the remainder of our data
而剩下的那部分数据

43
00:02:16,929 --> 00:02:19,310
might then be assigned to become our test set.
将被用作测试集

44
00:02:19,325 --> 00:02:23,410
And here, I'm going to use the notation m subscript test
在这里 我使用m下标test

45
00:02:23,425 --> 00:02:27,187
to denote the number of test examples.
来表示测试样本的总数

46
00:02:27,202 --> 00:02:32,225
And so in general, this subscript test is going to denote
因此 这里的下标test将表示

47
00:02:32,241 --> 00:02:34,987
examples that come from a test set so that
这些样本是来自测试集

48
00:02:35,002 --> 00:02:40,810
x1 subscript test,  y1 subscript test is my first
因此x(1)test y(1)test将成为我的

49
00:02:40,825 --> 00:02:43,648
test example which  I guess in this example
第一组测试样本

50
00:02:43,664 --> 00:02:45,656
might be this example over here.
我想应该是这里的这一组样本

51
00:02:45,671 --> 00:02:47,495
Finally, one last detail
最后再提醒一点

52
00:02:47,510 --> 00:02:50,795
whereas here I've drawn this as though the first 70%
在这里我是选择了前70%的数据作为训练集

53
00:02:50,810 --> 00:02:54,479
goes to the training set and the last 30% to the test set.
后30%的数据作为测试集

54
00:02:54,495 --> 00:02:57,518
If there is any sort of ordinary to the data.
但如果这组数据有某种规律或顺序的话

55
00:02:57,533 --> 00:03:01,048
That should be better to send a random 70%
那么最好是

56
00:03:01,048 --> 00:03:02,948
of your data to the training set and a
随机选择70%作为训练集

57
00:03:02,964 --> 00:03:05,556
random 30% of your data to the test set.
剩下的30%作为测试集

58
00:03:05,571 --> 00:03:08,579
So if your data were already randomly sorted,
当然如果你的数据已经随机分布了

59
00:03:08,595 --> 00:03:12,110
you could just take the first 70% and last 30%
那你可以选择前70%和后30%

60
00:03:12,125 --> 00:03:14,718
that if your data were not randomly ordered,
但如果你的数据不是随机排列的

61
00:03:14,733 --> 00:03:16,756
it would be better to randomly shuffle or
最好还是打乱顺序

62
00:03:16,771 --> 00:03:19,718
to randomly reorder the examples in your training set.
或者使用一种随机的顺序来构建你的数据

63
00:03:19,733 --> 00:03:23,310
Before you know sending the first 70% in the training set
然后再取出前70%作为训练集

64
00:03:23,325 --> 00:03:26,669
and the last 30% of the test set.
后30%作为测试集

65
00:03:27,054 --> 00:03:30,169
Here then is a fairly typical procedure
接下来 这里展示了一种典型的方法

66
00:03:30,185 --> 00:03:32,008
for how you would train and test
你可以按照这些步骤训练和测试你的学习算法

67
00:03:32,023 --> 00:03:34,492
the learning algorithm and the learning regression.
比如线性回归算法

68
00:03:34,508 --> 00:03:38,115
First, you learn the parameters theta from the training set
首先 你需要对训练集进行学习得到参数theta

69
00:03:38,131 --> 00:03:41,798
so you minimize the usual  training error objective j of theta,
具体来讲就是最小化训练误差J(θ)

70
00:03:41,813 --> 00:03:44,713
where j of theta  here was defined using that
这里的J(θ)是使用那70%数据

71
00:03:44,729 --> 00:03:47,059
70% of all the data you have.
来定义得到的

72
00:03:47,075 --> 00:03:49,759
There is only the training data.
也就是仅仅是训练数据

73
00:03:49,882 --> 00:03:52,167
And then you would compute the test error.
接下来 你要计算出测试误差

74
00:03:52,182 --> 00:03:56,298
And I am going to denote the test error as j subscript test.
我将用J下标test来表示测试误差

75
00:03:56,313 --> 00:03:59,229
And so what you do is take your parameter theta
那么你要做的就是

76
00:03:59,259 --> 00:04:02,190
that you have learned from  the training set, and plug it in here
取出你之前从训练集中学习得到的参数theta放在这里

77
00:04:02,205 --> 00:04:04,875
and compute your test set error.
来计算你的测试误差

78
00:04:04,890 --> 00:04:08,529
Which I am going to write as follows.
可以写成如下的形式

79
00:04:08,698 --> 00:04:11,275
So this is basically
这实际上是测试集

80
00:04:11,290 --> 00:04:15,244
the average squared error
平方误差的

81
00:04:15,269 --> 00:04:18,154
as measured on your test set.
平均值

82
00:04:18,169 --> 00:04:19,915
It's pretty much what you'd expect.
这就是你期望得到的值

83
00:04:19,931 --> 00:04:23,415
So if we run every test example through your hypothesis
因此 我们使用包含参数theta的假设函数对每一个测试样本进行测试

84
00:04:23,431 --> 00:04:28,008
with parameter theta and  just measure the squared error
然后通过假设函数和测试样本

85
00:04:28,023 --> 00:04:33,338
that your hypothesis has on your m subscript test, test examples.
计算出mtest个平方误差

86
00:04:33,354 --> 00:04:37,054
And of course, this is the definition of the
当然 这是当我们使用线性回归

87
00:04:37,069 --> 00:04:40,815
test set error if  we are using linear regression
和平方误差标准时

88
00:04:40,831 --> 00:04:44,362
and using the squared error metric.
测试误差的定义

89
00:04:44,377 --> 00:04:47,477
How about if we were doing a classification problem
那么如果是考虑分类问题

90
00:04:47,492 --> 00:04:50,654
and say using logistic regression instead.
比如说使用逻辑回归的时候呢

91
00:04:50,669 --> 00:04:53,877
In that case, the procedure for training
训练和测试逻辑回归的步骤

92
00:04:53,892 --> 00:04:57,085
and testing say logistic regression is pretty similar
与之前所说的非常类似

93
00:04:57,100 --> 00:04:59,985
first we will do the parameters from the training data,
首先我们要从训练数据 也就是所有数据的70%中

94
00:05:00,000 --> 00:05:02,331
that first 70% of the data.
学习得到参数theta

95
00:05:02,346 --> 00:05:05,115
And it will compute the test error as follows.
然后用如下的方式计算测试误差

96
00:05:05,131 --> 00:05:07,015
It's the same objective function
目标函数和我们平常

97
00:05:07,031 --> 00:05:09,592
as we always use but we just logistic regression,
做逻辑回归的一样

98
00:05:09,608 --> 00:05:11,569
except that now is define using
唯一的区别是

99
00:05:11,585 --> 00:05:15,115
our m subscript test, test examples.
现在我们使用的是mtest个测试样本

100
00:05:15,131 --> 00:05:17,600
While this definition of the test set error
这里的测试误差Jtest(θ)

101
00:05:17,631 --> 00:05:20,238
j subscript test is perfectly reasonable.
其实不难理解

102
00:05:20,254 --> 00:05:22,231
Sometimes there is an alternative
有时这是另一种形式的测试集

103
00:05:22,246 --> 00:05:25,469
test sets metric that might be easier to interpret,
更易于理解

104
00:05:25,485 --> 00:05:27,877
and that's the misclassification error.
这里的误差其实叫误分类率

105
00:05:27,892 --> 00:05:30,792
It's also called the zero one misclassification error,
也被称为0/1错分率

106
00:05:30,808 --> 00:05:32,692
with zero one denoting that
0/1表示了

107
00:05:32,708 --> 00:05:36,146
you either get an example right or you get an example wrong.
你预测到的正确或错误样本的情况

108
00:05:36,162 --> 00:05:37,910
Here's what I mean.
我想说的是这个意思

109
00:05:37,925 --> 00:05:41,795
Let me define the error of a prediction.
可以这样定义一次预测的误差

110
00:05:41,825 --> 00:05:44,202
That is h of x.
关于假设h(x)

111
00:05:44,218 --> 00:05:47,518
And given the label y as
和标签y的误差

112
00:05:47,533 --> 00:05:51,848
equal to one if my hypothesis
那么这个误差等于1

113
00:05:51,864 --> 00:05:54,633
outputs the value greater than equal to five
当你的假设函数h(x)的值大于等于0.5

114
00:05:54,641 --> 00:05:57,510
and Y is equal to zero
并且y的值等于0

115
00:05:57,525 --> 00:06:03,718
or if my hypothesis outputs  a value of less than 0.5
或者当h(x)小于0.5

116
00:06:03,733 --> 00:06:05,402
and y is equal to one,
并且y的值等于1

117
00:06:05,418 --> 00:06:08,118
right, so both of these cases basic respond
因此 这两种情况都表明

118
00:06:08,133 --> 00:06:11,833
to if your hypothesis mislabeled the example
你的假设对样本进行了误判

119
00:06:11,833 --> 00:06:14,518
assuming your threshold at an 0.5.
这里定义阈值为0.5

120
00:06:14,533 --> 00:06:18,171
So either thought it was more likely to be 1, but it was actually 0,
那么也就是说 假设结果更趋向于1 但实际是0

121
00:06:18,187 --> 00:06:20,733
or your hypothesis stored was more likely
或者说假设更趋向于0

122
00:06:20,748 --> 00:06:23,556
to be 0, but the label was actually 1.
但实际的标签却是1

123
00:06:23,571 --> 00:06:28,471
And otherwise, we define this error function to be zero.
否则 我们将误差值定义为0

124
00:06:28,487 --> 00:06:34,841
If your hypothesis basically classified the example y correctly.
此时你的假设值能够正确对样本y进行分类

125
00:06:34,864 --> 00:06:38,841
We could then define the test error,
然后 我们就能应用错分率误差

126
00:06:38,856 --> 00:06:42,371
using the misclassification error metric to be
来定义测试误差

127
00:06:42,387 --> 00:06:46,779
one of the m tests of sum from i equals one
也就是1/mtest 乘以

128
00:06:46,795 --> 00:06:49,941
to m subscript test of the
h(i)(xtest)和y(i)的错分率误差

129
00:06:49,956 --> 00:06:55,164
error of h of x(i) test
从i=1到mtest

130
00:06:55,179 --> 00:06:57,971
comma y(i).
的求和

131
00:06:57,987 --> 00:07:02,010
And so that's just my way of writing out that this is exactly
这样我就写出了我的定义方式

132
00:07:02,025 --> 00:07:05,587
the fraction of the examples in my test set
这实际上就是我的假设函数误标记的

133
00:07:05,602 --> 00:07:08,864
that my hypothesis has mislabeled.
那部分测试集中的样本

134
00:07:08,871 --> 00:07:10,602
And so that's the definition of
这也就是使用

135
00:07:10,618 --> 00:07:13,687
the test set error using the misclassification error
0/1错分率或误分类率

136
00:07:13,718 --> 00:07:16,948
of the 0 1  misclassification metric.
的准则来定义的测试误差

137
00:07:16,971 --> 00:07:19,995
So that's the standard technique for evaluating
以上我们介绍了一套标准技术

138
00:07:20,010 --> 00:07:22,833
how good a learned hypothesis is.
来评价一个已经学习过的假设

139
00:07:22,848 --> 00:07:25,579
In the next video,  we will adapt these ideas
在下一段视频中我们要应用这些方法

140
00:07:25,595 --> 00:07:28,525
to helping us do things like choose what features
来帮助我们进行诸如特征选择一类的问题

141
00:07:28,541 --> 00:07:31,641
like the degree polynomial to use with the learning algorithm
比如多项式次数的选择

142
00:07:31,656 --> 00:07:34,964
or choose the regularization parameter for learning algorithm.
或者正则化参数的选择