srt/12 - 4 - Kernels I (16 min).srt

﻿1
00:00:00,080 --> 00:00:01,140
In this video, I'd like
在这次的课程中（字幕整理：中国海洋大学，黄海广，haiguang2000@qq.com）

2
00:00:01,370 --> 00:00:03,120
to start adapting support vector
我将对支持向量机算法做一些改变

3
00:00:03,390 --> 00:00:06,280
machines in order to develop complex nonlinear classifiers.
以构造复杂的非线性分类器

4
00:00:07,630 --> 00:00:10,410
The main technique for doing that is something called kernels.
我们用"Kernels(核函数)"来达到此目的

5
00:00:11,730 --> 00:00:13,690
Let's see what this kernels are and how to use them.
我们来看看核函数是什么 以及如何使用

6
00:00:15,860 --> 00:00:16,930
If you have a training set that
如果

7
00:00:17,030 --> 00:00:18,270
looks like this, and you
你有一个像这样的训练集

8
00:00:18,400 --> 00:00:20,000
want to find a
然后你希望拟合一个

9
00:00:20,150 --> 00:00:21,670
nonlinear decision boundary to distinguish
非线性的判别边界

10
00:00:22,270 --> 00:00:23,950
the positive and negative examples, maybe
来区别正负实例

11
00:00:24,350 --> 00:00:25,900
a decision boundary that looks like that.
可能是这样的一个判别边界

12
00:00:27,040 --> 00:00:27,950
One way to do so is
一种办法

13
00:00:28,230 --> 00:00:29,760
to come up with a set
是构造

14
00:00:29,970 --> 00:00:32,180
of complex polynomial features, right? So, set of
多项式特征变量 是吧

15
00:00:32,340 --> 00:00:33,420
features that looks like this,
也就是像这样的特征变量集合

16
00:00:34,140 --> 00:00:34,990
so that you end up
这样

17
00:00:35,140 --> 00:00:37,120
with a hypothesis X that
你就能得到一个假设X

18
00:00:38,050 --> 00:00:40,380
predicts 1 if you know
如果θ0加上θ1*X1

19
00:00:40,570 --> 00:00:41,790
that theta 0 and plus theta 1 X1
加上其他的多项式特征变量

20
00:00:41,860 --> 00:00:45,000
plus dot dot dot all those polynomial features is
之和大于0

21
00:00:45,180 --> 00:00:47,410
greater than 0, and
那么就预测为1

22
00:00:47,540 --> 00:00:49,170
predict 0, otherwise.
反之 则预测为0

23
00:00:51,070 --> 00:00:52,760
And another way
这种方法

24
00:00:52,980 --> 00:00:54,330
of writing this, to introduce
的另一种写法

25
00:00:54,840 --> 00:00:56,240
a level of new notation that
这里介绍一个新的概念

26
00:00:56,500 --> 00:00:57,860
I'll use later, is that
之后将会用到

27
00:00:58,200 --> 00:00:59,370
we can think of a hypothesis
我们可以把假设函数

28
00:00:59,730 --> 00:01:01,610
as computing a decision boundary
看成是用这个

29
00:01:02,120 --> 00:01:03,380
using this. So, theta
来计算判别边界 那么

30
00:01:03,820 --> 00:01:04,870
0 plus theta 1 f1 plus
θ0+θ1*f1+

31
00:01:05,070 --> 00:01:06,130
theta 2, f2 plus theta
θ2*f2+θ3*f3

32
00:01:06,610 --> 00:01:08,730
3, f3 plus and so on.
加上其他项

33
00:01:09,590 --> 00:01:12,790
Where I'm going to
在这里

34
00:01:13,050 --> 00:01:14,070
use this new denotation
我将用这几个新的符号

35
00:01:14,730 --> 00:01:15,930
f1, f2, f3 and so
f1 f2 f3等等

36
00:01:16,270 --> 00:01:17,610
on to denote these new sort of features
来表示一系列我将要计算的

37
00:01:19,350 --> 00:01:20,630
that I'm computing, so f1 is
新的特征变量

38
00:01:21,370 --> 00:01:24,250
just X1, f2 is equal
因此 f1就等于X1

39
00:01:24,600 --> 00:01:27,060
to X2, f3 is
f2就等于X2

40
00:01:27,140 --> 00:01:28,560
equal to this one
f3等于这个

41
00:01:28,770 --> 00:01:29,790
here. So, X1X2. So,
X1X2

42
00:01:29,900 --> 00:01:32,200
f4 is equal to
f4等于X1的平方

43
00:01:33,840 --> 00:01:35,590
X1 squared where f5 is
f5等于X2的平方

44
00:01:35,680 --> 00:01:37,740
to be x2 squared and so
等等

45
00:01:38,520 --> 00:01:39,780
on and we seen previously that
我们之前看到

46
00:01:40,350 --> 00:01:41,190
coming up with these high
通过加入这些

47
00:01:41,370 --> 00:01:42,870
order polynomials is one
高阶项

48
00:01:43,110 --> 00:01:44,390
way to come up with lots more features,
我们可以得到更多特征变量

49
00:01:45,470 --> 00:01:47,070
the question is, is
问题是

50
00:01:47,250 --> 00:01:48,600
there a different choice of
能不能选择别的特征变量

51
00:01:48,670 --> 00:01:51,350
features or is there better sort of features than this high order
或者有没有比这些高阶项更好的特征变量

52
00:01:51,690 --> 00:01:53,510
polynomials because you know
因为

53
00:01:53,830 --> 00:01:54,820
it's not clear that this high
我们并不知道

54
00:01:55,120 --> 00:01:56,350
order polynomial is what we want,
这些高阶项是不是我们真正需要的

55
00:01:56,860 --> 00:01:57,920
and what we talked about
我们之前谈到

56
00:01:58,170 --> 00:01:59,560
computer vision talk about when
计算机视觉的时候

57
00:01:59,780 --> 00:02:01,940
the input is an image with lots of pixels.
提到过这时的输入是一个有很多像素的图像

58
00:02:02,540 --> 00:02:04,670
We also saw how using high order polynomials
我们看到如果用高阶项作为特征变量

59
00:02:05,140 --> 00:02:06,360
becomes very computationally
运算量将是非常大的

60
00:02:07,320 --> 00:02:08,270
expensive because there are
因为

61
00:02:08,280 --> 00:02:09,830
a lot of these higher order polynomial terms.
有太多的高阶项需要被计算

62
00:02:11,240 --> 00:02:12,280
So, is there a different or
因此 我们是否有不同的选择

63
00:02:12,430 --> 00:02:13,160
a better choice of the features
或者是更好的选择来构造特征变量

64
00:02:14,110 --> 00:02:15,100
that we can use to plug
以用来

65
00:02:15,410 --> 00:02:16,770
into this sort of
嵌入到

66
00:02:17,500 --> 00:02:19,200
hypothesis form.
假设函数中

67
00:02:19,420 --> 00:02:20,470
So, here is one idea for how to
事实上 这里有一个可以构造

68
00:02:20,580 --> 00:02:23,580
define new features f1, f2, f3.
新特征f1 f2 f3的想法

69
00:02:24,970 --> 00:02:25,930
On this line I am
在这一行中

70
00:02:26,100 --> 00:02:27,600
going to define only three new
我只定义三个

71
00:02:27,890 --> 00:02:28,770
features, but for real problems
特征变量 但是对于实际问题而言

72
00:02:29,500 --> 00:02:30,650
we can get to define a much larger number.
我们可以定义非常多的特征变量

73
00:02:31,060 --> 00:02:32,060
But here's what I'm going to do
但是在这里

74
00:02:32,260 --> 00:02:33,400
in this phase
对于这里的

75
00:02:33,640 --> 00:02:34,980
of features X1, X2, and
特征X1 X2

76
00:02:35,400 --> 00:02:36,520
I'm going to leave X0
我不打算

77
00:02:36,720 --> 00:02:37,800
out of this, the
把X0放在这里

78
00:02:38,060 --> 00:02:39,230
interceptor X0, but
截距X0

79
00:02:39,330 --> 00:02:40,320
in this phase X1 X2, I'm going to just,
但是这里的X1 X2

80
00:02:42,550 --> 00:02:43,560
you know, manually pick a few points, and then
我打算手动选取一些点

81
00:02:43,750 --> 00:02:45,210
call these points l1, we
然后将这些点定义为l1

82
00:02:45,450 --> 00:02:46,720
are going to pick
再选一个

83
00:02:46,820 --> 00:02:49,560
a different point, let's call
不同的点

84
00:02:50,080 --> 00:02:51,390
that l2 and let's pick
把它定为l2

85
00:02:51,710 --> 00:02:52,880
the third one and call
再选第三个点

86
00:02:53,170 --> 00:02:55,800
this one l3, and for
定为l3

87
00:02:55,900 --> 00:02:56,830
now let's just say that I'm
现在 假设我打算

88
00:02:56,930 --> 00:02:59,220
going to choose these three points manually.
只手动选取三个点

89
00:02:59,870 --> 00:03:02,860
I'm going to call these three points line ups, so line up one, two, three.
将这三个点作为标记，标记1，标记2，标记3

90
00:03:03,720 --> 00:03:04,630
What I'm going to do is
我将要做的是

91
00:03:04,790 --> 00:03:07,190
define my new features as follows, given
这样定义新的特征变量

92
00:03:07,510 --> 00:03:10,070
an example X, let me
给出一个实例X

93
00:03:10,170 --> 00:03:13,130
define my first feature f1
将第一个特征变量f1

94
00:03:13,330 --> 00:03:16,010
to be some
定义为

95
00:03:16,260 --> 00:03:18,960
measure of the similarity between
一种相似度的度量

96
00:03:19,330 --> 00:03:21,460
my training example X and
度量实例X与

97
00:03:21,680 --> 00:03:26,270
my first landmark and
第一个标记的相似度

98
00:03:26,520 --> 00:03:27,840
this specific formula that I'm
我将要用来度量相似度的

99
00:03:27,950 --> 00:03:29,600
going to use to measure similarity is
这个公式

100
00:03:30,160 --> 00:03:31,830
going to be this is E to
是这样的 对括号的内容取exp

101
00:03:31,940 --> 00:03:34,220
the minus the length of
负号 X-l1的长度

102
00:03:34,470 --> 00:03:37,880
X minus l1, squared, divided
平方

103
00:03:38,320 --> 00:03:39,610
by two sigma squared.
除以2倍的σ平方

104
00:03:40,730 --> 00:03:41,640
So, depending on whether or not
取决于你之前是否看了

105
00:03:41,780 --> 00:03:43,420
you watched the previous optional video,
上一个可选视频

106
00:03:44,390 --> 00:03:48,140
this notation, you know, this is
这个记号表示

107
00:03:48,460 --> 00:03:49,340
the length of the vector
向量W的长度

108
00:03:49,680 --> 00:03:51,260
W. And so, this thing
因此

109
00:03:51,460 --> 00:03:53,760
here, this X
这里的

110
00:03:54,020 --> 00:03:55,990
minus l1, this is
X-l1

111
00:03:56,100 --> 00:03:57,440
actually just the euclidean distance
就是欧式距离取平方

112
00:03:58,610 --> 00:03:59,950
squared, is the euclidean
是点x与l1之间的

113
00:04:00,410 --> 00:04:03,240
distance between the point x and the landmark l1.
欧式距离

114
00:04:03,530 --> 00:04:04,610
We will see more about this later.
我们之后会更多地谈到这个

115
00:04:06,440 --> 00:04:07,990
But that's my first feature, and
这是我的第一个特征向量

116
00:04:08,120 --> 00:04:09,610
my second feature f2 is
然后是f2

117
00:04:09,750 --> 00:04:11,750
going to be, you know,
它等于

118
00:04:12,370 --> 00:04:14,040
similarity function that measures
对x和l2使用相似度函数

119
00:04:14,400 --> 00:04:17,310
how similar X is to l2 and the game is going to be defined as
度量x与l2的相似度

120
00:04:17,370 --> 00:04:19,360
the following function.
这个相似度函数同上

121
00:04:25,970 --> 00:04:27,320
This is E to the minus of the square of the euclidean distance
对如下值取exp

122
00:04:28,150 --> 00:04:29,050
between X and the second
X到第二个标记之间的欧式距离

123
00:04:29,820 --> 00:04:31,310
landmark, that is what the enumerator is and
这是分子

124
00:04:31,510 --> 00:04:32,660
then divided by 2 sigma squared
再除以2倍的σ平方

125
00:04:33,520 --> 00:04:35,280
and similarly f3 is, you know,
类似的 f3

126
00:04:35,850 --> 00:04:39,480
similarity between X
等于X与l3之间的

127
00:04:39,840 --> 00:04:41,860
and l3, which is
相似度

128
00:04:41,980 --> 00:04:44,510
equal to, again, similar formula.
公式同上

129
00:04:46,550 --> 00:04:48,070
And what this similarity
这个相似度函数是

130
00:04:48,830 --> 00:04:50,440
function is, the mathematical term
用数学术语来说

131
00:04:50,730 --> 00:04:52,030
for this, is that this is
它就是

132
00:04:52,160 --> 00:04:54,390
going to be a kernel function.
核函数

133
00:04:55,340 --> 00:04:56,810
And the specific kernel I'm using
这里我所说的核函数

134
00:04:57,140 --> 00:04:59,570
here, this is actually called a Gaussian kernel.
实际上是高斯核函数

135
00:05:00,630 --> 00:05:01,920
And so this formula, this particular
因此这个公式

136
00:05:02,500 --> 00:05:04,990
choice of similarity function is called a Gaussian kernel.
我们选择的这个相似度公式是高斯核函数

137
00:05:05,770 --> 00:05:07,220
But the way the terminology goes is that, you know, in
但是这个术语

138
00:05:07,360 --> 00:05:09,110
the abstract these different
其实概括了

139
00:05:09,600 --> 00:05:11,270
similarity functions are called kernels and
许多不同的相似度函数

140
00:05:11,600 --> 00:05:12,670
we can have different similarity functions
它们都称作核函数

141
00:05:13,750 --> 00:05:16,410
and the specific example I'm giving here is called the Gaussian kernel.
而我用的这个特定例子是高斯核函数

142
00:05:17,110 --> 00:05:18,400
We'll see other examples of other kernels.
之后我们会见到别的核函数

143
00:05:18,840 --> 00:05:21,100
But for now just think of these as similarity functions.
但是现在就把这个当做相似度函数

144
00:05:22,470 --> 00:05:24,100
And so, instead of writing similarity between
我们通常不需要写

145
00:05:24,500 --> 00:05:26,270
X and l, sometimes we
X和L的相似度

146
00:05:26,480 --> 00:05:28,380
also write this a kernel denoted
有时我们就直接这样写

147
00:05:29,070 --> 00:05:32,360
you know, lower case k between x and one of my landmarks all right.
小写的k 括号里是x和标记l

148
00:05:34,120 --> 00:05:36,120
So let's see what a
现在

149
00:05:36,650 --> 00:05:38,480
criminals actually do and
我们来看看核函数到底可以做什么

150
00:05:38,810 --> 00:05:40,640
why these sorts of similarity
为什么这些相似度函数

151
00:05:41,280 --> 00:05:44,540
functions, why these expressions might make sense.
这些表达式是正确的

152
00:05:46,690 --> 00:05:48,020
So let's take my first landmark. My
先来看看我们的第一个标记

153
00:05:48,330 --> 00:05:49,230
landmark l1, which is
标记l1

154
00:05:49,350 --> 00:05:51,370
one of those points I chose on my figure just now.
l1是我之前在图中选取的几个点中的其中一个

155
00:05:53,000 --> 00:05:54,160
So the similarity of the kernel between x and l1 is given by this expression.
因此x和l1之间的核函数相似度是这样表达的

156
00:05:57,530 --> 00:05:58,600
Just to make sure, you know, we
为了保证

157
00:05:58,690 --> 00:05:59,600
are on the same page about what
你知道

158
00:05:59,780 --> 00:06:01,860
the numerator term is, the
这个分子项是什么

159
00:06:01,960 --> 00:06:03,140
numerator can also be
这个分子也可以

160
00:06:03,330 --> 00:06:04,620
written as a sum from
写为

161
00:06:04,880 --> 00:06:06,470
J equals 1 through N on sort of the distance.
对这个距离求和 j从1到n

162
00:06:07,000 --> 00:06:08,700
So this is the component wise distance
这是向量X和l

163
00:06:09,270 --> 00:06:10,900
between the vector X and
各分量之间的距离

164
00:06:11,070 --> 00:06:12,050
the vector l. And again
再次地

165
00:06:12,380 --> 00:06:14,460
for the purpose of these
在这几张幻灯片中

166
00:06:14,720 --> 00:06:16,180
slides I'm ignoring X0.
我忽略了X0

167
00:06:16,680 --> 00:06:17,910
So just ignoring the intercept
因此我们暂时先不管截距项X0

168
00:06:18,220 --> 00:06:19,960
term X0, which is always equal to 1.
X0总是等于1

169
00:06:21,430 --> 00:06:22,470
So, you know, this is
那么 你现在明白

170
00:06:22,630 --> 00:06:25,780
how you compute the kernel with similarity between X and a landmark.
这就是你通过计算X和标记之间的相似度得到的核函数

171
00:06:27,270 --> 00:06:28,200
So let's see what this function does.
让我们来看看这个函数计算的是什么

172
00:06:29,110 --> 00:06:31,870
Suppose X is close to one of the landmarks.
假设X与其中一个标记点非常接近

173
00:06:33,320 --> 00:06:34,910
Then this euclidean distance
那么这个欧式距离

174
00:06:35,360 --> 00:06:36,690
formula and the numerator will
以及这个分子

175
00:06:36,990 --> 00:06:38,770
be close to 0, right.
就会接近于0 对吧

176
00:06:38,890 --> 00:06:40,070
So, that is this term
这是因为

177
00:06:40,580 --> 00:06:41,880
here, the distance was great,
这里的这个项 是距离的平方

178
00:06:42,170 --> 00:06:43,130
the distance using X and 0
X到l的距离

179
00:06:43,240 --> 00:06:45,130
will be close to zero, and so
接近于0

180
00:06:46,390 --> 00:06:47,440
f1, this is a simple
因此f1

181
00:06:47,710 --> 00:06:50,100
feature, will be approximately E
这个特征变量约等于

182
00:06:50,290 --> 00:06:52,760
to the minus 0 and
对-0取exp

183
00:06:52,800 --> 00:06:54,650
then the numerator squared over 2 is equal to squared
然后除以2倍的σ平方

184
00:06:55,650 --> 00:06:56,670
so that E to the
因此对0取exp

185
00:06:56,770 --> 00:06:58,070
0, E to minus 0,
对-0取exp

186
00:06:58,370 --> 00:06:59,810
E to 0 is going to be close to one.
约等于1

187
00:07:01,640 --> 00:07:03,480
And I'll put the approximation symbol here
我把约等号放在这里

188
00:07:03,700 --> 00:07:05,430
because the distance may
是因为这个距离

189
00:07:05,530 --> 00:07:06,930
not be exactly 0, but
不是严格地等于0

190
00:07:07,120 --> 00:07:08,040
if X is closer to landmark
但是X越接近于L

191
00:07:08,340 --> 00:07:09,190
this term will be close
那么这个项就会越接近于0

192
00:07:09,440 --> 00:07:12,070
to 0 and so f1 would be close 1.
因此f1越接近于1

193
00:07:13,400 --> 00:07:15,220
Conversely, if X is
相反地

194
00:07:15,520 --> 00:07:17,350
far from 01 then this
如果X离L1越远

195
00:07:17,550 --> 00:07:18,940
first feature f1 will
那么f1

196
00:07:19,820 --> 00:07:21,190
be E to the minus
就等于对一个非常大的数字

197
00:07:21,540 --> 00:07:24,040
of some large number squared,
的平方除以2倍σ平方

198
00:07:24,960 --> 00:07:25,980
divided divided by two sigma
再取exp

199
00:07:26,260 --> 00:07:27,690
squared and E to
然后

200
00:07:27,810 --> 00:07:28,800
the minus of a large number
对一个负的大数字取exp

201
00:07:29,630 --> 00:07:31,450
is going to be close to 0.
接近于0

202
00:07:33,320 --> 00:07:34,610
So what these
因此

203
00:07:34,750 --> 00:07:36,080
features do is they measure how
这些特征变量的作用是度量

204
00:07:36,290 --> 00:07:37,500
similar X is from one
X到标记L的相似度

205
00:07:37,670 --> 00:07:39,160
of your landmarks and the feature
并且

206
00:07:39,530 --> 00:07:40,290
f is going to be close
如果X离L非常相近

207
00:07:40,540 --> 00:07:42,360
to one when X is
那么特征变量f

208
00:07:42,540 --> 00:07:43,810
close to your landmark and is
就接近于1

209
00:07:44,020 --> 00:07:45,310
going to be 0 or close
如果X

210
00:07:45,380 --> 00:07:46,520
to zero when X is
离标记L非常远

211
00:07:46,790 --> 00:07:48,850
far from your landmark.
那么f会约等于0

212
00:07:49,320 --> 00:07:49,980
Each of these landmarks.
之前我所画的

213
00:07:50,590 --> 00:07:51,620
On the previous line, I drew
那几个标记点

214
00:07:52,250 --> 00:07:54,260
three landmarks, l1, l2,l3.
l1 l2 l3

215
00:07:56,190 --> 00:08:00,030
Each of these landmarks, defines a new feature
每一个标记点会定义一个新的特征变量

216
00:08:00,660 --> 00:08:02,270
f1, f2 and f3.
f1 f2 f3

217
00:08:02,680 --> 00:08:03,660
That is, given the the
也就是说

218
00:08:03,710 --> 00:08:05,160
training example X, we can
给出一个训练实例X

219
00:08:05,380 --> 00:08:06,750
now compute three new
我们就能计算三个新的特征变量

220
00:08:06,930 --> 00:08:08,720
features: f1, f2, and
f1 f2和f3

221
00:08:09,520 --> 00:08:11,010
f3, given, you know, the three
基于我之前给的

222
00:08:11,340 --> 00:08:13,530
landmarks that I wrote just now.
三个标记点

223
00:08:13,760 --> 00:08:15,030
But first, let's look
但是首先

224
00:08:15,240 --> 00:08:16,450
at this exponentiation function, let's look
我们来看看这个指数函数

225
00:08:16,710 --> 00:08:18,190
at this similarity function and plot
我们来看看这个相似度函数

226
00:08:18,570 --> 00:08:20,790
in some figures and just, you know, understand
我们画一些图

227
00:08:21,230 --> 00:08:22,460
better what this really looks like.
来更好地理解这些函数是什么样的

228
00:08:23,510 --> 00:08:26,320
For this example, let's say I have two features X1 and X2.
比如 假设我们有两个特征变量X1和X2

229
00:08:26,570 --> 00:08:27,430
And let's say my first
假设我们的第一个标记点

230
00:08:27,820 --> 00:08:29,290
landmark, l1 is at
l1

231
00:08:29,520 --> 00:08:32,550
a location, 3 5. So
位于(3,5)

232
00:08:33,650 --> 00:08:35,750
and let's say I set sigma squared equals one for now.
假设σ的平方等于1

233
00:08:36,500 --> 00:08:37,550
If I plot what this feature
如果我画出图

234
00:08:37,890 --> 00:08:40,420
looks like, what I get is this figure.
就是这样的

235
00:08:41,210 --> 00:08:42,510
So the vertical axis, the height
这个纵轴

236
00:08:42,760 --> 00:08:44,030
of the surface is the value
这个曲面的高度是

237
00:08:45,240 --> 00:08:46,280
of f1 and down here
f1的值

238
00:08:46,630 --> 00:08:48,490
on the horizontal axis are, if
再看看水平的坐标

239
00:08:48,710 --> 00:08:50,580
I have some training example, and there
如果我把训练实例画在这里

240
00:08:51,660 --> 00:08:53,050
is x1 and there is x2.
这是X1 这是X2

241
00:08:53,320 --> 00:08:54,940
Given a certain training example, the
给出一个特定的训练实例

242
00:08:55,120 --> 00:08:56,890
training example here which shows
选这里的一个实例

243
00:08:56,980 --> 00:08:58,140
the value of x1 and x2
可以看到X1和X2的值

244
00:08:58,140 --> 00:08:59,390
at a height above the surface,
这个高度

245
00:08:59,950 --> 00:09:02,220
shows the corresponding value of
可以看到这个f1相应的值

246
00:09:02,410 --> 00:09:03,830
f1 and down below this is
下面的这个图

247
00:09:03,960 --> 00:09:04,890
the same figure I had showed,
内容是一样的

248
00:09:05,040 --> 00:09:06,600
using a quantifiable plot, with
但我用的是一个等值线图

249
00:09:06,810 --> 00:09:08,320
x1 on horizontal
x1为水平轴

250
00:09:09,090 --> 00:09:10,340
axis, x2 on horizontal
x2为竖直轴

251
00:09:10,820 --> 00:09:12,500
axis and so, this
那么

252
00:09:12,820 --> 00:09:13,700
figure on the bottom is just
底下的这个图

253
00:09:13,940 --> 00:09:15,440
a contour plot of the 3D surface.
就是这个3D曲面的等值线图

254
00:09:16,540 --> 00:09:17,800
You notice that when
你会发现

255
00:09:18,030 --> 00:09:19,540
X is equal to
当X等于（3，5）的时候

256
00:09:19,820 --> 00:09:24,140
3 5 exactly, then we
这个时候

257
00:09:24,380 --> 00:09:25,680
the f1 takes on the
f1就等于1

258
00:09:25,760 --> 00:09:26,990
value 1, because that's at
因为

259
00:09:27,170 --> 00:09:29,400
the maximum and X
它在最大值上

260
00:09:29,860 --> 00:09:31,150
moves away as X goes
所以如果X往旁边移动

261
00:09:31,680 --> 00:09:33,650
further away then this
离这个点越远

262
00:09:33,860 --> 00:09:35,270
feature takes on values
那么从图中可以看到

263
00:09:36,460 --> 00:09:37,160
that are close to 0.
f1的值就越接近0

264
00:09:38,750 --> 00:09:40,120
And so, this is really a feature,
这就是特征变量f1

265
00:09:40,400 --> 00:09:42,100
f1 measures, you know, how
计算的内容

266
00:09:42,400 --> 00:09:43,680
close X is to the first
也就是X与第一个标记点

267
00:09:44,040 --> 00:09:46,050
landmark and if
的远近程度

268
00:09:46,520 --> 00:09:47,610
varies between 0 and one
这个值在0到1之间

269
00:09:47,790 --> 00:09:48,940
depending on how close X
具体取决于X

270
00:09:49,160 --> 00:09:50,650
is to the first landmark l1.
距离标记点l1到底有多近

271
00:09:52,360 --> 00:09:53,710
Now the other was due on
我在这张幻灯片上要讲的另一项内容是

272
00:09:53,920 --> 00:09:55,530
this slide is show the effects
我们可以看到改变σ平方的值

273
00:09:56,090 --> 00:09:59,740
of varying this parameter sigma squared.
能产生多大影响

274
00:10:00,040 --> 00:10:01,770
So, sigma squared is the parameter of the
σ平方是高斯核函数的参数

275
00:10:02,530 --> 00:10:04,120
Gaussian kernel and as you vary it, you get slightly different effects.
当你改变它的值的时 你会得到略微不同的结果

276
00:10:05,150 --> 00:10:06,380
Let's set sigma squared to be
假设我们让σ平方

277
00:10:06,650 --> 00:10:07,570
equal to 0.5 and see
等于0.5

278
00:10:07,710 --> 00:10:09,850
what we get. We set sigma square to 0.5,
看看我们能得到什么 将σ平方设为0.5

279
00:10:10,090 --> 00:10:11,170
what you find is that the
你会发现

280
00:10:11,430 --> 00:10:12,670
kernel looks similar, except for the
核函数看起来还是相似的

281
00:10:12,730 --> 00:10:14,200
width of the bump becomes narrower.
只是这个突起的宽度变窄了

282
00:10:14,790 --> 00:10:16,400
The contours shrink a bit too.
等值线图也收缩了一些

283
00:10:17,120 --> 00:10:18,360
So if sigma squared equals to 0.5
所以如果我们将σ平方设为0.5

284
00:10:18,740 --> 00:10:19,820
then as you start
我们从X=(3 5)

285
00:10:20,250 --> 00:10:21,650
from X equals 3
开始

286
00:10:21,910 --> 00:10:23,140
5 and as you move away,
往旁边移动

287
00:10:24,750 --> 00:10:26,370
then the feature f1
那么特征变量f1

288
00:10:27,050 --> 00:10:28,520
falls to zero much more
降到0的速度

289
00:10:28,730 --> 00:10:30,830
rapidly and conversely,
会变得很快 与此相反地

290
00:10:32,090 --> 00:10:33,930
if you has increase since
如果你增大了σ平方的值

291
00:10:34,670 --> 00:10:36,280
where three in that
我们假设σ平方等于3

292
00:10:36,510 --> 00:10:37,700
case and as I
在这个例子中

293
00:10:37,800 --> 00:10:39,090
move away from, you know l. So
如果我从点L往旁边移动

294
00:10:39,630 --> 00:10:40,770
this point here is really
这里的这个点

295
00:10:41,110 --> 00:10:42,410
l, right, that's l1 is at
就是L

296
00:10:42,610 --> 00:10:45,210
location 3 5, right. So it's shown up here.
l1所在的坐标为(3,5) 从这里可以看到

297
00:10:48,190 --> 00:10:49,480
And if sigma squared is
如果σ平方很大

298
00:10:49,660 --> 00:10:50,460
large, then as you move
那么

299
00:10:50,690 --> 00:10:54,040
away from l1, the
当你从点l1移走的时候

300
00:10:54,320 --> 00:10:56,170
value of the feature falls
特征变量的值减小的速度

301
00:10:56,740 --> 00:10:57,670
away much more slowly.
会变得比较慢

302
00:11:03,590 --> 00:11:05,200
So, given this definition of
因此 讲完了特征变量的定义

303
00:11:05,290 --> 00:11:06,730
the features, let's see what
我们来看看

304
00:11:06,960 --> 00:11:08,420
source of hypothesis we can learn.
我们能得到什么样的预测函数

305
00:11:09,550 --> 00:11:11,360
Given the training example X, we
给定一个训练实例X

306
00:11:11,480 --> 00:11:12,930
are going to compute these features
我们准备计算出三个特征变量

307
00:11:14,670 --> 00:11:16,360
f1, f2, f3 and a
f1 f2 f3

308
00:11:17,550 --> 00:11:18,980
hypothesis is going to
预测函数的预测值

309
00:11:19,040 --> 00:11:20,510
predict one when theta 0
会等于1 如果θ0加上

310
00:11:20,760 --> 00:11:22,050
plus theta 1 f1 plus theta 2 f2,
θ1*f1 加上 θ2*f2

311
00:11:22,330 --> 00:11:26,210
and so on is greater than or equal to 0.
等等的结果是大于或者等于0的

312
00:11:26,250 --> 00:11:27,100
For this particular example, let's say
对于这个特定的例子而言

313
00:11:27,290 --> 00:11:28,460
that I've already found a learning
假设我们已经找到了一个学习算法

314
00:11:28,620 --> 00:11:29,520
algorithm and let's say that, you know,
并且假设

315
00:11:30,190 --> 00:11:31,220
somehow I ended up with
我已经得到了

316
00:11:31,900 --> 00:11:32,880
these values of the parameter.
这些参数的值

317
00:11:33,510 --> 00:11:34,600
So if theta 0 equals
因此如果θ0等于-0.5

318
00:11:34,830 --> 00:11:36,010
minus 0.5, theta 1 equals
θ1等于1

319
00:11:36,390 --> 00:11:37,780
1, theta 2 equals
θ2等于1

320
00:11:38,180 --> 00:11:39,570
1, and theta 3
θ3等于0

321
00:11:40,370 --> 00:11:42,480
equals 0 And what
我想要做的是

322
00:11:42,720 --> 00:11:44,530
I want to do is consider what
我想要知道会发生什么

323
00:11:44,670 --> 00:11:46,100
happens if we have a
如果

324
00:11:46,200 --> 00:11:48,060
training example that takes
我们有一个训练实例

325
00:11:49,260 --> 00:11:51,710
has location at this
它的坐标在这里

326
00:11:52,510 --> 00:11:55,050
magenta dot, right where I just drew this dot over here.
这个红点 我画的这个点

327
00:11:55,380 --> 00:11:56,180
So let's say I have a training
假设我们有一个训练实例X

328
00:11:56,290 --> 00:11:58,690
example X, what would my hypothesis predict?
我想知道我的预测函数会给出怎样的预测结果

329
00:11:59,000 --> 00:12:01,430
Well, If I look at this formula.
看看这个公式

330
00:12:04,580 --> 00:12:05,890
Because my training example X
因为我的训练实例X

331
00:12:06,050 --> 00:12:07,820
is close to l1, we have
接近于L1

332
00:12:08,230 --> 00:12:10,190
that f1 is going
那么f1

333
00:12:10,250 --> 00:12:11,830
to be close to 1 the because
就接近于1

334
00:12:12,250 --> 00:12:13,200
my training example X is
又因为训练实例X

335
00:12:13,360 --> 00:12:15,050
far from l2 and l3 I
离L2 L3都很远

336
00:12:15,360 --> 00:12:16,880
have that, you know, f2 would be close to
所以 f2就接近于0

337
00:12:17,590 --> 00:12:20,500
0 and f3 will be close to 0.
f3也接近于0

338
00:12:21,550 --> 00:12:22,700
So, if I look at
所以

339
00:12:22,880 --> 00:12:23,970
that formula, I have theta
如果我们看看这个公式

340
00:12:24,230 --> 00:12:25,670
0 plus theta 1
θ0加上θ1

341
00:12:26,600 --> 00:12:29,970
times 1 plus theta 2 times some value.
乘以1加上θ2乘以某个值

342
00:12:30,510 --> 00:12:32,390
Not exactly 0, but let's say close to 0.
不是严格意义上等于0 但是接近于0

343
00:12:33,140 --> 00:12:36,400
Then plus theta 3 times something close to 0.
接着加上θ3乘以一个接近于0的值

344
00:12:37,480 --> 00:12:39,810
And this is going to be equal to plugging in these values now.
这个等于... 再把这些值代入进去

345
00:12:41,050 --> 00:12:43,470
So, that gives minus 0.5
这个是-0.5

346
00:12:44,160 --> 00:12:46,820
plus 1 times 1 which is 1, and so on.
加上1乘以1等于1  等等

347
00:12:46,960 --> 00:12:47,740
Which is equal to 0.5 which is greater than or equal to 0.
最后等于0.5 这个值大于等于0

348
00:12:48,000 --> 00:12:50,820
So, at this point,
因此 这个点

349
00:12:51,160 --> 00:12:54,280
we're going to predict Y equals
我们预测出的Y值是1

350
00:12:54,740 --> 00:12:57,320
1, because that's greater than or equal to zero.
因为大于等于0

351
00:12:58,910 --> 00:12:59,950
Now let's take a different point.
现在我们选择另一个不同的点

352
00:13:00,800 --> 00:13:02,100
Now lets' say I take a
假设

353
00:13:07,250 --> 00:13:08,470
a point out there, if that
这个点

354
00:13:08,710 --> 00:13:10,580
were my training example X, then
如果它是训练实例X

355
00:13:11,270 --> 00:13:12,190
if you make a similar computation,
如果你进行和之前相同的计算

356
00:13:12,950 --> 00:13:14,390
you find that f1, f2,
你发现f1 f2

357
00:13:15,420 --> 00:13:16,850
Ff3 are all going to be close to 0.
f3都接近于0

358
00:13:18,160 --> 00:13:19,910
And so, we have theta
因此 我们得到

359
00:13:20,240 --> 00:13:23,940
0 plus theta 1, f1,
θ0加上θ1*f1

360
00:13:24,230 --> 00:13:26,010
plus so on and this
加上其他项

361
00:13:26,200 --> 00:13:27,830
will be about equal to
最后的结果

362
00:13:28,020 --> 00:13:30,810
minus 0.5, because theta
会等于-0.5

363
00:13:31,170 --> 00:13:32,110
0 is minus 0.5 and
因为θ0等于-0.5

364
00:13:32,190 --> 00:13:33,920
f1, f2, f3 are all zero.
并且f1 f2 f3都为0

365
00:13:34,910 --> 00:13:37,510
So this will be minus 0.5, this is less than zero.
因此最后结果是-0.5 小于0

366
00:13:37,860 --> 00:13:38,910
And so, at this
因此

367
00:13:39,090 --> 00:13:40,220
point out there, we're going to
这个点

368
00:13:40,470 --> 00:13:42,010
predict Y equals zero.
我们预测的Y值是0

369
00:13:44,190 --> 00:13:45,100
And if you do
如果

370
00:13:45,270 --> 00:13:46,230
this yourself for a range
你自己来对大量的点

371
00:13:46,380 --> 00:13:47,460
of different points, be sure to
进行这样相应的处理

372
00:13:47,670 --> 00:13:48,660
convince yourself that if you
你应该可以确定

373
00:13:48,730 --> 00:13:50,340
have a training example that's
如果你有一个训练实例

374
00:13:50,890 --> 00:13:52,390
close to L2, say,
它非常接近于L2

375
00:13:52,970 --> 00:13:55,730
then at this point we'll also predict Y equals one.
那么通过这个点预测的Y值就是1

376
00:13:56,800 --> 00:13:58,110
And in fact, what you end
实际上

377
00:13:58,240 --> 00:13:59,300
up doing is, you know,
你最后得到的结果是

378
00:13:59,350 --> 00:14:00,920
if you look around this boundary, this
如果你看看这个边界线

379
00:14:01,140 --> 00:14:02,300
space, what we'll find is that
这个区域 我们会发现

380
00:14:02,820 --> 00:14:03,900
for points near l1
对于接近L1和L2的点

381
00:14:04,090 --> 00:14:05,560
and l2 we end up predicting positive.
我们的预测值是1

382
00:14:06,550 --> 00:14:07,780
And for points far away from
对于远离

383
00:14:08,050 --> 00:14:09,260
l1 and l2, that's for
L1和L2的店

384
00:14:09,470 --> 00:14:12,220
points far away from these two
对于离这两个标记点非常远的点

385
00:14:12,480 --> 00:14:13,780
landmarks, we end up predicting
我们最后预测的结果

386
00:14:14,390 --> 00:14:15,560
that the class is equal to 0.
是等于0的

387
00:14:16,510 --> 00:14:17,380
As so, what we end up doing,is
我们最后会得到

388
00:14:17,890 --> 00:14:20,270
that the decision boundary of
这个预测函数的

389
00:14:20,400 --> 00:14:22,110
this hypothesis would end
判别边界

390
00:14:22,280 --> 00:14:24,210
up looking something like this where
会像这样

391
00:14:24,370 --> 00:14:25,630
inside this red decision boundary
在这个红色的判别边界里面

392
00:14:26,580 --> 00:14:28,240
would predict Y equals
预测的Y值等于1

393
00:14:28,630 --> 00:14:30,250
1 and outside we predict
在这外面预测的Y值

394
00:14:32,570 --> 00:14:32,570
Y equals 0.
等于0

395
00:14:33,020 --> 00:14:34,770
And so this is
这就是

396
00:14:34,850 --> 00:14:36,010
how with this definition
我们如何通过标记点

397
00:14:36,870 --> 00:14:38,560
of the landmarks and of the kernel function.
以及核函数

398
00:14:39,370 --> 00:14:40,940
We can learn pretty complex non-linear
来训练出非常复杂的非线性

399
00:14:41,420 --> 00:14:42,800
decision boundary, like what I
判别边界的方法

400
00:14:42,930 --> 00:14:44,150
just drew where we predict
就像我刚才画的那个判别边界

401
00:14:44,560 --> 00:14:46,990
positive when we're close to either one of the two landmarks.
当我们接近两个标记点中任意一个时 预测值就会等于1

402
00:14:47,570 --> 00:14:48,880
And we predict negative when we're
否则预测值等于0

403
00:14:49,260 --> 00:14:50,680
very far away from any
如果这些点离标记点

404
00:14:50,950 --> 00:14:52,990
of the landmarks.
非常远

405
00:14:53,440 --> 00:14:55,000
And so this is part of
这就是核函数这部分

406
00:14:55,050 --> 00:14:57,300
the idea of kernels of and
的概念

407
00:14:57,600 --> 00:14:58,620
how we use them with the
以及我们如何

408
00:14:58,770 --> 00:14:59,810
support vector machine, which is that
在支持向量机中使用它们

409
00:14:59,990 --> 00:15:01,720
we define these extra features using
我们通过标记点和相似性函数

410
00:15:02,040 --> 00:15:03,900
landmarks and similarity functions
来定义新的特征变量

411
00:15:04,770 --> 00:15:06,730
to learn more complex nonlinear classifiers.
从而训练复杂的非线性边界

412
00:15:08,210 --> 00:15:09,290
So hopefully that gives you
我希望刚才讲的内容能够

413
00:15:09,390 --> 00:15:10,410
a sense of the idea of
帮助你更好的理解核函数的概念

414
00:15:10,590 --> 00:15:11,680
kernels and how we could
以及我们如何使用它

415
00:15:11,890 --> 00:15:14,110
use it to define new features for the Support Vector Machine.
在支持向量机中定义新的特征变量

416
00:15:15,510 --> 00:15:17,670
But there are a couple of questions that we haven't answered yet.
但是还有一些问题我们并没有做出回答

417
00:15:18,010 --> 00:15:19,550
One is, how do we get these landmarks?
其中一个是 我们如何得到这些标记点

418
00:15:20,120 --> 00:15:20,930
How do we choose these landmarks?
我们怎么来选择这些标记点

419
00:15:21,050 --> 00:15:22,910
And another is, what
另一个是

420
00:15:23,090 --> 00:15:24,500
other similarity functions, if any,
其他的相似度方程是什么样的 如果有其他的话

421
00:15:24,750 --> 00:15:25,680
can we use other than the
我们能够用其他的相似度方程

422
00:15:25,780 --> 00:15:29,000
one we talked about, which is called the Gaussian kernel.
来代替我们所讲的这个高斯核函数吗

423
00:15:29,190 --> 00:15:29,970
In the next video we give
在下一个视频中

424
00:15:29,990 --> 00:15:31,290
answers to these questions and put
我们会回答这些问题

425
00:15:31,490 --> 00:15:33,150
everything together to show how
然后把所有东西都整合到一起

426
00:15:33,740 --> 00:15:35,060
support vector machines with kernels
来看看支持向量机如何通过核函数的定义

427
00:15:35,720 --> 00:15:36,960
can be a powerful way
有效的学习

428
00:15:37,200 --> 00:15:38,610
to learn complex nonlinear functions.
复杂非线性函数