forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path2 - 2 - Cost Function (8 min).srt
523 lines (436 loc) · 14.6 KB
/
2 - 2 - Cost Function (8 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
1
00:00:00,000 --> 00:00:05,399
In this video we'll define something
called the cost function. This will let us
在这段视频中我们将定义代价函数的概念 这有助于我们
(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
2
00:00:05,399 --> 00:00:10,688
figure out how to fit the best possible
straight line to our data. In linear
弄清楚如何把最有可能的直线与我们的数据相拟合
3
00:00:10,688 --> 00:00:16,758
regression we have a training set like
that shown here. Remember our notation M
在线性回归中我们有一个像这样的训练集 记住
4
00:00:16,758 --> 00:00:21,972
was the number of training examples. So
maybe M=47. And the form of the
M代表了训练样本的数量 所以 比如 M = 47
5
00:00:21,972 --> 00:00:27,731
hypothesis, which we use to make
predictions, is this linear function. To
而我们的假设函数 也就是用来进行预测的函数 是这样的线性函数形式
6
00:00:27,731 --> 00:00:33,723
introduce a little bit more terminology,
these theta zero and theta one, right,
接下来我们会引入一些术语 这些θ0和θ1
7
00:00:33,723 --> 00:00:39,759
these theta i's are what I call the
parameters of the model. What we're
这些θi我把它们称为模型参数 在这个视频中
8
00:00:39,759 --> 00:00:44,578
going to do in this video is talk about
how to go about choosing these two
我们要做的就是谈谈如何选择这两个参数值θ0和θ1
9
00:00:44,578 --> 00:00:49,654
parameter values, theta zero and theta
one. With different choices of parameters
选择不同的参数θ0和θ1
10
00:00:49,654 --> 00:00:54,408
theta zero and theta one we get different
hypotheses, different hypothesis
我们会得到不同的假设 不同的假设函数
11
00:00:54,408 --> 00:00:59,355
functions. I know some of you will
probably be already familiar with what I'm
我知道你们中的有些人可能已经知道我在这张幻灯片上要讲的
12
00:00:59,355 --> 00:01:04,367
going to do on this slide, but just to
review here are a few examples. If theta
但我们还是用这几个例子来复习回顾一下
13
00:01:04,367 --> 00:01:09,378
zero is 1.5 and theta one is 0, then
the hypothesis function will look like
如果θ0是1.5 θ1是0 那么假设函数会看起来是这样
14
00:01:09,378 --> 00:01:15,701
this. Right, because your hypothesis
function will be h( x) equals 1.5 plus
是吧 因为你的假设函数是h(x)=1.5+0*x
15
00:01:15,701 --> 00:01:22,645
0 times x which is this constant value
function, this is flat at 1.5. If
是这样一个常数函数 恒等于1.5
16
00:01:22,645 --> 00:01:29,332
theta zero equals 0 and theta one
equals 0.5, then the hypothesis will look
如果θ0=0并且θ1=0.5 那么假设会看起来像这样
17
00:01:29,332 --> 00:01:35,536
like this. And it should pass through this
point (2, 1), says you now have h(x) or
它会通过点(2,1) 这样你又得到了h(x)
18
00:01:35,536 --> 00:01:40,666
really some h<u>theta(x) but
sometimes I'll just omit theta for</u>
或者hθ(x) 但是有时我们为了简洁会省略θ
19
00:01:40,666 --> 00:01:46,518
brevity. So, h(x) will be equal to just
0.5 times x which looks like that. And
因此 h(x)将等于0.5倍的x 就像这样
20
00:01:46,518 --> 00:01:52,443
finally if theta zero equals 1 and theta
one equals 0.5 then we end up with the
最后 如果θ0=1并且θ1=0.5 我们最后得到的假设会看起来像这样
21
00:01:52,443 --> 00:01:58,598
hypothesis that looks like this. Let's
see, it should pass through the (2, 2)
让我们来看看 它应该通过点(2,2)
22
00:01:58,598 --> 00:02:04,468
point like so. And this is my new h(x)
or my new h<u>theta(x). All right? Well</u>
这是我的新的h(x)或者写作hθ(x) 对吧?
23
00:02:04,468 --> 00:02:09,980
you remember that this is
h<u>theta(x) but as a shorthand</u>
你还记得之前我们提到过hθ(x)的 但作为简写 我们通常只把它写作h(x)
24
00:02:09,980 --> 00:02:16,584
sometimes I just write this as h(x). In
linear regression we have a training set,
在线性回归中 我们有一个训练集
25
00:02:16,584 --> 00:02:22,439
like maybe the one I've plotted here. What
we want to do is come up with values for
可能就像我在这里绘制的 我们要做的就是
26
00:02:22,439 --> 00:02:28,295
the parameters theta zero and theta one.
So that the straight line we get out
得出θ0 θ1这两个参数的值 来让假设函数表示的直线
27
00:02:28,295 --> 00:02:33,799
of this corresponds to a straight line
that somehow fits the data well. Like
尽量地与这些数据点很好的拟合
28
00:02:33,799 --> 00:02:39,756
maybe that line over there. So how do we
come up with values theta zero, theta one
也许就像这里的这条线一样 那么我们如何得出θ0 θ1的值
29
00:02:39,756 --> 00:02:45,350
that corresponds to a good fit to the
data? The idea is we're going to choose
来使它很好地拟合数据的呢?我们的想法是 我们要选择
30
00:02:45,350 --> 00:02:51,162
our parameters theta zero, theta one so
that h(x), meaning the value we predict
能使h(x) 也就是 输入x时我们预测的值
31
00:02:51,162 --> 00:02:56,330
on input x, that this at least close to
the values y for the examples in our
最接近该样本对应的y值的参数θ0 θ1
32
00:02:56,330 --> 00:03:01,133
training set, for our training examples.
So, in our training set we're given a
所以 在我们的训练集中我们会得到一定数量的样本
33
00:03:01,133 --> 00:03:06,505
number of examples where we know x decides
the house and we know the actual price of
我们知道x表示卖出哪所房子 并且知道这所房子的实际价格
34
00:03:06,505 --> 00:03:11,796
what it's sold for. So let's try to
choose values for the parameters so that
所以 我们要尽量选择参数值 使得
35
00:03:11,796 --> 00:03:17,302
at least in the training set, given the
x's in the training set, we make
在训练集中 给出训练集中的x值
36
00:03:17,302 --> 00:03:23,507
reasonably accurate predictions for the y
values. Let's formalize this. So linear
我们能合理准确地预测y的值
37
00:03:23,507 --> 00:03:29,401
regression, what we're going to do is that I'm
going to want to solve a minimization
让我们给出标准的定义 在线性回归中 我们要解决的是一个最小化问题
38
00:03:29,401 --> 00:03:38,787
problem. So I'm going to write minimize over theta
zero, theta one. And, I want this to be
所以我要写出关于θ0 θ1的最小化 而且
39
00:03:38,787 --> 00:03:44,379
small, right, I want the difference
between h(x) and y to be small. And one
我希望这个式子极其小 是吧 我想要h(x)和y之间的差异要小
40
00:03:44,379 --> 00:03:50,493
thing I'm gonna do is try to minimize the
square difference between the output of
我要做的事情是尽量减少假设的输出与房子真实价格
41
00:03:50,493 --> 00:03:56,159
the hypothesis and the actual price of the
house. Okay? So let's fill in some
之间的差的平方 明白吗?接下来我会详细的阐述
42
00:03:56,159 --> 00:04:01,379
details. Remember that I was using the
notation (x(i), y(i)) to represent the
别忘了 我用符号( x(i),y(i) )代表第i个样本
43
00:04:01,379 --> 00:04:07,418
ith training example. So what I
want really is to sum over my training
所以我想要做的是对所有训练样本进行一个求和
44
00:04:07,418 --> 00:04:13,202
set. Sum from i equals 1 to M of
the square difference between
对i=1到i=M的样本 将对假设进行预测得到的结果
45
00:04:13,202 --> 00:04:18,896
this is the prediction of my hypothesis
when it is input the size of house number
此时的输入是第i号房子的面积 对吧
46
00:04:18,896 --> 00:04:24,380
i, right, minus the actual price that
house number i will sell for and I want to
将第i号对应的预测结果 减去第i号房子的实际价格 所得的差的平方相加得到总和
47
00:04:24,380 --> 00:04:29,588
minimize the sum of my training set sum
from i equals 1 through M of the
而我希望尽量减小这个值
48
00:04:29,588 --> 00:04:35,281
difference of this squared error,
square difference between the predicted
也就是预测值和实际值的差的平方误差和 或者说预测价格和
49
00:04:35,281 --> 00:04:41,091
price of the house and the price
that it will actually sell for. And just
实际卖出价格的差的平方
50
00:04:41,091 --> 00:04:47,723
remind you of your notation M here was
the, the size of my training set, right,
我说了这里的m指的是训练集的样本容量
51
00:04:47,723 --> 00:04:53,347
so the M there is my number of training
examples. Right? That hash sign is the
对吧
52
00:04:53,347 --> 00:04:59,045
abbreviation for "number" of training
examples. Okay? And to make some of our,
这个井号是训练样本“个数”的缩写 对吧 而为了让表达式的数学意义
53
00:04:59,045 --> 00:05:04,888
make the math a little bit easier, I'm
going to actually look at, you know, 1
变得容易理解一点 我们实际上考虑的是
54
00:05:04,888 --> 00:05:09,578
over M times that. So we're going to try
to minimize my average error, which we're
这个数的1/m 因此我们要尝试尽量减少我们的平均误差
55
00:05:09,578 --> 00:05:13,926
going to minimize one by 2M.
Putting the 2, the constant one half, in
也就是尽量减少其1/2m 通常是这个数的一半
56
00:05:13,926 --> 00:05:18,386
front it just makes some of the math a
little easier. So minimizing one half of
前面的这些只是为了使数学更直白一点 因此对这个求和值的二分之一求最小值
57
00:05:18,386 --> 00:05:23,073
something, right, should give you the same
values of the parameters theta zero, theta
应该得出相同的θ0值和相同的θ1值来
58
00:05:23,073 --> 00:05:27,647
one as minimizing that function. And just
make sure this, this, this equation is
请大家一定弄清楚这个道理
59
00:05:27,647 --> 00:05:35,569
clear, right? This expression in here,
h<u>theta(x), this is my, this is</u>
没问题吧?在这里hθ(x)的这种表达 这是我们的假设
60
00:05:35,569 --> 00:05:44,880
our usual, right? That's equal to this
plus theta one x(i). And, this notation,
它等于θ0加上θ1与x(i)的乘积 而这个表达
61
00:05:44,880 --> 00:05:49,814
minimize over theta zero and theta one,
this means find me the values of theta
表示关于θ0和θ1的最小化过程 这意味着我们要找到θ0和θ1
62
00:05:49,814 --> 00:05:54,369
zero and theta one that causes this
expression to be minimized. And this
的值来使这个表达式的值最小
63
00:05:54,369 --> 00:05:59,557
expression depends on theta zero and theta
one. Okay? So just to recap, we're posing
这个表达式因θ0和θ1的变化而变化对吧?
64
00:05:59,557 --> 00:06:04,382
this problem as find me the values of
theta zero and theta one so that the
因此 简单地说 我们正在把这个问题变成 找到能使
65
00:06:04,575 --> 00:06:09,292
average already one over two M times the
sum of square errors between my
我的训练集中预测值和真实值的差的平方的和
66
00:06:09,292 --> 00:06:14,590
predictions on the training set minus the
actual values of the houses on the
的1/2M最小的θ0和θ1的值
67
00:06:14,590 --> 00:06:19,694
training set is minimized. So this is
going to be my overall objective function
因此 这将是我的线性回归的整体目标函数
68
00:06:19,694 --> 00:06:25,127
for linear regression. And just to, you
know rewrite this out a little bit more
为了使它更明确一点 我们要改写这个函数
69
00:06:25,127 --> 00:06:30,580
cleanly what I'm going to do by convention
is we usually define a cost function.
按照惯例 我要定义一个代价函数
70
00:06:30,860 --> 00:06:38,965
Which is going to be exactly this. That
formula that I have up here. And what I
正如屏幕中所示 这里的这个公式
71
00:06:38,965 --> 00:06:48,388
want to do is minimize over theta zero and
theta one my function J of theta zero
我们想要做的就是关于θ0和θ1 对函数J(θ0,θ1)求最小值
72
00:06:48,388 --> 00:06:57,428
comma theta one. Just write this
out, this is my cost function. So, this
这就是我的代价函数
73
00:06:57,428 --> 00:07:06,943
cost function is also called the squared
error function or sometimes called the
代价函数也被称作平方误差函数 有时也被称为
74
00:07:06,943 --> 00:07:14,461
square error cost function and it turns
out that Why, why do we, you know, take
平方误差代价函数 事实上 我们之所以要求出
75
00:07:14,461 --> 00:07:19,006
the squares of the errors? It turns out
that the squared error cost function is a
误差的平方和 是因为误差平方代价函数
76
00:07:19,006 --> 00:07:23,214
reasonable choice and will work well for
most problems, for most regression
对于大多数问题 特别是回归问题 都是一个合理的选择
77
00:07:23,214 --> 00:07:27,815
problems. There are other cost functions
that will work pretty well, but the squared
还有其他的代价函数也能很好地发挥作用
78
00:07:27,815 --> 00:07:32,473
error cost function is probably the most
commonly used one for regression problems.
但是平方误差代价函数可能是解决回归问题最常用的手段了
79
00:07:32,473 --> 00:07:36,793
Later in this class we'll also talk about alternative
cost functions as well, but this, this
在后续课程中 我们还会谈论其他的代价函数
80
00:07:36,793 --> 00:07:41,282
choice that we just had, should be a
pret-, pretty reasonable thing to try for
但我们刚刚讲的选择是对于大多数线性回归问题非常合理的
81
00:07:41,282 --> 00:07:45,706
most linear regression problems. Okay. So
that's the cost function. So far we've
好吧 所以这是代价函数 到目前为止 我们已经
82
00:07:45,706 --> 00:07:50,899
just seen a mathematical definition of you
know this cost function and in case this
介绍了代价函数的数学定义
83
00:07:50,899 --> 00:07:55,973
function J of theta zero theta one in case
this function seems a little bit abstract
也许这个函数J(θ0,θ1)有点抽象
84
00:07:55,973 --> 00:08:00,808
and you still don't have a good sense of
what its doing in the next video, in the
可能你仍然不知道它的内涵
85
00:08:00,808 --> 00:08:05,882
next couple videos we're actually going to
go a little bit deeper into what the cost
在接下来的几个视频里 我们要更进一步解释
86
00:08:05,882 --> 00:08:10,776
function J is doing and try to give you
better intuition about what its computing
代价函数J的工作原理 并尝试更直观地解释它在计算什么
87
00:08:10,776 --> 00:08:12,329
and why we want to use it.
以及我们使用它的目的
【果壳教育无边界字幕组】翻译:antis 校对:cheerzzh 审核:所罗门捷列夫