srt/2 - 2 - Cost Function (8 min).srt

﻿1
00:00:00,000 --> 00:00:05,399
In this video we'll define something
called the cost function. This will let us
在这段视频中我们将定义代价函数的概念 这有助于我们
(字幕整理：中国海洋大学 黄海广,haiguang2000@qq.com )

2
00:00:05,399 --> 00:00:10,688
figure out how to fit the best possible
straight line to our data. In linear
弄清楚如何把最有可能的直线与我们的数据相拟合

3
00:00:10,688 --> 00:00:16,758
regression we have a training set like
that shown here. Remember our notation M
在线性回归中我们有一个像这样的训练集 记住

4
00:00:16,758 --> 00:00:21,972
was the number of training examples. So
maybe M=47. And the form of the
M代表了训练样本的数量  所以 比如 M = 47

5
00:00:21,972 --> 00:00:27,731
hypothesis, which we use to make
predictions, is this linear function. To
而我们的假设函数 也就是用来进行预测的函数 是这样的线性函数形式

6
00:00:27,731 --> 00:00:33,723
introduce a little bit more terminology,
these theta zero and theta one, right,
接下来我们会引入一些术语 这些θ0和θ1

7
00:00:33,723 --> 00:00:39,759
these theta i's are what I call the
parameters of the model. What we're
这些θi我把它们称为模型参数 在这个视频中

8
00:00:39,759 --> 00:00:44,578
going to do in this video is talk about
how to go about choosing these two
我们要做的就是谈谈如何选择这两个参数值θ0和θ1

9
00:00:44,578 --> 00:00:49,654
parameter values, theta zero and theta
one. With different choices of parameters
选择不同的参数θ0和θ1

10
00:00:49,654 --> 00:00:54,408
theta zero and theta one we get different
hypotheses, different hypothesis
我们会得到不同的假设 不同的假设函数

11
00:00:54,408 --> 00:00:59,355
functions. I know some of you will
probably be already familiar with what I'm
我知道你们中的有些人可能已经知道我在这张幻灯片上要讲的

12
00:00:59,355 --> 00:01:04,367
going to do on this slide, but just to
review here are a few examples. If theta
但我们还是用这几个例子来复习回顾一下

13
00:01:04,367 --> 00:01:09,378
zero is 1.5 and theta one is 0, then
the hypothesis function will look like
如果θ0是1.5 θ1是0 那么假设函数会看起来是这样

14
00:01:09,378 --> 00:01:15,701
this. Right, because your hypothesis
function will be h( x) equals 1.5 plus
是吧 因为你的假设函数是h(x)=1.5+0*x

15
00:01:15,701 --> 00:01:22,645
0 times x which is this constant value
function, this is flat at 1.5. If
是这样一个常数函数 恒等于1.5

16
00:01:22,645 --> 00:01:29,332
theta zero equals 0 and theta one
equals 0.5, then the hypothesis will look
如果θ0=0并且θ1=0.5 那么假设会看起来像这样

17
00:01:29,332 --> 00:01:35,536
like this. And it should pass through this
point (2, 1), says you now have h(x) or
它会通过点(2,1) 这样你又得到了h(x)

18
00:01:35,536 --> 00:01:40,666
really some h<u>theta(x) but
sometimes I'll just omit theta for</u>
或者hθ(x) 但是有时我们为了简洁会省略θ

19
00:01:40,666 --> 00:01:46,518
brevity. So, h(x) will be equal to just
0.5 times x which looks like that. And
因此 h(x)将等于0.5倍的x 就像这样

20
00:01:46,518 --> 00:01:52,443
finally if theta zero equals 1 and theta
one equals 0.5 then we end up with the
最后 如果θ0=1并且θ1=0.5 我们最后得到的假设会看起来像这样

21
00:01:52,443 --> 00:01:58,598
hypothesis that looks like this. Let's
see, it should pass through the (2, 2)
让我们来看看 它应该通过点(2,2)

22
00:01:58,598 --> 00:02:04,468
point like so. And this is my new h(x)
or my new h<u>theta(x). All right? Well</u>
这是我的新的h(x)或者写作hθ(x) 对吧？

23
00:02:04,468 --> 00:02:09,980
you remember that this is
h<u>theta(x) but as a shorthand</u>
你还记得之前我们提到过hθ(x)的 但作为简写 我们通常只把它写作h(x)

24
00:02:09,980 --> 00:02:16,584
sometimes I just write this as h(x). In
linear regression we have a training set,
在线性回归中 我们有一个训练集

25
00:02:16,584 --> 00:02:22,439
like maybe the one I've plotted here. What
we want to do is come up with values for
可能就像我在这里绘制的 我们要做的就是

26
00:02:22,439 --> 00:02:28,295
the parameters theta zero and theta one.
So that the straight line we get out
得出θ0 θ1这两个参数的值 来让假设函数表示的直线

27
00:02:28,295 --> 00:02:33,799
of this corresponds to a straight line
that somehow fits the data well. Like
尽量地与这些数据点很好的拟合

28
00:02:33,799 --> 00:02:39,756
maybe that line over there. So how do we
come up with values theta zero, theta one
也许就像这里的这条线一样 那么我们如何得出θ0 θ1的值

29
00:02:39,756 --> 00:02:45,350
that corresponds to a good fit to the
data? The idea is we're going to choose
来使它很好地拟合数据的呢？我们的想法是 我们要选择

30
00:02:45,350 --> 00:02:51,162
our parameters theta zero, theta one so
that h(x), meaning the value we predict
能使h(x) 也就是 输入x时我们预测的值

31
00:02:51,162 --> 00:02:56,330
on input x, that this at least close to
the values y for the examples in our
最接近该样本对应的y值的参数θ0 θ1

32
00:02:56,330 --> 00:03:01,133
training set, for our training examples.
So, in our training set we're given a
所以 在我们的训练集中我们会得到一定数量的样本

33
00:03:01,133 --> 00:03:06,505
number of examples where we know x decides
the house and we know the actual price of
我们知道x表示卖出哪所房子 并且知道这所房子的实际价格

34
00:03:06,505 --> 00:03:11,796
what it's sold for. So let's try to
choose values for the parameters so that
所以 我们要尽量选择参数值 使得

35
00:03:11,796 --> 00:03:17,302
at least in the training set, given the
x's in the training set, we make
在训练集中 给出训练集中的x值

36
00:03:17,302 --> 00:03:23,507
reasonably accurate predictions for the y
values. Let's formalize this. So linear
我们能合理准确地预测y的值

37
00:03:23,507 --> 00:03:29,401
regression, what we're going to do is that I'm
going to want to solve a minimization
让我们给出标准的定义 在线性回归中 我们要解决的是一个最小化问题

38
00:03:29,401 --> 00:03:38,787
problem. So I'm going to write minimize over theta
zero, theta one. And, I want this to be
所以我要写出关于θ0 θ1的最小化 而且

39
00:03:38,787 --> 00:03:44,379
small, right, I want the difference
between h(x) and y to be small. And one
我希望这个式子极其小 是吧 我想要h(x)和y之间的差异要小

40
00:03:44,379 --> 00:03:50,493
thing I'm gonna do is try to minimize the
square difference between the output of
我要做的事情是尽量减少假设的输出与房子真实价格

41
00:03:50,493 --> 00:03:56,159
the hypothesis and the actual price of the
house. Okay? So let's fill in some
之间的差的平方 明白吗？接下来我会详细的阐述

42
00:03:56,159 --> 00:04:01,379
details. Remember that I was using the
notation (x(i), y(i)) to represent the
别忘了 我用符号( x(i),y(i) )代表第i个样本

43
00:04:01,379 --> 00:04:07,418
ith training example. So what I
want really is to sum over my training
所以我想要做的是对所有训练样本进行一个求和

44
00:04:07,418 --> 00:04:13,202
set. Sum from i equals 1 to M of
the square difference between
对i=1到i=M的样本 将对假设进行预测得到的结果

45
00:04:13,202 --> 00:04:18,896
this is the prediction of my hypothesis
when it is input the size of house number
此时的输入是第i号房子的面积 对吧

46
00:04:18,896 --> 00:04:24,380
i, right, minus the actual price that
house number i will sell for and I want to
将第i号对应的预测结果 减去第i号房子的实际价格 所得的差的平方相加得到总和

47
00:04:24,380 --> 00:04:29,588
minimize the sum of my training set sum
from i equals 1 through M of the
而我希望尽量减小这个值

48
00:04:29,588 --> 00:04:35,281
difference of this squared error,
square difference between the predicted
也就是预测值和实际值的差的平方误差和 或者说预测价格和

49
00:04:35,281 --> 00:04:41,091
price of the house and the price
that it will actually sell for. And just
实际卖出价格的差的平方

50
00:04:41,091 --> 00:04:47,723
remind you of your notation M here was
the, the size of my training set, right,
我说了这里的m指的是训练集的样本容量

51
00:04:47,723 --> 00:04:53,347
so the M there is my number of training
examples. Right? That hash sign is the
对吧

52
00:04:53,347 --> 00:04:59,045
abbreviation for "number" of training
examples. Okay? And to make some of our,
这个井号是训练样本“个数”的缩写 对吧 而为了让表达式的数学意义

53
00:04:59,045 --> 00:05:04,888
make the math a little bit easier, I'm
going to actually look at, you know, 1
变得容易理解一点 我们实际上考虑的是

54
00:05:04,888 --> 00:05:09,578
over M times that. So we're going to try
to minimize my average error, which we're
这个数的1/m 因此我们要尝试尽量减少我们的平均误差

55
00:05:09,578 --> 00:05:13,926
going to minimize one by 2M.
Putting the 2, the constant one half, in
也就是尽量减少其1/2m 通常是这个数的一半

56
00:05:13,926 --> 00:05:18,386
front it just makes some of the math a
little easier. So minimizing one half of
前面的这些只是为了使数学更直白一点 因此对这个求和值的二分之一求最小值

57
00:05:18,386 --> 00:05:23,073
something, right, should give you the same
values of the parameters theta zero, theta
应该得出相同的θ0值和相同的θ1值来

58
00:05:23,073 --> 00:05:27,647
one as minimizing that function. And just
make sure this, this, this equation is
请大家一定弄清楚这个道理

59
00:05:27,647 --> 00:05:35,569
clear, right? This expression in here,
h<u>theta(x), this is my, this is</u>
没问题吧？在这里hθ(x)的这种表达 这是我们的假设

60
00:05:35,569 --> 00:05:44,880
our usual, right? That's equal to this
plus theta one x(i). And, this notation,
它等于θ0加上θ1与x(i)的乘积 而这个表达

61
00:05:44,880 --> 00:05:49,814
minimize over theta zero and theta one,
this means find me the values of theta
表示关于θ0和θ1的最小化过程 这意味着我们要找到θ0和θ1

62
00:05:49,814 --> 00:05:54,369
zero and theta one that causes this
expression to be minimized. And this
的值来使这个表达式的值最小

63
00:05:54,369 --> 00:05:59,557
expression depends on theta zero and theta
one. Okay? So just to recap, we're posing
这个表达式因θ0和θ1的变化而变化对吧？

64
00:05:59,557 --> 00:06:04,382
this problem as find me the values of
theta zero and theta one so that the
因此 简单地说 我们正在把这个问题变成 找到能使

65
00:06:04,575 --> 00:06:09,292
average already one over two M times the
sum of square errors between my
我的训练集中预测值和真实值的差的平方的和

66
00:06:09,292 --> 00:06:14,590
predictions on the training set minus the
actual values of the houses on the
的1/2M最小的θ0和θ1的值

67
00:06:14,590 --> 00:06:19,694
training set is minimized. So this is
going to be my overall objective function
因此 这将是我的线性回归的整体目标函数

68
00:06:19,694 --> 00:06:25,127
for linear regression. And just to, you
know rewrite this out a little bit more
为了使它更明确一点 我们要改写这个函数

69
00:06:25,127 --> 00:06:30,580
cleanly what I'm going to do by convention
is we usually define a cost function.
按照惯例 我要定义一个代价函数

70
00:06:30,860 --> 00:06:38,965
Which is going to be exactly this. That
formula that I have up here. And what I
正如屏幕中所示 这里的这个公式

71
00:06:38,965 --> 00:06:48,388
want to do is minimize over theta zero and
theta one my function J of theta zero
我们想要做的就是关于θ0和θ1 对函数J(θ0,θ1)求最小值

72
00:06:48,388 --> 00:06:57,428
comma theta one. Just write this
out, this is my cost function. So, this
这就是我的代价函数

73
00:06:57,428 --> 00:07:06,943
cost function is also called the squared
error function or sometimes called the
代价函数也被称作平方误差函数 有时也被称为

74
00:07:06,943 --> 00:07:14,461
square error cost function and it turns
out that Why, why do we, you know, take
平方误差代价函数 事实上 我们之所以要求出

75
00:07:14,461 --> 00:07:19,006
the squares of the errors? It turns out
that the squared error cost function is a
误差的平方和 是因为误差平方代价函数

76
00:07:19,006 --> 00:07:23,214
reasonable choice and will work well for
most problems, for most regression
对于大多数问题 特别是回归问题 都是一个合理的选择

77
00:07:23,214 --> 00:07:27,815
problems. There are other cost functions
that will work pretty well, but the squared
还有其他的代价函数也能很好地发挥作用

78
00:07:27,815 --> 00:07:32,473
error cost function is probably the most
commonly used one for regression problems.
但是平方误差代价函数可能是解决回归问题最常用的手段了

79
00:07:32,473 --> 00:07:36,793
Later in this class we'll also talk about alternative
cost functions as well, but this, this
在后续课程中 我们还会谈论其他的代价函数

80
00:07:36,793 --> 00:07:41,282
choice that we just had, should be a
pret-, pretty reasonable thing to try for
但我们刚刚讲的选择是对于大多数线性回归问题非常合理的

81
00:07:41,282 --> 00:07:45,706
most linear regression problems. Okay. So
that's the cost function. So far we've
好吧 所以这是代价函数 到目前为止 我们已经

82
00:07:45,706 --> 00:07:50,899
just seen a mathematical definition of you
know this cost function and in case this
介绍了代价函数的数学定义

83
00:07:50,899 --> 00:07:55,973
function J of theta zero theta one in case
this function seems a little bit abstract
也许这个函数J(θ0,θ1)有点抽象

84
00:07:55,973 --> 00:08:00,808
and you still don't have a good sense of
what its doing in the next video, in the
可能你仍然不知道它的内涵

85
00:08:00,808 --> 00:08:05,882
next couple videos we're actually going to
go a little bit deeper into what the cost
在接下来的几个视频里 我们要更进一步解释

86
00:08:05,882 --> 00:08:10,776
function J is doing and try to give you
better intuition about what its computing
代价函数J的工作原理 并尝试更直观地解释它在计算什么

87
00:08:10,776 --> 00:08:12,329
and why we want to use it.
以及我们使用它的目的
【果壳教育无边界字幕组】翻译：antis  校对：cheerzzh 审核：所罗门捷列夫