srt/16 - 3 - Collaborative Filtering (10 min).srt

﻿1
00:00:01,060 --> 00:00:02,420
In this video we'll talk about
在这段视频中 我们要讲
(字幕整理：中国海洋大学 黄海广,haiguang2000@qq.com )

2
00:00:02,620 --> 00:00:03,900
an approach to building a
一种构建推荐系统的方法

3
00:00:03,970 --> 00:00:06,390
recommender system that's called collaborative filtering.
叫做协同过滤(collaborative filtering)

4
00:00:07,540 --> 00:00:08,880
The algorithm that we're talking
我们所讲的算法

5
00:00:09,180 --> 00:00:10,400
about has a very interesting
有一个值得一提的

6
00:00:10,680 --> 00:00:11,830
property that it does
特点 那就是它能实现

7
00:00:12,120 --> 00:00:13,290
what is called feature learning and
对特征的学习

8
00:00:13,790 --> 00:00:14,800
by that I mean that this
我的意思是

9
00:00:14,960 --> 00:00:16,270
will be an algorithm that can
这种算法能够

10
00:00:16,450 --> 00:00:19,010
start to learn for itself what features to use.
自行学习所要使用的特征

11
00:00:21,130 --> 00:00:22,100
Here was the data set that
我们建一个数据集

12
00:00:22,220 --> 00:00:23,440
we had and we had
假定是为每一部电影准备的

13
00:00:23,720 --> 00:00:25,030
assumed that for each movie,
对每一部电影

14
00:00:25,690 --> 00:00:27,000
someone had come and told
我们找一些人来

15
00:00:27,320 --> 00:00:28,640
us how romantic that
告诉我们这部电影

16
00:00:28,840 --> 00:00:30,550
movie was and how much action there was in that movie.
浪漫指数是多少 动作指数是多少

17
00:00:31,680 --> 00:00:32,880
But as you can imagine it
但想一下就知道

18
00:00:33,020 --> 00:00:34,320
can be very difficult and time
这样做难度很大

19
00:00:34,500 --> 00:00:36,390
consuming and expensive to actually try
也很花费时间

20
00:00:36,490 --> 00:00:37,860
to get someone to, you know,
你想想 要让每个人

21
00:00:38,050 --> 00:00:39,440
watch each movie and tell
看完每一部电影

22
00:00:39,700 --> 00:00:40,880
you how romantic each movie and
告诉你你每一部电影有多浪漫 多动作

23
00:00:41,410 --> 00:00:42,570
how action packed is each
这是一件不容易的事情

24
00:00:42,660 --> 00:00:44,270
movie, and often you'll
而且通常

25
00:00:44,390 --> 00:00:46,760
want even more features than just these two.
你还会希望得到除这两个特征之外的其他指数

26
00:00:46,980 --> 00:00:48,130
So where do you get these features from?
那么你怎样才能得到这些特征呢？

27
00:00:49,890 --> 00:00:50,920
So let's change the problem
所以 让我们转移一下问题

28
00:00:51,500 --> 00:00:53,220
a bit and suppose that
假如我们

29
00:00:53,980 --> 00:00:55,160
we have a data set where
有某一个数据集

30
00:00:55,410 --> 00:00:57,980
we do not know the values of these features.
我们并不知道特征的值是多少

31
00:00:58,380 --> 00:00:59,280
So we're given the data set
所以比如我们得到一些

32
00:00:59,640 --> 00:01:01,140
of movies and of
关于电影的数据

33
00:01:01,270 --> 00:01:03,550
how the users rated them, but we
不同用户对电影的评分

34
00:01:03,760 --> 00:01:05,190
have no idea how romantic each
我们并不知道每部电影

35
00:01:05,370 --> 00:01:06,140
movie is and we have no
到底有多少浪漫的成分

36
00:01:06,310 --> 00:01:07,660
idea how action packed each
也不知道到底每部电影里面动作成分是多少

37
00:01:07,820 --> 00:01:09,940
movie is so I've replaced all of these things with question marks.
于是我把所有的问题都打上问号

38
00:01:11,310 --> 00:01:12,330
But now let's make a slightly different assumption.
现在我们稍稍改变一下这个假设

39
00:01:13,870 --> 00:01:15,570
Let's say we've gone to each of our users, and each of our users has told has told us
假设我们采访了每一位用户 而且每一位用户都告诉我们

40
00:01:15,980 --> 00:01:18,510
how much they like the
他们是否喜欢

41
00:01:18,820 --> 00:01:20,040
romantic movies and how much
爱情电影 以及

42
00:01:20,220 --> 00:01:22,320
they like action packed movies.
他们是否喜欢动作电影

43
00:01:22,830 --> 00:01:26,090
So Alice has associated a current of theta 1.
这样 Alice 就有了对应的参数 θ(1)

44
00:01:26,820 --> 00:01:27,470
Bob theta 2.
Bob 的是 θ(2)

45
00:01:27,910 --> 00:01:28,440
Carol theta 3.
Carol 的是 θ(3)

46
00:01:28,970 --> 00:01:30,330
Dave theta 4.
Dave 的是 θ(4)

47
00:01:30,500 --> 00:01:31,530
And let's say we also use this
我们还有这样的假设

48
00:01:31,780 --> 00:01:33,040
and that Alice tells us
假如 Alice 告诉我们

49
00:01:33,380 --> 00:01:35,340
that she really
她十分喜欢

50
00:01:35,610 --> 00:01:36,960
likes romantic movies and so
爱情电影

51
00:01:37,140 --> 00:01:38,780
there's a five there which
于是 Alice 的特征 x1

52
00:01:39,280 --> 00:01:41,210
is the multiplier associated with X1 and lets
对应的值就是5

53
00:01:41,570 --> 00:01:42,680
say that Alice tells us she
假设 Alice 告诉我们

54
00:01:42,840 --> 00:01:45,030
really doesn't like action movies and so there's a 0 there.
她非常不喜欢动作电影 于是这一个特征就是0

55
00:01:46,060 --> 00:01:47,190
And Bob tells us something similar
Bob 也有相似的喜好

56
00:01:48,660 --> 00:01:49,770
so we have theta 2 over here.
所以也就有了 θ(2) 的数据

57
00:01:50,630 --> 00:01:52,460
Whereas Carol tells us that
但 Carol 说

58
00:01:53,570 --> 00:01:54,720
she really likes action movies
她非常喜欢动作电影

59
00:01:55,240 --> 00:01:56,450
which is why there's a 5 there,
于是这个特征就被记录为5

60
00:01:56,900 --> 00:01:58,600
that's the multiplier associated with X2,
也就是 x2 的值

61
00:01:58,980 --> 00:02:00,160
and remember there's also
别忘了

62
00:02:01,210 --> 00:02:03,490
X0 equals 1 and let's
我们仍然有等于1的 x0

63
00:02:03,770 --> 00:02:05,390
say that Carol tells us
假设 Carol 告诉我们

64
00:02:05,610 --> 00:02:07,000
she doesn't like romantic
她不喜欢爱情电影之类的

65
00:02:07,390 --> 00:02:09,640
movies and so on, similarly for Dave.
而且戴夫也是这样

66
00:02:09,840 --> 00:02:11,030
So let's assume that somehow
于是 我们假定 某种程度上

67
00:02:11,440 --> 00:02:12,830
we can go to users and
我们就可以着眼于用户

68
00:02:13,290 --> 00:02:14,600
each user J just tells
看看任意的用户 j

69
00:02:15,020 --> 00:02:16,160
us what is the value
对应的 θ(j) 是怎样的

70
00:02:17,090 --> 00:02:18,870
of theta J for them.
这样就明确地告诉了我们

71
00:02:19,450 --> 00:02:22,190
And so basically specifies to us of how much they like different types of movies.
他们对不同题材电影的喜欢程度

72
00:02:24,060 --> 00:02:25,280
If we can get these parameters
如果我们能够从用户那里

73
00:02:25,990 --> 00:02:27,890
theta from our users then it
得到这些 θ 参考值

74
00:02:28,050 --> 00:02:29,820
turns out that it becomes possible to
那么我们理论上就能

75
00:02:29,960 --> 00:02:31,210
try to infer what are the
推测出每部电影的

76
00:02:31,310 --> 00:02:33,710
values of x1 and x2 for each movie.
x1 以及 x2 的值

77
00:02:34,800 --> 00:02:35,140
Let's look at an example.
举例来说

78
00:02:35,730 --> 00:02:36,560
Let's look at movie 1.
假如我们看电影1

79
00:02:38,690 --> 00:02:39,790
So that movie 1 has associated
于是电影1就对应于

80
00:02:40,580 --> 00:02:42,050
with it a feature vector x1.
表示特征的向量 x1 联系在一起了

81
00:02:42,890 --> 00:02:45,420
And you know this movie is called Love at last but let's ignore that.
这部电影的名字叫《爱到最后》 但这不重要

82
00:02:45,770 --> 00:02:46,750
Let's pretend we don't know what
假设我们不知道

83
00:02:46,870 --> 00:02:49,060
this movie is about, so let's ignore the title of this movie.
这部电影的主要内容 所以也不要在意电影的名字

84
00:02:50,180 --> 00:02:52,270
All we know is that Alice loved this move.
我们知道的就是 Alice 喜欢这部电影

85
00:02:52,450 --> 00:02:53,110
Bob loved this movie.
Bob 喜欢这部电影

86
00:02:53,750 --> 00:02:55,370
Carol and Dave hated this movie.
Carol 和 Dave 不喜欢它

87
00:02:56,450 --> 00:02:57,450
So what can we infer?
那么我们能推断出什么呢？

88
00:02:57,830 --> 00:02:58,900
Well, we know from the
好的 我们从

89
00:02:58,990 --> 00:03:00,510
feature vectors that Alice
特征向量知道了

90
00:03:00,780 --> 00:03:03,190
and Bob love romantic movies
Alice 和 Bob 喜欢爱情电影

91
00:03:03,700 --> 00:03:05,660
because they told us that there's a 5 here.
因为他们都在这里评了5分

92
00:03:06,290 --> 00:03:07,480
Whereas Carol and Dave,
然而 Carol 和 Dave

93
00:03:08,380 --> 00:03:10,150
we know that they hate
我们知道他们不喜欢

94
00:03:10,510 --> 00:03:11,920
romantic movies and that
爱情电影

95
00:03:12,300 --> 00:03:13,990
they love action movies. So
但喜欢动作电影

96
00:03:14,730 --> 00:03:16,050
because those are the parameter
由于你知道这些

97
00:03:16,340 --> 00:03:18,830
vectors that you know, uses 3 and 4, Carol and Dave, gave us.
是可以从 第3和第4个参数看出来的

98
00:03:20,110 --> 00:03:20,950
And so based on the fact
同时 由于我们知道

99
00:03:21,390 --> 00:03:22,340
that movie 1 is loved
Alice 和 Bob

100
00:03:22,880 --> 00:03:24,120
by Alice and Bob and
喜欢电影1

101
00:03:24,340 --> 00:03:26,460
hated by Carol and Dave, we might
而 Carol 和 Dave 不喜欢它

102
00:03:26,910 --> 00:03:30,810
reasonably conclude that this is probably a romantic movie,
我们可以推断 这可能是一部爱情片

103
00:03:31,180 --> 00:03:34,240
it is probably not much of an action movie.
而不太可能是动作片

104
00:03:35,290 --> 00:03:36,360
this example is a little
这个例子在数学上

105
00:03:36,520 --> 00:03:38,090
bit mathematically simplified but what
可能某种程度上简化了

106
00:03:38,260 --> 00:03:40,330
we're really asking is what
但我们真正需要的是

107
00:03:40,590 --> 00:03:42,010
feature vector should X1 be
特征向量 x(1) 应该是什么

108
00:03:42,840 --> 00:03:45,370
so that theta 1 transpose
才能让 θ(1) 的转置

109
00:03:46,030 --> 00:03:48,940
x1 is approximately equal to 5,
乘以x(1) 约等于5

110
00:03:49,660 --> 00:03:51,700
that's Alice's rating, and
也就是 Alice 的评分值

111
00:03:51,920 --> 00:03:55,360
theta 2 transpose x1 is
然后 θ(2) 的转置乘以 x(1)

112
00:03:55,510 --> 00:03:56,660
also approximately equal to 5,
也近似于5

113
00:03:57,670 --> 00:03:59,100
and theta 3 transpose x1 is
而 θ(3) 的转置 乘以 x(1)

114
00:03:59,310 --> 00:04:02,180
approximately equal to 0,
约等于0

115
00:04:03,020 --> 00:04:05,250
so this would be Carol's rating, and
这是 Carol 的评分

116
00:04:06,970 --> 00:04:09,780
theta 4 transpose X1
而 θ(4) 的转置乘以 x(1)

117
00:04:10,740 --> 00:04:11,630
is approximately equal to 0.
也约等于0

118
00:04:12,590 --> 00:04:13,520
And from this it looks
由此可知

119
00:04:13,770 --> 00:04:16,000
like, you know, X1 equals
x(1) 应该用

120
00:04:16,870 --> 00:04:18,770
one that's the intercept term, and
[1 1.0 0.0] 这个向量表示

121
00:04:19,080 --> 00:04:20,960
then 1.0, 0.0, that makes sense
第一个1 是截距项

122
00:04:21,310 --> 00:04:22,390
given what we know of Alice,
这样才能得出

123
00:04:22,790 --> 00:04:24,110
Bob, Carol, and Dave's preferences
Alice Bob Carol 和 Dave 四个人

124
00:04:24,770 --> 00:04:25,940
for movies and the way they rated this movie.
对电影评分的结果

125
00:04:27,700 --> 00:04:29,080
And so more generally, we can
由此及之 我们可以

126
00:04:29,220 --> 00:04:30,210
go down this list and try
继续列举 试着

127
00:04:30,430 --> 00:04:31,520
to figure out what might
弄明白

128
00:04:31,700 --> 00:04:35,260
be reasonable features for these other movies as well.
其他电影的合理特征

129
00:04:39,160 --> 00:04:41,890
Let's formalize this problem of learning the features XI.
让我们将这一学习问题标准化到任意特征 x(i)

130
00:04:42,410 --> 00:04:44,220
Let's say that our
假设我们的用户

131
00:04:44,340 --> 00:04:45,860
users have given us their preferences.
告诉了我们的偏好

132
00:04:46,580 --> 00:04:47,950
So let's say that our users have
就是说用户们

133
00:04:48,130 --> 00:04:49,100
come and, you know, told us
已经给我们提供了

134
00:04:49,330 --> 00:04:50,800
these values for theta 1
θ(1) 到 θ(nu) 的值

135
00:04:50,890 --> 00:04:52,990
through theta of NU
θ(1) 到 θ(nu) 的值

136
00:04:53,280 --> 00:04:54,430
and we want to learn the
而我们想知道

137
00:04:54,790 --> 00:04:56,130
feature vector XI for movie
电影 i 的

138
00:04:56,540 --> 00:04:58,020
number I. What we can
特征向量 x(i) 我们能做的

139
00:04:58,200 --> 00:05:00,830
do is therefore pose the following optimization problem.
是列出以下的最优化的问题

140
00:05:01,220 --> 00:05:02,210
So we want to sum over
所以 我们想要把

141
00:05:02,840 --> 00:05:04,600
all the indices J for
所有指数 j 相加

142
00:05:04,930 --> 00:05:06,280
which we have a rating
得到对电影 i 的评分

143
00:05:06,950 --> 00:05:08,340
for movie I because
因为我们

144
00:05:08,750 --> 00:05:10,040
we're trying to learn the features
想要求得电影 i 的特征

145
00:05:10,950 --> 00:05:13,560
for movie I that is this feature vector XI.
也就是向量 x(i)

146
00:05:14,650 --> 00:05:15,660
So and then what we
所以现在我们

147
00:05:15,780 --> 00:05:18,450
want to do is minimize this squared
要做的是最小化这个平方误差

148
00:05:19,020 --> 00:05:20,160
error, so we want to choose
我们要选择

149
00:05:20,420 --> 00:05:22,430
features XI, so that,
特征 x(i)

150
00:05:22,900 --> 00:05:25,000
you know, the predictive value of
使得 我们预测的用户 j

151
00:05:25,200 --> 00:05:26,820
how user J rates movie
对该电影 i 评分的预测值评分值

152
00:05:27,110 --> 00:05:28,170
I will be similar,
跟我们从用户 j 处

153
00:05:28,900 --> 00:05:30,130
will be not too far in the
实际得到的评分值

154
00:05:30,440 --> 00:05:31,910
squared error sense of the actual
不会相差太远

155
00:05:32,530 --> 00:05:35,330
value YIJ that we actually observe in
也就是这个差值

156
00:05:35,530 --> 00:05:37,130
the rating of user j
不要太大

157
00:05:38,310 --> 00:05:40,790
on movie I. So, just to
所以 总结一下

158
00:05:41,040 --> 00:05:42,320
summarize what this term does
这一阶段要做的

159
00:05:42,840 --> 00:05:44,060
is it tries to choose features
就是为所有

160
00:05:45,040 --> 00:05:46,590
XI so that for
为电影评分的

161
00:05:46,960 --> 00:05:48,210
all the users J that
用户 j

162
00:05:48,360 --> 00:05:50,190
have rated that movie, the
选择特征 x(i)

163
00:05:50,860 --> 00:05:52,830
algorithm also predicts a
这一算法同样也预测出一个值

164
00:05:52,900 --> 00:05:55,490
value for how that user would have rated that movie
表示该用户将会如何评价某部电影

165
00:05:56,170 --> 00:05:57,720
that is not too far, in
而这个预测值

166
00:05:57,810 --> 00:05:59,730
the squared error sense, from the
在平方误差的形式中

167
00:06:00,000 --> 00:06:02,310
actual value that the user had rated that movie.
与用户对该电影评分的实际值尽量接近

168
00:06:03,380 --> 00:06:04,560
So that's the squared error term.
这就是那个平方误差项了

169
00:06:05,420 --> 00:06:07,200
As usual, we can
和之前一样

170
00:06:07,310 --> 00:06:08,430
also add this sort of
我们可以加上一个正则化项

171
00:06:08,520 --> 00:06:09,850
regularization term to prevent
来防止特征的数值

172
00:06:10,300 --> 00:06:11,870
the features from becoming too big.
变得过大

173
00:06:13,720 --> 00:06:15,610
So this is how we
这就是我们

174
00:06:15,760 --> 00:06:16,910
would learn the features
如何从一部特定的电影中

175
00:06:17,420 --> 00:06:19,140
for one specific movie but
学习到特征的方法

176
00:06:19,690 --> 00:06:20,480
what we want to do is
但我们要做的是

177
00:06:20,740 --> 00:06:22,060
learn all the features for all
学习出所有电影的

178
00:06:22,230 --> 00:06:23,820
the movies and so what
所有特征

179
00:06:24,080 --> 00:06:25,050
I'm going to do is add this
所以我现在要做的是

180
00:06:25,240 --> 00:06:26,620
extra summation here so
在此加上另外的一个求和

181
00:06:26,780 --> 00:06:28,840
I'm going to sum over all Nm
我要对所有的电影 nm 求和

182
00:06:29,260 --> 00:06:33,140
movies, N subscript m movies, and minimize
n 下标 m 个电影

183
00:06:33,830 --> 00:06:34,670
this objective on top
然后最小化整个这个目标函数

184
00:06:35,010 --> 00:06:37,080
that sums of all movies.
针对所有的电影

185
00:06:37,410 --> 00:06:39,930
And if you do that, you end up with the following optimization problem.
这样你就会得到如下的最优化的问题

186
00:06:40,950 --> 00:06:42,320
And if you minimize
如果你将这个最小化

187
00:06:42,890 --> 00:06:44,520
this, you have hopefully a
就应该能得到所有电影的

188
00:06:44,680 --> 00:06:47,440
reasonable set of features for all of your movies.
一系列合理的特征

189
00:06:48,650 --> 00:06:50,080
So putting everything together, what
好的 把我们

190
00:06:50,210 --> 00:06:51,050
we, the algorithm we talked
前一个视频讨论的算法

191
00:06:51,330 --> 00:06:52,730
about in the previous video and
以及我们刚刚

192
00:06:53,180 --> 00:06:54,810
the algorithm that we just talked about in this video.
在这个视频中讲过的算法合在一起

193
00:06:55,730 --> 00:06:57,070
In the previous video, what we
上一个视频中

194
00:06:57,180 --> 00:06:58,710
showed was that you know,
我们讲的是

195
00:06:58,820 --> 00:06:59,700
if you have a set of
如果你有一系列

196
00:06:59,790 --> 00:07:00,640
movie ratings, so if you
对电影的评分 那么如果你

197
00:07:00,640 --> 00:07:03,960
have the data the rij's and
有r(i,j) 和 y(i,j)

198
00:07:04,090 --> 00:07:06,100
then you have the yij's that will be the movie ratings.
也就是对电影的评分

199
00:07:08,500 --> 00:07:09,650
Then given features for your
于是 根据不同电影的特征

200
00:07:09,800 --> 00:07:11,800
different movies we can learn these parameters theta.
我们可以得到参数 θ

201
00:07:12,340 --> 00:07:13,110
So if you knew the features,
这样 如果你知道了特征

202
00:07:13,830 --> 00:07:15,000
you can learn the parameters
你就能学习出不同用户的

203
00:07:15,650 --> 00:07:16,850
theta for your different users.
参数 θ 值

204
00:07:18,250 --> 00:07:19,770
And what we showed earlier in
我们之前

205
00:07:19,930 --> 00:07:21,400
this video is that if
这个视频中讲的是

206
00:07:21,790 --> 00:07:22,860
your users are willing to
如果用户愿意

207
00:07:23,000 --> 00:07:25,450
give you parameters, then you
为你提供参数 那么你就

208
00:07:25,560 --> 00:07:28,060
can estimate features for the different movies.
可以为不同的电影估计特征

209
00:07:29,270 --> 00:07:31,490
So this is kind of a chicken and egg problem.
这有点像鸡和蛋的问题

210
00:07:31,770 --> 00:07:32,290
Which comes first?
到底先有鸡还是先有蛋？

211
00:07:32,900 --> 00:07:35,570
You know, do we want if we can get the thetas, we can know the Xs.
就是说 如果我们能知道 θ 就能学习到 x

212
00:07:36,060 --> 00:07:38,160
If we have the Xs, we can learn the thetas.
如果我们知道 x 也会学出 θ 来

213
00:07:39,500 --> 00:07:40,500
And what you can
而这样一来 你能做的

214
00:07:40,680 --> 00:07:41,790
do is, and then
就是

215
00:07:41,910 --> 00:07:43,000
this actually works, what you
如果这真的可行的话

216
00:07:43,110 --> 00:07:44,530
can do is in fact randomly
实际上你能做的就是

217
00:07:45,170 --> 00:07:47,160
guess some value of the thetas.
随机猜de θ 的值

218
00:07:48,210 --> 00:07:49,200
Now based on your initial random
基于你一开始随机

219
00:07:49,530 --> 00:07:50,630
guess for the thetas, you can
猜测出的 θ 的值

220
00:07:50,940 --> 00:07:52,530
then go ahead and use
继你可以继续下去

221
00:07:53,160 --> 00:07:54,210
the procedure that we just talked
运用我们刚刚讲到的

222
00:07:54,460 --> 00:07:55,810
about in order to
步骤 我们可以学习出

223
00:07:56,060 --> 00:07:57,740
learn features for your different movies.
不同电影的特征

224
00:07:58,800 --> 00:07:59,990
Now given some initial set
给出已有的一些电影的

225
00:08:00,130 --> 00:08:01,160
of features for your movies you
原始特征

226
00:08:01,240 --> 00:08:02,730
can then use this first
你可以运用

227
00:08:03,050 --> 00:08:04,060
method that we talked about
我们在上一个视频中讨论过的

228
00:08:04,130 --> 00:08:06,180
in the previous video to try to get
第一种方法 可以得到

229
00:08:06,360 --> 00:08:08,590
an even better estimate for your parameters theta.
对参数 θ 的更好估计

230
00:08:09,560 --> 00:08:12,420
Now that you have a better setting of the parameters theta for your users,
这样就会为用户提供更好的参数 θ 集

231
00:08:12,860 --> 00:08:13,850
we can use that to maybe
我们就可以用这些

232
00:08:14,070 --> 00:08:15,140
even get a better set of
得到更好的

233
00:08:15,240 --> 00:08:17,110
features and so on.
特征集或者其他数据

234
00:08:17,380 --> 00:08:18,400
We can sort of keep
然后我们可以继续

235
00:08:18,600 --> 00:08:19,440
iterating, going back and forth
迭代 不停重复

236
00:08:19,790 --> 00:08:21,270
and optimizing theta, x theta,
优化θ x θ

237
00:08:21,560 --> 00:08:24,000
x theta, nd this
x θ 这非常有效

238
00:08:24,270 --> 00:08:25,290
actually works and if you
如果你

239
00:08:25,410 --> 00:08:26,340
do this, this will actually
这样做的话

240
00:08:26,800 --> 00:08:28,360
cause your album to converge
你的算法将会收敛到

241
00:08:28,930 --> 00:08:30,430
to a reasonable set of
一组合理的电影的特征

242
00:08:31,340 --> 00:08:32,650
features for you movies and a
以及一组对合理的

243
00:08:32,790 --> 00:08:34,880
reasonable set of parameters for your different users.
对不同用户参数的估计

244
00:08:36,080 --> 00:08:38,870
So this is a basic collaborative filtering algorithm.
这就是基本的协同过滤算法

245
00:08:39,770 --> 00:08:40,850
This isn't actually the final
这实际并不是最后

246
00:08:41,020 --> 00:08:42,890
algorithm that we're going to use. In the next
我们将要使用的算法

247
00:08:43,120 --> 00:08:44,100
video we are going to be able to improve
下一个视频中

248
00:08:44,790 --> 00:08:45,610
on this algorithm and make
我们将改进这个算法

249
00:08:45,920 --> 00:08:47,430
it quite a bit more computationally efficient.
让其在计算时更为高效

250
00:08:48,390 --> 00:08:49,510
But, hopefully this gives you
但是这节课希望能让你

251
00:08:49,640 --> 00:08:50,600
a sense of how you
基本了解如何

252
00:08:50,680 --> 00:08:51,980
can formulate a
构建一个问题

253
00:08:52,040 --> 00:08:52,990
problem where you can simultaneously
在这个问题中

254
00:08:53,930 --> 00:08:57,200
learn the parameters and simultaneously learn the features from the different movies.
从不同的电影处学到参数以及特征

255
00:08:58,440 --> 00:08:59,660
And for this problem, for the
对于这个问题

256
00:08:59,740 --> 00:09:01,100
recommender system problem, this is
对于推荐系统

257
00:09:01,390 --> 00:09:02,950
possible only because each user
可能就根据每个用户

258
00:09:03,490 --> 00:09:04,840
rates multiple movies and hopefully
对多部电影的评分

259
00:09:05,100 --> 00:09:06,410
each movie is rated
以及每部电影由

260
00:09:06,790 --> 00:09:08,710
by multiple users. And so
由不同用户的评分

261
00:09:09,280 --> 00:09:10,150
you can do this back and
这样你就可以反复进行这样的过程

262
00:09:10,270 --> 00:09:11,150
forth process to estimate theta
来估计出 θ 和 x

263
00:09:11,200 --> 00:09:14,400
and x.  So to
总结一下

264
00:09:14,830 --> 00:09:15,910
summarize, in this video we've
在这个视频中

265
00:09:16,140 --> 00:09:18,710
seen an initial collaborative filtering algorithm.
我们了解了最基本的协同过滤算法

266
00:09:19,680 --> 00:09:21,550
The term collaborative filtering refers
协同过滤算法指的是

267
00:09:22,020 --> 00:09:23,620
to the observation that when
当你执行这个算法时

268
00:09:23,760 --> 00:09:25,020
you run this algorithm with a
你通过一大堆用户

269
00:09:25,210 --> 00:09:26,790
large set of users, what all
得到的数据

270
00:09:26,960 --> 00:09:28,410
of these users are effectively doing are sort of
这些用户实际上在高效地

271
00:09:29,070 --> 00:09:31,300
collaboratively--or collaborating to
进行了协同合作

272
00:09:31,490 --> 00:09:32,770
get better movie ratings for
来得到每个人

273
00:09:33,010 --> 00:09:34,610
everyone because with every
对电影的评分值

274
00:09:34,840 --> 00:09:36,540
user rating some subset with the movies,
只要用户对某几部电影进行评分

275
00:09:37,350 --> 00:09:39,040
every user is helping the
每个用户就都在帮助算法

276
00:09:39,300 --> 00:09:41,490
algorithm a little bit to learn better features,
更好的学习出特征

277
00:09:42,900 --> 00:09:44,390
and then by helping--
这样 通过自己

278
00:09:44,490 --> 00:09:46,690
by rating a few movies myself, I will be helping
对几部电影评分之后

279
00:09:47,810 --> 00:09:49,550
the system learn better features and
我就能帮助系统更好的学习到特征

280
00:09:49,680 --> 00:09:50,750
then these features can be used
这些特征可以

281
00:09:50,930 --> 00:09:52,610
by the system to make better
被系统运用 为其他人

282
00:09:52,890 --> 00:09:54,380
movie predictions for everyone else.
做出更准确的电影预测

283
00:09:54,640 --> 00:09:55,450
And so there is a sense of
协同的另一层意思

284
00:09:55,530 --> 00:09:56,980
collaboration where every user is
是说每位用户

285
00:09:57,370 --> 00:09:58,980
helping the system learn better features
都在为了大家的利益

286
00:09:59,360 --> 00:10:00,740
for the common good. This
学习出更好的特征

287
00:10:00,810 --> 00:10:03,450
is this collaborative filtering.
这就是协同过滤

288
00:10:04,070 --> 00:10:04,990
And, in the next video what we
在下一个视频中

289
00:10:05,140 --> 00:10:07,490
going to do is take the ideas that
我们要把这些想法

290
00:10:07,740 --> 00:10:08,850
have worked out, and try to
付诸实施

291
00:10:09,090 --> 00:10:09,910
develop a better an even
尝试开发一种

292
00:10:10,170 --> 00:10:11,920
better algorithm, a slightly better
更完美的算法

293
00:10:12,180 --> 00:10:13,640
technique for collaborative filtering.
为协同过滤算法做出一点改进