forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path13 - 1 - Unsupervised Learninguction (3 min).srt
501 lines (401 loc) · 9.58 KB
/
13 - 1 - Unsupervised Learninguction (3 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
1
00:00:00,090 --> 00:00:02,320
In this video, I'd like to start to talk about clustering.
在这个视频中 我将开始介绍聚类算法
(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
2
00:00:03,420 --> 00:00:04,850
This will be exciting because this
这将是一个激动人心的时刻
3
00:00:04,930 --> 00:00:06,910
is our first unsupervised learning algorithm
因为这是我们学习的第一个非监督学习算法
4
00:00:07,350 --> 00:00:09,080
where we learn from unlabeled data
我们将要让计算机学习无标签数据
5
00:00:09,840 --> 00:00:10,730
instead of the label data.
而不是此前的标签数据
6
00:00:11,900 --> 00:00:13,300
So, what is unsupervised learning?
那么 什么是非监督学习呢
7
00:00:14,390 --> 00:00:15,630
I briefly talked about unsupervised
在课程的一开始
8
00:00:16,350 --> 00:00:17,470
learning at the beginning of
我曾简单的介绍过非监督学习
9
00:00:17,550 --> 00:00:18,560
the class but it's useful
然而 我们还是有必要
10
00:00:19,030 --> 00:00:20,320
to contrast it with supervised learning. So, here's
将其与监督学习做一下比较
11
00:00:21,760 --> 00:00:23,750
a typical supervisory problem where
在一个典型的监督学习中
12
00:00:24,030 --> 00:00:25,470
we are given a label training sets
我们有一个有标签的训练集
13
00:00:25,770 --> 00:00:27,470
and the goal is to find
我们的目标是找到
14
00:00:27,980 --> 00:00:29,420
the decision boundary that separates the
能够区分正样本和负样本的
15
00:00:29,530 --> 00:00:31,310
positive label examples and the negative label examples.
决策边界
16
00:00:33,100 --> 00:00:34,400
The supervised learning problem in
在这里的监督学习中
17
00:00:34,460 --> 00:00:35,710
this case is given a
我们有一系列标签
18
00:00:35,850 --> 00:00:38,270
set of labels to fit a hypothesis to it.
我们需要据此拟合一个假设函数
19
00:00:39,160 --> 00:00:40,560
In contrast, in the unsupervised
与此不同的是
20
00:00:41,080 --> 00:00:42,420
learning problem, we're given
在非监督学习中
21
00:00:42,710 --> 00:00:43,740
data that does not have
我们的数据没有
22
00:00:43,890 --> 00:00:45,270
any labels associated with it.
附带任何标签
23
00:00:46,730 --> 00:00:47,940
So we're given data that looks
我们拿到的数据就是这样的
24
00:00:48,100 --> 00:00:49,090
like this, here's a set
在这里
25
00:00:49,180 --> 00:00:50,470
of points and then no labels.
我们有一系列点 却没有标签
26
00:00:51,800 --> 00:00:52,860
And so our training set is written
因此 我们的训练集可以写成
27
00:00:53,220 --> 00:00:54,720
just x1, x2 and
只有x(1) x(2)
28
00:00:55,210 --> 00:00:56,890
so on up to x(m)
一直到 x(m)
29
00:00:57,450 --> 00:00:58,720
and we don't get any labels y.
我们没有任何标签y
30
00:00:59,540 --> 00:01:00,800
And that's why the points plotted
因此 图上画的这些点
31
00:01:01,160 --> 00:01:02,300
up on the figure don't have
没有
32
00:01:02,430 --> 00:01:04,330
any labels on them.
标签信息
33
00:01:04,490 --> 00:01:05,510
So in unsupervised learning, what
也就是说 在非监督学习中
34
00:01:05,710 --> 00:01:06,860
we do is, we give this sort of
我们需要将一系列
35
00:01:07,280 --> 00:01:09,150
unlabeled training set to
无标签的训练数据
36
00:01:09,250 --> 00:01:10,510
an algorithm and we just
输入到一个算法中
37
00:01:10,600 --> 00:01:12,220
ask the algorithm: find some
然后我们告诉这个算法
38
00:01:12,430 --> 00:01:14,130
structure in the data for us.
快去为我们找找这个数据的内在结构
39
00:01:15,420 --> 00:01:16,490
Given this data set, one
给定数据
40
00:01:16,650 --> 00:01:17,810
type of structure we might
我们可能需要某种算法
41
00:01:18,010 --> 00:01:19,540
have an algorithm find, is that
帮助我们寻找一种结构
42
00:01:19,810 --> 00:01:21,440
it looks like this data set has
图上的数据看起来
43
00:01:21,620 --> 00:01:23,740
points grouped into two
可以分成两个
44
00:01:24,030 --> 00:01:25,500
separate clusters and so
分开的点集(称为簇)
45
00:01:25,800 --> 00:01:28,230
an algorithm that finds that
一个能够找到
46
00:01:28,360 --> 00:01:29,230
clusters like the ones I just
我圈出的这些点集的算法
47
00:01:29,450 --> 00:01:30,610
circled, is called a clustering
就被称为
48
00:01:32,440 --> 00:01:32,440
algorithm.
聚类算法
49
00:01:33,160 --> 00:01:34,620
And this will be our first type of
这将是我们介绍的
50
00:01:34,720 --> 00:01:36,590
unsupervised learning, although there
第一个非监督学习算法
51
00:01:36,790 --> 00:01:38,320
will be other types of unsupervised
当然 此后我们还将提到
52
00:01:39,020 --> 00:01:40,200
learning algorithms that we'll talk
其他类型的非监督学习算法
53
00:01:40,320 --> 00:01:41,710
about later that finds other
它们可以为我们
54
00:01:42,130 --> 00:01:43,710
types of structure or other
找到其他类型的结构
55
00:01:43,920 --> 00:01:46,000
types of patterns in the data other than clusters.
或者其他的一些模式 而不只是簇
56
00:01:46,900 --> 00:01:48,360
We'll talk about this afterwards, we will talk about clustering.
我们将先介绍聚类算法 此后 我们将陆续介绍其他算法
57
00:01:50,020 --> 00:01:51,210
So what is clustering good for?
好啦 那么聚类算法一般用来做什么呢?
58
00:01:51,380 --> 00:01:54,350
Early in this class I had already mentioned a few applications.
在这门课程的早些时候 我曾经列举过一些应用
59
00:01:54,950 --> 00:01:56,540
One is market segmentation, where
比如市场分割
60
00:01:56,670 --> 00:01:57,690
you may have a database of
也许你在数据库中
61
00:01:57,770 --> 00:01:58,840
customers and want to group
存储了许多客户的信息
62
00:01:59,070 --> 00:02:00,380
them into different market segments.
而你希望将他们分成不同的客户群
63
00:02:00,950 --> 00:02:02,590
So, you can sell to
这样你可以对不同类型的客户分别销售产品
64
00:02:02,720 --> 00:02:05,570
them separately or serve your different market segments better.
或者分别提供更适合的服务
65
00:02:06,730 --> 00:02:08,370
Social network analysis, there are
社交网络分析
66
00:02:08,580 --> 00:02:10,090
actually, you know, groups that have done
事实上 有许多研究人员
67
00:02:10,320 --> 00:02:12,590
this, things like looking at a
正在研究这样一些内容
68
00:02:12,730 --> 00:02:14,540
group of people, social networks,
他们关注一群人 关注社交网络
69
00:02:15,070 --> 00:02:16,390
so things like Facebook, Google plus
例如 Facebook Google+
70
00:02:16,710 --> 00:02:18,260
or maybe information about who
或者是其他的一些信息
71
00:02:18,430 --> 00:02:19,710
are the people that you
比如说
72
00:02:20,030 --> 00:02:21,110
email the most frequently and who
你经常跟哪些人联系
73
00:02:21,230 --> 00:02:22,170
are the people that they email
而这些人
74
00:02:22,310 --> 00:02:23,600
the most frequently, and
又经常给哪些人发邮件
75
00:02:23,750 --> 00:02:25,400
to find coherent groups of people.
由此找到关系密切的人群
76
00:02:26,500 --> 00:02:27,600
So, this would be another maybe
因此 这可能需要另一个
77
00:02:28,180 --> 00:02:28,850
clustering algorithm where, you know, you'd want
聚类算法
78
00:02:29,080 --> 00:02:32,200
to find who other coherent groups of friends in a social network.
你希望用它发现社交网络中关系密切的朋友
79
00:02:33,140 --> 00:02:33,990
Here's something that one of my
我有一个朋友
80
00:02:34,140 --> 00:02:35,170
friends actually worked on, which is,
正在研究这个问题
81
00:02:35,920 --> 00:02:37,200
use clustering to organize compute
他希望使用聚类算法
82
00:02:37,670 --> 00:02:39,220
clusters or to organize data
来更好的组织计算机集群
83
00:02:39,440 --> 00:02:40,600
centers better because, if you
或者更好的管理数据中心
84
00:02:40,800 --> 00:02:42,450
know which computers in the
因为如果你知道数据中心中
85
00:02:42,520 --> 00:02:44,990
data center are in the cluster there tend to work together.
那些计算机经常协作工作
86
00:02:45,400 --> 00:02:46,270
You can use that to reorganize
那么 你可以
87
00:02:46,950 --> 00:02:48,390
your resources and how you
重新分配资源
88
00:02:48,570 --> 00:02:50,120
lay out its network and
重新布局网络
89
00:02:50,260 --> 00:02:52,040
how design your data center and communications.
由此优化数据中心 优化数据通信
90
00:02:53,140 --> 00:02:54,540
And lastly something that, actually
最后 我实际上
91
00:02:54,850 --> 00:02:55,910
another thing I worked on, using
还在研究
92
00:02:56,130 --> 00:02:57,810
clustering algorithms to understand
如何利用聚类算法
93
00:02:58,400 --> 00:03:00,030
galaxy formation and using
了解星系的形成
94
00:03:00,280 --> 00:03:02,260
that to understand how, to
然后用这个知识
95
00:03:02,600 --> 00:03:03,860
understand astronomical detail.
了解一些天文学上的细节问题
96
00:03:06,550 --> 00:03:08,580
So, that's clustering which
好的 这就是聚类算法
97
00:03:08,890 --> 00:03:10,450
is our first example of
这将是我们介绍的第一个
98
00:03:10,530 --> 00:03:12,650
an unsupervised learning algorithm.
非监督学习算法
99
00:03:13,090 --> 00:03:14,200
In the next video, we'll start to
在下一个视频中 我们将开始介绍
100
00:03:14,370 --> 00:03:16,250
talk about a specific clustering algorithm.
一个具体的聚类算法