forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path13 - 5 - Choosing the Number of Clusters (8 min).srt
1306 lines (1045 loc) · 24.2 KB
/
13 - 5 - Choosing the Number of Clusters (8 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,200 --> 00:00:01,390
In this video I'd like to
在这个视频中
(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
2
00:00:01,570 --> 00:00:02,780
talk about one last detail
详细讨论一下
3
00:00:03,350 --> 00:00:04,950
of K-means clustering which is
K均值方法聚类中
4
00:00:05,450 --> 00:00:06,680
how to choose the number of
如何去选择聚类
5
00:00:06,770 --> 00:00:07,890
clusters, or how to choose
类别数目 或者说是如何去选择
6
00:00:08,290 --> 00:00:09,160
the value of the parameter
参数K的值
7
00:00:10,230 --> 00:00:12,310
capsule K. To be
说实话
8
00:00:12,390 --> 00:00:13,690
honest, there actually isn't a
其实上没有一个
9
00:00:13,760 --> 00:00:15,420
great way of answering this
非常好的方法回答这个问题
10
00:00:15,680 --> 00:00:17,150
or doing this automatically and
或者能够自动做这件事情
11
00:00:17,820 --> 00:00:18,930
by far the most common way
到目前为止 用来决定聚类
12
00:00:19,110 --> 00:00:20,380
of choosing the number of clusters,
数目最常用的方法
13
00:00:20,520 --> 00:00:22,040
is still choosing it manually
任然是通过
14
00:00:22,710 --> 00:00:24,380
by looking at visualizations or by
看可视化的图或者通过
15
00:00:24,450 --> 00:00:26,070
looking at the output of the clustering algorithm or something else.
查看聚类算法的输出结果或者其他一些东西来手动地决定聚类的类别数量
16
00:00:27,340 --> 00:00:28,270
But I do get asked
但是 我也确实经常被问及
17
00:00:28,600 --> 00:00:29,460
this question quite a lot of
这样的问题
18
00:00:29,650 --> 00:00:30,510
how do you choose the number of
你是如何来选择聚类的数量的
19
00:00:30,810 --> 00:00:31,930
clusters, and so I just want
我只是想
20
00:00:32,240 --> 00:00:33,650
to tell you know what
告诉你
21
00:00:33,850 --> 00:00:35,020
are peoples' current thinking on
现在人们所思考的
22
00:00:35,230 --> 00:00:36,480
it although, the most
最为常见的
23
00:00:36,740 --> 00:00:38,060
common thing is actually to
一件事实际上是
24
00:00:38,180 --> 00:00:40,130
choose the number of clusters by hand.
手动去选择聚类的数目
25
00:00:42,230 --> 00:00:43,680
A large part of
其中一大部分
26
00:00:43,800 --> 00:00:45,020
why it might not always
为什么选择聚类
27
00:00:45,390 --> 00:00:46,530
be easy to choose the
数量并不容易
28
00:00:46,640 --> 00:00:47,940
number of clusters is that
的原因是
29
00:00:48,190 --> 00:00:51,920
it is often generally ambiguous how many clusters there are in the data.
通常在数据集中有多少个聚类是不清楚的
30
00:00:52,940 --> 00:00:53,890
Looking at this data set
看这样一个数据集
31
00:00:54,080 --> 00:00:55,110
some of you may see
有些人可能会看到
32
00:00:55,380 --> 00:00:56,830
four clusters and that
四个聚类
33
00:00:57,020 --> 00:00:59,440
would suggest using K equals 4.
那么这就意味着需要使用K=4
34
00:00:59,620 --> 00:01:00,650
Or some of you may
或者有些人可能
35
00:01:00,870 --> 00:01:02,620
see two clusters and
会看到两个聚类
36
00:01:02,730 --> 00:01:04,460
that will suggest K equals
这个条件下就意味着K等于
37
00:01:04,870 --> 00:01:06,630
2 and now this may see three clusters.
2 现在这有可能是3个聚类
38
00:01:08,070 --> 00:01:09,710
And so, looking at the
那么
39
00:01:09,820 --> 00:01:10,750
data set like this, the
相类似于这样的数据集
40
00:01:10,920 --> 00:01:12,390
true number of clusters, it actually
其真实的类别数对我来说实际上
41
00:01:12,810 --> 00:01:14,560
seems genuinely ambiguous to me,
相当地模棱两可的
42
00:01:14,690 --> 00:01:17,160
and I don't think there is one right answer.
我并不认为只有一个正确的答案
43
00:01:18,100 --> 00:01:19,500
And this is part of our supervised learning.
这就是我们监督学习的一部分
44
00:01:20,250 --> 00:01:21,450
We are aren't given labels, and
我们不知道分类
45
00:01:21,550 --> 00:01:23,950
so there isn't always a clear cut answer.
因此总是没有一个清晰的答案
46
00:01:24,830 --> 00:01:25,730
And this is one of the
这是其中的一个
47
00:01:25,850 --> 00:01:26,710
things that makes it more difficult
使得(决定聚类数目)
48
00:01:27,340 --> 00:01:28,530
to say, have an automatic
拥有一个自动化
49
00:01:29,160 --> 00:01:30,860
algorithm for choosing how many clusters to have.
算法来决定聚类数目变得困难的原因
50
00:01:32,100 --> 00:01:33,250
When people talk about ways of
当人们在讨论
51
00:01:33,320 --> 00:01:34,270
choosing the number of clusters,
选择聚类数目的方法时
52
00:01:34,840 --> 00:01:36,050
one method that people sometimes
有一个可能会
53
00:01:36,440 --> 00:01:39,150
talk about is something called the Elbow Method.
谈及的方法叫作“肘部法则”
54
00:01:39,630 --> 00:01:40,490
Let me just tell you a little bit about that,
让我告诉你一些关于这个法则的内容
55
00:01:40,800 --> 00:01:43,760
and then mention some of its advantages but also shortcomings.
之后会提及到它的一些优点和缺点
56
00:01:44,690 --> 00:01:45,980
So the Elbow Method,
关于“肘部法则”
57
00:01:46,420 --> 00:01:47,570
what we're going to do is vary
我们所需要做的是改变
58
00:01:48,340 --> 00:01:49,860
K, which is the total number of clusters.
K值 也就是聚类类别数目的总数
59
00:01:50,250 --> 00:01:51,570
So, we're going to run K-means
我们用一个聚类来运行K均值聚类方法
60
00:01:52,050 --> 00:01:53,340
with one cluster, that means really,
这就意味着
61
00:01:53,630 --> 00:01:54,840
everything gets grouped into a
所有的数据都会分到一个
62
00:01:54,980 --> 00:01:56,530
single cluster and compute the
聚类里 然后计算
63
00:01:56,660 --> 00:01:57,850
cost function or compute the distortion
成本函数或者计算畸变函数
64
00:01:58,460 --> 00:01:59,490
J and plot that here.
J 并将其画在这儿
65
00:02:00,410 --> 00:02:01,090
And then we're going to run K
之后我们又用两个聚类来运行K
66
00:02:01,310 --> 00:02:03,270
means with two clusters, maybe
均值聚类 也许
67
00:02:03,610 --> 00:02:05,430
with multiple random initial agents, maybe not.
有多个随机的初始中心 也许没有
68
00:02:06,140 --> 00:02:07,150
But then, you know,
但是之后 你知道的
69
00:02:07,280 --> 00:02:08,280
with two clusters we should
有两个聚类 我们
70
00:02:08,500 --> 00:02:09,510
get, hopefully, a smaller distortion,
所期望得到的是一个较小的畸变值
71
00:02:10,710 --> 00:02:11,820
and so plot that there.
把它画在这儿
72
00:02:11,950 --> 00:02:13,100
And then run K-means with three
之后用选用3个聚类来运行K均值聚类
73
00:02:13,310 --> 00:02:14,590
clusters, hopefully, you get even
你期望得到一个更
74
00:02:14,760 --> 00:02:16,680
smaller distortion and plot that there.
小的畸变值 并把它画在这儿
75
00:02:16,990 --> 00:02:19,710
I'm gonna run K-means with four, five and so on.
之后再用4,5等聚类数来运行均值聚类
76
00:02:19,780 --> 00:02:20,790
And so we end up with
最后我们就能得到
77
00:02:20,970 --> 00:02:22,840
a curve showing how the
一条曲线显示
78
00:02:23,240 --> 00:02:24,560
distortion, you know, goes
随着我们的聚类数量的增多
79
00:02:24,800 --> 00:02:27,170
down as we increase the number of clusters.
畸变值是如何下降的
80
00:02:27,440 --> 00:02:29,870
And so we get a curve that maybe looks like this.
我们可能会得到一条类似于这样的曲线
81
00:02:31,390 --> 00:02:32,210
And if you look at this
看看这条
82
00:02:32,300 --> 00:02:33,400
curve, what the Elbow Method does
曲线 这就是“肘部法则”所做的
83
00:02:33,720 --> 00:02:35,770
it says "Well, let's look at this plot.
让我们来看这样一个图
84
00:02:36,450 --> 00:02:39,340
Looks like there's a clear elbow there".
看起来就好像有一个很清楚的肘在那儿
85
00:02:40,230 --> 00:02:41,620
Right, this is, would be by
对吧 这就
86
00:02:41,830 --> 00:02:43,210
analogy to the human arm where,
类比于人的手臂
87
00:02:43,550 --> 00:02:44,620
you know, if you imagine that
想象一下
88
00:02:45,370 --> 00:02:46,460
you reach out your arm,
如果你伸出你的胳膊
89
00:02:47,240 --> 00:02:48,940
then, this is your
那么这就是你的
90
00:02:49,160 --> 00:02:50,340
shoulder joint, this is your
肩关节 这就是你的
91
00:02:50,550 --> 00:02:52,960
elbow joint and I guess, your hand is at the end over here.
肘关节 我猜你的手就在这里的终端
92
00:02:53,260 --> 00:02:54,170
And so this is the Elbow Method.
这就是“肘部法则”
93
00:02:54,490 --> 00:02:55,930
Then you find this sort of pattern
你会发现这种模式
94
00:02:56,250 --> 00:02:57,630
where the distortion goes down rapidly
它的畸变值会迅速下降
95
00:02:58,550 --> 00:02:59,120
from 1 to 2, and 2 to
从1到2 从2到
96
00:02:59,280 --> 00:03:01,330
3, and then you reach an
3 之后你会
97
00:03:01,520 --> 00:03:03,160
elbow at 3, and then
在3的时候达到一个肘点 在此之后
98
00:03:03,330 --> 00:03:05,260
the distortion goes down very slowly after that.
畸变值就下降的非常慢
99
00:03:05,430 --> 00:03:06,520
And then it looks like, you
看起来就像
100
00:03:06,580 --> 00:03:08,700
know what, maybe using three
使用3个聚类来
101
00:03:08,960 --> 00:03:09,920
clusters is the right
进行聚类是正确的
102
00:03:10,040 --> 00:03:11,340
number of clusters, because that's
这是因为那个点
103
00:03:12,020 --> 00:03:14,430
the elbow of this curve, right?
是曲线的肘点
104
00:03:14,700 --> 00:03:16,040
That it goes down, distortion goes
畸变值下降
105
00:03:16,250 --> 00:03:17,290
down rapidly until K equals
得很快知道K等于
106
00:03:17,610 --> 00:03:19,700
3, really goes down very slowly after that.
3 之后就下降得很慢
107
00:03:19,820 --> 00:03:20,850
So let's pick K equals 3.
那么我们就选K等于3
108
00:03:23,460 --> 00:03:24,570
If you apply the Elbow Method,
当你应用“肘部法则” 的时候
109
00:03:25,110 --> 00:03:26,240
and if you get a plot
如果你得到了一个
110
00:03:26,540 --> 00:03:27,450
that actually looks like this,
像这样的图
111
00:03:27,890 --> 00:03:29,120
then, that's pretty good, and
那么这非常好
112
00:03:29,240 --> 00:03:30,160
this would be a reasonable way
这将是一种用来
113
00:03:30,700 --> 00:03:32,590
of choosing the number of clusters.
选择聚类个数的合理方法
114
00:03:33,620 --> 00:03:34,600
It turns out the Elbow Method
而事实证明“肘部法则”
115
00:03:35,040 --> 00:03:37,170
isn't used that often, and one
并不那么常用 其中一个
116
00:03:37,340 --> 00:03:38,270
reason is that, if you
原因是如果你
117
00:03:38,350 --> 00:03:39,470
actually use this on
把这种方法用到
118
00:03:39,720 --> 00:03:41,060
a clustering problem, it turns out that
一个聚类问题上 事实证明
119
00:03:41,210 --> 00:03:42,640
fairly often, you know,
这种想象是相当常见的
120
00:03:42,740 --> 00:03:43,610
you end up with a curve
你最后得到了一条
121
00:03:43,910 --> 00:03:46,940
that looks much more ambiguous, maybe something like this.
看上去相当模棱两可的曲线 也许就像这样
122
00:03:47,700 --> 00:03:48,370
And if you look at this,
请看这个
123
00:03:48,920 --> 00:03:50,220
I don't know, maybe there's
我不知道 也许没有
124
00:03:50,390 --> 00:03:51,580
no clear elbow, but it
一个清晰的肘点 但是
125
00:03:51,720 --> 00:03:53,090
looks like distortion continuously goes down,
看上去畸变值是连续下降的
126
00:03:53,440 --> 00:03:54,570
maybe 3 is a
也许3是比较
127
00:03:54,620 --> 00:03:55,680
good number, maybe 4 is
好的一个数字 也许4是
128
00:03:55,750 --> 00:03:58,180
a good number, maybe 5 is also not bad.
一个比较好的数字 也许5也并不糟糕
129
00:03:58,390 --> 00:03:59,190
And so, if you actually
那么如果你在
130
00:03:59,600 --> 00:04:00,710
do this in a practice, you know,
实际操作中做这样一个事情的话
131
00:04:00,820 --> 00:04:02,690
if your plot looks like the one on the left and that's great.
如果你的图像左边这个的话 那么就太好了
132
00:04:03,400 --> 00:04:04,990
It gives you a clear answer, but
它会给你一个清晰的答案 但是
133
00:04:05,490 --> 00:04:06,550
just as often, you end
通常 你最终
134
00:04:06,740 --> 00:04:07,580
up with a plot that looks
得到的图是像
135
00:04:07,750 --> 00:04:09,020
like the one on the right and
右边的那个
136
00:04:09,110 --> 00:04:11,000
is not clear where the
并不能清晰指定
137
00:04:11,790 --> 00:04:13,230
ready location of the elbow
肘点合适的位置
138
00:04:13,490 --> 00:04:14,440
is. It makes it harder to
这使得
139
00:04:14,640 --> 00:04:16,700
choose a number of clusters using this method.
用这个方法来选择聚类数目变得较为困难
140
00:04:17,370 --> 00:04:18,220
So maybe the quick summary
对于“肘部法则”快速的小结
141
00:04:18,700 --> 00:04:20,500
of the Elbow Method is that is worth the shot
就是它使一个值得尝试的方法
142
00:04:21,010 --> 00:04:22,350
but I wouldn't necessarily,
但是我不会必然地
143
00:04:23,610 --> 00:04:24,700
you know, have a very high
对它有很高的
144
00:04:24,870 --> 00:04:27,360
expectation of it working for any particular problem.
期望来解决任何一个特定的问题
145
00:04:29,880 --> 00:04:31,030
Finally, here's one other way
最后 有另外一种方法
146
00:04:31,300 --> 00:04:32,850
of how, thinking about how
引导你如何
147
00:04:32,990 --> 00:04:33,980
you choose the value of K,
选择K值
148
00:04:34,930 --> 00:04:36,030
very often people are running
通常人们运行
149
00:04:36,310 --> 00:04:37,380
K-means in order you
K均值聚类方法是为了
150
00:04:37,530 --> 00:04:38,770
get clusters for some later
得到一些聚类用于后面的
151
00:04:39,240 --> 00:04:40,880
purpose, or for some sort of downstream purpose.
一些用途 或者是一些下游的目的
152
00:04:41,460 --> 00:04:42,900
Maybe you want to use K-means
也许你会用K均值聚类方法
153
00:04:43,380 --> 00:04:44,460
in order to do market segmentation,
来做市场分割
154
00:04:45,310 --> 00:04:47,600
like in the T-shirt sizing example that we talked about.
如我们之前谈论的T恤尺寸的例子
155
00:04:48,140 --> 00:04:50,570
Maybe you want K-means to organize
也许你会用K均值聚类来使得
156
00:04:51,130 --> 00:04:52,300
a computer cluster better, or
电脑的聚类变得更好 或者
157
00:04:52,480 --> 00:04:53,430
maybe a learning cluster for some
也有可能是用于某种不同目的一个
158
00:04:53,630 --> 00:04:55,070
different purpose, and so,
学习聚类 等等
159
00:04:55,450 --> 00:04:57,020
if that later, downstream purpose,
如果是后续下游的目的
160
00:04:57,510 --> 00:04:59,050
such as market segmentation. If
如市场分割 如果
161
00:04:59,180 --> 00:05:00,420
that gives you an evaluation metric,
那能给你一个评估标准
162
00:05:01,310 --> 00:05:02,670
then often, a better
那么通常 更好
163
00:05:02,800 --> 00:05:03,890
way to determine the number of
的方式是决定聚类的数量
164
00:05:03,960 --> 00:05:05,680
clusters, is to see
来看
165
00:05:06,010 --> 00:05:07,740
how well different numbers of
不同的聚类数值能
166
00:05:07,930 --> 00:05:10,140
clusters serve that later downstream purpose.
为后续下游的目的提供多好的结果
167
00:05:11,230 --> 00:05:13,050
Let me step through a specific example.
让我们来涉足一个特别的例子
168
00:05:14,190 --> 00:05:15,080
Let me go through the T-shirt
让我重新举例T恤尺寸
169
00:05:15,440 --> 00:05:17,420
size example again, and I'm
的例子 我
170
00:05:17,570 --> 00:05:19,700
trying to decide, do I want three T-shirt sizes?
尝试决定我是否需要3种T恤尺寸
171
00:05:20,330 --> 00:05:22,320
So, I choose K equals 3, then
因此我选择K等于3 那么
172
00:05:22,560 --> 00:05:25,360
I might have small, medium and large T-shirts.
可能会有小号 中号 大号三类T恤
173
00:05:26,320 --> 00:05:27,250
Or maybe, I want to choose
或者我可以选择
174
00:05:27,470 --> 00:05:28,240
K equals 5, and then I
K等于5 那么我就
175
00:05:29,030 --> 00:05:30,140
might have, you know, extra
可能会有 特小
176
00:05:30,390 --> 00:05:33,130
small, small, medium, large
号 小号 中号 大号
177
00:05:33,620 --> 00:05:35,070
and extra large T-shirt sizes.
和特大号尺寸的T恤
178
00:05:35,860 --> 00:05:38,580
So, you can have like 3 T-shirt sizes or four or five T-shirt sizes.
所以 你可能有3种T恤尺寸或者4种或者5种
179
00:05:39,270 --> 00:05:40,100
We could also have four T-shirt
我们也可以有四种T恤
180
00:05:40,430 --> 00:05:41,740
sizes, but I'm just
尺寸,但是我只是
181
00:05:41,930 --> 00:05:43,240
showing three and five here,
在这里展示了3和5这两种情况
182
00:05:43,490 --> 00:05:45,670
just to simplify this slide for now.
只是为了是这张幻灯片变得简洁一些
183
00:05:46,930 --> 00:05:49,020
So, if I run K-means with
因此如果我用K等于
184
00:05:49,130 --> 00:05:50,290
K equals 3, maybe I end
3来运行K均值方法 最后我可能会得到
185
00:05:50,670 --> 00:05:51,860
up with, that's my small
这是小号
186
00:05:53,100 --> 00:05:55,020
and that's my
这是
187
00:05:55,140 --> 00:05:56,720
medium and that's my large.
中号 这是大号
188
00:05:58,580 --> 00:06:00,370
Whereas, if I run K-means with
然而 如果我用
189
00:06:00,650 --> 00:06:03,540
5 clusters, maybe I
5个聚类来运行K均值方法 也许我
190
00:06:03,700 --> 00:06:05,170
end up with, those are
最后会得到 这些是
191
00:06:05,330 --> 00:06:07,400
my extra small T-shirts, these
超小号T恤 这些
192
00:06:07,740 --> 00:06:10,920
are my small, these are
是小号 这些是
193
00:06:11,050 --> 00:06:13,740
my medium, these are my
中号 这些是
194
00:06:13,990 --> 00:06:17,110
large and these are my extra large.
大号 这些是超大号
195
00:06:19,320 --> 00:06:20,150
And the nice thing about this
这个例子的一个亮点
196
00:06:20,320 --> 00:06:21,510
example is that, this then
是这之后
197
00:06:21,810 --> 00:06:22,940
maybe gives us another way
可能会给我们一种方法
198
00:06:23,550 --> 00:06:24,730
to choose whether we want
来选择究竟我们想要的
199
00:06:24,970 --> 00:06:26,070
3 or 4 or 5 clusters,
聚类数目是3或4 还是5
200
00:06:28,570 --> 00:06:29,630
and in particular, what you can
特别的 你所能