forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path15 - 1 - Problem Motivation (8 min).srt
1166 lines (933 loc) · 21.3 KB
/
15 - 1 - Problem Motivation (8 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,170 --> 00:00:01,190
In this next set of videos,
在接下来的一系列视频中
(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
2
00:00:01,720 --> 00:00:02,680
I'd like to tell you about
我将向大家介绍
3
00:00:03,050 --> 00:00:04,560
a problem called Anomaly Detection.
异常检测(Anomaly detection)问题
4
00:00:05,710 --> 00:00:07,220
This is a reasonably commonly
这是机器学习算法
5
00:00:07,870 --> 00:00:08,740
use you type machine learning.
的一个常见应用
6
00:00:09,580 --> 00:00:10,990
And one of the interesting aspects
这种算法的一个有趣之处在于
7
00:00:11,580 --> 00:00:13,250
is that it's mainly for
它虽然主要用于
8
00:00:14,020 --> 00:00:15,860
unsupervised problem, that there's some
非监督学习问题
9
00:00:16,320 --> 00:00:17,240
aspects of it that are
但从某些角度看
10
00:00:17,510 --> 00:00:20,000
also very similar to sort of the supervised learning problem.
它又类似于一些监督学习问题
11
00:00:21,160 --> 00:00:22,440
So, what is anomaly detection?
那么 什么是异常检测呢?
12
00:00:23,380 --> 00:00:25,000
To explain it. Let me use
为了解释这个概念
13
00:00:25,240 --> 00:00:27,780
the motivating example of: Imagine
让我举一个例子吧
14
00:00:28,440 --> 00:00:30,040
that you're a manufacturer of
假想你是一个
15
00:00:30,330 --> 00:00:32,370
aircraft engines, and let's
飞机引擎制造商
16
00:00:32,600 --> 00:00:33,850
say that as your aircraft
当你生产的飞机引擎
17
00:00:34,280 --> 00:00:35,330
engines roll off the assembly
从生产线上流出时
18
00:00:35,620 --> 00:00:37,580
line, you're doing, you know, QA or
你需要进行
19
00:00:37,820 --> 00:00:39,850
quality assurance testing, and as
QA (质量控制测试)
20
00:00:40,030 --> 00:00:41,340
part of that testing you
而作为这个测试的一部分
21
00:00:41,410 --> 00:00:43,140
measure features of your
你测量了飞机引擎的一些特征变量
22
00:00:43,510 --> 00:00:44,900
aircraft engine, like maybe, you measure
比如 你可能测量了
23
00:00:45,180 --> 00:00:46,820
the heat generated, things like
引擎运转时产生的热量
24
00:00:46,860 --> 00:00:48,340
the vibrations and so on.
或者引擎的振动等等
25
00:00:48,630 --> 00:00:49,570
I share some friends that worked
我有一些朋友
26
00:00:49,860 --> 00:00:50,940
on this problem a long time
很早之前就开始进行这类工作
27
00:00:51,010 --> 00:00:52,610
ago, and these were actually the
在实际工作中
28
00:00:52,710 --> 00:00:53,960
sorts of features that they were
他们确实会从真实的飞机引擎
29
00:00:54,470 --> 00:00:55,910
collecting off actual aircraft
采集这些特征变量
30
00:00:56,280 --> 00:00:58,540
engines so you
这样一来
31
00:00:58,630 --> 00:00:59,570
now have a data set of
你就有了一个数据集
32
00:00:59,700 --> 00:01:01,000
X1 through Xm, if you have
从x(1)到x(m)
33
00:01:01,760 --> 00:01:04,490
manufactured m aircraft engines,
如果你生产了m个引擎的话
34
00:01:05,030 --> 00:01:06,740
and if you plot your data, maybe it looks like this.
也许你会将这些数据绘制成图表 看起来就是这个样子
35
00:01:07,130 --> 00:01:08,640
So, each point here, each cross
这里的每个点 每个叉
36
00:01:08,770 --> 00:01:10,580
here as one of your unlabeled examples.
都是你的无标签数据
37
00:01:11,990 --> 00:01:15,220
So, the anomaly detection problem is the following.
这样 异常检测问题可以定义如下
38
00:01:16,450 --> 00:01:17,770
Let's say that on, you
我们假设
39
00:01:17,880 --> 00:01:18,970
know, the next day, you
后来有一天
40
00:01:19,140 --> 00:01:20,390
have a new aircraft engine
你有一个新的飞机引擎
41
00:01:20,810 --> 00:01:21,860
that rolls off the assembly line
从生产线上流出
42
00:01:22,320 --> 00:01:23,890
and your new aircraft engine has
而你的新飞机引擎
43
00:01:24,160 --> 00:01:25,440
some set of features x-test.
有特征变量x-test
44
00:01:26,290 --> 00:01:27,680
What the anomaly detection problem is,
所谓的异常检测问题就是
45
00:01:27,930 --> 00:01:29,070
we want to know if this
我们希望知道
46
00:01:29,420 --> 00:01:31,310
aircraft engine is anomalous in
这个新的飞机引擎是否有某种异常
47
00:01:31,520 --> 00:01:32,480
any way, in other words, we want
或者说
48
00:01:32,740 --> 00:01:34,110
to know if, maybe, this engine
我们希望判断
49
00:01:34,570 --> 00:01:36,290
should undergo further testing
这个引擎是否需要进一步测试
50
00:01:37,330 --> 00:01:38,370
because, or if it looks
因为 如果它看起来
51
00:01:38,710 --> 00:01:40,560
like an okay engine, and
像一个正常的引擎
52
00:01:40,740 --> 00:01:41,700
so it's okay to just ship
那么我们可以直接将它运送到客户那里
53
00:01:41,880 --> 00:01:43,260
it to a customer without further testing.
而不需要进一步的测试
54
00:01:44,560 --> 00:01:45,670
So, if your new
比如说
55
00:01:45,840 --> 00:01:47,330
aircraft engine looks like
如果你的新引擎
56
00:01:47,540 --> 00:01:49,150
a point over there, well, you
对应的点落在这里
57
00:01:49,260 --> 00:01:50,200
know, that looks a lot
那么 你可以认为
58
00:01:50,360 --> 00:01:51,440
like the aircraft engines we've seen
它看起来像我们之前见过的引擎
59
00:01:51,650 --> 00:01:53,860
before, and so maybe we'll say that it looks okay.
因此我们可以直接认为它是正常的
60
00:01:54,750 --> 00:01:55,740
Whereas, if your new aircraft
然而 如果你的新飞机引擎
61
00:01:56,200 --> 00:01:59,390
engine, if x-test, you know, were
如果x-test
62
00:01:59,620 --> 00:02:00,430
a point that were out here,
对应的点在这外面
63
00:02:00,910 --> 00:02:02,270
so that if X1 and
这里x1和
64
00:02:02,410 --> 00:02:04,800
X2 are the features of this new example.
x2是这个新引擎对应的特征变量
65
00:02:05,360 --> 00:02:06,530
If x-tests were all the
如果x-test在外面这么远的地方
66
00:02:06,590 --> 00:02:08,930
way out there, then we would call that an anomaly.
那么我们可以认为这是一个异常
67
00:02:10,420 --> 00:02:11,640
and maybe send that aircraft engine
也许我们需要在向客户发货之前
68
00:02:12,070 --> 00:02:13,720
for further testing before we
进一步检测
69
00:02:13,870 --> 00:02:15,130
ship it to a customer, since
这个引擎
70
00:02:16,010 --> 00:02:18,340
it looks very different than
因为它和我们之前见过的
71
00:02:18,600 --> 00:02:20,350
the rest of the aircraft engines we've seen before.
其他飞机引擎看起来不一样
72
00:02:21,000 --> 00:02:22,560
More formally in the anomaly
如果更正式的定义
73
00:02:22,960 --> 00:02:24,230
detection problem, we're give
异常检测问题
74
00:02:24,900 --> 00:02:26,160
some data sets, x1 through
那么我们有一些数据
75
00:02:26,280 --> 00:02:28,340
Xm of examples, and we
从x(1)到x(m)
76
00:02:28,460 --> 00:02:29,720
usually assume that these end
我们通常假定这m个样本
77
00:02:29,880 --> 00:02:32,250
examples are normal or
都是正常的
78
00:02:33,120 --> 00:02:34,910
non-anomalous examples, and we
或者说都不是异常的
79
00:02:34,980 --> 00:02:36,100
want an algorithm to tell us
然后我们需要一个算法来告诉我们
80
00:02:36,290 --> 00:02:38,300
if some new example x-test is anomalous.
一个新的样本数据x-test是否是异常
81
00:02:38,850 --> 00:02:40,080
The approach that we're going
我们要采取的方法是
82
00:02:40,130 --> 00:02:41,670
to take is that given this training
给定训练集
83
00:02:42,060 --> 00:02:43,300
set, given the unlabeled training
给定无标签的训练集
84
00:02:43,690 --> 00:02:45,280
set, we're going to
我们将
85
00:02:45,420 --> 00:02:46,920
build a model for p of
对数据建一个模型p(x)
86
00:02:47,020 --> 00:02:48,060
x. In other words, we're
也就是说
87
00:02:48,140 --> 00:02:49,320
going to build a model for the
我们将对
88
00:02:49,520 --> 00:02:51,230
probability of x, where
x的分布概率建模
89
00:02:51,390 --> 00:02:53,330
x are these features of, say, aircraft engines.
其中x是这些特征变量 例如飞机引擎
90
00:02:54,620 --> 00:02:56,290
And so, having built a
因此 当我们
91
00:02:56,530 --> 00:02:57,350
model of the probability of x
建立了x的概率模型之后
92
00:02:58,070 --> 00:02:59,230
we're then going to say that
我们就会说
93
00:02:59,820 --> 00:03:01,280
for the new aircraft engine, if
对于新的飞机引擎
94
00:03:01,520 --> 00:03:04,670
p of x-test is less
也就是x-test 如果概率p
95
00:03:04,920 --> 00:03:07,180
than some epsilon then
低于阈值ε
96
00:03:07,930 --> 00:03:09,170
we flag this as an anomaly.
那么就将其标记为异常
97
00:03:11,410 --> 00:03:12,260
So we see a new engine
因此当我们看到一个新的引擎
98
00:03:12,660 --> 00:03:13,960
that, you know, has very low probability
在我们根据训练数据
99
00:03:14,850 --> 00:03:15,900
under a model p of
得到的p(x)模型中
100
00:03:16,020 --> 00:03:17,130
x that we estimate from the data,
概率非常低时
101
00:03:17,790 --> 00:03:19,370
then we flag this anomaly, whereas
我们就将其标记为异常
102
00:03:19,730 --> 00:03:21,880
if p of x-test is, say,
反之 如果x-test的概率p
103
00:03:22,320 --> 00:03:24,110
greater than or equal to some small threshold.
大于给定的阈值ε
104
00:03:25,120 --> 00:03:26,620
Then we say that, you know, okay, it looks okay.
我们就认为它是正常的
105
00:03:27,780 --> 00:03:28,740
And so, given the training set,
因此 给定图中的
106
00:03:28,980 --> 00:03:30,890
like that plotted here, if
这个训练集
107
00:03:31,060 --> 00:03:31,940
you build a model, hopefully
如果你建立了一个模型
108
00:03:32,560 --> 00:03:34,020
you will find that aircraft engines,
你将很可能发现飞机引擎
109
00:03:34,470 --> 00:03:35,500
or hopefully the model p of
很可能发现模型p(x)
110
00:03:35,560 --> 00:03:37,070
x will say that points that
将会认为
111
00:03:37,260 --> 00:03:38,540
lie, you know, somewhere in the
在中心区域的这些点
112
00:03:38,580 --> 00:03:39,550
middle, that's pretty high probability,
有很大的概率值
113
00:03:40,720 --> 00:03:42,830
whereas points a little bit further out have lower probability.
而稍微远离中心区域的点概率会小一些
114
00:03:43,850 --> 00:03:45,050
Points that are even further out
更远的地方的点
115
00:03:45,530 --> 00:03:47,220
have somewhat lower probability, and the
它们的概率将更小
116
00:03:47,480 --> 00:03:48,420
point that's way out here,
这外面的点
117
00:03:49,080 --> 00:03:50,400
the point that's way
和这外面的点
118
00:03:50,520 --> 00:03:52,100
out there, would be an anomaly.
将成为异常点
119
00:03:54,150 --> 00:03:55,280
Whereas the point that's way in
而这边的点
120
00:03:55,470 --> 00:03:56,460
there, right in the
正好在中心区域的点
121
00:03:56,520 --> 00:03:57,720
middle, this would be
这些点将是正常的
122
00:03:57,830 --> 00:03:59,080
okay because p of x
因为在中心区域
123
00:03:59,370 --> 00:04:00,300
right in the middle of that
p(x)概率值
124
00:04:00,460 --> 00:04:01,320
would be very high cause we've
会非常大
125
00:04:01,520 --> 00:04:03,320
seen a lot of points in that region.
因为我们看到很多点都落在了这个区域
126
00:04:04,620 --> 00:04:07,580
Here are some examples of applications of anomaly detection.
异常检测算法有如下应用案例
127
00:04:08,450 --> 00:04:09,990
Perhaps the most common application of
也许异常检测
128
00:04:10,080 --> 00:04:11,420
anomaly detection is actually
最常见的应用是
129
00:04:11,560 --> 00:04:13,260
for detection if you
是欺诈检测
130
00:04:13,360 --> 00:04:14,820
have many users, and if
假设你有很多用户
131
00:04:15,070 --> 00:04:16,360
each of your users take different
你的每个用户
132
00:04:16,670 --> 00:04:17,740
activities, you know maybe
都在从事不同的活动
133
00:04:17,920 --> 00:04:18,560
on your website or in the
也许是在你的网站上
134
00:04:18,630 --> 00:04:20,180
physical plant or something, you
也许是在一个实体工厂之类的地方
135
00:04:20,300 --> 00:04:23,670
can compute features of the different users activities.
你可以对不同的用户活动计算特征变量
136
00:04:24,830 --> 00:04:25,730
And what you can do is build
然后 你可以
137
00:04:25,940 --> 00:04:27,240
a model to say, you know,
建立一个模型
138
00:04:27,310 --> 00:04:28,960
what is the probability of different
用来表示用户表现出
139
00:04:29,170 --> 00:04:30,730
users behaving different ways.
各种行为的可能性
140
00:04:30,890 --> 00:04:32,280
What is the probability of a particular vector
用来表示用户行为
141
00:04:32,460 --> 00:04:34,590
of features of a
对应的特征向量
142
00:04:34,840 --> 00:04:36,750
users behavior so you
出现的概率
143
00:04:36,900 --> 00:04:38,360
know examples of features of
因此 你看到
144
00:04:38,450 --> 00:04:40,480
a users activity may be on
某个用户在网站上行为
145
00:04:40,650 --> 00:04:41,650
the website it'd be things like,
的特征变量是这样的
146
00:04:42,710 --> 00:04:44,350
maybe x1 is how often does
也许x1是用户登陆的频率
147
00:04:44,840 --> 00:04:46,460
this user log in, x2, you know, maybe
x2也许是
148
00:04:46,850 --> 00:04:47,920
the number of what
用户访问
149
00:04:48,130 --> 00:04:49,330
pages visited, or the
某个页面的次数
150
00:04:49,730 --> 00:04:51,420
number of transactions, maybe x3
或者交易次数
151
00:04:51,440 --> 00:04:52,820
is, you know, the number of
也许x3是
152
00:04:53,120 --> 00:04:53,990
posts of the users on the
用户在论坛上发贴的次数
153
00:04:54,130 --> 00:04:55,850
forum, feature x4 could
x4是
154
00:04:56,000 --> 00:04:56,910
be what is the typing
用户的
155
00:04:57,440 --> 00:04:58,660
speed of the user and some
打字速度
156
00:04:58,920 --> 00:04:59,980
websites can actually track that
有些网站是可以记录
157
00:05:00,280 --> 00:05:01,410
was the typing speed of this
用户每秒
158
00:05:01,600 --> 00:05:03,010
user in characters per second.
打了多少个字母的
159
00:05:03,730 --> 00:05:06,610
And so you can model p of x based on this sort of data.
因此你可以根据这些数据建一个模型p(x)
160
00:05:08,150 --> 00:05:09,140
And finally having your model
最后你将得到
161
00:05:09,270 --> 00:05:10,530
p of x, you can
你的模型p(x)
162
00:05:10,790 --> 00:05:12,570
try to identify users that
然后你可以用它来发现
163
00:05:12,760 --> 00:05:14,210
are behaving very strangely on your
你网站上的行为奇怪的用户
164
00:05:14,350 --> 00:05:15,590
website by checking which ones have
你只需要
165
00:05:16,320 --> 00:05:18,100
probably effects less than epsilon and
看哪些用户的p(x)概率小于ε
166
00:05:18,240 --> 00:05:21,140
maybe send the profiles of those users for further review.
接下来 你拿来这些用户的档案 做进一步筛选
167
00:05:22,330 --> 00:05:24,560
Or demand additional identification from
或者要求这些用户
168
00:05:24,740 --> 00:05:26,190
those users, or some such
验证他们的身份
169
00:05:26,650 --> 00:05:28,370
to guard against you know,
从而让你的网站防御
170
00:05:29,200 --> 00:05:31,650
strange behavior or fraudulent behavior on your website.
异常行为或者欺诈行为
171
00:05:33,030 --> 00:05:34,960
This sort of technique will tend
这样的技术将会找到
172
00:05:35,160 --> 00:05:36,470
of flag the users that are
行为不寻常的用户
173
00:05:36,720 --> 00:05:38,250
behaving unusually, not just
而不只是
174
00:05:39,480 --> 00:05:41,420
users that maybe behaving fraudulently.
有欺诈行为的用户
175
00:05:42,190 --> 00:05:44,030
So not just constantly having
也不只是那些
176
00:05:44,370 --> 00:05:45,670
stolen or users that are
被盗号的用户
177
00:05:45,780 --> 00:05:47,780
trying to do funny things, or just find unusual users.
或者有滑稽行为的用户 而是行为不寻常的用户
178
00:05:48,560 --> 00:05:49,770
But this is actually the technique
然而这就是许多
179
00:05:50,040 --> 00:05:51,430
that is used by many online
许多在线购物网站
180
00:05:52,500 --> 00:05:53,570
websites that sell things to
常用来识别异常用户
181
00:05:53,750 --> 00:05:55,860
try identify users behaving
的技术
182
00:05:56,240 --> 00:05:57,900
strangely that might be
这些用户行为奇怪
183
00:05:58,040 --> 00:05:59,160
indicative of either fraudulent
可能是表示他们有欺诈行为
184
00:05:59,760 --> 00:06:02,420
behavior or of computer accounts that have been stolen.
或者是被盗号
185
00:06:03,580 --> 00:06:06,410
Another example of anomaly detection is manufacturing.
异常检测的另一个例子是在工业生产领域
186
00:06:07,180 --> 00:06:08,470
So, already talked about the
事实上 我们之前已经谈到过
187
00:06:08,530 --> 00:06:09,770
aircraft engine thing where you can
飞机引擎的问题
188
00:06:10,030 --> 00:06:11,460
find unusual, say, aircraft
你可以找到异常的飞机引擎
189
00:06:11,900 --> 00:06:13,600
engines and send those for further review.
然后要求进一步细查这些引擎的质量
190
00:06:15,430 --> 00:06:16,740
A third application would be
第三个应用是
191
00:06:17,070 --> 00:06:19,210
monitoring computers in a data center.
数据中心的计算机监控
192
00:06:19,390 --> 00:06:20,410
I actually have some friends who work on this too.
实际上 我有些朋友正在从事这类工作
193
00:06:21,260 --> 00:06:22,280
So if you have a lot
如果你管理一个
194
00:06:22,580 --> 00:06:23,550
of machines in a computer
计算机集群
195
00:06:23,730 --> 00:06:24,690
cluster or in a
或者一个数据中心
196
00:06:24,780 --> 00:06:25,710
data center, we can do
其中有许多计算机
197
00:06:25,920 --> 00:06:28,560
things like compute features at each machine.
那么我们可以为每台计算机计算特征变量
198
00:06:29,020 --> 00:06:30,650
So maybe some features capturing
也许某些特征衡量
199
00:06:31,170 --> 00:06:32,730
you know, how much memory used, number of
计算机的内存消耗
200
00:06:32,870 --> 00:06:34,280
disc accesses, CPU load.
或者硬盘访问量 CPU负载