forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path17 - 6 - Map Reduce and Data Parallelism (14 min).srt
2101 lines (1681 loc) · 38.3 KB
/
17 - 6 - Map Reduce and Data Parallelism (14 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,320 --> 00:00:01,510
In the last few videos, we talked
在上面几个视频中,我们讨论了
(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
2
00:00:01,810 --> 00:00:03,430
about stochastic gradient descent, and,
随机梯度下降
3
00:00:03,620 --> 00:00:05,020
you know, other variations of the
以及梯度下降算法的
4
00:00:05,120 --> 00:00:06,530
stochastic gradient descent algorithm,
其他一些变种
5
00:00:06,910 --> 00:00:09,150
including those adaptations to online
包括如何将其
6
00:00:09,490 --> 00:00:10,420
learning, but all of those
运用于在线学习
7
00:00:10,610 --> 00:00:11,810
algorithms could be run on
然而所有这些算法
8
00:00:12,110 --> 00:00:13,740
one machine, or could be run on one computer.
都只能在一台计算机上运行
9
00:00:14,800 --> 00:00:15,870
And some machine learning problems
但是 有些机器学习问题
10
00:00:16,310 --> 00:00:17,270
are just too big to run
太大以至于不可能
11
00:00:17,520 --> 00:00:19,160
on one machine, sometimes maybe
只在一台计算机上运行
12
00:00:19,300 --> 00:00:21,050
you just so much data you
有时候 它涉及的数据量如此巨大
13
00:00:21,170 --> 00:00:22,350
just don't ever want to run
不论你使用何种算法
14
00:00:22,670 --> 00:00:23,980
all that data through a
你都不希望只使用
15
00:00:24,100 --> 00:00:26,270
single computer, no matter what algorithm you would use on that computer.
一台计算机来处理这些数据
16
00:00:28,470 --> 00:00:29,640
So in this video I'd
因此 在这个视频中
17
00:00:29,740 --> 00:00:31,240
like to talk about different approach
我希望介绍
18
00:00:31,770 --> 00:00:33,610
to large scale machine learning, called
进行大规模机器学习的另一种方法
19
00:00:34,010 --> 00:00:36,190
the map reduce approach.
称为map reduce (映射 化简) 方法
20
00:00:37,030 --> 00:00:38,080
And even though we have
尽管我们
21
00:00:38,380 --> 00:00:39,400
quite a few videos on stochastic
用了多个视频讲解
22
00:00:39,970 --> 00:00:41,230
gradient descent and we're going
随机梯度下降算法
23
00:00:41,550 --> 00:00:43,100
to spend relative less time
而我们将只用少量时间
24
00:00:43,460 --> 00:00:45,350
on map reduce--don't judge the
介绍map reduce
25
00:00:45,560 --> 00:00:46,750
relative importance of map reduce
但是请不要根据
26
00:00:47,160 --> 00:00:48,240
versus the gradient descent
我们所花的时间长短
27
00:00:48,690 --> 00:00:49,590
based on the amount amount of
来判断哪一种技术
28
00:00:49,660 --> 00:00:51,480
time I spend on these ideas in particular.
更加重要
29
00:00:52,230 --> 00:00:53,380
Many people will say that
事实上 许多人认为
30
00:00:53,790 --> 00:00:54,840
map reduce is at least
map reduce方法至少是
31
00:00:55,090 --> 00:00:56,330
an equally important, and some
同等重要的
32
00:00:56,580 --> 00:00:57,850
would say an even more important idea
还有人认为map reduce方法
33
00:00:58,500 --> 00:01:00,620
compared to gradient descent, only
甚至比梯度下降方法更重要
34
00:01:01,460 --> 00:01:03,040
it's relatively simpler to
我们之所以只在
35
00:01:03,160 --> 00:01:04,620
explain, which is why I'm
map reduce上话少量时间
36
00:01:04,720 --> 00:01:05,580
going to spend less time on
只是因为它相对简单 容易解释
37
00:01:05,830 --> 00:01:07,040
it, but using these ideas
然而 实际上
38
00:01:07,670 --> 00:01:08,400
you might be able to scale
相比于随机梯度下降方法
39
00:01:09,070 --> 00:01:10,640
learning algorithms to even
map reduce方法
40
00:01:10,880 --> 00:01:12,520
far larger problems than is
能够处理
41
00:01:12,630 --> 00:01:14,530
possible using stochastic gradient descent.
更大规模的问题
42
00:01:18,720 --> 00:01:19,000
Here's the idea.
它的想法是这样的
43
00:01:19,810 --> 00:01:21,020
Let's say we want to fit
假设我们要
44
00:01:21,490 --> 00:01:22,960
a linear regression model or
拟合一个线性回归模型
45
00:01:23,140 --> 00:01:24,440
a logistic regression model or some
或者Logistic回归模型
46
00:01:24,540 --> 00:01:26,100
such, and let's start again
或者其他的什么模型
47
00:01:26,430 --> 00:01:27,660
with batch gradient descent, so
让我们再次从随机梯度下降算法开始吧
48
00:01:27,840 --> 00:01:30,300
that's our batch gradient descent learning rule.
这就是我们的随机梯度下降学习算法
49
00:01:31,240 --> 00:01:32,430
And to keep the writing
为了让幻灯片上的文字
50
00:01:32,850 --> 00:01:34,170
on this slide tractable, I'm going
更容易理解
51
00:01:34,340 --> 00:01:36,990
to assume throughout that we have m equals 400 examples.
我们将假定m固定为400个样本
52
00:01:37,530 --> 00:01:39,560
Of course, by our
当然 根据
53
00:01:39,750 --> 00:01:40,850
standards, in terms of large scale
大规模机器学习的标准
54
00:01:41,090 --> 00:01:42,050
machine learning, you know m
m等于400
55
00:01:42,170 --> 00:01:43,210
might be pretty small and so,
实在是太小了
56
00:01:43,770 --> 00:01:45,390
this might be more commonly
也许在实际问题中
57
00:01:45,870 --> 00:01:46,920
applied to problems, where you
你更有可能遇到
58
00:01:47,050 --> 00:01:48,190
have maybe closer to 400
样本大小为4亿
59
00:01:48,740 --> 00:01:49,940
million examples, or some
的数据
60
00:01:50,080 --> 00:01:51,310
such, but just to
或者其他差不多的大小
61
00:01:51,390 --> 00:01:52,330
make the writing on the slide
但是 为了使我们的讲解更加简单和清晰
62
00:01:52,770 --> 00:01:55,000
simpler, I'm going to pretend we have 400 examples.
我们假定我们只有400个样本
63
00:01:55,690 --> 00:01:57,460
So in that case, the
这样以来
64
00:01:57,790 --> 00:01:59,080
batch gradient descent learning rule
随机梯度下降学习算法中
65
00:01:59,570 --> 00:02:00,930
has this 400 and the
这里是400
66
00:02:01,500 --> 00:02:02,930
sum from i equals 1 through
以及400个样本的求和
67
00:02:03,330 --> 00:02:05,050
400 through my 400 examples
这里i从1取到400
68
00:02:05,590 --> 00:02:06,890
here, and if m
如果m很大
69
00:02:07,050 --> 00:02:09,780
is large, then this is a computationally expensive step.
那么这一步的计算量将会很大
70
00:02:10,890 --> 00:02:12,830
So, what the MapReduce idea
因此 下面我们来介绍
71
00:02:13,250 --> 00:02:14,470
does is the following, and
map reduce算法
72
00:02:14,890 --> 00:02:15,740
I should say the map
这里我必须指出
73
00:02:15,950 --> 00:02:16,940
reduce idea is due to
map reduce算法的基本思想
74
00:02:17,680 --> 00:02:20,190
two researchers, Jeff Dean
来自Jeff Dean和Sanjay Gimawat
75
00:02:20,700 --> 00:02:22,060
and Sanjay Gimawat.
这两位研究者
76
00:02:22,640 --> 00:02:23,490
Jeff Dean, by the way, is
Jeff Dean是硅谷
77
00:02:24,190 --> 00:02:26,520
one of the most legendary engineers in
最为传奇般的
78
00:02:26,660 --> 00:02:28,300
all of Silicon Valley and he
一位工程师
79
00:02:28,420 --> 00:02:29,530
kind of built a large
今天谷歌 (Google) 所有的服务
80
00:02:29,820 --> 00:02:31,670
fraction of the architectural
所依赖的后台基础架构
81
00:02:32,310 --> 00:02:34,770
infrastructure that all of Google runs on today.
有很大一部分是他创建的
82
00:02:36,000 --> 00:02:37,320
But here's the map reduce idea.
接下来我们回到 map reduce 的基本想法
83
00:02:37,850 --> 00:02:38,570
So, let's say I have
假设我们有一个
84
00:02:38,700 --> 00:02:39,840
some training set, if we
训练样本
85
00:02:39,900 --> 00:02:41,220
want to denote by this box here
我们将它表示为
86
00:02:41,610 --> 00:02:42,760
of X Y pairs,
这个方框中的一系列X~Y数据对
87
00:02:44,250 --> 00:02:47,730
where it's X1, Y1, down
从X1~Y1开始
88
00:02:47,990 --> 00:02:49,640
to my 400 examples,
涵盖我所有的400个样本
89
00:02:50,520 --> 00:02:51,660
Xm, Ym.
直到X400~Y400
90
00:02:52,190 --> 00:02:53,780
So, that's my training set with 400 training examples.
总之 这就是我的400个训练样本
91
00:02:55,060 --> 00:02:56,550
In the MapReduce idea, one way
根据map reduce思想
92
00:02:56,690 --> 00:02:58,190
to do, is split this training
一种解决方案是
93
00:02:58,570 --> 00:03:00,510
set in to different subsets.
将训练集划分成几个不同的子集
94
00:03:01,890 --> 00:03:02,590
I'm going to.
在这个例子中
95
00:03:02,950 --> 00:03:04,150
assume for this example that
我假定我有
96
00:03:04,290 --> 00:03:05,530
I have 4 computers,
4台计算机
97
00:03:06,160 --> 00:03:07,160
or 4 machines to run in
它们并行的
98
00:03:07,300 --> 00:03:08,670
parallel on my training set,
处理我的训练数据
99
00:03:08,890 --> 00:03:10,570
which is why I'm splitting this into 4 machines.
因此我要将数据划分成4份 分给这4台计算机
100
00:03:10,920 --> 00:03:12,290
If you have 10 machines or
如果你有10台计算机
101
00:03:12,400 --> 00:03:13,810
100 machines, then you would
或者100台计算机
102
00:03:13,970 --> 00:03:15,890
split your training set into 10 pieces or 100 pieces or what have you.
那么你可能会将训练数据划分成10份或者100份
103
00:03:18,040 --> 00:03:19,710
And what the first of my
我的4台计算机中
104
00:03:19,850 --> 00:03:20,840
4 machines is to do,
第一台
105
00:03:21,100 --> 00:03:23,170
say, is use just the
将处理第一个
106
00:03:23,270 --> 00:03:25,170
first one quarter of my
四分之一训练数据
107
00:03:25,300 --> 00:03:28,680
training set--so use just the first 100 training examples.
也就是前100个训练样本
108
00:03:30,020 --> 00:03:31,440
And in particular, what it's
具体来说
109
00:03:31,480 --> 00:03:32,520
going to do is look at
这台计算机
110
00:03:32,630 --> 00:03:34,800
this summation, and compute
将参与处理这个求和
111
00:03:35,490 --> 00:03:38,560
that summation for just the first 100 training examples.
它将对前100个训练样本进行求和运算
112
00:03:40,030 --> 00:03:40,960
So let me write that up
让我把公式写下来吧
113
00:03:41,110 --> 00:03:42,530
I'm going to compute a variable
我将计算临时变量
114
00:03:43,560 --> 00:03:46,230
temp 1 to superscript 1
temp 1 这里的上标1
115
00:03:46,320 --> 00:03:49,410
the first machine J equals
表示第一台计算机
116
00:03:50,450 --> 00:03:52,150
sum from equals 1 through
其下标为j 该变量等于从1到100的求和
117
00:03:52,260 --> 00:03:53,160
100, and then I'm going to plug
然后我在这里写的部分
118
00:03:53,500 --> 00:03:56,610
in exactly that term there--so I have
和这里的完全相同
119
00:03:57,260 --> 00:04:00,140
X-theta, Xi, minus Yi
也就是h θ Xi减Yi
120
00:04:01,800 --> 00:04:03,230
times Xij, right?
乘以Xij
121
00:04:03,740 --> 00:04:05,680
So that's just that
这其实就是
122
00:04:05,910 --> 00:04:07,460
gradient descent term up there.
这里的梯度下降公式中的这一项
123
00:04:08,300 --> 00:04:09,780
And then similarly, I'm going
然后 类似的
124
00:04:10,010 --> 00:04:11,330
to take the second quarter
我将用第二台计算机
125
00:04:11,600 --> 00:04:13,130
of my data and send it
处理我的
126
00:04:13,320 --> 00:04:14,520
to my second machine, and
第二个四分之一数据
127
00:04:14,690 --> 00:04:15,680
my second machine will use
也就是说 我的第二台计算机
128
00:04:15,900 --> 00:04:18,750
training examples 101 through 200
将使用第101到200号训练样本
129
00:04:19,350 --> 00:04:21,170
and you will compute similar variables
类似的 我们用它
130
00:04:21,720 --> 00:04:22,880
of a temp to j which
计算临时变量 temp 2 j
131
00:04:23,110 --> 00:04:24,450
is the same sum for index
也就是从101到200号
132
00:04:24,890 --> 00:04:26,620
from examples 101 through 200.
数据的求和
133
00:04:26,840 --> 00:04:29,680
And similarly machines 3
类似的 第三台和第四台
134
00:04:29,830 --> 00:04:32,720
and 4 will use the
计算机将会使用
135
00:04:32,830 --> 00:04:34,110
third quarter and the fourth
第三个和第四个
136
00:04:34,570 --> 00:04:36,550
quarter of my training set.
四分之一训练样本
137
00:04:37,530 --> 00:04:38,950
So now each machine has
这样 现在每台计算机
138
00:04:39,190 --> 00:04:40,580
to sum over 100 instead
不用处理400个样本
139
00:04:41,060 --> 00:04:42,570
of over 400 examples and so
而只用处理100个样本
140
00:04:42,760 --> 00:04:43,750
has to do only a quarter
它们只用完成
141
00:04:44,050 --> 00:04:45,220
of the work and thus presumably
四分之一的工作量
142
00:04:45,900 --> 00:04:48,000
it could do it about four times as fast.
这样 也许可以将运算速度提高到原来的四倍
143
00:04:49,380 --> 00:04:50,630
Finally, after all these machines
最后 当这些计算机
144
00:04:50,990 --> 00:04:51,740
have done this work, I am
全都完成了各自的工作
145
00:04:51,850 --> 00:04:53,560
going to take these temp variables
我会将这些临时变量
146
00:04:55,350 --> 00:04:56,480
and put them back together.
收集到一起
147
00:04:56,870 --> 00:04:58,400
So I take these variables and
我会将它们
148
00:04:58,530 --> 00:04:59,950
send them all to a You
送到一个
149
00:05:00,090 --> 00:05:03,080
know centralized master server and
中心计算服务器
150
00:05:03,300 --> 00:05:04,750
what the master will do
这台服务器会
151
00:05:05,140 --> 00:05:06,720
is combine these results together.
将这些临时变量合并起来
152
00:05:07,360 --> 00:05:08,470
and in particular, it will
具体来说
153
00:05:08,780 --> 00:05:10,780
update my parameters theta
它将根据以下公式
154
00:05:11,000 --> 00:05:13,160
j according to theta
来更新参数θj
155
00:05:13,410 --> 00:05:14,720
j gets updated as theta j
新的θj将等于
156
00:05:15,730 --> 00:05:17,560
minus Of the
旧的θj减去
157
00:05:17,680 --> 00:05:19,510
learning rate alpha times one
学习速率α乘以
158
00:05:20,120 --> 00:05:22,940
over 400 times temp,
400分之一
159
00:05:23,300 --> 00:05:27,410
1, J, plus temp
乘以临时变量 temp 1 j
160
00:05:27,760 --> 00:05:30,290
2j plus temp 3j
加temp 2j 加temp 3j
161
00:05:32,400 --> 00:05:35,470
plus temp 4j and
加temp 4j
162
00:05:35,560 --> 00:05:37,890
of course we have to do this separately for J equals 0.
当然 对于j等于0的情况我们需要单独处理
163
00:05:37,980 --> 00:05:39,570
You know, up to
这里 j从0
164
00:05:39,820 --> 00:05:41,220
and within this number of features.
取到特征总数n
165
00:05:42,550 --> 00:05:45,420
So operating this equation into I hope it's clear.
通过将这个公式拆成多行讲解 我希望大家已经理解了
166
00:05:45,670 --> 00:05:47,870
So what this equation
其实 这个公式计算的数值
167
00:05:50,930 --> 00:05:53,220
is doing is exactly the
和原先的梯度下降公式计算的数值
168
00:05:53,290 --> 00:05:54,570
same is that when you
是完全一样的
169
00:05:54,660 --> 00:05:56,140
have a centralized master server
只不过 现在我们有一个中心运算服务器
170
00:05:56,680 --> 00:05:57,950
that takes the results, the ten
它收集了一些部分计算结果
171
00:05:58,040 --> 00:05:58,780
one j the ten two j
temp 1j temp 2j
172
00:05:59,000 --> 00:05:59,850
ten three j and ten four
temp 3j 和 temp4j
173
00:05:59,970 --> 00:06:01,760
j and adds them up
把它们加了起来
174
00:06:02,030 --> 00:06:03,430
and so of course the sum
很显然 这四个
175
00:06:04,090 --> 00:06:04,960
of these four things.
临时变量的和
176
00:06:06,360 --> 00:06:07,810
Right, that's just the sum of
就是这个求和
177
00:06:08,060 --> 00:06:09,440
this, plus the sum
加上这个求和
178
00:06:09,760 --> 00:06:11,490
of this, plus the sum
加上这个求和
179
00:06:11,630 --> 00:06:13,000
of this, plus the sum
再加上这个求和
180
00:06:13,120 --> 00:06:14,290
of that, and those four
它们加起来的和
181
00:06:14,470 --> 00:06:15,830
things just add up to
其实和原先
182
00:06:15,920 --> 00:06:17,740
be equal to this sum that
我们使用批量梯度下降公式
183
00:06:17,880 --> 00:06:19,580
we're originally computing a batch stream descent.
计算的结果是一样的
184
00:06:20,590 --> 00:06:21,550
And then we have the alpha times
接下来 我们有
185
00:06:21,860 --> 00:06:22,910
1 of 400, alpha times 1
α乘以400分之一
186
00:06:23,350 --> 00:06:24,690
of 100, and this is
这里也是α乘以400分之一
187
00:06:25,020 --> 00:06:27,020
exactly equivalent to the
因此这个公式
188
00:06:27,140 --> 00:06:29,390
batch gradient descent algorithm, only,
完全等同于批量梯度下降公式
189
00:06:29,910 --> 00:06:30,880
instead of needing to sum
唯一的不同是
190
00:06:31,290 --> 00:06:32,540
over all four hundred training
我们原本需要在一台计算机上
191
00:06:32,810 --> 00:06:33,900
examples on just one
完成400个训练样本的求和
192
00:06:34,040 --> 00:06:35,280
machine, we can instead
而现在
193
00:06:35,760 --> 00:06:37,460
divide up the work load on four machines.
我们将这个工作分给了4台计算机
194
00:06:39,090 --> 00:06:40,190
So, here's what the general
总结来说
195
00:06:40,630 --> 00:06:43,410
picture of the MapReduce technique looks like.
map reduce技术是这么工作的
196
00:06:45,060 --> 00:06:46,510
We have some training sets, and
我们有一些训练样本
197
00:06:46,670 --> 00:06:48,200
if we want to paralyze across four
如果我们希望使用4台计算机
198
00:06:48,420 --> 00:06:49,100
machines, we are going to
并行的运行机器学习算法
199
00:06:49,170 --> 00:06:51,670
take the training set and split it, you know, equally.
那么我们将训练样本等分
200
00:06:52,120 --> 00:06:54,640
Split it as evenly as we can into four subsets.
尽量均匀的分成4份