forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path7 - 2 - Cost Function (10 min).srt
1441 lines (1153 loc) · 27 KB
/
7 - 2 - Cost Function (10 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,144 --> 00:00:02,011
In this video, I'd like to
在这段视频中 我想要
(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
2
00:00:02,011 --> 00:00:03,990
convey to you, the main intuitions
传达给你一个直观的感受
3
00:00:03,990 --> 00:00:05,771
behind how regularization works.
告诉你正规化是如何进行的
4
00:00:05,771 --> 00:00:07,386
And, we'll also write down
而且 我们还要写出
5
00:00:07,386 --> 00:00:11,724
the cost function that we'll use, when we were using regularization.
我们使用正规化时 需要使用的代价函数
6
00:00:11,780 --> 00:00:13,327
With the hand drawn examples that
根据我们幻灯片上的
7
00:00:13,327 --> 00:00:14,916
we have on these slides, I
这些例子
8
00:00:14,950 --> 00:00:17,642
think I'll be able to convey part of the intuition.
我想我可以给你一个直观的感受
9
00:00:17,700 --> 00:00:19,608
But, an even better
但是 一个更好的
10
00:00:19,608 --> 00:00:21,192
way to see for yourself, how
让你自己去理解正规化
11
00:00:21,192 --> 00:00:22,643
regularization works, is if
如何工作的方法是
12
00:00:22,643 --> 00:00:25,869
you implement it, and, see it work for yourself.
你自己亲自去实现它 并且看看它是如何工作的
13
00:00:25,869 --> 00:00:26,888
And, if you do the
如果在这节课后
14
00:00:26,888 --> 00:00:28,603
appropriate exercises after this,
你进行一些适当的练习
15
00:00:28,603 --> 00:00:30,053
you get the chance
你就有机会亲自体验一下
16
00:00:30,053 --> 00:00:33,927
to self see regularization in action for yourself.
正规化到底是怎么工作的
17
00:00:33,930 --> 00:00:36,519
So, here is the intuition.
那么 这里就是一些直观解释
18
00:00:36,519 --> 00:00:38,233
In the previous video, we saw
在前面的视频中 我们看到了
19
00:00:38,233 --> 00:00:39,771
that, if we were to fit
如果说我们要
20
00:00:39,771 --> 00:00:41,420
a quadratic function to this
用一个二次函数来
21
00:00:41,420 --> 00:00:44,283
data, it gives us a pretty good fit to the data.
拟合这些数据 它给了我们一个对数据很好的拟合
22
00:00:44,283 --> 00:00:45,286
Whereas, if we were to
然而 如果我们
23
00:00:45,310 --> 00:00:47,175
fit an overly high order
用一个更高次的
24
00:00:47,210 --> 00:00:48,823
degree polynomial, we end
多项式去拟合 我们最终
25
00:00:48,850 --> 00:00:50,111
up with a curve that may fit
可能得到一个曲线
26
00:00:50,111 --> 00:00:51,760
the training set very well, but,
能非常好地拟合训练集 但是
27
00:00:51,760 --> 00:00:53,381
really not be a,
这真的不是一个好的结果
28
00:00:53,420 --> 00:00:54,497
but overfit the data
它过度拟合了数据
29
00:00:54,497 --> 00:00:57,225
poorly, and, not generalize well.
因此 一般性并不是很好
30
00:00:57,900 --> 00:01:00,453
Consider the following, suppose we
让我们考虑下面的假设
31
00:01:00,453 --> 00:01:02,088
were to penalize, and, make
我们想要加上惩罚项 从而使
32
00:01:02,088 --> 00:01:04,753
the parameters theta 3 and theta 4 really small.
参数 θ3 和 θ4 足够的小
33
00:01:04,753 --> 00:01:06,543
Here's what I
这里我的意思就是
34
00:01:06,543 --> 00:01:09,676
mean, here is our optimization
这是我们的优化目标
35
00:01:09,690 --> 00:01:10,859
objective, or here is our
或者客观的说 这就是我们需要
36
00:01:10,870 --> 00:01:12,574
optimization problem, where we minimize
优化的问题 我们需要尽量减少
37
00:01:12,580 --> 00:01:15,526
our usual squared error cause function.
代价函数的均方误差
38
00:01:15,526 --> 00:01:17,350
Let's say I take this objective
对于这个函数
39
00:01:17,370 --> 00:01:19,125
and modify it and add
我们对它进行一些 添加一些项
40
00:01:19,160 --> 00:01:23,291
to it, plus 1000 theta
加上 1000 乘以 θ3 的平方
41
00:01:23,291 --> 00:01:28,334
3 squared, plus 1000 theta 4 squared.
再加上 1000 乘以 θ4 的平方
42
00:01:28,334 --> 00:01:32,354
1000 I am just writing down as some huge number.
1000 只是我随便写的某个较大的数字而已
43
00:01:32,354 --> 00:01:33,538
Now, if we were to
现在 如果我们要
44
00:01:33,540 --> 00:01:35,127
minimize this function, the
最小化这个函数
45
00:01:35,140 --> 00:01:36,688
only way to make this
为了使这个
46
00:01:36,710 --> 00:01:38,620
new cost function small is
新的代价函数最小化
47
00:01:38,620 --> 00:01:40,769
if theta 3 and data
我们要让 θ3 和 θ4
48
00:01:40,769 --> 00:01:42,133
4 are small, right?
尽可能小 对吧?
49
00:01:42,133 --> 00:01:43,264
Because otherwise, if you have
因为 如果你有
50
00:01:43,264 --> 00:01:44,956
a thousand times theta 3, this
1000 乘以 θ3 这个
51
00:01:44,970 --> 00:01:48,103
new cost functions gonna be big.
新的代价函数将会是很大的
52
00:01:48,140 --> 00:01:49,245
So when we minimize this
所以 当我们最小化
53
00:01:49,245 --> 00:01:50,402
new function we are going
这个新的函数时 我们将使
54
00:01:50,402 --> 00:01:52,107
to end up with theta 3
θ3 的值
55
00:01:52,110 --> 00:01:53,776
close to 0 and theta
接近于0
56
00:01:53,776 --> 00:01:56,700
4 close to 0, and as
θ4 的值也接近于0
57
00:01:56,700 --> 00:01:59,691
if we're getting rid
就像我们忽略了
58
00:01:59,691 --> 00:02:03,206
of these two terms over there.
这两个值一样
59
00:02:03,710 --> 00:02:05,282
And if we do that, well then,
如果我们做到这一点
60
00:02:05,290 --> 00:02:06,783
if theta 3 and theta 4
如果 θ3 和 θ4
61
00:02:06,783 --> 00:02:07,973
close to 0 then we are
接近0 那么我们
62
00:02:07,973 --> 00:02:09,643
being left with a quadratic function,
将得到一个近似的二次函数
63
00:02:09,643 --> 00:02:11,089
and, so, we end up with
所以 我们最终
64
00:02:11,110 --> 00:02:13,343
a fit to the data, that's, you know, quadratic
恰当地拟合了数据 你知道
65
00:02:13,343 --> 00:02:15,463
function plus maybe, tiny
二次函数加上一些项
66
00:02:15,463 --> 00:02:17,856
contributions from small terms,
这些很小的项 贡献很小
67
00:02:17,860 --> 00:02:20,207
theta 3, theta 4, that they may be very close to 0.
因为 θ3 θ4 它们是非常接近于0的
68
00:02:20,207 --> 00:02:27,293
And, so, we end up with
所以 我们最终得到了
69
00:02:27,293 --> 00:02:29,386
essentially, a quadratic function, which is good.
实际上 很好的一个二次函数
70
00:02:29,386 --> 00:02:30,544
Because this is a
因为这是一个
71
00:02:30,544 --> 00:02:34,060
much better hypothesis.
更好的假设
72
00:02:34,104 --> 00:02:36,666
In this particular example, we looked at the effect
在这个具体的例子中 我们看到了
73
00:02:36,700 --> 00:02:39,023
of penalizing two of
惩罚这两个
74
00:02:39,023 --> 00:02:41,446
the parameter values being large.
大的参数值的效果
75
00:02:41,446 --> 00:02:46,510
More generally, here is the idea behind regularization.
更一般地 这里给出了正规化背后的思路
76
00:02:46,980 --> 00:02:48,924
The idea is that, if we
这种思路就是 如果我们
77
00:02:48,924 --> 00:02:50,303
have small values for the
的参数值
78
00:02:50,303 --> 00:02:53,083
parameters, then, having
对应一个较小值的话
79
00:02:53,083 --> 00:02:55,250
small values for the parameters,
就是说 参数值比较小
80
00:02:55,250 --> 00:02:57,866
will somehow, will usually correspond
那么往往我们会得到一个
81
00:02:57,866 --> 00:03:00,386
to having a simpler hypothesis.
形式更简单的假设
82
00:03:00,386 --> 00:03:02,279
So, for our last example, we
所以 我们最后一个例子中
83
00:03:02,279 --> 00:03:04,024
penalize just theta 3 and
我们惩罚的只是 θ3 和
84
00:03:04,024 --> 00:03:05,666
theta 4 and when both
θ4 使这两个
85
00:03:05,666 --> 00:03:07,046
of these were close to zero,
值均接近于零
86
00:03:07,046 --> 00:03:08,450
we wound up with a much simpler
我们得到了一个更简单的假设
87
00:03:08,480 --> 00:03:12,549
hypothesis that was essentially a quadratic function.
也即这个假设大抵上是一个二次函数
88
00:03:12,549 --> 00:03:13,991
But more broadly, if we penalize all
但更一般地说 如果我们就像这样
89
00:03:13,991 --> 00:03:15,989
the parameters usually that, we
惩罚的其它参数 通常我们
90
00:03:15,989 --> 00:03:17,416
can think of that, as trying
可以把它们都想成是
91
00:03:17,420 --> 00:03:19,076
to give us a simpler hypothesis
得到一个更简单的假设
92
00:03:19,110 --> 00:03:20,943
as well because when, you
因为你知道
93
00:03:20,943 --> 00:03:22,380
know, these parameters are
当这些参数越接近这个例子时
94
00:03:22,410 --> 00:03:23,700
as close as you in this
假设的结果越接近
95
00:03:23,700 --> 00:03:26,105
example, that gave us a quadratic function.
一个二次函数
96
00:03:26,105 --> 00:03:29,038
But more generally, it is
但更一般地
97
00:03:29,038 --> 00:03:30,493
possible to show that having
可以表明
98
00:03:30,530 --> 00:03:32,536
smaller values of the parameters
这些参数的值越小
99
00:03:32,540 --> 00:03:34,416
corresponds to usually smoother
通常对应于越光滑的函数
100
00:03:34,416 --> 00:03:36,780
functions as well for the simpler.
也就是更加简单的函数
101
00:03:36,780 --> 00:03:41,667
And which are therefore, also, less prone to overfitting.
因此 就不易发生过拟合的问题
102
00:03:41,680 --> 00:03:43,245
I realize that the reasoning for
我知道
103
00:03:43,245 --> 00:03:45,441
why having all the parameters be small.
为什么要所有的部分参数变小的这些原因
104
00:03:45,441 --> 00:03:46,944
Why that corresponds to a simpler
为什么越小的参数对应于一个简单的假设
105
00:03:46,960 --> 00:03:48,916
hypothesis; I realize that
我知道这些原因
106
00:03:48,916 --> 00:03:51,572
reasoning may not be entirely clear to you right now.
对你来说现在不一定完全理解
107
00:03:51,590 --> 00:03:52,784
And it is kind of hard
但现在解释起来确实比较困难
108
00:03:52,784 --> 00:03:54,477
to explain unless you implement
除非你自己实现一下
109
00:03:54,480 --> 00:03:56,446
yourself and see it for yourself.
自己亲自运行了这部分
110
00:03:56,470 --> 00:03:58,247
But I hope that the example of
但是我希望 这个例子中
111
00:03:58,247 --> 00:03:59,610
having theta 3 and theta
使 θ3 和 θ4
112
00:03:59,650 --> 00:04:01,230
4 be small and how
很小 并且这样做
113
00:04:01,230 --> 00:04:02,535
that gave us a simpler
能给我们一个更加简单的
114
00:04:02,540 --> 00:04:04,776
hypothesis, I hope that
假设 我希望这个例子
115
00:04:04,800 --> 00:04:06,314
helps explain why, at least give
有助于解释原因 至少给了
116
00:04:06,330 --> 00:04:09,320
some intuition as to why this might be true.
我们一些直观感受 为什么这应该是这样的
117
00:04:09,320 --> 00:04:11,476
Lets look at the specific example.
来让我们看看具体的例子
118
00:04:12,010 --> 00:04:13,873
For housing price prediction we
对于房屋价格预测我们
119
00:04:13,873 --> 00:04:15,465
may have our hundred features
可能有上百种特征
120
00:04:15,480 --> 00:04:17,223
that we talked about where may
我们谈到了一些可能的特征
121
00:04:17,250 --> 00:04:18,756
be x1 is the size, x2
比如说 x1 是房屋的尺寸
122
00:04:18,756 --> 00:04:20,096
is the number of bedrooms, x3
x2 是卧室的数目
123
00:04:20,096 --> 00:04:21,963
is the number of floors and so on.
x3 是房屋的层数等等
124
00:04:21,963 --> 00:04:24,502
And we may we may have a hundred features.
那么我们可能就有一百个特征
125
00:04:24,502 --> 00:04:26,896
And unlike the polynomial
跟前面的多项式例子不同
126
00:04:26,920 --> 00:04:28,459
example, we don't know, right,
我们是不知道的 对吧
127
00:04:28,460 --> 00:04:29,826
we don't know that theta 3,
我们不知道 θ3
128
00:04:29,826 --> 00:04:32,641
theta 4, are the high order polynomial terms.
θ4 是高阶多项式的项
129
00:04:32,641 --> 00:04:34,515
So, if we have just a
所以 如果我们有一个袋子
130
00:04:34,540 --> 00:04:35,863
bag, if we have just a
如果我们有一百个特征
131
00:04:35,863 --> 00:04:38,074
set of a hundred features, it's hard
在这个袋子里 我们是很难
132
00:04:38,100 --> 00:04:40,210
to pick in advance which are
提前选出那些
133
00:04:40,260 --> 00:04:42,729
the ones that are less likely to be relevant.
关联度更小的特征的
134
00:04:42,729 --> 00:04:45,773
So we have a hundred or a hundred one parameters.
也就是说如果我们有一百或一百零一个参数
135
00:04:45,780 --> 00:04:47,340
And we don't know which
我们不知道
136
00:04:47,340 --> 00:04:48,987
ones to pick, we
挑选哪一个
137
00:04:49,010 --> 00:04:50,445
don't know which
我们并不知道
138
00:04:50,450 --> 00:04:54,272
parameters to try to pick, to try to shrink.
如何选择参数 如何缩小参数的数目
139
00:04:54,430 --> 00:04:56,237
So, in regularization, what we're
因此在正规化里
140
00:04:56,237 --> 00:04:58,438
going to do, is take our
我们要做的事情 就是把我们的
141
00:04:58,438 --> 00:05:01,213
cost function, here's my cost function for linear regression.
代价函数 这里就是线性回归的代价函数
142
00:05:01,213 --> 00:05:02,656
And what I'm going to do
我现在要做的就是
143
00:05:02,660 --> 00:05:04,326
is, modify this cost
来修改这个代价函数
144
00:05:04,340 --> 00:05:06,246
function to shrink all
从而缩小
145
00:05:06,270 --> 00:05:07,643
of my parameters, because, you know,
我所有的参数值 因为你知道
146
00:05:07,643 --> 00:05:09,059
I don't know which
我不知道是哪个
147
00:05:09,059 --> 00:05:10,440
one or two to try to shrink.
哪一个或两个要去缩小
148
00:05:10,440 --> 00:05:11,690
So I am going to modify my
所以我就修改我的
149
00:05:11,690 --> 00:05:16,732
cost function to add a term at the end.
代价函数 在这后面添加一项
150
00:05:17,390 --> 00:05:20,436
Like so we have square brackets here as well.
就像我们在方括号里的这项
151
00:05:20,440 --> 00:05:22,212
When I add an extra
当我添加一个额外的
152
00:05:22,212 --> 00:05:23,516
regularization term at the
正则化项的时候
153
00:05:23,530 --> 00:05:25,510
end to shrink every
我们收缩了每个
154
00:05:25,560 --> 00:05:27,286
single parameter and so this
参数 并且因此
155
00:05:27,320 --> 00:05:28,745
term we tend to shrink
我们会使
156
00:05:28,760 --> 00:05:30,747
all of my parameters theta 1,
我们所有的参数 θ1
157
00:05:30,747 --> 00:05:32,746
theta 2, theta 3 up
θ2 θ3
158
00:05:32,746 --> 00:05:35,490
to theta 100.
直到 θ100 的值变小
159
00:05:36,790 --> 00:05:39,629
By the way, by convention the summation
顺便说一下 按照惯例来讲
160
00:05:39,629 --> 00:05:41,007
here starts from one so I
我们从第一个这里开始
161
00:05:41,007 --> 00:05:43,341
am not actually going penalize theta
所以我实际上没有去惩罚 θ0
162
00:05:43,360 --> 00:05:45,416
zero being large.
因此 θ0 的值是大的
163
00:05:45,470 --> 00:05:46,435
That sort of the convention that,
这就是一个约定
164
00:05:46,435 --> 00:05:48,664
the sum I equals one through
从1到 n 的求和
165
00:05:48,664 --> 00:05:50,185
N, rather than I equals zero
而不是从0到 n 的求和
166
00:05:50,190 --> 00:05:51,953
through N. But in practice,
但其实在实践中
167
00:05:51,960 --> 00:05:53,464
it makes very little difference, and,
这只会有非常小的差异
168
00:05:53,490 --> 00:05:54,788
whether you include, you know,
无论你是否包括这项
169
00:05:54,788 --> 00:05:56,221
theta zero or not, in
就是 θ0 这项
170
00:05:56,221 --> 00:05:59,532
practice, make very little difference to the results.
实际上 结果只有非常小的差异
171
00:05:59,540 --> 00:06:01,804
But by convention, usually, we regularize
但是按照惯例 通常情况下我们还是只
172
00:06:01,804 --> 00:06:03,356
only theta through theta
从 θ1 到 θ100 进行正规化
173
00:06:03,360 --> 00:06:06,084
100. Writing down
这里我们写下来
174
00:06:06,084 --> 00:06:08,978
our regularized optimization objective,
我们的正规化优化目标
175
00:06:08,978 --> 00:06:10,655
our regularized cost function again.
我们的正规化后的代价函数
176
00:06:10,655 --> 00:06:11,718
Here it is. Here's J of
就是这样的
177
00:06:11,718 --> 00:06:13,903
theta where, this term
J(θ) 这个项
178
00:06:13,970 --> 00:06:15,863
on the right is a regularization
右边的这项就是一个正则化项
179
00:06:15,863 --> 00:06:17,548
term and lambda
并且 λ
180
00:06:17,570 --> 00:06:23,950
here is called the regularization parameter and
在这里我们称做正规化参数
181
00:06:23,973 --> 00:06:26,334
what lambda does, is it
λ 要做的就是控制
182
00:06:26,334 --> 00:06:28,480
controls a trade off
在两个不同的目标中
183
00:06:28,510 --> 00:06:30,636
between two different goals.
的一个平衡关系
184
00:06:30,636 --> 00:06:32,478
The first goal, capture it
第一个目标
185
00:06:32,500 --> 00:06:34,399
by the first goal objective, is
第一个需要抓住的目标
186
00:06:34,399 --> 00:06:36,081
that we would like to train,
就是我们想要训练
187
00:06:36,090 --> 00:06:38,350
is that we would like to fit the training data well.
使假设更好地拟合训练数据
188
00:06:38,390 --> 00:06:41,083
We would like to fit the training set well.
我们希望假设能够很好的适应训练集
189
00:06:41,083 --> 00:06:42,954
And the second goal is,
而第二个目标是
190
00:06:42,954 --> 00:06:44,474
we want to keep the parameters
我们想要保持参数值较小
191
00:06:44,474 --> 00:06:46,053
small, and that's captured by
这就是第二项的目标
192
00:06:46,060 --> 00:06:49,103
the second term, by the regularization objective. And by the regularization term.
通过正则化目标函数
193
00:06:49,103 --> 00:06:53,583
And what lambda, the regularization
这就是λ 这个正则化
194
00:06:53,583 --> 00:06:55,937
parameter does is the controls the trade of
参数需要控制的
195
00:06:55,937 --> 00:06:57,694
between these two
它会这两者之间的平衡
196
00:06:57,694 --> 00:06:58,938
goals, between the goal of fitting the training set well
目标就是平衡拟合训练的目的
197
00:06:58,960 --> 00:07:00,562
and the
和
198
00:07:00,562 --> 00:07:02,043
goal of keeping the parameter plan
保持参数值较小的目的
199
00:07:02,080 --> 00:07:05,688
small and therefore keeping the hypothesis relatively
从而来保持假设的形式相对简单
200
00:07:05,688 --> 00:07:09,134
simple to avoid overfitting.
来避免过度的拟合