forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path12 - 3 - Mathematics Behind Large Margin Classification (Optional) (20 min).srt
2641 lines (2113 loc) · 45.5 KB
/
12 - 3 - Mathematics Behind Large Margin Classification (Optional) (20 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,680 --> 00:00:01,740
In this video, I'd like to
在本节课中 我将
(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
2
00:00:01,900 --> 00:00:02,960
tell you a bit about the
介绍一些
3
00:00:03,210 --> 00:00:04,680
math behind large margin classification.
大间隔分类背后的数学原理
4
00:00:05,960 --> 00:00:08,390
This video is optional, so please feel free to skip it.
本节为选学部分 你完全可以跳过它
5
00:00:09,260 --> 00:00:10,380
It may also give you better
但是听听这节课可能让你对
6
00:00:10,650 --> 00:00:11,980
intuition about how the
支持向量机中的优化问题
7
00:00:12,460 --> 00:00:13,830
optimization problem of the
以及如何得到
8
00:00:13,940 --> 00:00:15,540
support vex machine, how that
大间距分类器
9
00:00:15,860 --> 00:00:17,150
leads to large margin classifiers.
产生更好的直观理解
10
00:00:21,180 --> 00:00:22,530
In order to get started, let
首先
11
00:00:22,600 --> 00:00:23,730
me first remind you of a
让我来给大家复习一下
12
00:00:23,970 --> 00:00:26,490
couple of properties of what vector inner products look like.
关于向量内积的知识
13
00:00:28,310 --> 00:00:29,280
Let's say I have two vectors
假设我有两个向量
14
00:00:29,900 --> 00:00:32,180
U and V, that look like this.
u 和 v 我将它们写在这里
15
00:00:32,950 --> 00:00:34,180
So both two dimensional vectors.
两个都是二维向量
16
00:00:35,460 --> 00:00:36,940
Then let's see what U
我们看一下
17
00:00:37,440 --> 00:00:39,550
transpose V looks like.
u 转置乘以 v 的结果
18
00:00:40,160 --> 00:00:42,180
And U transpose V is
u 转置乘以 v
19
00:00:42,300 --> 00:00:43,720
also called the inner products
也叫做向量 u 和 v
20
00:00:44,490 --> 00:00:45,880
between the vectors U and V.
之间的内积
21
00:00:48,360 --> 00:00:49,960
Use a two dimensional vector, so
由于是二维向量 我可以
22
00:00:50,380 --> 00:00:51,940
I can on plot it on this figure.
将它们画在这个图上
23
00:00:52,760 --> 00:00:53,860
So let's say
我们说
24
00:00:54,040 --> 00:00:55,850
that's the vector U. And
这就是向量 u
25
00:00:55,960 --> 00:00:56,930
what I mean by that is
即
26
00:00:57,110 --> 00:00:59,160
if on the horizontal axis that
在横轴上
27
00:00:59,360 --> 00:01:00,820
value takes whatever value
取值为某个u1
28
00:01:01,560 --> 00:01:03,280
U1 is and on the
而在纵轴上
29
00:01:03,350 --> 00:01:04,820
vertical axis the height
高度是
30
00:01:05,100 --> 00:01:06,360
of that is whatever U2
某个 u2 作为U的
31
00:01:07,340 --> 00:01:08,530
is the second component
第二个分量
32
00:01:08,990 --> 00:01:12,580
of the vector U. Now, one
现在
33
00:01:12,860 --> 00:01:13,760
quantity that will be nice
很容易计算的
34
00:01:14,040 --> 00:01:15,430
to have is the norm
一个量就是向量 u 的
35
00:01:16,500 --> 00:01:17,540
of the vector U. So, these
范数
36
00:01:17,860 --> 00:01:19,390
are, you know, double bars on
这是双竖线
37
00:01:19,540 --> 00:01:20,380
the left and right that denotes
左边一个 右边一个
38
00:01:20,800 --> 00:01:22,610
the norm or length of
表示 u 的范数
39
00:01:22,730 --> 00:01:23,930
U. So this just means; really the
即 u 的长度
40
00:01:24,200 --> 00:01:27,330
euclidean length of the
即向量 u 的欧几里得长度
41
00:01:27,410 --> 00:01:30,800
vector U. And this
根据
42
00:01:31,350 --> 00:01:33,600
is Pythagoras theorem is just
毕达哥拉斯定理 等于
43
00:01:33,940 --> 00:01:35,420
equal to U1
它等于 u1 平方
44
00:01:35,620 --> 00:01:37,300
squared plus U2
加上 u2 平方
45
00:01:37,530 --> 00:01:40,190
squared square root, right?
开根号
46
00:01:40,300 --> 00:01:42,780
And this is the length of the vector U. That's a real number.
这是向量 u 的长度 它是一个实数
47
00:01:43,730 --> 00:01:44,750
Just say you know, what is the length
现在你知道了
48
00:01:45,080 --> 00:01:46,120
of this, what is the
这个的长度是多少
49
00:01:46,220 --> 00:01:48,900
length of this vector down here.
这个向量的长度写在这里了
50
00:01:49,680 --> 00:01:50,490
What is the length of this
我刚刚画的这个
51
00:01:50,760 --> 00:01:52,990
arrow that I just drew, is the normal view?
向量的长度就知道了
52
00:01:56,020 --> 00:01:57,300
Now let's go back and
现在让我们回头来看
53
00:01:57,450 --> 00:01:59,660
look at the vector V because we want to compute the inner product.
向量v 因为我们想计算内积
54
00:02:00,430 --> 00:02:01,380
So V will be some other
v 是另一个向量
55
00:02:01,520 --> 00:02:03,150
vector with, you know,
它的两个分量 v1 和 v2
56
00:02:03,310 --> 00:02:06,900
some value V1, V2.
是已知的
57
00:02:08,340 --> 00:02:10,490
And so, the vector
向量 v
58
00:02:10,880 --> 00:02:15,050
V will look like that, towards V like so.
可以画在这里
59
00:02:16,920 --> 00:02:18,260
Now let's go back
现在让我们
60
00:02:18,640 --> 00:02:19,880
and look at how to compute
来看看如何计算
61
00:02:20,400 --> 00:02:21,610
the inner product between U
u 和 v 之间的内积
62
00:02:21,860 --> 00:02:23,320
and V. Here's how you can do it.
这就是具体做法
63
00:02:24,010 --> 00:02:25,780
Let me take the vector V and
我们将向量 v
64
00:02:26,200 --> 00:02:28,440
project it down onto the
投影到
65
00:02:28,550 --> 00:02:29,700
vector U. So I'm going
向量 u 上
66
00:02:29,930 --> 00:02:31,900
to take a orthogonal projection or
我们做一个直角投影
67
00:02:31,970 --> 00:02:33,700
a 90 degree projection, and project
或者说一个90度投影
68
00:02:33,920 --> 00:02:35,490
it down onto U like so.
将其投影到 u 上
69
00:02:36,650 --> 00:02:37,410
And what I'm going to do
接下来我度量
70
00:02:38,130 --> 00:02:39,480
measure length of this
这条红线的
71
00:02:40,210 --> 00:02:41,520
red line that I just drew here.
长度
72
00:02:41,720 --> 00:02:42,620
So, I'm going to call the length of
我称这条红线的
73
00:02:42,730 --> 00:02:44,670
that red line P. So, P
长度为 p 因此 p
74
00:02:45,530 --> 00:02:46,830
is the length or is
就是长度 或者说是
75
00:02:46,890 --> 00:02:48,230
the magnitude of the projection
向量 v 投影到
76
00:02:49,670 --> 00:02:51,670
of the vector V onto the
向量 u 上的量
77
00:02:51,790 --> 00:02:54,380
vector U. Let me just write that down.
我将它写下来
78
00:02:54,560 --> 00:02:55,600
So, P is the length
p 是 v
79
00:02:57,500 --> 00:03:02,150
of the projection of the
投影到
80
00:03:02,260 --> 00:03:05,800
vector V onto the
向量 u 上的
81
00:03:05,920 --> 00:03:08,210
vector U. And it is
长度
82
00:03:08,430 --> 00:03:10,510
possible to show that unit
因此可以
83
00:03:10,790 --> 00:03:12,710
product U transpose V, that
将 u 转置乘以 v
84
00:03:12,870 --> 00:03:13,540
this is going to be equal
写作
85
00:03:13,840 --> 00:03:16,330
to P times the
p 乘以
86
00:03:16,430 --> 00:03:18,020
norm or the length of
u 的范数或者说
87
00:03:18,110 --> 00:03:21,130
the vector U. So, this
u的长度
88
00:03:21,460 --> 00:03:23,400
is one way to compute the inner product.
这是计算内积的一种方法
89
00:03:24,070 --> 00:03:25,590
And if you actually do
如果你从几何上
90
00:03:25,780 --> 00:03:27,160
the geometry figure out what
画出 p 的值
91
00:03:27,330 --> 00:03:29,280
P is and figure out what the norm of U is.
同时画出 u 的范数
92
00:03:29,900 --> 00:03:30,690
This should give you the same
你也会同样地
93
00:03:31,050 --> 00:03:32,330
way, the same answer as
计算出内积
94
00:03:32,680 --> 00:03:33,840
the other way of computing unit product.
答案是一样的
95
00:03:34,860 --> 00:03:34,860
Right.
对吧
96
00:03:35,070 --> 00:03:36,140
Which is if you take U
另一个计算公式是
97
00:03:36,280 --> 00:03:38,150
transpose V then U transposes
u 转置乘以 v 就是
98
00:03:39,000 --> 00:03:40,930
this U1 U2, its a
[u1 u2] 这个一行两列的矩阵
99
00:03:41,090 --> 00:03:42,650
one by two matrix, 1
乘以
100
00:03:43,220 --> 00:03:45,250
times V. And so
v 因此
101
00:03:45,620 --> 00:03:46,930
this should actually give you
可以得到
102
00:03:47,490 --> 00:03:50,630
U1, V1 plus U2, V2.
u1×v1 加上 u2×v2
103
00:03:51,700 --> 00:03:53,140
And so the theorem of
根据线性代数的知识
104
00:03:53,310 --> 00:03:55,010
linear algebra that these two
这两个公式
105
00:03:55,180 --> 00:03:56,880
formulas give you the same answer.
会给出同样的结果
106
00:03:57,890 --> 00:03:58,720
And by the way, U transpose
顺便说一句
107
00:03:59,290 --> 00:04:01,010
V is also equal to
u 转置乘以 v
108
00:04:01,320 --> 00:04:03,490
V transpose U. So if
等于 v 转置乘以 u
109
00:04:03,650 --> 00:04:04,510
you were to do the same process
因此如果你将 u 和 v 交换位置
110
00:04:05,050 --> 00:04:06,860
in reverse, instead of projecting
将 u 投影到 v 上
111
00:04:07,120 --> 00:04:08,130
V onto U, you could project
而不是将 v 投影到 u 上
112
00:04:08,520 --> 00:04:09,940
U onto V. Then, you know, do
然后做同样地计算
113
00:04:10,160 --> 00:04:12,410
the same process, but with the rows of U and V reversed.
只是把 u 和 v 的位置交换一下
114
00:04:13,170 --> 00:04:14,390
And you would actually, you should
你事实上可以
115
00:04:14,710 --> 00:04:16,900
actually get the same number whatever that number is.
得到同样的结果
116
00:04:17,540 --> 00:04:18,790
And just to clarify what's
申明一点
117
00:04:18,990 --> 00:04:20,850
going on in this equation the
在这个等式中
118
00:04:21,030 --> 00:04:21,920
norm of U is a real
u 的范数是一个实数
119
00:04:22,100 --> 00:04:25,260
number and P is also a real number.
p也是一个实数
120
00:04:25,760 --> 00:04:28,720
And so U transpose V is
因此 u 转置乘以 v
121
00:04:29,410 --> 00:04:32,350
the regular multiplication as two real numbers of
就是两个实数
122
00:04:33,040 --> 00:04:34,440
the length of P times the normal view.
正常相乘
123
00:04:35,580 --> 00:04:36,960
Just one last detail, which is
最后一点
124
00:04:37,190 --> 00:04:38,240
if you look at the norm of
需要注意的就是p值
125
00:04:38,330 --> 00:04:40,250
P, P is actually signed so to the right.
p事实上是有符号的
126
00:04:41,350 --> 00:04:43,240
And it can either be positive or negative.
即它可能是正值 也可能是负值
127
00:04:44,350 --> 00:04:45,530
So let me say what I mean
我的意思是说
128
00:04:45,650 --> 00:04:46,740
by that, if U
如果 u
129
00:04:47,170 --> 00:04:49,360
is a vector that looks like
是一个类似这样的向量
130
00:04:49,640 --> 00:04:51,360
this and V is a vector that looks like this.
v 是一个类似这样的向量
131
00:04:52,380 --> 00:04:53,890
So if the angle between U
u 和 v 之间的
132
00:04:54,130 --> 00:04:55,770
and V is greater than ninety degrees.
夹角大于90度
133
00:04:56,620 --> 00:04:57,960
Then if I project V onto
则如果将 v
134
00:04:58,270 --> 00:05:00,220
U, what I get
投影到 u 上 会得到
135
00:05:00,420 --> 00:05:01,590
is a projection it looks like
这样的一个投影
136
00:05:01,720 --> 00:05:03,860
this and so that length
这是 p 的长度
137
00:05:04,110 --> 00:05:05,490
P. And in this
在这个情形下
138
00:05:05,670 --> 00:05:06,900
case, I will still have
我们仍然有
139
00:05:07,670 --> 00:05:09,510
that U transpose V is
u 转置乘以 v
140
00:05:09,660 --> 00:05:11,720
equal to P times the
是等于 p 乘以
141
00:05:11,800 --> 00:05:14,070
norm of U. Except in
u 的范数
142
00:05:14,200 --> 00:05:16,600
this example P will be negative.
唯一一点不同的是 p 在这里是负的
143
00:05:19,150 --> 00:05:20,990
So, you know, in inner products if the angle
在内积计算中 如果 u 和 v 之间的夹角
144
00:05:21,320 --> 00:05:22,540
between U and V is less
小于90度
145
00:05:22,790 --> 00:05:23,820
than ninety degrees, then P
那么那条红线的长度
146
00:05:24,100 --> 00:05:26,480
is the positive length for that red line
p 是正值
147
00:05:27,130 --> 00:05:28,420
whereas if the angle of this
然而如果
148
00:05:28,720 --> 00:05:29,640
angle of here is greater
这个夹角
149
00:05:30,000 --> 00:05:31,890
than 90 degrees then P
大于90度 则p
150
00:05:32,130 --> 00:05:33,880
here will be negative of
将会是负的
151
00:05:34,130 --> 00:05:37,260
the length of the super line of that little line segment right over there.
就是这个小线段的长度是负的
152
00:05:37,650 --> 00:05:38,750
So the inner product between two
因此两个向量之间的内积
153
00:05:38,900 --> 00:05:40,130
vectors can also be negative
也是负的
154
00:05:40,820 --> 00:05:42,900
if the angle between them is greater than 90 degrees.
如果它们之间的夹角大于90度
155
00:05:43,770 --> 00:05:45,100
So that's how vector inner
这就是关于向量内积的知识
156
00:05:45,310 --> 00:05:46,490
products work. We're going to
我们接下来将会
157
00:05:46,930 --> 00:05:47,960
use these properties of vector
使用这些关于向量内积的
158
00:05:48,280 --> 00:05:49,610
inner product to try
性质 试图来
159
00:05:49,840 --> 00:05:51,880
to understand the support
理解支持向量机
160
00:05:52,400 --> 00:05:54,490
vector machine optimization objective over there. Here
中的目标函数
161
00:05:54,630 --> 00:05:58,620
is the optimization objective for the
这就是我们先前给出的
162
00:05:58,650 --> 00:06:00,900
support vector machine that we worked out earlier. Just for
支持向量机模型中的目标函数
163
00:06:01,100 --> 00:06:02,070
the purpose of this slide I
为了讲解方便
164
00:06:02,120 --> 00:06:04,520
am going to make one simplification or
我做一点简化
165
00:06:04,910 --> 00:06:08,220
once just to make the objective easy
仅仅是为了让目标函数
166
00:06:08,670 --> 00:06:10,110
to analyze and what I'm going to do is
更容易被分析 我接下来忽略掉截距
167
00:06:10,270 --> 00:06:14,160
ignore the indeceptrums. So, we'll just ignore theta 0 and set that to be equal to 0. To
令 θ0 等于 0
168
00:06:16,510 --> 00:06:22,950
make things easier to plot, I'm also going to set N the number of features to be equal to 2. So, we have only 2 features,
这样更容易画示意图 我将特征数 n 置为2 因此我们仅有
169
00:06:23,980 --> 00:06:24,710
X1 and X2.
两个特征 x1 和 x2
170
00:06:26,510 --> 00:06:27,980
Now, let's look at the objective function.
现在 我们来看一下目标函数
171
00:06:28,470 --> 00:06:29,910
The optimization objective of the
支持向量机的优化目标函数
172
00:06:30,160 --> 00:06:32,130
SVM. What we have only two features.
当我们仅有两个特征
173
00:06:32,630 --> 00:06:33,710
When N is equal to 2.
即 n=2 时
174
00:06:34,170 --> 00:06:35,340
This can be written,
这个式子可以写作
175
00:06:36,130 --> 00:06:37,900
one half of
二分之一
176
00:06:38,040 --> 00:06:40,080
theta one squared plus theta two squared.
θ1 平方加上 θ2 平方
177
00:06:40,620 --> 00:06:42,870
Because we only have two parameters, theta one and thetaa two.
我们只有两个参数 θ1 和θ2
178
00:06:45,240 --> 00:06:46,730
What I'm going to do is rewrite this a bit.
接下来我重写一下
179
00:06:46,940 --> 00:06:47,900
I'm going to write this as one
我将其重写成
180
00:06:48,090 --> 00:06:49,980
half of theta one
二分之一 θ1 平方
181
00:06:50,190 --> 00:06:51,860
squared plus theta two squared and
加上 θ2 平方
182
00:06:52,050 --> 00:06:54,160
the square root squared.
开平方根后再平方
183
00:06:54,820 --> 00:06:55,760
And the reason I can do that,
我这么做的根据是
184
00:06:56,100 --> 00:06:58,990
is because for any number, you know, W, right, the
对于任何数 w
185
00:07:00,830 --> 00:07:02,480
square roots of W and
w的平方根 再取平方
186
00:07:02,570 --> 00:07:03,930
then squared, that's just equal
得到的就是
187
00:07:04,080 --> 00:07:05,650
to W. So square roots
w 本身 因此平方根 然后平方
188
00:07:05,840 --> 00:07:07,250
and squared should give you the same thing.
并不会改变值的大小
189
00:07:08,600 --> 00:07:09,500
What you may notice is that
你可能注意到
190
00:07:09,730 --> 00:07:11,870
this term inside is that's
括号里面的这一项
191
00:07:12,290 --> 00:07:13,450
equal to the norm
是向量 θ
192
00:07:14,530 --> 00:07:16,460
or the length of the
的范数
193
00:07:16,690 --> 00:07:18,250
vector theta and what
或者说是向量 θ 的长度
194
00:07:18,430 --> 00:07:20,020
I mean by that is that
我的意思是
195
00:07:20,200 --> 00:07:21,640
if we write out the
如果我们将
196
00:07:21,700 --> 00:07:22,590
vector theta like this, as
向量 θ 写出来
197
00:07:23,080 --> 00:07:24,320
you know theta one, theta two.
θ1 θ2
198
00:07:25,260 --> 00:07:26,260
Then this term that I've just
那么我刚刚画红线的这一项
199
00:07:26,690 --> 00:07:28,230
underlined in red, that's exactly
就是向量 θ
200
00:07:28,640 --> 00:07:30,480
the length, or the norm, of the vector theta.
的长度或范数