forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path14 - 6 - Reconstruction from Compressed Representation (4 min).srt
565 lines (452 loc) · 9.82 KB
/
14 - 6 - Reconstruction from Compressed Representation (4 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
1
00:00:00,120 --> 00:00:01,020
In some of the earlier videos,
在以前的视频中,(字幕翻译:中国海洋大学,黄海广,haiguang2000@qq.com)
2
00:00:01,690 --> 00:00:03,300
I was talking about PCA as
我谈论PCA
3
00:00:03,410 --> 00:00:05,270
a compression algorithm where you
作为压缩算法。
4
00:00:05,330 --> 00:00:06,760
may have say, a thousand dimensional
在那里你可能需要把1000维
5
00:00:07,270 --> 00:00:08,760
data and compress it
的数据压缩
6
00:00:09,100 --> 00:00:10,850
to a hundred dimensional feature back
一百维特征,
7
00:00:11,010 --> 00:00:12,360
there, or have three dimensional
或具有三维
8
00:00:12,810 --> 00:00:14,980
data and compress it to a two dimensional representation.
数据压缩到一二维表示。
9
00:00:16,360 --> 00:00:17,430
So, if this is a
所以,如果这是一个
10
00:00:17,620 --> 00:00:19,040
compression algorithm, there should
压缩算法,应该
11
00:00:19,360 --> 00:00:20,440
be a way to go back from
能回到
12
00:00:20,660 --> 00:00:22,930
this compressed representation, back to
这个压缩表示,回
13
00:00:23,030 --> 00:00:25,560
an approximation of your original high dimensional data.
到你原有的高维数据的一种近似。
14
00:00:26,340 --> 00:00:28,070
So, given z(i), which maybe
所以,给定的Z(i),这可能
15
00:00:28,780 --> 00:00:30,250
a hundred dimensional, how do
一百维,怎么办
16
00:00:30,320 --> 00:00:31,710
you go back to your original
你回到你原来的
17
00:00:32,050 --> 00:00:34,720
representation x(i), which was maybe a thousand dimensional?
表示x(i),这可能是一千维的数组?
18
00:00:35,760 --> 00:00:36,820
In this video, I'd like to
这个视频,我想
19
00:00:36,930 --> 00:00:40,350
describe how to do that.
描述如何去做。
20
00:00:40,500 --> 00:00:43,620
In the PCA algorithm, we may have an example like this.
在PCA算法,我们可能有一个这样的样本。
21
00:00:43,940 --> 00:00:45,670
So maybe that's my example x1
也许那是那是我们的样本x1,
22
00:00:45,910 --> 00:00:47,810
and maybe that's my example x2.
也许那是我的样本X2。
23
00:00:48,110 --> 00:00:49,340
And what we do
我们做的是,
24
00:00:49,570 --> 00:00:51,010
is, we take these examples and
我们把这些样本
25
00:00:51,120 --> 00:00:54,160
we project them onto this one dimensional surface.
投射到这一个维平面。
26
00:00:55,150 --> 00:00:56,280
And then now we need
然后现在我们需要
27
00:00:56,450 --> 00:00:57,750
to use only a real number,
只使用一个实数,
28
00:00:58,350 --> 00:01:00,500
say z1, to specify the
比如Z1,指定
29
00:01:00,600 --> 00:01:01,950
location of these points after
这些点的位置后
30
00:01:02,300 --> 00:01:04,640
they've been projected onto this one dimensional surface. So
他们被投射到这一个三维曲面。所以
31
00:01:04,890 --> 00:01:06,930
, given a point
,给定一个点
32
00:01:07,730 --> 00:01:08,730
like this, given a point z1,
这样,给定一个点Z1,
33
00:01:08,980 --> 00:01:10,840
how can we go back to
我们怎么能回去
34
00:01:11,000 --> 00:01:12,580
this original two-dimensional space?
这个原始的二维空间?
35
00:01:13,290 --> 00:01:15,380
And in particular, given the
特别是,给定
36
00:01:15,510 --> 00:01:16,510
point z, which is an
Z点,这是一个
37
00:01:16,660 --> 00:01:17,840
r, can we map
R,我们可以映射
38
00:01:18,160 --> 00:01:19,660
this back to some approximate
这些来回到一些近似
39
00:01:20,440 --> 00:01:22,060
representation x and r2
表示X和R2
40
00:01:22,370 --> 00:01:24,970
of whatever the original value of the data was?
的任何数据的原始值?
41
00:01:26,520 --> 00:01:28,090
So, whereas z equals U
而z等于
42
00:01:28,400 --> 00:01:29,570
reduced transverse x, if you
UTreduced x,如果你
43
00:01:29,680 --> 00:01:30,640
want to go in the opposite
想去相反的
44
00:01:30,930 --> 00:01:33,620
direction, the equation for
方向,方程
45
00:01:33,780 --> 00:01:35,150
that is, we're going
变为,
46
00:01:35,290 --> 00:01:38,220
to write x approx equals
x approx 等于
47
00:01:40,470 --> 00:01:43,570
U reduce times z.
U reduce乘以 z
48
00:01:44,020 --> 00:01:44,880
Again, just to check the dimensions,
只是为了检查维度,
49
00:01:45,950 --> 00:01:47,760
here U reduce is
这里U reduce 为
50
00:01:47,970 --> 00:01:48,990
going to be an n by k
N*K
51
00:01:49,680 --> 00:01:51,260
dimensional vector, z is
维向量,Z
52
00:01:51,370 --> 00:01:53,270
going to be a k by 1 dimensional vector.
变成k阶一维向量。
53
00:01:54,030 --> 00:01:56,280
So, we multiply these out and that's going to be n by one.
所以,我们将这些相乘,将是一个N。
54
00:01:56,720 --> 00:01:58,270
So x approx is going
所以x approx
55
00:01:58,450 --> 00:01:59,990
to be an n dimensional vector.
是一个N维向量。
56
00:02:00,310 --> 00:02:01,320
And so the intent of PCA,
这就是PCA的意图,
57
00:02:01,390 --> 00:02:03,320
that is, the square projection error
方块上的投影误差
58
00:02:03,620 --> 00:02:04,510
is not too big, is that
不太大,
59
00:02:04,730 --> 00:02:06,050
this x approx will be
x approx将会
60
00:02:06,500 --> 00:02:08,640
close to whatever was
接近
61
00:02:08,960 --> 00:02:10,090
the original value of x
x的初始值
62
00:02:10,270 --> 00:02:13,140
that you had used to derive z in the first place.
你有用于导出Z放在第一位。
63
00:02:14,080 --> 00:02:16,630
To show a picture of what this looks like, this is what it looks like.
这张图片,看起来就是它的样子。
64
00:02:16,870 --> 00:02:17,820
What you get back of this
当你回到
65
00:02:17,970 --> 00:02:19,640
procedure are points that lie
程序点
66
00:02:19,920 --> 00:02:22,860
on the projection of that onto the green line.
的投影到绿色线。
67
00:02:23,500 --> 00:02:24,580
So to take our early example,
所以采取我们早期的样本,
68
00:02:24,920 --> 00:02:26,400
if we started off with
如果我们开始
69
00:02:26,610 --> 00:02:28,570
this value of x1, and got
这个值x1,并得到了
70
00:02:28,850 --> 00:02:29,690
this z1, if you plug
Z1,如果你用
71
00:02:30,310 --> 00:02:32,760
z1 through this formula to get
Z1通过这个公式得到
72
00:02:33,440 --> 00:02:35,510
x1 approx, then this
x1 approx,那么这个
73
00:02:35,730 --> 00:02:37,040
point here, that will be
点在这里,那将是
74
00:02:37,590 --> 00:02:40,110
x1 approx, which is
x1 approx,这
75
00:02:40,260 --> 00:02:41,990
going to be r2 and so.
将变为R2等。
76
00:02:42,780 --> 00:02:44,060
And similarly, if you
同样地,如果你
77
00:02:44,180 --> 00:02:45,640
do the same procedure, this will
做同样的程序,这将
78
00:02:45,760 --> 00:02:47,840
be x2 approx.
是x2 approx。
79
00:02:49,640 --> 00:02:50,630
And you know, that's a pretty
如你所知,这是一个漂亮的
80
00:02:50,780 --> 00:02:53,160
decent approximation to the original data.
与原始数据相当相似。
81
00:02:53,670 --> 00:02:54,870
So, that's how you
所以,这就是你
82
00:02:55,060 --> 00:02:56,190
go back from your low dimensional
从低维
83
00:02:56,630 --> 00:02:58,350
representation z back to
表示Z回到
84
00:02:58,700 --> 00:03:00,720
an uncompressed representation of
未压缩的表示
85
00:03:00,760 --> 00:03:01,990
the data we get back an
我们得到的数据的一个
86
00:03:02,240 --> 00:03:03,480
the approxiamation to your original
之间你的原始
87
00:03:03,690 --> 00:03:05,400
data x, and we
数据X,我们
88
00:03:05,500 --> 00:03:07,210
also call this process reconstruction
也把这个过程称为重建
89
00:03:08,220 --> 00:03:08,900
of the original data.
原始数据。
90
00:03:09,530 --> 00:03:10,950
When we think of trying to reconstruct
当我们认为试图重建
91
00:03:11,310 --> 00:03:13,630
the original value of x from the compressed representation.
从压缩表示x的初始值。
92
00:03:16,770 --> 00:03:18,370
So, given an unlabeled data
所以,给定未标记的数据
93
00:03:18,610 --> 00:03:19,850
set, you now know how to
集,您现在知道如何
94
00:03:19,990 --> 00:03:21,710
apply PCA and take
应用PCA和带
95
00:03:21,970 --> 00:03:23,800
your high dimensional features x and
你的高维特征X和
96
00:03:24,130 --> 00:03:25,440
map it to this
映射到这
97
00:03:25,560 --> 00:03:27,200
lower dimensional representation z, and
的低维表示Z,和
98
00:03:27,400 --> 00:03:28,630
from this video, hopefully you now
这个视频,希望你现在
99
00:03:28,910 --> 00:03:29,670
also know how to take
也知道如何采取
100
00:03:30,260 --> 00:03:31,690
these low representation z and
这些低维表示Z
101
00:03:31,860 --> 00:03:32,810
map the backup to an approximation
映射到备份到一个近似
102
00:03:33,700 --> 00:03:35,780
of your original high dimensional data.
你原有的高维数据。
103
00:03:37,300 --> 00:03:38,180
Now that you know how to
现在你知道如何
104
00:03:38,460 --> 00:03:40,280
implement in applying PCA, what
实施应用PCA,
105
00:03:40,470 --> 00:03:41,290
we will like to do next is to
我们将要做的事是
106
00:03:41,390 --> 00:03:42,250
talk about some of the
谈论
107
00:03:42,290 --> 00:03:43,460
mechanics of how to
一些技术在
108
00:03:43,990 --> 00:03:45,240
actually use PCA well,
实际使用PCA很好,
109
00:03:45,530 --> 00:03:46,670
and in particular, in the next
特别是,在接下来的
110
00:03:46,890 --> 00:03:47,610
video, I like to talk
视频中,我想谈一谈
111
00:03:48,090 --> 00:03:49,730
about how to choose K, which is,
关于如何选择K,这是,
112
00:03:49,910 --> 00:03:51,140
how to choose the dimension
如何选择维度
113
00:03:51,560 --> 00:03:53,570
of this reduced representation vector z.
这减少表示的向量Z。