forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path10 - 2 - Evaluating a Hypothesis (8 min).srt
711 lines (569 loc) · 15.9 KB
/
10 - 2 - Evaluating a Hypothesis (8 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
1
00:00:00,146 --> 00:00:02,515
In this video, I would like to talk about how to
在本节视频中我想介绍一下
(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
2
00:00:02,523 --> 00:00:06,662
evaluate a hypothesis that has been learned by your algorithm.
怎样评价通过你的学习算法得到的一个假设
3
00:00:06,685 --> 00:00:09,200
In later videos, we will build on this
基于这节课的讨论 在之后的视频中
4
00:00:09,231 --> 00:00:11,846
to talk about how to prevent in the problems of
我们还将讨论如何防止
5
00:00:11,869 --> 00:00:14,908
overfitting and underfitting as well.
过拟合和欠拟合的问题
6
00:00:15,615 --> 00:00:19,023
When we fit the parameters of our learning algorithm
当我们确定学习算法的参数时
7
00:00:19,038 --> 00:00:23,154
we think about choosing the parameters to minimize the training error.
我们考虑的是选择参数来使训练误差最小化
8
00:00:23,169 --> 00:00:26,077
One might think that getting a really low value of
有人认为 得到一个很小的训练误差
9
00:00:26,100 --> 00:00:28,108
training error might be a good thing,
一定是一件好事
10
00:00:28,108 --> 00:00:29,562
but we have already seen that
但我们已经知道
11
00:00:29,562 --> 00:00:32,400
just because a hypothesis has low training error,
仅仅是因为这个假设具有很小的训练误差
12
00:00:32,400 --> 00:00:35,254
that doesn't mean it is necessarily a good hypothesis.
并不能说明它一定是一个好的假设
13
00:00:35,254 --> 00:00:40,223
And we've already seen the example of how a hypothesis can overfit.
我们也学习了过拟合假设的例子
14
00:00:40,415 --> 00:00:45,785
And therefore fail to generalize the new examples not in the training set.
这时推广到新的训练样本上就不灵了
15
00:00:45,962 --> 00:00:50,000
So how do you tell if the hypothesis might be overfitting.
那么 你怎样判断一个假设是否是过拟合的呢
16
00:00:50,015 --> 00:00:54,346
In this simple example we could plot the hypothesis h of x
对于这个简单的例子 我们可以
17
00:00:54,365 --> 00:00:56,338
and just see what was going on.
画出假设函数h(x) 然后观察
18
00:00:56,346 --> 00:01:00,538
But in general for problems with more features than just one feature,
但对于更一般的情况 特征不止一个的例子
19
00:01:00,554 --> 00:01:03,531
for problems with a large number of features like these
就像这样有很多特征的问题
20
00:01:03,546 --> 00:01:06,692
it becomes hard or may be impossible
想要通过画出假设函数来观察
21
00:01:06,708 --> 00:01:09,515
to plot what the hypothesis looks like
就变得很难甚至不可能了
22
00:01:09,531 --> 00:01:13,046
and so we need some other way to evaluate our hypothesis.
因此 我们需要另一种评价假设函数的方法
23
00:01:13,062 --> 00:01:17,315
The standard way to evaluate a learned hypothesis is as follows.
如下给出了一种评价假设的标准方法
24
00:01:17,331 --> 00:01:19,308
Suppose we have a data set like this.
假如我们有这样一组数据组
25
00:01:19,323 --> 00:01:21,977
Here I have just shown 10 training examples,
在这里我只展示了10组训练样本
26
00:01:21,992 --> 00:01:23,969
but of course usually we may have
当然我们通常可以有
27
00:01:23,985 --> 00:01:27,254
dozens or hundreds or maybe thousands of training examples.
成百上千组训练样本
28
00:01:27,269 --> 00:01:30,246
In order to make sure we can evaluate our hypothesis,
为了确保我们可以评价我们的假设函数
29
00:01:30,262 --> 00:01:32,808
what we are going to do is split
我要做的是
30
00:01:32,823 --> 00:01:35,554
the data we have into two portions.
将这些数据分成两部分
31
00:01:35,569 --> 00:01:40,723
The first portion is going to be our usual training set
第一部分将成为我们的训练集
32
00:01:42,638 --> 00:01:47,446
and the second portion is going to be our test set,
第二部分将成为我们的测试集
33
00:01:47,462 --> 00:01:50,398
and a pretty typical split of this
将所有数据分成训练集和测试集
34
00:01:50,413 --> 00:01:53,482
all the data we have into a training set and test set
其中一种典型的分割方法是
35
00:01:53,498 --> 00:01:57,936
might be around say a 70%, 30% split.
按照7:3的比例
36
00:01:57,952 --> 00:02:00,052
Worth more today to grade the training set
将70%的数据作为训练集
37
00:02:00,067 --> 00:02:02,367
and relatively less to the test set.
30%的数据作为测试集
38
00:02:02,382 --> 00:02:05,782
And so now, if we have some data set,
因此 现在如果我们有了一些数据
39
00:02:05,790 --> 00:02:08,459
we run a sine of say 70%
我们只用其中的70%
40
00:02:08,475 --> 00:02:11,529
of the data to be our training set where here "m"
作为我们的训练集
41
00:02:11,544 --> 00:02:14,336
is as usual our number of training examples
这里的m依然表示训练样本的总数
42
00:02:14,352 --> 00:02:16,913
and the remainder of our data
而剩下的那部分数据
43
00:02:16,929 --> 00:02:19,310
might then be assigned to become our test set.
将被用作测试集
44
00:02:19,325 --> 00:02:23,410
And here, I'm going to use the notation m subscript test
在这里 我使用m下标test
45
00:02:23,425 --> 00:02:27,187
to denote the number of test examples.
来表示测试样本的总数
46
00:02:27,202 --> 00:02:32,225
And so in general, this subscript test is going to denote
因此 这里的下标test将表示
47
00:02:32,241 --> 00:02:34,987
examples that come from a test set so that
这些样本是来自测试集
48
00:02:35,002 --> 00:02:40,810
x1 subscript test, y1 subscript test is my first
因此x(1)test y(1)test将成为我的
49
00:02:40,825 --> 00:02:43,648
test example which I guess in this example
第一组测试样本
50
00:02:43,664 --> 00:02:45,656
might be this example over here.
我想应该是这里的这一组样本
51
00:02:45,671 --> 00:02:47,495
Finally, one last detail
最后再提醒一点
52
00:02:47,510 --> 00:02:50,795
whereas here I've drawn this as though the first 70%
在这里我是选择了前70%的数据作为训练集
53
00:02:50,810 --> 00:02:54,479
goes to the training set and the last 30% to the test set.
后30%的数据作为测试集
54
00:02:54,495 --> 00:02:57,518
If there is any sort of ordinary to the data.
但如果这组数据有某种规律或顺序的话
55
00:02:57,533 --> 00:03:01,048
That should be better to send a random 70%
那么最好是
56
00:03:01,048 --> 00:03:02,948
of your data to the training set and a
随机选择70%作为训练集
57
00:03:02,964 --> 00:03:05,556
random 30% of your data to the test set.
剩下的30%作为测试集
58
00:03:05,571 --> 00:03:08,579
So if your data were already randomly sorted,
当然如果你的数据已经随机分布了
59
00:03:08,595 --> 00:03:12,110
you could just take the first 70% and last 30%
那你可以选择前70%和后30%
60
00:03:12,125 --> 00:03:14,718
that if your data were not randomly ordered,
但如果你的数据不是随机排列的
61
00:03:14,733 --> 00:03:16,756
it would be better to randomly shuffle or
最好还是打乱顺序
62
00:03:16,771 --> 00:03:19,718
to randomly reorder the examples in your training set.
或者使用一种随机的顺序来构建你的数据
63
00:03:19,733 --> 00:03:23,310
Before you know sending the first 70% in the training set
然后再取出前70%作为训练集
64
00:03:23,325 --> 00:03:26,669
and the last 30% of the test set.
后30%作为测试集
65
00:03:27,054 --> 00:03:30,169
Here then is a fairly typical procedure
接下来 这里展示了一种典型的方法
66
00:03:30,185 --> 00:03:32,008
for how you would train and test
你可以按照这些步骤训练和测试你的学习算法
67
00:03:32,023 --> 00:03:34,492
the learning algorithm and the learning regression.
比如线性回归算法
68
00:03:34,508 --> 00:03:38,115
First, you learn the parameters theta from the training set
首先 你需要对训练集进行学习得到参数theta
69
00:03:38,131 --> 00:03:41,798
so you minimize the usual training error objective j of theta,
具体来讲就是最小化训练误差J(θ)
70
00:03:41,813 --> 00:03:44,713
where j of theta here was defined using that
这里的J(θ)是使用那70%数据
71
00:03:44,729 --> 00:03:47,059
70% of all the data you have.
来定义得到的
72
00:03:47,075 --> 00:03:49,759
There is only the training data.
也就是仅仅是训练数据
73
00:03:49,882 --> 00:03:52,167
And then you would compute the test error.
接下来 你要计算出测试误差
74
00:03:52,182 --> 00:03:56,298
And I am going to denote the test error as j subscript test.
我将用J下标test来表示测试误差
75
00:03:56,313 --> 00:03:59,229
And so what you do is take your parameter theta
那么你要做的就是
76
00:03:59,259 --> 00:04:02,190
that you have learned from the training set, and plug it in here
取出你之前从训练集中学习得到的参数theta放在这里
77
00:04:02,205 --> 00:04:04,875
and compute your test set error.
来计算你的测试误差
78
00:04:04,890 --> 00:04:08,529
Which I am going to write as follows.
可以写成如下的形式
79
00:04:08,698 --> 00:04:11,275
So this is basically
这实际上是测试集
80
00:04:11,290 --> 00:04:15,244
the average squared error
平方误差的
81
00:04:15,269 --> 00:04:18,154
as measured on your test set.
平均值
82
00:04:18,169 --> 00:04:19,915
It's pretty much what you'd expect.
这就是你期望得到的值
83
00:04:19,931 --> 00:04:23,415
So if we run every test example through your hypothesis
因此 我们使用包含参数theta的假设函数对每一个测试样本进行测试
84
00:04:23,431 --> 00:04:28,008
with parameter theta and just measure the squared error
然后通过假设函数和测试样本
85
00:04:28,023 --> 00:04:33,338
that your hypothesis has on your m subscript test, test examples.
计算出mtest个平方误差
86
00:04:33,354 --> 00:04:37,054
And of course, this is the definition of the
当然 这是当我们使用线性回归
87
00:04:37,069 --> 00:04:40,815
test set error if we are using linear regression
和平方误差标准时
88
00:04:40,831 --> 00:04:44,362
and using the squared error metric.
测试误差的定义
89
00:04:44,377 --> 00:04:47,477
How about if we were doing a classification problem
那么如果是考虑分类问题
90
00:04:47,492 --> 00:04:50,654
and say using logistic regression instead.
比如说使用逻辑回归的时候呢
91
00:04:50,669 --> 00:04:53,877
In that case, the procedure for training
训练和测试逻辑回归的步骤
92
00:04:53,892 --> 00:04:57,085
and testing say logistic regression is pretty similar
与之前所说的非常类似
93
00:04:57,100 --> 00:04:59,985
first we will do the parameters from the training data,
首先我们要从训练数据 也就是所有数据的70%中
94
00:05:00,000 --> 00:05:02,331
that first 70% of the data.
学习得到参数theta
95
00:05:02,346 --> 00:05:05,115
And it will compute the test error as follows.
然后用如下的方式计算测试误差
96
00:05:05,131 --> 00:05:07,015
It's the same objective function
目标函数和我们平常
97
00:05:07,031 --> 00:05:09,592
as we always use but we just logistic regression,
做逻辑回归的一样
98
00:05:09,608 --> 00:05:11,569
except that now is define using
唯一的区别是
99
00:05:11,585 --> 00:05:15,115
our m subscript test, test examples.
现在我们使用的是mtest个测试样本
100
00:05:15,131 --> 00:05:17,600
While this definition of the test set error
这里的测试误差Jtest(θ)
101
00:05:17,631 --> 00:05:20,238
j subscript test is perfectly reasonable.
其实不难理解
102
00:05:20,254 --> 00:05:22,231
Sometimes there is an alternative
有时这是另一种形式的测试集
103
00:05:22,246 --> 00:05:25,469
test sets metric that might be easier to interpret,
更易于理解
104
00:05:25,485 --> 00:05:27,877
and that's the misclassification error.
这里的误差其实叫误分类率
105
00:05:27,892 --> 00:05:30,792
It's also called the zero one misclassification error,
也被称为0/1错分率
106
00:05:30,808 --> 00:05:32,692
with zero one denoting that
0/1表示了
107
00:05:32,708 --> 00:05:36,146
you either get an example right or you get an example wrong.
你预测到的正确或错误样本的情况
108
00:05:36,162 --> 00:05:37,910
Here's what I mean.
我想说的是这个意思
109
00:05:37,925 --> 00:05:41,795
Let me define the error of a prediction.
可以这样定义一次预测的误差
110
00:05:41,825 --> 00:05:44,202
That is h of x.
关于假设h(x)
111
00:05:44,218 --> 00:05:47,518
And given the label y as
和标签y的误差
112
00:05:47,533 --> 00:05:51,848
equal to one if my hypothesis
那么这个误差等于1
113
00:05:51,864 --> 00:05:54,633
outputs the value greater than equal to five
当你的假设函数h(x)的值大于等于0.5
114
00:05:54,641 --> 00:05:57,510
and Y is equal to zero
并且y的值等于0
115
00:05:57,525 --> 00:06:03,718
or if my hypothesis outputs a value of less than 0.5
或者当h(x)小于0.5
116
00:06:03,733 --> 00:06:05,402
and y is equal to one,
并且y的值等于1
117
00:06:05,418 --> 00:06:08,118
right, so both of these cases basic respond
因此 这两种情况都表明
118
00:06:08,133 --> 00:06:11,833
to if your hypothesis mislabeled the example
你的假设对样本进行了误判
119
00:06:11,833 --> 00:06:14,518
assuming your threshold at an 0.5.
这里定义阈值为0.5
120
00:06:14,533 --> 00:06:18,171
So either thought it was more likely to be 1, but it was actually 0,
那么也就是说 假设结果更趋向于1 但实际是0
121
00:06:18,187 --> 00:06:20,733
or your hypothesis stored was more likely
或者说假设更趋向于0
122
00:06:20,748 --> 00:06:23,556
to be 0, but the label was actually 1.
但实际的标签却是1
123
00:06:23,571 --> 00:06:28,471
And otherwise, we define this error function to be zero.
否则 我们将误差值定义为0
124
00:06:28,487 --> 00:06:34,841
If your hypothesis basically classified the example y correctly.
此时你的假设值能够正确对样本y进行分类
125
00:06:34,864 --> 00:06:38,841
We could then define the test error,
然后 我们就能应用错分率误差
126
00:06:38,856 --> 00:06:42,371
using the misclassification error metric to be
来定义测试误差
127
00:06:42,387 --> 00:06:46,779
one of the m tests of sum from i equals one
也就是1/mtest 乘以
128
00:06:46,795 --> 00:06:49,941
to m subscript test of the
h(i)(xtest)和y(i)的错分率误差
129
00:06:49,956 --> 00:06:55,164
error of h of x(i) test
从i=1到mtest
130
00:06:55,179 --> 00:06:57,971
comma y(i).
的求和
131
00:06:57,987 --> 00:07:02,010
And so that's just my way of writing out that this is exactly
这样我就写出了我的定义方式
132
00:07:02,025 --> 00:07:05,587
the fraction of the examples in my test set
这实际上就是我的假设函数误标记的
133
00:07:05,602 --> 00:07:08,864
that my hypothesis has mislabeled.
那部分测试集中的样本
134
00:07:08,871 --> 00:07:10,602
And so that's the definition of
这也就是使用
135
00:07:10,618 --> 00:07:13,687
the test set error using the misclassification error
0/1错分率或误分类率
136
00:07:13,718 --> 00:07:16,948
of the 0 1 misclassification metric.
的准则来定义的测试误差
137
00:07:16,971 --> 00:07:19,995
So that's the standard technique for evaluating
以上我们介绍了一套标准技术
138
00:07:20,010 --> 00:07:22,833
how good a learned hypothesis is.
来评价一个已经学习过的假设
139
00:07:22,848 --> 00:07:25,579
In the next video, we will adapt these ideas
在下一段视频中我们要应用这些方法
140
00:07:25,595 --> 00:07:28,525
to helping us do things like choose what features
来帮助我们进行诸如特征选择一类的问题
141
00:07:28,541 --> 00:07:31,641
like the degree polynomial to use with the learning algorithm
比如多项式次数的选择
142
00:07:31,656 --> 00:07:34,964
or choose the regularization parameter for learning algorithm.
或者正则化参数的选择