forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path18 - 4 - Ceiling Analysis_ What Part of the Pipeline to Work on Next (14 min).srt
2145 lines (1716 loc) · 41.1 KB
/
18 - 4 - Ceiling Analysis_ What Part of the Pipeline to Work on Next (14 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,090 --> 00:00:01,140
in earlier videos, I have
在前面的视频中 (字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
2
00:00:01,260 --> 00:00:02,510
said over and over that, when
我不止一次地说过
3
00:00:02,650 --> 00:00:03,980
you are developing machine learning system,
在你开发机器学习系统时
4
00:00:04,770 --> 00:00:06,630
one of the most valuable resources is
你最宝贵的资源
5
00:00:06,810 --> 00:00:08,050
your time as the developer
就是你的时间
6
00:00:08,490 --> 00:00:09,820
in terms of picking what
作为一个开发者
7
00:00:09,950 --> 00:00:11,520
to work on next.
你需要正确选择下一步的工作
8
00:00:11,950 --> 00:00:12,710
Or, you have a team of developers
或者也许你有一个开发团队
9
00:00:13,300 --> 00:00:14,610
or a team of engineers working together
或者一个工程师小组
10
00:00:15,090 --> 00:00:16,620
on a machine learning system, again
共同开发一个机器学习系统
11
00:00:16,930 --> 00:00:18,420
one of the most valuable resources is
同样 最宝贵的还是
12
00:00:18,990 --> 00:00:20,790
the time of the engineers or the developers working on the system.
开发系统所花费的时间
13
00:00:22,420 --> 00:00:23,340
And what you really want to
你需要尽量避免的
14
00:00:23,430 --> 00:00:25,340
avoid is that you or
情况是你或者
15
00:00:25,360 --> 00:00:26,410
your colleagues or your friends spend
你的同事 你的朋友
16
00:00:26,680 --> 00:00:27,560
a lot of time working on
花费了大量时间
17
00:00:27,970 --> 00:00:29,510
some component, only to realize
在某一个模块上
18
00:00:30,470 --> 00:00:31,540
after weeks or months of
在几周甚至几个月的努力以后
19
00:00:31,620 --> 00:00:33,070
time spent, that all that
才意识到所有这些付出的劳动
20
00:00:33,310 --> 00:00:35,090
work, you know, just doesn't
都对你最终系统的表现
21
00:00:35,380 --> 00:00:38,120
make a huge difference on the performance of the final system.
并没有太大的帮助
22
00:00:39,350 --> 00:00:40,430
In this video, what I'd
在这段视频中
23
00:00:40,550 --> 00:00:42,960
like to to is, to talk about something called ceiling analysis.
我将介绍一下关于上限分析(ceiling analysis)的内容
24
00:00:44,510 --> 00:00:45,760
When you or your team
当你自己或你跟
25
00:00:46,280 --> 00:00:47,270
are working on a pipeline
你的团队在设计某个
26
00:00:47,520 --> 00:00:48,860
machine learning system, this can
机器学习系统的流水线时
27
00:00:49,020 --> 00:00:50,380
sometimes give you a very
这种方式通常能
28
00:00:50,630 --> 00:00:51,650
strong signal, a very strong
提供一种很有价值的信号
29
00:00:52,340 --> 00:00:53,730
guidance, on what parts
或者说很有用的导向
30
00:00:54,150 --> 00:00:56,550
of the pipeline might be the best use of your time to work on.
告诉你流水线中的哪个部分最值得你花时间去完成
31
00:00:59,740 --> 00:01:01,700
To talk about ceiling analysis, I'm
为了介绍上限分析
32
00:01:01,860 --> 00:01:03,140
going to keep on using the
我将继续使用之前用过的
33
00:01:03,690 --> 00:01:04,910
example of the photo
照片OCR流水线的例子
34
00:01:05,640 --> 00:01:06,870
OCR pipeline and I said
在之前的课程中
35
00:01:07,170 --> 00:01:08,270
earlier each of these
我讲过这些方框
36
00:01:08,480 --> 00:01:09,900
boxes text detection, character
文字检测、字符分割
37
00:01:10,200 --> 00:01:12,140
segmentation, character recognition, each
字符识别
38
00:01:12,310 --> 00:01:13,730
of these boxes can have even
这每一个方框都可能
39
00:01:14,100 --> 00:01:15,550
a small engineering team working
需要一个小团队来完成
40
00:01:15,920 --> 00:01:17,370
on it, or maybe the
当然也可能
41
00:01:17,690 --> 00:01:18,640
entire system is just built
你一个人来构建整个系统
42
00:01:18,800 --> 00:01:19,700
by you, either way, but
不管怎样
43
00:01:19,960 --> 00:01:22,340
the question is, where should you allocate resources?
问题是 你应该怎样分配资源呢?
44
00:01:22,730 --> 00:01:24,250
Which of these boxes is
哪一个方框最值得
45
00:01:24,430 --> 00:01:26,630
most worth your efforts, trying
你投入精力去做
46
00:01:26,920 --> 00:01:28,260
to improve the performance of.
投入时间去改善效果
47
00:01:29,070 --> 00:01:30,350
In order to explain the idea
(以下这段同前重复,译者注)
48
00:01:30,840 --> 00:01:32,560
of ceiling analysis, I'm going
为了解释上限分析的原理
49
00:01:32,730 --> 00:01:35,690
to keep using the example of our photo OCR pipeline.
我将继续使用照片OCR流水线的例子
50
00:01:37,000 --> 00:01:38,320
As I mentioned earlier, each of
在之前的视频中我讲过
51
00:01:38,430 --> 00:01:39,630
these boxes here, each of
这里的每个方框
52
00:01:39,850 --> 00:01:41,860
these machine learning components could be
都表示一个机器学习的组成部分
53
00:01:42,170 --> 00:01:43,270
the work of even a
需要一个小团队来完成
54
00:01:43,470 --> 00:01:44,720
small team of engineers, or
当然也可能
55
00:01:45,280 --> 00:01:48,110
maybe the whole system could be built by just one person.
整个系统都由一个人来完成
56
00:01:48,780 --> 00:01:49,920
But the question is, where should
但问题是
57
00:01:50,100 --> 00:01:51,990
you allocate scarce resources?
你应该如何分配资源呢?
58
00:01:52,130 --> 00:01:53,200
Now this, which of these
也就是说
59
00:01:53,690 --> 00:01:54,860
components, or which one or
这些模块中
60
00:01:54,950 --> 00:01:56,250
two or maybe all three of these components
哪一个 或者哪两个、三个
61
00:01:57,080 --> 00:01:58,540
is most worth your time
是最值得你花更多的
62
00:01:59,200 --> 00:02:01,060
to try to improve the performance of.
精力去改善它的效果的?
63
00:02:01,660 --> 00:02:02,810
So here's the idea of ceiling analysis.
这便是上限分析要做的事
64
00:02:04,140 --> 00:02:05,520
As in the development process for
跟其他机器学习系统的
65
00:02:05,890 --> 00:02:07,170
other machine learning systems as
开发过程一样
66
00:02:07,340 --> 00:02:08,490
well, in order to make
为了决定
67
00:02:08,670 --> 00:02:09,740
decisions on what to do
要开发这个系统应该
68
00:02:09,970 --> 00:02:11,150
for developing the system
采取什么样的行动
69
00:02:11,710 --> 00:02:12,770
is going to be
一个有效的方法是
70
00:02:12,900 --> 00:02:14,070
very helpful to have a
对学习系统使用一个
71
00:02:14,580 --> 00:02:17,650
single road number evaluation metric for this learning system.
数值评价量度
72
00:02:18,450 --> 00:02:19,390
So let's say we pick characters level accuracy.
所以假如我们用字符准确度作为这个量度
73
00:02:19,530 --> 00:02:21,140
So if, you know, given a
因此 给定一个
74
00:02:21,570 --> 00:02:22,840
test set image, while just
测试样本图像
75
00:02:22,860 --> 00:02:24,710
a fraction of alphabets of
那么这个数值就表示
76
00:02:25,060 --> 00:02:26,570
characters in the testing image that
我们对测试图像中的文字
77
00:02:28,980 --> 00:02:29,390
we recognize correctly.
识别正确的比例
78
00:02:29,550 --> 00:02:30,830
Or you can pick some other single world
或者你也可以选择
79
00:02:31,030 --> 00:02:32,270
number evaluation metric, if you
其他的某个数值评价度量值
80
00:02:32,370 --> 00:02:33,740
want, but let's say that
随你选择
81
00:02:34,040 --> 00:02:35,820
whatever evaluation metric we
但不管选择什么评价量度值
82
00:02:35,920 --> 00:02:37,680
pick, we get that, we
我们只是假设
83
00:02:37,880 --> 00:02:40,090
find that the overall system currently has 72% accuracy.
整个系统的估计准确率为72%
84
00:02:40,350 --> 00:02:42,210
So, in other
所以换句话说
85
00:02:42,350 --> 00:02:43,380
words, we have some set
我们有一些测试集图像
86
00:02:43,520 --> 00:02:44,960
of test set images and for
并且对测试集中的
87
00:02:45,180 --> 00:02:46,460
each test set images, we
每一幅图像
88
00:02:46,640 --> 00:02:47,850
run it through text section, then
我们都对其分别运行
89
00:02:47,980 --> 00:02:49,280
character 7 nation, then character
文字检测、字符分割
90
00:02:49,560 --> 00:02:50,680
recognition, and we find
然后字符识别
91
00:02:51,010 --> 00:02:52,240
that on our test set, the
然后我们发现
92
00:02:52,370 --> 00:02:53,570
overall accuracy of the
整个测试集的准确率是72%
93
00:02:53,800 --> 00:02:56,220
entire system was 72% on one of the metric you chose.
不管你用什么度量值来度量
94
00:02:58,120 --> 00:02:59,700
Now just the idea behind
下面是上限分析的
95
00:03:00,070 --> 00:03:01,610
sealing analysis which is that
主要思想
96
00:03:01,910 --> 00:03:03,530
we're going to go to let
首先我们关注
97
00:03:03,670 --> 00:03:05,100
see the first module of a
这个机器学习流程中的
98
00:03:05,400 --> 00:03:06,810
machinery pipelines text detection.
第一个模块 文字检测
99
00:03:07,270 --> 00:03:08,400
And what we are going
而我们要做的
100
00:03:08,420 --> 00:03:09,170
to do is we are going to
实际上是在
101
00:03:09,270 --> 00:03:11,310
monkey around with the test set.
给测试集样本捣点儿乱
102
00:03:11,980 --> 00:03:12,920
We are going to go to the
我们要对
103
00:03:12,990 --> 00:03:14,270
test set and for every test example
每一个测试集样本
104
00:03:14,830 --> 00:03:16,170
we are just going to provide it
都给它提供一个
105
00:03:16,380 --> 00:03:18,230
the correct text detection outputs.
正确的文字检测结果
106
00:03:19,210 --> 00:03:20,300
In other words, we are going
换句话说
107
00:03:20,560 --> 00:03:21,760
to the test set and just
我们要遍历每个测试集样本
108
00:03:21,960 --> 00:03:23,340
manually tell the algorithm
然后人为地告诉算法
109
00:03:24,350 --> 00:03:26,210
where the text is
每一个测试样本中
110
00:03:26,780 --> 00:03:27,940
in each of the test examples.
什么地方出现了文字
111
00:03:28,950 --> 00:03:29,960
So in other words, we
因此换句话说
112
00:03:30,030 --> 00:03:31,510
are going to simulate what happens
我们是要仿真出
113
00:03:32,030 --> 00:03:33,640
if we have a text detection
如果是100%
114
00:03:33,890 --> 00:03:35,350
system with a 100%
正确地检测出
115
00:03:35,610 --> 00:03:37,180
accuracy, for the purpose
图片中的文字信息
116
00:03:38,300 --> 00:03:40,410
of detecting text in an image.
应该是什么样的
117
00:03:42,050 --> 00:03:43,070
And really the way you
当然 要做到这个
118
00:03:43,110 --> 00:03:44,210
do that is very simple right, instead
是很容易的
119
00:03:44,620 --> 00:03:45,840
of letting your learning algorithm
现在不用你的学习算法
120
00:03:46,340 --> 00:03:47,630
detect the text in the images.
来检测图像中的文字了
121
00:03:48,180 --> 00:03:49,110
You wouldn't say go to the
你只需要找到对应的图像
122
00:03:49,340 --> 00:03:51,230
images and just manually label what
然后人为地识别出
123
00:03:51,540 --> 00:03:53,620
is the location of the text in my test set image.
测试集图像中出现文字的区域
124
00:03:54,200 --> 00:03:55,040
And you would then let these
然后你要做的就是让这些
125
00:03:55,530 --> 00:03:56,620
correct, so let these ground
绝对正确的结果
126
00:03:56,990 --> 00:03:58,370
true labels of where as
这些绝对为真的标签
127
00:03:58,560 --> 00:04:00,010
the text be part of
也就是告诉你
128
00:04:00,090 --> 00:04:01,330
your text set and use these
图像中哪些位置
129
00:04:01,580 --> 00:04:02,990
ground true labels what you
有文字信息的标签
130
00:04:03,110 --> 00:04:04,200
feed in to the next
把它们传给下一个模块
131
00:04:04,470 --> 00:04:07,550
stage of the pipeline, to the character segmentation pipeline.
也就是传给字符分割模块
132
00:04:07,710 --> 00:04:09,250
So just said it again, by
我再说一遍
133
00:04:09,680 --> 00:04:10,790
putting a checkmark over here,
这里打钩的地方
134
00:04:11,500 --> 00:04:12,590
what I mean is Im going
我想做的是
135
00:04:12,750 --> 00:04:13,750
to go to my test set and
遍历我的测试集
136
00:04:13,860 --> 00:04:14,970
just give it the correct answers,
直接向它公布“标准答案”
137
00:04:15,480 --> 00:04:16,520
give it the correct labels, for
为这个流程中的文字检测部分
138
00:04:16,650 --> 00:04:18,250
the text detection part of the pipeline.
直接提供正确的标签
139
00:04:19,240 --> 00:04:20,280
So that, as it, I have
这样好像我就会
140
00:04:20,410 --> 00:04:21,700
a perfect text detection system
有一个非常棒的文字检测系统
141
00:04:22,370 --> 00:04:24,270
on my test One into
能很好地检测我的测试样本
142
00:04:24,460 --> 00:04:26,570
do that run this data
然后我们要做的是
143
00:04:27,190 --> 00:04:28,150
to the rest of five points
继续运行完接下来的几个模块
144
00:04:28,530 --> 00:04:29,860
paper presentation and counter definition.
也就是字符分割和字符识别
145
00:04:30,680 --> 00:04:31,930
And then, use the same
然后使用跟之前一样的
146
00:04:32,300 --> 00:04:33,310
evaluation metric as before,
评价量度指标
147
00:04:34,000 --> 00:04:35,240
to measure what is the
来测量整个系统的
148
00:04:35,450 --> 00:04:36,900
overall accuracy of the entire system.
总体准确度
149
00:04:37,790 --> 00:04:39,890
And with perfect hopefully the performance goes up.
这样用准确的文字检测结果 系统的表现应该会有提升
150
00:04:40,330 --> 00:04:41,870
Let 's say it
假如说 准确率
151
00:04:41,930 --> 00:04:44,550
goes up 89% and then
提高到89%
152
00:04:44,680 --> 00:04:45,830
were going to keep going, next lets
然后我们继续进行
153
00:04:46,090 --> 00:04:47,120
go to the next selection of
接着执行流水线中的下一模块 字符分割
154
00:04:47,330 --> 00:04:50,230
pipeline, two character segmentation and again were going to go to my test.
同前面一样 我还是去找出我的测试集
155
00:04:50,540 --> 00:04:52,300
And now going to
然后现在我不仅用
156
00:04:52,390 --> 00:04:54,140
give the correct text detection
标准的文字检测结果
157
00:04:54,900 --> 00:04:55,970
output and give the correct
我还同时用标准的
158
00:04:56,490 --> 00:04:58,220
character segmentation outputs and
字符分割结果
159
00:04:59,400 --> 00:05:00,780
manually label the correct
所以还是遍历测试样本
160
00:05:01,330 --> 00:05:03,710
segment orientations of text into individual characters.
人工地给出正确的字符分割结果
161
00:05:04,730 --> 00:05:05,560
And see how much that helps.
然后看看这样做以后 效果怎样变化
162
00:05:05,810 --> 00:05:06,670
And let's say it goes up to
假如我们这样做以后
163
00:05:06,800 --> 00:05:09,140
90% accuracy for the overall system.
整个系统准确率提高到90%
164
00:05:10,070 --> 00:05:11,060
Alright so as always the accuracy is.
注意跟前面一样 这里说的准确率
165
00:05:11,340 --> 00:05:13,420
Accuracy of the overall systems.
是指整个系统的准确率
166
00:05:14,120 --> 00:05:15,460
So whatever the final output
所以无论最后一个模块
167
00:05:15,830 --> 00:05:17,450
of the character recognition system is.
字符识别模块给出的最终输出是什么
168
00:05:17,560 --> 00:05:18,870
Whatever the final output of
无论整个流水线的
169
00:05:19,040 --> 00:05:19,660
the overall pipeline is, it's going
最后输出结果是什么
170
00:05:19,930 --> 00:05:22,400
to measure the accuracy of that.
我们都是测出的整个系统的准确率
171
00:05:22,520 --> 00:05:23,720
And then finally like character recognition
最后我们还是执行最后一个模块 字符识别
172
00:05:24,170 --> 00:05:26,170
system and give that the correct label as well.
同样也是人工给出这一模块的正确标签
173
00:05:26,780 --> 00:05:29,270
And if I do that too then, no surprise that I should get a 100% accuracy.
这样做以后 我应该理所当然得到100%准确率
174
00:05:31,270 --> 00:05:32,530
Now, the nice thing about having
进行上限分析的
175
00:05:32,850 --> 00:05:34,340
done this analysis analysis is we
一个好处是
176
00:05:34,450 --> 00:05:36,080
can now understand what is
我们现在就知道了
177
00:05:36,700 --> 00:05:40,250
the upside potential for improving each of these components.
如果对每一个模块进行改善 它们各自的上升空间是多大
178
00:05:41,390 --> 00:05:44,180
So we see that if we get perfect text detection.
所以 我们可以看到 如果我们拥有完美的文字检测模块
179
00:05:44,950 --> 00:05:46,360
Our performance went up from
那么整个系统的表现将会从
180
00:05:46,710 --> 00:05:48,080
72 to 89 percent, so
准确率72%上升到89%
181
00:05:48,420 --> 00:05:50,670
that's' a 17 percent performance gain.
因此效果的增益是17%
182
00:05:51,640 --> 00:05:52,680
So this means that you've
这就意味着
183
00:05:52,890 --> 00:05:54,030
to take your current system you
如果你在现有系统的基础上
184
00:05:54,160 --> 00:05:56,130
spend a lot of time improving text detection.
花费时间和精力改善文字检测模块的效果
185
00:05:57,330 --> 00:05:58,750
That means that we could potentially improve
那么系统的表现
186
00:05:59,200 --> 00:06:00,640
our system's performance by 17 percent.
可能会提高17%
187
00:06:01,020 --> 00:06:02,850
This seems like it's well worth our while.
看起来这还挺值得
188
00:06:03,770 --> 00:06:05,840
Whereas in contrast, when going
而相对来讲
189
00:06:06,200 --> 00:06:08,360
from text detection When we
如果我们取得完美的字符分割模块
190
00:06:08,640 --> 00:06:12,450
gave it perfect character segmentation, performance went up only by one percent.
那么最终系统表现只提升了1%
191
00:06:13,020 --> 00:06:14,820
So, that's a more sobering message.
这便提供了一个很重要的信息
192
00:06:15,250 --> 00:06:16,880
It means that no matter how
这就告诉我们
193
00:06:17,090 --> 00:06:18,510
much time you spend character segmentation,
不管我们投入多大精力在字符分割上
194
00:06:19,800 --> 00:06:20,990
maybe the upside potential is
系统效果的潜在上升空间
195
00:06:21,080 --> 00:06:22,280
going to be pretty small, and maybe
也都是很小很小
196
00:06:22,460 --> 00:06:23,420
you do not want to
所以你就不会让一个
197
00:06:23,580 --> 00:06:24,340
have a large team of engineers
比较大的工程师团队
198
00:06:24,860 --> 00:06:26,860
working on character segmentation that
花时间忙于字符分割模块
199
00:06:26,990 --> 00:06:28,860
this sort of analysis shows that
因为通过上限分析我们知道了
200
00:06:29,150 --> 00:06:30,180
even when you give it the
即使你把字符分割模块做得再好