Description
On this image:
with configuration:
"hocr_font_info":"1",
"tessedit_pageseg_mode":"1"
O1 optimization will produce:
"GI 96%:\n\n14:18\n\n \n\nno" Kyivstar ‘3 '3\n\n< 3aMeTKl/I\n\n27 anpenn 2016 r‘, 14:18\n\nLeben ham.\nnicht von Dritten Uberwacht\n99.! mir attain Md.-\n\ngehbrt und\nLeben haben,\n\n \n\n"
O2 optimization will produce:
"GI 96%:\n\n14:18\n\n \n\nno" Kyivstar ‘3 '3\n\n< 3aMeTKl/I\n\n27 anpenn 2016 r‘, 14:18\n\nnicht von Dritten Uberwacht\n99.! mir attain Md.-\n\ngehbrt und\nLeben haben,\n\n \n\n"
O2 optimization loses first line of text ("Leben haben").
This happens because of segmentation bug.
Looks like the issue is with colpartition::VCoreOverlap method. This method might cause integer overflow (which is undefined behavior) if median_top_ or median_bottom_ are not computed.
This happens for LeaderPartitions, median_top_ and median_bottom_ are initialized to MAX_INT32 and -MAX_INT32 and never get updated.