Skip to content

Commit

Permalink
FindAndRemoveLines/FindVerticalAlignment: decrease fixed vline min le…
Browse files Browse the repository at this point in the history
…ngth

When detecting vertical separators, the blob aligner is used to glue
line segments (often segmented due to artificial cracks).
But (unlike LineFinder) it has many parameters that are not
relative to pixel density/resolution.
This change decreases the minimum absolute length in pixels
for vertical separators.
  • Loading branch information
bertsky committed Aug 24, 2020
1 parent 0228d93 commit 65a077d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/textord/alignedblob.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ const int kMinRaggedTabs = 5;
// Min number of points to accept for an aligned tab stop.
const int kMinAlignedTabs = 4;
// Constant number of pixels minimum height of a vertical line.
const int kVLineMinLength = 500;
const int kVLineMinLength = 300;
// Minimum gradient for a vertical tab vector. Used to prune away junk
// tab vectors with what would be a ridiculously large skew angle.
// Value corresponds to tan(90 - max allowed skew angle)
Expand Down

4 comments on commit 65a077d

@rmast
Copy link

@rmast rmast commented on 65a077d Jun 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit is the commit that is pointed at by git bisect as the first bad commit concerning this issue: #3906
It is unclear whether an already existent bug is shifted into sight, or whether this really is the cause. I doubt this is the cause.

@rmast
Copy link

@rmast rmast commented on 65a077d Jun 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway,

The 'Projection' view of the ScrollView-debugger shows a real difference with this setting. Back on 500 it looks like this, so the lowest line is complete:
image
With the 300 value it is broken into two following the columns above:
tesseract --dpi 300 ~/Downloads/175789293-f39ddfdb-6f3e-4598-8d16-80a1f4a88b36.jpg test1 segdemo inter -c textord_tabfind_show_strokewidths=1

image

@rmast
Copy link

@rmast rmast commented on 65a077d Jun 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the original finding that led to this commit?

@stweil
Copy link
Member

@stweil stweil commented on 65a077d Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.