- 
                Notifications
    
You must be signed in to change notification settings  - Fork 12
 
resegment: skip empty line polygons #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
           Good catch. What does the output look like with your fix?  | 
    
          
 No fix yet. I just avoid the combination   | 
    
(Do not ignore the foreground of neighbouring regions/segments during page or line segmentation if they actually cover all of the image. This can happen when bad previous segmentation made neighbours overlap heavily.)
(Do not ignore the foreground of neighbouring regions during line segmentation when already using a clipped derived image for the region. Clipping applies to foreground components selectively, while suppression would mask out the whole segment's outline.)
| 
           This pull request introduces 1 alert when merging 0549577 into c3fad1a - view on LGTM.com new alerts: 
  | 
    
| 
           I am thinking about removing the  The only place where that would matter is resegmentation – we need to decide which Ocropy text line to keep per input text line (and cannot keep multiple in the current scheme). But as stated in the opening example, I'll have to rewrite resegmentation anyway...  | 
    
(During line segmentation, when separators/neighbours of the text region are being suppressed, annotate a clipped derived image for that region, too. This is analogous to page segmentation, which already annotates a derived image with non-text suppressed.)
(When suppressing intruders from neighbours, make sure they are not suppressed in those neighbours as well.)
(When clipping intruders from neighbours, allow suppressing the foreground completely – without thresholding –, but avoid clipping if the segment's outline is properly contained in the neighbour – cannot have independent foreground at all.)
0549577    to
    b871e8f      
    Compare
  
    | 
           This pull request introduces 1 alert when merging b871e8f into c3fad1a - view on LGTM.com new alerts: 
  | 
    
| 
           This pull request introduces 1 alert when merging 6714823 into c3fad1a - view on LGTM.com new alerts: 
  | 
    
- when including existing segments in the recursive XY-cut (lines2regions) ordering/segmentation, avoid grouping them together with actual new text lines, and treat them like existing region assignments properly - simplify decoding into PAGE regions - when polygonalizing new regions from label masks, avoid creating hulls if these would create additional overlaps with existing regions
          
 Decided against that, because horizontal over-segmentation also introduces extra newlines. Improved the hmerge functionality instead.  | 
    
| 
           This pull request fixes 1 alert when merging 1a33f94 into c3fad1a - view on LGTM.com fixed alerts: 
  | 
    
| 
           This pull request fixes 1 alert when merging 6f8a612 into c3fad1a - view on LGTM.com fixed alerts: 
  | 
    
in `finalize`, if predefined region labels are present, when re-ordering the slice's old and new zones and assigning textlines to them, - calculate the order based on fg relationships, not bg - make sure textlines are assigned to their majority zone
when merging textlines within text regions horizontally, - do not only respect existing regions and fg separators (in blocking merges), but also bg separators - enlarge the region mask to the newly merged line bg
when trying to partition slices by separators, - also treat pre-existing regions like separators, and - fix the condition on smallest allowed partitions (insignificant but complete lines)
when no cut or separator-split partition can be found for the current slice, then attempt to find another separator-split by grouping lines along their mutual horizontal neighbourship with fg separators; repeatedly allow both kinds of partitioning, if interspersed
… must be softer than final criterion
| 
           This pull request introduces 2 alerts and fixes 1 when merging 529f7f5 into c3fad1a - view on LGTM.com new alerts: 
 fixed alerts: 
  | 
    
- calculate connected component analysis - calculate distance transform of existing labels - find new line seeds by flattening existing labels (via maximum distance) - propagate line seeds across connected components (by majority in case of conflict) - spread ccomps labels against each other into background - for each line, * if enough background and foreground wille be retained * find the hull polygon of the new line via alpha shape * annotate as new coordinates
- calculate connected component analysis - find new line seeds based on the existing baselines (by applying dilation above) - propagate line seeds across connected components (by majority in case of conflict) - spread ccomps labels against each other into the background - for each line, * if enough background and foreground will be retained * find the hull polygon of the new line via alpha shape * annotate as new coordinates
| 
           This pull request introduces 4 alerts and fixes 1 when merging 14e27e1 into c3fad1a - view on LGTM.com new alerts: 
 fixed alerts: 
  | 
    
| 
           @finkf this is ready for merging (and releasing) IMO. If you want me to do it myself, just give me a cue.  | 
    
| 
           This pull request introduces 2 alerts and fixes 1 when merging c2f9203 into c3fad1a - view on LGTM.com new alerts: 
 fixed alerts: 
  | 
    
Fixes a bug when the input lines are already empty.
I'll shortly append more fixes in the follow-up:
For example, I found that
resegmentdoes not cope well with large overlaps. The current algorithm simply fetches the largest line (optimising locally), without checking the neighbouring choices (optimising globally). See: