You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I see that in some cases the columns are not processed correctly.
It jumps from information in one column to another, causing the resulting information to be out of order and incorrect.
I provide you with an example pdf where it occurs: ejemplo.pdf
I also show you the problem below:
If I process the pdf with pymupdf, it does it correctly:
The text was updated successfully, but these errors were encountered:
I found the problem: The joining of original text blocks happens too aggressively, so the page number at the bottom gets joined and recursively causes all the text on page being joined in one big single block.
This causes nonsense to come out in the end.
As a quick fix, you can use margins=(0, 0, 0, 72) to ignore the page number block.
Hello, I see that in some cases the columns are not processed correctly.
It jumps from information in one column to another, causing the resulting information to be out of order and incorrect.
I provide you with an example pdf where it occurs:
ejemplo.pdf
I also show you the problem below:
If I process the pdf with pymupdf, it does it correctly:
The text was updated successfully, but these errors were encountered: