-
Notifications
You must be signed in to change notification settings - Fork 448
Description
tabula-java and tabula auto-detect tables give different results
For example for file
example.pdf
tabula:
give me this
[
{
"page": 1,
"extraction_method": "stream",
"selection_id": "X1542028577619",
"x1": 30.001895919113164,
"x2": 568.0223525383758,
"y1": 90.9968955695343,
"y2": 169.13264653083803,
"width": 538.0204566192626,
"height": 78.13575096130371,
"spec_index": 0
},
{
"page": 1,
"extraction_method": "stream",
"selection_id": "B1542028577644",
"x1": 30.001895919113164,
"x2": 568.0223525383758,
"y1": 354.99676186752316,
"y2": 483.73471345138546,
"width": 538.0204566192626,
"height": 128.7379515838623,
"spec_index": 1
}
]
tabula-java:
java -jar /usr/local/bin/tabula-1.0.2-jar-with-dependencies.jar -g example.pdf -f JSON
show:
{
"extraction_method": "stream",
"top": 0,
"left": 0,
"width": 440.54998779296875,
"height": 167.16000366210938,
...
{
"extraction_method": "stream",
"top": 0,
"left": 0,
"width": 566.2899780273438,
"height": 482.2799987792969,
...