-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The scripts/code used to match the PDF miner outputs on documents to the XML representations #20
Comments
We cannot open source the code at the moment as it is related to our IP protection. |
Then how about publishing the alignment data themselves in some form? |
Em, I did not think of it before. Let me have a check along our legal approval chain. |
I assume this means that providing only the code for extracting annotations from XML representation is also not possible at the moment? |
@pollyMath Unfortunately that is what our IP lawyer told us. |
@zhxgj Did your lawyers reach a verdict regarding the publication of PDF/XML alignment data? Note: This is relevant to a number of potential applications of this corpus, for which some choices made in the COCO format would be incompatible or suboptimal, e.g.
|
Unfortunately not yet. I understand the benefits, but we cannot release
it yet. Thanks for your understanding.
…On Tue, Jan 12, 2021 at 3:49 AM Robert Sachunsky ***@***.***> wrote:
We cannot open source the code at the moment as it is related to our IP
protection.
Then how about publishing the alignment data themselves in some form?
Em, I did not think of it before. Let me have a check along our legal
approval chain.
@zhxgj <https://github.com/zhxgj> Did your lawyers reach a verdict
regarding the publication of PDF/XML alignment data?
Note: This is relevant to a number of potential applications of this
corpus, for which some choices made in the COCO format would be
incompatible or suboptimal, e.g.
- definition/granularity of region classes
- not annotating headers and footers
- not including reading order of regions
- not including text lines (contours / baselines)
- not including text content (plain) and text style (formatting)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#20 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6BZDOMQJ545RQ35QSAHDLSZMTXZANCNFSM4K34F7UA>
.
|
Do you provide the scripts/code that you developed to match the PDFMiner outputs on the documents to the XML representation of the PDF page itself? Thanks
The text was updated successfully, but these errors were encountered: