-
-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing text from PDF #137
Comments
I'm personally looking to find some text and replace the "field"'s contents |
Hello @nunofgs! It is not currently possible to parse plain text out of a document with pdf-lib (but you can extract the content of acroform fields). I'd suggest you consider using PDF.js to parse/extract text. Of course, this isn't an ideal solution since it requires two different libraries for a seemingly simple task. But it's the best approach I know of for now, until pdf-lib gains support for text parsing. |
@dasilvacontin Is the field you are working with just plain text? Or is it an acroform field? If it is raw text, I'm afraid pdf-lib doesn't have the necessary features to parse it (but as I mentioned above, you could use PDF.js instead). However, if it's in an acroform, pdf-lib should be able to do what you need. pdf-lib's acroform support isn't currently well documented, so I'd suggest taking a look at some of the existing acroform issues. Please let me know if you have any questions! |
Hi @Hopding, thank you for the great lib.
Apologies if this is a newbie question, but I can't seem to find a way to parse text out of an existing PDF. I'm looking to retrieve a string from a PDF in order to determine which page it's on.
Any idea how I could accomplish this?
The text was updated successfully, but these errors were encountered: