-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
page.getText('html') returns empty string #726
Comments
Interesting case! |
Found the problem: |
Nice one, thanks for the quick response! |
More specifically, the non-UTF8 characters only occur in the fontnames. You can have a look into this by comparing |
you can download a pre-version wheel from here. |
Thanks for the quick fix! Just ran it locally on linux and it is working fine now. |
I think I will make that version public over the weekend. |
New version available on PyPI: |
Describe the bug (mandatory)
page.getText('html')
is returning an empty string for some files. Interestingly,page.getText('text')
returns content so it is unclear why it is failing.To Reproduce (mandatory)
Code:
When using the url tagged
# Working file
everything works fine. When using the url tagged# Broken file
html is empty while text has content.Expected behavior (optional)
I should have gotten the file converted to a html format, or if there is an issue parsing some sort of error message.
Your configuration (mandatory)
The text was updated successfully, but these errors were encountered: