-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/parse_html_embed_objects #2233
Comments
@scanny - What do you think about this? I think I'd rather avoid dynamically linked videos or images in HTML files. For images at least, converting the HTML to PDF could work to extract the images. I don't think we're likely to do anything with iframes. |
tl;dr: We could potentially capture those links but probably not traverse them to actually capture the image or video bytes.
|
Yeah downloading malicious content from the link was my main concern as well. I like the idea of treating |
I am trying to parse HTML documents containing embedded images and youtube videos inside iframe. I am able to use partition_html function get textual elements, as well metdata object containing ahref tags. However the image element as well iframe elements are being missed out.
I would like to have these data points made available either as separete elements like HTMLImage, HTMLIframe or attach these link urls as well made available as part of the metadata object's link_urls.
The text was updated successfully, but these errors were encountered: