Skip to content

[BUG] "figure" or "picture" not parsed #654

Open
@jdavidlopez

Description

@jdavidlopez

Describe the bug
When an article (all from Substack or Medium) has images embed inside a <figure> it doesn't get parsed.

To Reproduce
Parse any article from Substack/Medium that contains images.

Expected behavior
When using keep_article_html=True images should be embedded there

Screenshots
N/A

System information

  • OS: Linux
  • Python version: 3.8
  • Library version: 0.9.3.1

Additional context
Example of image not being parsed

<figure class="mi mj mk ml mm mn mf mg paragraph-image">
  <div role="button" tabindex="0" class="mo mp ed mq bh mr">
    <div class="mf mg mh">
      <picture>
        <source srcset="https://miro.medium.com/v2/resize:fit:640/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 640w, https://miro.medium.com/v2/resize:fit:720/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 720w, https://miro.medium.com/v2/resize:fit:750/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 750w, https://miro.medium.com/v2/resize:fit:786/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 786w, https://miro.medium.com/v2/resize:fit:828/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 828w, https://miro.medium.com/v2/resize:fit:1100/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 1100w, https://miro.medium.com/v2/resize:fit:1400/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 1400w" sizes="(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px" type="image/webp">
        <source data-testid="og" srcset="https://miro.medium.com/v2/resize:fit:640/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 640w, https://miro.medium.com/v2/resize:fit:720/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 720w, https://miro.medium.com/v2/resize:fit:750/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 750w, https://miro.medium.com/v2/resize:fit:786/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 786w, https://miro.medium.com/v2/resize:fit:828/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 828w, https://miro.medium.com/v2/resize:fit:1100/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 1100w, https://miro.medium.com/v2/resize:fit:1400/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 1400w" sizes="(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px">
        <img alt="Two hands reaching for one another, seen under pink light against a bright pink background" class="bh ko ms c" width="700" height="680" loading="eager" src="https://miro.medium.com/v2/resize:fit:1155/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg">
      </picture>
    </div>
  </div>
</figure>

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions