Autoclosing / Non-Autoclosing HTML tags support

Hi 👋 

While working on Confluence page parsing related tasks, we found that `markdownify` had various behaviors depending on how `<img>` tags are used in the DOM.

**TL;DR: `markdownify` doesn't support  `<img>` tags mixed in with `<img />` tags which causes some images to be omitted in the Markdown output.**

## What Specifications Say

Before diving into examples, I checked different RFC specifications to make sure that `<img />` and `<img>` tags are valid tags and it's the case as XHTML/XML uses `<img />` and HTML5 uses `<img>` as specified in [Section 9.5.3 of RFC 7992](https://datatracker.ietf.org/doc/html/rfc7992#section-9.5.3).

![Image](https://github.com/user-attachments/assets/98373653-89b0-419b-b7c1-6ae35e84dce5)

## Examples

When images are present in the DOM with both autoclosing and non-autoclosing tags, issues occur.

### Base Test

As reference, we use a fully autoclosing `<img />` tag example, this is the ideal case for `markdownify`.

```python
from bs4 import BeautifulSoup
from markdownify import markdownify


if __name__ == '__main__':
    content = '''<table>
        <tbody>
            <tr>
                <td>
                    <span><img src="https://placehold.co/600x400/gray/white" alt="(gray)" name=":gray:"/></span>
                </td>
                <td>
                    <p>
                        <img src="https://placehold.co/600x400/orange/white" alt="(orange)" name=":orange:"/> / <img src="https://placehold.co/600x400/blue/white" alt="(blue)" name=":blue:"/>
                    </p>
                </td>
            </tr>
        </tbody>
    </table>'''

    print('BeautifulSoup Output:\n')
    print(BeautifulSoup(content, "html.parser"), end='\n' * 3)

    print('Markdownify Output:\n')
    print(markdownify(content, heading_style="ATX"))
```

This produces the following expected output:

```
BeautifulSoup Output:

<table>
<tbody>
<tr>
<td>
<span><img alt="(gray)" name=":gray:" src="https://placehold.co/600x400/gray/white"/></span>
</td>
<td>
<p>
<img alt="(orange)" name=":orange:" src="https://placehold.co/600x400/orange/white"/> / <img alt="(blue)" name=":blue:" src="https://placehold.co/600x400/blue/white"/>
</p>
</td>
</tr>
</tbody>
</table>


Markdownify Output:

|  |  |
| --- | --- |
| (gray) | (orange) / (blue) |
```

As you can see, every image is converted to its `alt` attribute value and every `alt` is present.

### Example 1

If we spice things up and change the first image to an non-autoclosing `<img>` tag:

```python
from bs4 import BeautifulSoup
from markdownify import markdownify


if __name__ == '__main__':
    content = '''<table>
        <tbody>
            <tr>
                <td>
                    <span><img src="https://placehold.co/600x400/gray/white" alt="(gray)" name=":gray:"></span>
                </td>
                <td>
                    <p>
                        <img src="https://placehold.co/600x400/orange/white" alt="(orange)" name=":orange:"/> / <img src="https://placehold.co/600x400/blue/white" alt="(blue)" name=":blue:"/>
                    </p>
                </td>
            </tr>
        </tbody>
    </table>'''

    print('BeautifulSoup Output:\n')
    print(BeautifulSoup(content, "html.parser"), end='\n' * 3)

    print('Markdownify Output:\n')
    print(markdownify(content, heading_style="ATX"))
```

The first `(blue)` image disappears from the output:

```
BeautifulSoup Output:

<table>
<tbody>
<tr>
<td>
<span><img alt="(gray)" name=":gray:" src="https://placehold.co/600x400/gray/white"/></span>
</td>
<td>
<p>
<img alt="(orange)" name=":orange:" src="https://placehold.co/600x400/orange/white"> / <img alt="(blue)" name=":blue:" src="https://placehold.co/600x400/blue/white"/>
</img></p>
</td>
</tr>
</tbody>
</table>


Markdownify Output:

|  |  |
| --- | --- |
| (gray) | (orange) |
```

Please notice that `BeautifulSoup` interpreted the first `<img>` tag as autoclosing and the second `<img />` tag as non-autoclosing 🤔 

### Example 2

We can go further and  only leave the `(orange)` image as autoclosing:

```python
from bs4 import BeautifulSoup
from markdownify import markdownify


if __name__ == '__main__':
    content = '''<table>
        <tbody>
            <tr>
                <td>
                    <span><img src="https://placehold.co/600x400/gray/white" alt="(gray)" name=":gray:"></span>
                </td>
                <td>
                    <p>
                        <img src="https://placehold.co/600x400/orange/white" alt="(orange)" name=":orange:"/> / <img src="https://placehold.co/600x400/blue/white" alt="(blue)" name=":blue:">
                    </p>
                </td>
            </tr>
        </tbody>
    </table>'''

    print('BeautifulSoup Output:\n')
    print(BeautifulSoup(content, "html.parser"), end='\n' * 3)

    print('Markdownify Output:\n')
    print(markdownify(content, heading_style="ATX"))
```

This produces the same case as Example 1:

```
BeautifulSoup Output:

<table>
<tbody>
<tr>
<td>
<span><img alt="(gray)" name=":gray:" src="https://placehold.co/600x400/gray/white"/></span>
</td>
<td>
<p>
<img alt="(orange)" name=":orange:" src="https://placehold.co/600x400/orange/white"> / <img alt="(blue)" name=":blue:" src="https://placehold.co/600x400/blue/white"/>
</img></p>
</td>
</tr>
</tbody>
</table>


Markdownify Output:

|  |  |
| --- | --- |
| (gray) | (orange) |
```

## Combinations

Other combinations seem to work fine.

Here is a  table of all the cases I've encountered:

| Gray | Orange | Blue | Result |
| --- | --- | --- | --- |
| `<img>` | `<img>` | `<img>` | ✅ |
| `<img>` | `<img>` | `<img />` | ✅ |
| `<img>` | `<img />` | `<img>` | ❌ `(blue)` is missing |
| `<img>` | `<img />` | `<img />` | ❌ `(blue)` is missing |
| `<img />` | `<img>` | `<img>` | ✅ |
| `<img />` | `<img>` | `<img />` | ✅ |
| `<img />` | `<img />` | `<img>` | ✅ |
| `<img />` | `<img />` | `<img />` | ✅ |


Thanks for your help 🙏 

Gray	Orange	Blue	Result
`<img>`	`<img>`	`<img>`	✅
`<img>`	`<img>`	`<img />`	✅
`<img>`	`<img />`	`<img>`	❌ `(blue)` is missing
`<img>`	`<img />`	`<img />`	❌ `(blue)` is missing
`<img />`	`<img>`	`<img>`	✅
`<img />`	`<img>`	`<img />`	✅
`<img />`	`<img />`	`<img>`	✅
`<img />`	`<img />`	`<img />`	✅

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Autoclosing / Non-Autoclosing HTML tags support #205

What Specifications Say

Examples

Base Test

Example 1

Example 2

Combinations

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Autoclosing / Non-Autoclosing HTML tags support #205

Description

What Specifications Say

Examples

Base Test

Example 1

Example 2

Combinations

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions