Skip to content

Commit

Permalink
Ignore <p> tags in table rows (#354)
Browse files Browse the repository at this point in the history
Closes #198
Co-authored-by: Alireza Savand <591113+Alir3z4@users.noreply.github.com>
  • Loading branch information
gpanders authored Jan 16, 2024
1 parent 1e7cb73 commit 7ba8431
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 0 deletions.
2 changes: 2 additions & 0 deletions ChangeLog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ UNRELEASED
* Fix extra line breaks inside html link text (between '[' and ']')
* Fix #344: indent ``<ul>`` inside ``<ol>`` three spaces instead of two to comply with CommonMark, GFM, etc.
* Fix #324: unnecessary spaces around ``<b>``, ``<em>``, and ``strike`` tags.
* Don't wrap tables by default and add a ``--wrap-tables`` config option.
* Feature #198: Ignore ``<p>`` tags inside table rows.
* Don't wrap tables by default and add a ``--wrap-tables`` config option
* Remove support for Python ≤ 3.5. Now requires Python 3.6+.
* Support for Python 3.10.
Expand Down
2 changes: 2 additions & 0 deletions html2text/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,8 @@ def handle_tag(
self.soft_br()
elif self.astack:
pass
elif self.split_next_td:
pass
else:
self.p()

Expand Down
12 changes: 12 additions & 0 deletions test/no_p_in_table.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<!DOCTYPE html> <html>
<head lang="en"> <meta charset="UTF-8"> <title></title> </head>
<body> <h1>This is a test document</h1> With some text, <code>code</code>, <b>bolds</b> and <i>italics</i>. <h2>This is second header</h2> <p style="display: none">Displaynone text</p>
<table>
<tr> <th>Header 1</th> <th>Header 2</th> <th>Header 3</th> </tr>
<tr> <td><p>Content 1</p></td> <td><p>2</p></td> <td><img src="http://lorempixel.com/200/200" alt="200"/> Image!</td> </tr>
<tr> <td><p>Content 1 longer</p></td> <td><p>Content 2</p></td> <td><p>blah</p></td> </tr>
<tr> <td><p>Content </p></td> <td><p>Content 2</p></td> <td><p>blah</p></td> </tr>
<tr> <td><p>t </p></td> <td><p>Content 2</p></td> <td><p>blah blah blah</p></td> </tr>
</table>

</body> </html>
15 changes: 15 additions & 0 deletions test/no_p_in_table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# This is a test document

With some text, `code`, **bolds** and _italics_.

## This is second header

Displaynone text

Header 1 | Header 2 | Header 3
---|---|---
Content 1 | 2 | ![200](http://lorempixel.com/200/200) Image!
Content 1 longer | Content 2 | blah
Content | Content 2 | blah
t | Content 2 | blah blah blah

0 comments on commit 7ba8431

Please sign in to comment.