Skip to content

Commit bdc9983

Browse files
Workaround BeautifulSoup not handling empty byte array correctly
1 parent e17023e commit bdc9983

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

cardinal_pythonlib/extract_text.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1140,6 +1140,12 @@ def convert_html_to_text(
11401140
"""
11411141
Converts HTML to text.
11421142
"""
1143+
1144+
# beautifulsoup4==4.13.4 returns "b''" for an empty bytes array
1145+
# So we just workaround this here:
1146+
if bytes is not None and len(blob) == 0:
1147+
return ""
1148+
11431149
with get_filelikeobject(filename, blob) as fp:
11441150
soup = bs4.BeautifulSoup(fp, "html.parser")
11451151
return soup.get_text()

0 commit comments

Comments
 (0)