Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra "\" slashes before specific numeric #133

Open
SubhamDyno opened this issue Apr 7, 2023 · 1 comment
Open

Extra "\" slashes before specific numeric #133

SubhamDyno opened this issue Apr 7, 2023 · 1 comment

Comments

@SubhamDyno
Copy link

  • Version by html2text --2020.1.16
  • Test script
  • Python version python --3.9

Hello Team,

Whenever we give
Input: <p>1. Hello My name is Subham</p> to this html2text.
output: 1\. Hello My name is Subham

The extra "" after numeric digit is not needed. This is very specific appearing after numerics whenever there is a "." (dot) and whitespace following to it.

Could you please help to escape this.

@rajkumar-jangid-macmillan

The Issue happens at utils.py package file (Python37\Lib\site-packages\html2text\utils.py) at lines 210.
Here are those lines that work:
text = config.RE_MD_DOT_MATCHER.sub(r"\1\2", text)

These lines originally have 2 extra backslashes, just replacing this one lines should fix this issue. Not sure if it could break something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants