HTML2PHPBBCode is a Python 3 package that can be used to parse HTML code and convert it to phpBB-compatible BBCode.
>>> from html2phpbbcode.parser import HTML2PHPBBCode
>>> parser = HTML2PHPBBCode()
>>> parser.feed('<ul><li>Hello</li><li>World</li></ul>')
'[list][*]Hello[*]World[/list]'
>>> parser.feed('<ol><li>one<li>two</ol>')
'[list=1][*]one[*]two[/list]'
>>> parser.feed('<a href="https://water.org">Water.org</a>')
'[url=https://water.org]Water.org[/url]'
>>> parser.feed('<a href="mailto:info@water.org">Mail Water.org</a>')
'[email=info@water.org]Mail Water.org[/email]'
>>> parser.feed('<strong>Hello <i>World</i>. It's a wonderful world</strong>')
"[b]Hello [i]World[/i]. It's a wonderful world[/b]"
HTML2PHPBBCode is based on the html2bbcode package of Vladimir Korsun which is available under the BSD License.
The regex package by Matthew Barnett is also used, available under the Python Software Foundation License.
The code includes some regular expressions from the phpBB bulletin board software as well. Minor changes have been made for Python compatibility. phpBB code is available under GNU GPL v2.0.
This package differs from html2bbcode in the following:
- The generated BBCode follows the syntax described in phpBB's BBCode guide.
<b>
,<i>
,<u>
,<s>
,<ol>
HTML tags are also supported.<font>
'ssize
attribute handling has been changed so that it maps to reasonable BBCode size values.- If the
href
attribute of an<a>
link uses themailto:
protocol, then the[email]
BBCode tag is used. - If the
href
attribute of an<a>
link is neither an email nor a valid http/https URL, the link is converted to plain-text in BBCode. - The parser removes excessive whitespace such as newlines between tags:
<p>Hello</p>\n<p>World</p>
(TODO: Use the W3C spec rules)
The package is available at PyPI and can be installed with the following command:
pip install html2phpbbcode
Installing from source is also an option:
python3 setup.py install
pytest is used for testing. Just run pytest
in the project directory to execute the tests.