diff --git a/.gitignore b/.gitignore index 26bbdf8e..c4abbd13 100644 --- a/.gitignore +++ b/.gitignore @@ -4,10 +4,14 @@ pip-log.txt .coverage dist *.egg-info -.noseids build .tox docs/_build/ .cache/ .eggs/ .*env*/ +.pytest_cache/ +.python-version +*~ +*.swp +__pycache__ diff --git a/.travis.yml b/.travis.yml index 14015378..f315b9bd 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,34 +1,32 @@ +# Note: If you update this, make sure to update tox.ini, too. sudo: false language: python cache: directories: - "~/.cache/pip" + python: -- "2.7" -- "3.3" -- "3.4" -- "3.5" -- "3.6" -- "pypy" -env: -- HTML5LIB=0.99999999 # 8 -- HTML5LIB=0.999999999 # 9 + - "2.7" + - "3.4" + - "3.5" + - "3.6" + - "pypy" + install: - # html5lib 0.99999999 (8 9s) requires at least setuptools 18.5 - pip install -U pip setuptools>=18.5 - - pip install -r requirements.txt - # stomp on html5lib install with the specified one - - pip install html5lib==$HTML5LIB + - pip install -r requirements-dev.txt + +matrix: + include: + - python: "2.7" + env: MODE=lint + - python: "2.7" + env: MODE=vendorverify + - python: "3.4" + env: MODE=lint + - python: "3.7" + sudo: required + dist: xenial + script: -- py.test -- flake8 bleach/ -deploy: - provider: pypi - user: jezdez - distributions: sdist bdist_wheel - password: - secure: TTLpnNBAmRBPe4qITwtM6MRXw3CvGpflnkG6V97oKYL1RJhDXmxIxxImkGyVoT2IR4Oy/jqEikWUCCC3aDoqDnIkkDVriTPmo5PGnS2WgvEmYdcaTIp+RXdKwKhpCVX8ITEuye0iCXYu28vDaySGjnxjlYAP4S0PGPUzh/tn4DY= - on: - tags: true - repo: mozilla/bleach - python: "2.7" + - ./scripts/run_tests.sh $MODE diff --git a/CHANGES b/CHANGES index 25789814..6cf295e1 100644 --- a/CHANGES +++ b/CHANGES @@ -1,6 +1,183 @@ Bleach changes ============== +Version 3.1.1 (February 13th, 2020) +----------------------------------- + +**Security fixes** + +* ``bleach.clean`` behavior parsing ``noscript`` tags did not match + browser behavior. + + Calls to ``bleach.clean`` allowing ``noscript`` and one or more of + the raw text tags (``title``, ``textarea``, ``script``, ``style``, + ``noembed``, ``noframes``, ``iframe``, and ``xmp``) were vulnerable + to a mutation XSS. + + This security issue was confirmed in Bleach versions v2.1.4, v3.0.2, + and v3.1.0. Earlier versions are probably affected too. + + Anyone using Bleach <=v3.1.0 is highly encouraged to upgrade. + + https://bugzilla.mozilla.org/show_bug.cgi?id=1615315 + +**Backwards incompatible changes** + +None + +**Features** + +None + +**Bug fixes** + +None + +Bleach changes +============== + +Version 3.1.0 (January 9th, 2019) +--------------------------------- + +**Security fixes** + +None + +**Backwards incompatible changes** + +None + +**Features** + +* Add ``recognized_tags`` argument to the linkify ``Linker`` class. This + fixes issues when linkifying on its own and having some tags get escaped. + It defaults to a list of HTML5 tags. Thank you, Chad Birch! (#409) + +**Bug fixes** + +* Add ``six>=1.9`` to requirements. Thank you, Dave Shawley (#416) + +* Fix cases where attribute names could have invalid characters in them. + (#419) + +* Fix problems with ``LinkifyFilter`` not being able to match links + across ``&``. (#422) + +* Fix ``InputStreamWithMemory`` when the ``BleachHTMLParser`` is + parsing ``meta`` tags. (#431) + +* Fix doctests. (#357) + + +Version 3.0.2 (October 11th, 2018) +---------------------------------- + +**Security fixes** + +None + +**Backwards incompatible changes** + +None + +**Features** + +None + +**Bug fixes** + +* Merge ``Characters`` tokens after sanitizing them. This fixes issues in the + ``LinkifyFilter`` where it was only linkifying parts of urls. (#374) + + +Version 3.0.1 (October 9th, 2018) +--------------------------------- + +**Security fixes** + +None + +**Backwards incompatible changes** + +None + +**Features** + +* Support Python 3.7. It supported Python 3.7 just fine, but we added 3.7 to + the list of Python environments we test so this is now officially supported. + (#377) + +**Bug fixes** + +* Fix ``list`` object has no attribute ``lower`` in ``clean``. (#398) +* Fix ``abbr`` getting escaped in ``linkify``. (#400) + + +Version 3.0.0 (October 3rd, 2018) +--------------------------------- + +**Security fixes** + +None + +**Backwards incompatible changes** + +* A bunch of functions were moved from one module to another. + + These were moved from ``bleach.sanitizer`` to ``bleach.html5lib_shim``: + + * ``convert_entity`` + * ``convert_entities`` + * ``match_entity`` + * ``next_possible_entity`` + * ``BleachHTMLSerializer`` + * ``BleachHTMLTokenizer`` + * ``BleachHTMLParser`` + + These functions and classes weren't documented and aren't part of the + public API, but people read code and might be using them so we're + considering it an incompatible API change. + + If you're using them, you'll need to update your code. + +**Features** + +* Bleach no longer depends on html5lib. html5lib==1.0.1 is now vendored into + Bleach. You can remove it from your requirements file if none of your other + requirements require html5lib. + + This means Bleach will now work fine with other libraries that depend on + html5lib regardless of what version of html5lib they require. (#386) + +**Bug fixes** + +* Fixed tags getting added when using clean or linkify. This was a + long-standing regression from the Bleach 2.0 rewrite. (#280, #392) + +* Fixed ```` getting replaced with a string. Now it gets escaped or + stripped depending on whether it's in the allowed tags or not. (#279) + + +Version 2.1.4 (August 16th, 2018) +--------------------------------- + +**Security fixes** + +None + +**Backwards incompatible changes** + +* Dropped support for Python 3.3. (#328) + +**Features** + +None + +**Bug fixes** + +* Handle ambiguous ampersands in correctly. (#359) + + Version 2.1.3 (March 5th, 2018) ------------------------------- @@ -14,6 +191,7 @@ Version 2.1.3 (March 5th, 2018) This security issue was introduced in Bleach 2.1. Anyone using Bleach 2.1 is highly encouraged to upgrade. + https://bugzilla.mozilla.org/show_bug.cgi?id=1442745 **Backwards incompatible changes** diff --git a/CONTRIBUTORS b/CONTRIBUTORS index 94276246..2b0137d0 100644 --- a/CONTRIBUTORS +++ b/CONTRIBUTORS @@ -18,20 +18,25 @@ Contributors: - Adam Lofts - Adrian "ThiefMaster" - Alek -- Alexandre Macabies -- Alexandr N. Zamaraev - Alex Defsen - Alex Ehlke +- Alexandre Macabies +- Alexandr N. Zamaraev - Alireza Savand - Andreas Malecki - Andy Freeland +- Antoine Leclair +- Anton Backer - Anton Kovalyov +- Chad Birch - Chris Beaven - Dan Gayle +- dave-shawley - Erik Rose - Gaurav Dadhania - Geoffrey Sneddon - Greg Guthe +- hugovk - Istvan Albert - Jaime Irurzun - James Socol @@ -48,6 +53,7 @@ Contributors: - Mark Lee - Mark Paschal - mdxs +- Nikita Sobolev - nikolas - Oh Jinkyun - Paul Craciunoiu @@ -55,8 +61,11 @@ Contributors: - Ryan Niemeyer - Sébastien Fievet - sedrubal +- Stephane Blondon +- Stu Cox - Tim Dumol - Timothy Fitz +- Vadim Kotov - Vitaly Volkov - Will Kahn-Greene - Zoltán diff --git a/MANIFEST.in b/MANIFEST.in index 14ad79c7..70e794fd 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -2,7 +2,7 @@ include CHANGES include CONTRIBUTORS include CONTRIBUTING.rst include CODE_OF_CONDUCT.rst -include requirements.txt +include requirements-dev.txt include tox.ini include LICENSE include README.rst @@ -10,8 +10,9 @@ include README.rst include docs/conf.py include docs/Makefile -recursive-include docs *.rst +include scripts/* +recursive-include bleach *.py *.json *.rst *.sh *.txt INSTALLER METADATA RECORD WHEEL +recursive-include docs *.rst recursive-include tests *.py *.test - recursive-include tests_website *.html *.py *.rst diff --git a/README.rst b/README.rst index 5f151dc7..21670e9f 100644 --- a/README.rst +++ b/README.rst @@ -51,6 +51,21 @@ please read our wiki page at ``_. +Security +======== + +Bleach is a security-focused library. + +We have a responsible security vulnerability reporting process. Please use +that if you're reporting a security issue. + +Security issues are fixed in private. After we land such a fix, we'll do a +release. + +For every release, we mark security issues we've fixed in the ``CHANGES`` in +the **Security issues** section. We include any relevant CVE links. + + Installing Bleach ================= @@ -58,10 +73,6 @@ Bleach is available on PyPI_, so you can install it with ``pip``:: $ pip install bleach -Or with ``easy_install``:: - - $ easy_install bleach - Upgrading Bleach ================ @@ -89,21 +100,6 @@ The simplest way to use Bleach is: u'an http://example.com url -Security -======== - -Bleach is a security-related library. - -We have a responsible security vulnerability reporting process. Please use -that if you're reporting a security issue. - -Security issues are fixed in private. After we land such a fix, we'll do a -release. - -For every release, we mark security issues we've fixed in the ``CHANGES`` in -the **Security issues** section. We include relevant CVE links. - - Code of conduct =============== diff --git a/bleach/__init__.py b/bleach/__init__.py index b81b0bbe..30f8fb84 100644 --- a/bleach/__init__.py +++ b/bleach/__init__.py @@ -2,7 +2,6 @@ from __future__ import unicode_literals -import warnings from pkg_resources import parse_version from bleach.linkifier import ( @@ -18,24 +17,10 @@ ) -import html5lib -try: - _html5lib_version = html5lib.__version__.split('.') - if len(_html5lib_version) < 2: - _html5lib_version = _html5lib_version + ['0'] -except Exception: - _h5ml5lib_version = ['unknown', 'unknown'] - - -# Bleach 3.0.0 won't support html5lib-python < 1.0.0. -if _html5lib_version < ['1', '0'] or 'b' in _html5lib_version[1]: - warnings.warn('Support for html5lib-python < 1.0.0 is deprecated.', DeprecationWarning) - - # yyyymmdd -__releasedate__ = '20180305' +__releasedate__ = '20200213' # x.y.z or x.y.z.dev0 -- semver -__version__ = '2.1.3' +__version__ = '3.1.1' VERSION = parse_version(__version__) diff --git a/bleach/_vendor/README.rst b/bleach/_vendor/README.rst new file mode 100644 index 00000000..a8a42d68 --- /dev/null +++ b/bleach/_vendor/README.rst @@ -0,0 +1,37 @@ +======================= +Vendored library policy +======================= + +To simplify Bleach development, we're now vendoring certain libraries that +we use. + +Vendored libraries must follow these rules: + +1. Vendored libraries must be pure Python--no compiling. +2. Source code for the libary is included in this directory. +3. License must be included in this repo and in the Bleach distribution. +4. Requirements of the library become requirements of Bleach. +5. No modifications to the library may be made. + + +Adding/Updating a vendored library +================================== + +Way to vendor a library or update a version: + +1. Update ``vendor.txt`` with the library, version, and hash. You can use + `hashin `_. +2. Remove all old files and directories of the old version. +3. Run ``pip_install_vendor.sh`` and check everything it produced in including + the ``.dist-info`` directory and contents. + + +Reviewing a change involving a vendored library +=============================================== + +Way to verify a vendored library addition/update: + +1. Pull down the branch. +2. Delete all the old files and directories of the old version. +3. Run ``pip_install_vendor.sh``. +4. Run ``git diff`` and verify there are no changes. diff --git a/bleach/_vendor/__init__.py b/bleach/_vendor/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/bleach/_vendor/html5lib-1.0.1.dist-info/DESCRIPTION.rst b/bleach/_vendor/html5lib-1.0.1.dist-info/DESCRIPTION.rst new file mode 100644 index 00000000..c05f8c00 --- /dev/null +++ b/bleach/_vendor/html5lib-1.0.1.dist-info/DESCRIPTION.rst @@ -0,0 +1,489 @@ +html5lib +======== + +.. image:: https://travis-ci.org/html5lib/html5lib-python.png?branch=master + :target: https://travis-ci.org/html5lib/html5lib-python + +html5lib is a pure-python library for parsing HTML. It is designed to +conform to the WHATWG HTML specification, as is implemented by all major +web browsers. + + +Usage +----- + +Simple usage follows this pattern: + +.. code-block:: python + + import html5lib + with open("mydocument.html", "rb") as f: + document = html5lib.parse(f) + +or: + +.. code-block:: python + + import html5lib + document = html5lib.parse("

Hello World!") + +By default, the ``document`` will be an ``xml.etree`` element instance. +Whenever possible, html5lib chooses the accelerated ``ElementTree`` +implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x). + +Two other tree types are supported: ``xml.dom.minidom`` and +``lxml.etree``. To use an alternative format, specify the name of +a treebuilder: + +.. code-block:: python + + import html5lib + with open("mydocument.html", "rb") as f: + lxml_etree_document = html5lib.parse(f, treebuilder="lxml") + +When using with ``urllib2`` (Python 2), the charset from HTTP should be +pass into html5lib as follows: + +.. code-block:: python + + from contextlib import closing + from urllib2 import urlopen + import html5lib + + with closing(urlopen("http://example.com/")) as f: + document = html5lib.parse(f, transport_encoding=f.info().getparam("charset")) + +When using with ``urllib.request`` (Python 3), the charset from HTTP +should be pass into html5lib as follows: + +.. code-block:: python + + from urllib.request import urlopen + import html5lib + + with urlopen("http://example.com/") as f: + document = html5lib.parse(f, transport_encoding=f.info().get_content_charset()) + +To have more control over the parser, create a parser object explicitly. +For instance, to make the parser raise exceptions on parse errors, use: + +.. code-block:: python + + import html5lib + with open("mydocument.html", "rb") as f: + parser = html5lib.HTMLParser(strict=True) + document = parser.parse(f) + +When you're instantiating parser objects explicitly, pass a treebuilder +class as the ``tree`` keyword argument to use an alternative document +format: + +.. code-block:: python + + import html5lib + parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom")) + minidom_document = parser.parse("

Hello World!") + +More documentation is available at https://html5lib.readthedocs.io/. + + +Installation +------------ + +html5lib works on CPython 2.7+, CPython 3.3+ and PyPy. To install it, +use: + +.. code-block:: bash + + $ pip install html5lib + + +Optional Dependencies +--------------------- + +The following third-party libraries may be used for additional +functionality: + +- ``datrie`` can be used under CPython to improve parsing performance + (though in almost all cases the improvement is marginal); + +- ``lxml`` is supported as a tree format (for both building and + walking) under CPython (but *not* PyPy where it is known to cause + segfaults); + +- ``genshi`` has a treewalker (but not builder); and + +- ``chardet`` can be used as a fallback when character encoding cannot + be determined. + + +Bugs +---- + +Please report any bugs on the `issue tracker +`_. + + +Tests +----- + +Unit tests require the ``pytest`` and ``mock`` libraries and can be +run using the ``py.test`` command in the root directory. + +Test data are contained in a separate `html5lib-tests +`_ repository and included +as a submodule, thus for git checkouts they must be initialized:: + + $ git submodule init + $ git submodule update + +If you have all compatible Python implementations available on your +system, you can run tests on all of them using the ``tox`` utility, +which can be found on PyPI. + + +Questions? +---------- + +There's a mailing list available for support on Google Groups, +`html5lib-discuss `_, +though you may get a quicker response asking on IRC in `#whatwg on +irc.freenode.net `_. + +Change Log +---------- + +1.0.1 +~~~~~ + +Released on December 7, 2017 + +Breaking changes: + +* Drop support for Python 2.6. (#330) (Thank you, Hugo, Will Kahn-Greene!) +* Remove ``utils/spider.py`` (#353) (Thank you, Jon Dufresne!) + +Features: + +* Improve documentation. (#300, #307) (Thank you, Jon Dufresne, Tom Most, + Will Kahn-Greene!) +* Add iframe seamless boolean attribute. (Thank you, Ritwik Gupta!) +* Add itemscope as a boolean attribute. (#194) (Thank you, Jonathan Vanasco!) +* Support Python 3.6. (#333) (Thank you, Jon Dufresne!) +* Add CI support for Windows using AppVeyor. (Thank you, John Vandenberg!) +* Improve testing and CI and add code coverage (#323, #334), (Thank you, Jon + Dufresne, John Vandenberg, Geoffrey Sneddon, Will Kahn-Greene!) +* Semver-compliant version number. + +Bug fixes: + +* Add support for setuptools < 18.5 to support environment markers. (Thank you, + John Vandenberg!) +* Add explicit dependency for six >= 1.9. (Thank you, Eric Amorde!) +* Fix regexes to work with Python 3.7 regex adjustments. (#318, #379) (Thank + you, Benedikt Morbach, Ville Skyttä, Mark Vasilkov!) +* Fix alphabeticalattributes filter namespace bug. (#324) (Thank you, Will + Kahn-Greene!) +* Include license file in generated wheel package. (#350) (Thank you, Jon + Dufresne!) +* Fix annotation-xml typo. (#339) (Thank you, Will Kahn-Greene!) +* Allow uppercase hex chararcters in CSS colour check. (#377) (Thank you, + Komal Dembla, Hugo!) + + +1.0 +~~~ + +Released and unreleased on December 7, 2017. Badly packaged release. + + +0.999999999/1.0b10 +~~~~~~~~~~~~~~~~~~ + +Released on July 15, 2016 + +* Fix attribute order going to the tree builder to be document order + instead of reverse document order(!). + + +0.99999999/1.0b9 +~~~~~~~~~~~~~~~~ + +Released on July 14, 2016 + +* **Added ordereddict as a mandatory dependency on Python 2.6.** + +* Added ``lxml``, ``genshi``, ``datrie``, ``charade``, and ``all`` + extras that will do the right thing based on the specific + interpreter implementation. + +* Now requires the ``mock`` package for the testsuite. + +* Cease supporting DATrie under PyPy. + +* **Remove PullDOM support, as this hasn't ever been properly + tested, doesn't entirely work, and as far as I can tell is + completely unused by anyone.** + +* Move testsuite to ``py.test``. + +* **Fix #124: move to webencodings for decoding the input byte stream; + this makes html5lib compliant with the Encoding Standard, and + introduces a required dependency on webencodings.** + +* **Cease supporting Python 3.2 (in both CPython and PyPy forms).** + +* **Fix comments containing double-dash with lxml 3.5 and above.** + +* **Use scripting disabled by default (as we don't implement + scripting).** + +* **Fix #11, avoiding the XSS bug potentially caused by serializer + allowing attribute values to be escaped out of in old browser versions, + changing the quote_attr_values option on serializer to take one of + three values, "always" (the old True value), "legacy" (the new option, + and the new default), and "spec" (the old False value, and the old + default).** + +* **Fix #72 by rewriting the sanitizer to apply only to treewalkers + (instead of the tokenizer); as such, this will require amending all + callers of it to use it via the treewalker API.** + +* **Drop support of charade, now that chardet is supported once more.** + +* **Replace the charset keyword argument on parse and related methods + with a set of keyword arguments: override_encoding, transport_encoding, + same_origin_parent_encoding, likely_encoding, and default_encoding.** + +* **Move filters._base, treebuilder._base, and treewalkers._base to .base + to clarify their status as public.** + +* **Get rid of the sanitizer package. Merge sanitizer.sanitize into the + sanitizer.htmlsanitizer module and move that to sanitizer. This means + anyone who used sanitizer.sanitize or sanitizer.HTMLSanitizer needs no + code changes.** + +* **Rename treewalkers.lxmletree to .etree_lxml and + treewalkers.genshistream to .genshi to have a consistent API.** + +* Move a whole load of stuff (inputstream, ihatexml, trie, tokenizer, + utils) to be underscore prefixed to clarify their status as private. + + +0.9999999/1.0b8 +~~~~~~~~~~~~~~~ + +Released on September 10, 2015 + +* Fix #195: fix the sanitizer to drop broken URLs (it threw an + exception between 0.9999 and 0.999999). + + +0.999999/1.0b7 +~~~~~~~~~~~~~~ + +Released on July 7, 2015 + +* Fix #189: fix the sanitizer to allow relative URLs again (as it did + prior to 0.9999/1.0b5). + + +0.99999/1.0b6 +~~~~~~~~~~~~~ + +Released on April 30, 2015 + +* Fix #188: fix the sanitizer to not throw an exception when sanitizing + bogus data URLs. + + +0.9999/1.0b5 +~~~~~~~~~~~~ + +Released on April 29, 2015 + +* Fix #153: Sanitizer fails to treat some attributes as URLs. Despite how + this sounds, this has no known security implications. No known version + of IE (5.5 to current), Firefox (3 to current), Safari (6 to current), + Chrome (1 to current), or Opera (12 to current) will run any script + provided in these attributes. + +* Pass error message to the ParseError exception in strict parsing mode. + +* Allow data URIs in the sanitizer, with a whitelist of content-types. + +* Add support for Python implementations that don't support lone + surrogates (read: Jython). Fixes #2. + +* Remove localization of error messages. This functionality was totally + unused (and untested that everything was localizable), so we may as + well follow numerous browsers in not supporting translating technical + strings. + +* Expose treewalkers.pprint as a public API. + +* Add a documentEncoding property to HTML5Parser, fix #121. + + +0.999 +~~~~~ + +Released on December 23, 2013 + +* Fix #127: add work-around for CPython issue #20007: .read(0) on + http.client.HTTPResponse drops the rest of the content. + +* Fix #115: lxml treewalker can now deal with fragments containing, at + their root level, text nodes with non-ASCII characters on Python 2. + + +0.99 +~~~~ + +Released on September 10, 2013 + +* No library changes from 1.0b3; released as 0.99 as pip has changed + behaviour from 1.4 to avoid installing pre-release versions per + PEP 440. + + +1.0b3 +~~~~~ + +Released on July 24, 2013 + +* Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any + implementation using it should be moved to + ``NonRecursiveTreeWalker``, as everything bundled with html5lib has + for years. + +* Fix #67 so that ``BufferedStream`` to correctly returns a bytes + object, thereby fixing any case where html5lib is passed a + non-seekable RawIOBase-like object. + + +1.0b2 +~~~~~ + +Released on June 27, 2013 + +* Removed reordering of attributes within the serializer. There is now + an ``alphabetical_attributes`` option which preserves the previous + behaviour through a new filter. This allows attribute order to be + preserved through html5lib if the tree builder preserves order. + +* Removed ``dom2sax`` from DOM treebuilders. It has been replaced by + ``treeadapters.sax.to_sax`` which is generic and supports any + treewalker; it also resolves all known bugs with ``dom2sax``. + +* Fix treewalker assertions on hitting bytes strings on + Python 2. Previous to 1.0b1, treewalkers coped with mixed + bytes/unicode data on Python 2; this reintroduces this prior + behaviour on Python 2. Behaviour is unchanged on Python 3. + + +1.0b1 +~~~~~ + +Released on May 17, 2013 + +* Implementation updated to implement the `HTML specification + `_ as of 5th May + 2013 (`SVN `_ revision r7867). + +* Python 3.2+ supported in a single codebase using the ``six`` library. + +* Removed support for Python 2.5 and older. + +* Removed the deprecated Beautiful Soup 3 treebuilder. + ``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that + since it doesn't support namespaces, foreign content like SVG and + MathML is parsed incorrectly. + +* Removed ``simpletree`` from the package. The default tree builder is + now ``etree`` (using the ``xml.etree.cElementTree`` implementation if + available, and ``xml.etree.ElementTree`` otherwise). + +* Removed the ``XHTMLSerializer`` as it never actually guaranteed its + output was well-formed XML, and hence provided little of use. + +* Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no + longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will + return the default DOM treebuilder, which uses ``xml.dom.minidom``. + +* Optional heuristic character encoding detection now based on + ``charade`` for Python 2.6 - 3.3 compatibility. + +* Optional ``Genshi`` treewalker support fixed. + +* Many bugfixes, including: + + * #33: null in attribute value breaks XML AttValue; + + * #4: nested, indirect descendant,