1.14.0 / 2023-01-12
1.14.0 / 2023-01-12
Notable Changes
Ruby
This release introduces native gem support for Ruby 3.2. (Also see "Technical note" under "Changed" below.)
This release ends support for:
- Ruby 2.6, for which upstream support ended 2022-04-12.
- JRuby 9.3, which is not fully compatible with Ruby 2.7+
Faster, more reliable installation: Native Gem for aarch64-linux
(aka linux/arm64/v8
)
This version of Nokogiri ships official native gem support for the aarch64-linux
platform, which should support AWS Graviton and other ARM64 Linux platforms. Please note that glibc >= 2.29 is required for aarch64-linux systems, see Supported Platforms for more information.
Faster, more reliable installation: Native Gem for arm-linux
(aka linux/arm/v7
)
This version of Nokogiri ships experimental native gem support for the arm-linux
platform. Please note that glibc >= 2.29 is required for arm-linux systems, see Supported Platforms for more information.
Pattern matching
This version introduces an experimental pattern matching API for XML::Attr
, XML::Document
, XML::DocumentFragment
, XML::Namespace
, XML::Node
, and XML::NodeSet
(and their subclasses).
Some documentation on what can be matched:
XML::Attr#deconstruct_keys
XML::Document#deconstruct_keys
XML::Namespace#deconstruct_keys
XML::Node#deconstruct_keys
XML::DocumentFragment#deconstruct
XML::NodeSet#deconstruct
We welcome feedback on this API at #2360.
Dependencies
CRuby
- Vendored libiconv is updated to v1.17
JRuby
- This version of Nokogiri uses
jar-dependencies
to manage most of the vendored Java dependencies.nokogiri -v
now outputs maven metadata for all Java dependencies, andNokogiri::VERSION_INFO
also contains this metadata. [#2432] - HTML parsing is now provided by
net.sourceforge.htmlunit:neko-htmlunit:2.61.0
(previously Nokogiri used a fork oforg.cyberneko.html:nekohtml
) - Vendored Jing is updated from
com.thaiopensource:jing:20091111
tonu.validator:jing:20200702VNU
. - New dependency on
net.sf.saxon:Saxon-HE:9.6.0-4
(vianu.validator:jing:20200702VNU
).
Added
Node#wrap
andNodeSet#wrap
now also accept aNode
type argument, which will bedup
ed for each wrapper. For cases where many nodes are being wrapped, creating aNode
once usingDocument#create_element
and passing thatNode
multiple times is significantly faster than re-parsing markup on each call. [#2657]- [CRuby] Invocation of custom XPath or CSS handler functions may now use the
nokogiri
namespace prefix. Historically, the JRuby implementation required this namespace but the CRuby implementation did not support it. It's recommended that all XPath and CSS queries use thenokogiri
namespace going forward. Invocation without the namespace is planned for deprecation in v1.15.0 and removal in a future release. [#2147] HTML5::Document#quirks_mode
andHTML5::DocumentFragment#quirks_mode
expose the quirks mode used by the parser.
Improved
Functional
- HTML5 parser update to reflect changes to the living specification:
Performance
- Serialization of HTML5 documents and fragments has been re-implemented and is ~10x faster than previous versions. [#2596, #2569]
- Parsing of HTML5 documents is ~90% faster thanks to additional compiler optimizations being applied. [#2639]
- Compare
Encoding
objects rather than compare their names. This is a slight performance improvement and is future-proof. [#2454] (Thanks, @casperisfine!)
Error handling
Document#canonicalize
now raises an exception ifinclusive_namespaces
is non-nil and the mode is inclusive, i.e.XML_C14N_1_0
orXML_C14N_1_1
.inclusive_namespaces
can only be passed with exclusive modes, and previously this silently failed.- Empty CSS selectors now raise a clearer
Nokogiri::CSS::SyntaxError
message, "empty CSS selector". Previously the exception raised from the bowels ofracc
was "unexpected '$' after ''". [#2700] - [CRuby]
XML::Reader
parsing errors encountered duringReader#attribute_hash
andReader#namespaces
now raise anXML::SyntaxError
. Previously these methods would returnnil
and users would generally experienceNoMethodErrors
from elsewhere in the code. - Prefer
ruby_xmalloc
tomalloc
within the C extension. [#2480] (Thanks, @Garfield96!)
Installation
- Avoid compile-time conflict with system-installed
gumbo.h
on OpenBSD. [#2464] - Remove calls to
vasprintf
in favor of platform-independentrb_vsprintf
- Installation from source on systems missing libiconv will once again generate a helpful error message (broken since v1.11.0). [#2505]
- [CRuby+OSX] Compiling from source on MacOS will use the clang option
-Wno-unknown-warning-option
to avoid errors when Ruby injects options that clang doesn't know about. [#2689]
Fixed
SAX::Parser
'sencoding
attribute will not be clobbered when an alternative encoding is passed intoSAX::Parser#parse_io
. [#1942] (Thanks, @kp666!)- Serialized
HTML4::DocumentFragment
will now be properly encoded. Previously this empty string was encoded asUS-ASCII
. [#2649] Node#wrap
now uses the parent as the context node for parsing wrapper markup, falling back to the document for unparented nodes. Previously the document was always used.- [CRuby] UTF-16-encoded documents longer than ~4000 code points now serialize properly. Previously the serialized document was corrupted when it exceeded the length of libxml2's internal string buffer. [#752]
- [CRuby] The HTML5 parser now correctly handles text at the end of
form
elements. - [CRuby]
HTML5::Document#fragment
now always usesbody
as the parsing context. Previously, fragments were parsed in the context of the associated document's root node, which allowed for inconsistent parsing. [#2553] - [CRuby]
Nokogiri::HTML5::Document#url
now correctly returns the URL passed to the constructor method. Previously it always returnednil
. [#2583] - [CRuby]
HTML5
encoding detection is now case-insensitive with respect tometa
tag charset declaration. [#2693] - [CRuby]
HTML5
fragment parsing in context of an annotation-xml node now works. Previously this rarely-used path invoked rb_funcall with incorrect parameters, resulting in an exception, a fatal error, or potentially a segfault. [#2692] - [CRuby]
HTML5
quirks mode during fragment parsing more closely matches document parsing. [#2646] - [JRuby] Fixed a bug with adding the same namespace to multiple nodes via
#add_namespace_definition
. [#1247] - [JRuby]
NodeSet#[]
now raises a TypeError if passed an invalid parameter type. [#2211]
Deprecated
Nokogiri.install_default_aliases
is deprecated in favor ofNokogiri::EncodingHandler.install_default_aliases
. This is part of a private API and is probably not called by anybody, but we'll go through a deprecation cycle before removal anyway. [#2643, #2446]
Changed
- [CRuby+OSX] Technical note: On MacOS Ruby 3.2, the symbols from libxml2 and libxslt are no longer exported. Ruby 3.2 adopted new features from the Darwin toolchain that make it challenging to continue to support this rarely-used binary API. A future minor release of Nokogiri may remove these symbols (and others) entirely. Feedback from downstream gem maintainers is welcome at #2746, where you'll also be able to read deeper context on this decision.
Thank you!
The following people and organizations were kind enough to sponsor @flavorjones or the Nokogiri project during the development of v1.14.0:
- Götz Görisch @GoetzGoerisch
- Airbnb @airbnb
- Kyohei Nanba @kyo-nanba
- Maxime Gauthier @biximilien
- @renuo
- @dbootyfvrt
- YOSHIDA Katsuhiko @kyoshidajp
- Homebrew @Homebrew
- David Vrensk @dvrensk
- Alex Daragiu @daragiu
- Github @github
- Julian Joseph @Julian88Tex
- Charles Simon-Meunier @csimonmeunier
- Ben Slaughter @benSlaughter
- Garen Torikian @gjtorikian
- Frank Groeneveld @frenkel
- Hiroshi SHIBATA @hsbt
sha256 checksums:
c87564f5f8fbfb72fbcb7ed9781f6472ceabe2f288ede6b9c37071dc32320ba6 nokogiri-1.14.0-aarch64-linux.gem
33617e8a94993b8130a50bd59d6141a8d4d2aa4d4053f5c7874c71608e6e6dcc nokogiri-1.14.0-arm-linux.gem
5c0cd4eeb8501526e7e2aaba93b60ebf3dda37bfda665691196d4e9bb87adb1a nokogiri-1.14.0-arm64-darwin.gem
772936bf635b33b99bc89828de8e7077de47009638fe5ff11795f8b1d578465c nokogiri-1.14.0-java.gem
ee11c092b2cf2b137e71f623746162c578b53483dccf4c6209c80f5ba47927fe nokogiri-1.14.0-x64-mingw-ucrt.gem
9b91eede6155eb8891d7d95d8087d514f3007dd19813982104ed77452a2a7ace nokogiri-1.14.0-x64-mingw32.gem
649019d961b0ea8aee1bc8aa2573ab8ffb77d3f5e9c333aa2462a79fc56745fc nokogiri-1.14.0-x86-linux.gem
40985fc46315ea3d33ed900a649c0bb77484035ea882b7c9e55aef436b1958a8 nokogiri-1.14.0-x86-mingw32.gem
5d328c0d0c5f6f37a26c75b0282f9014c9686d4c10578ec8dfbbfcbea7da8b95 nokogiri-1.14.0-x86_64-darwin.gem
faa88b2bca46adaa3420c6e27eb8eb71f5b8d9f454ed7488a194a00c5ef52fbe nokogiri-1.14.0-x86_64-linux.gem
55ca6e87ae85e944a5901dd5a6cacbb961eaaf8b8dd3901b57475665396914bb nokogiri-1.14.0.gem