Skip to content

Latest commit

 

History

History
83 lines (69 loc) · 5.66 KB

CHANGES.md

File metadata and controls

83 lines (69 loc) · 5.66 KB

jsoup Changelog

1.18.1 (Pending)

Improvements

  • Stream Parser: A StreamParser provides a progressive parse of its input. As each Element is completed, it is emitted via a Stream or Iterator interface. Elements returned will be complete with all their children, and an (empty) next sibling, if applicable. Elements (or their children) may be removed from the DOM during the parse, for e.g. to conserve memory, providing a mechanism to parse an input document that would otherwise be too large to fit into memory, yet still providing a DOM interface to the document and its elements. Additionally, the parser provides a selectFirst(String query) / selectNext(String query), which will run the parser until a hit is found, at which point the parse is suspended. It can be resumed via another select() call, or via the stream() or iterator() methods. 2096
  • Download Progress: added a Response Progress event interface, which reports progress and URLs are downloaded (and parsed). Supported on both a session and a single connection level. 2164, 656
  • Added Path accepting parse methods: Jsoup.parse(Path), Jsoup.parse(path, charsetName, baseUri, parser), etc. 2055
  • Updated the button tag configuration to include a space between multiple button elements in the Element.text() method. 2105
  • Added support for the ns|* all elements in namespace Selector. 1811

Changes

  • Removed previously deprecated internal classes and methods. 2094
  • Build change: the built jar's OSGi manifest no longer imports itself. 2158

Bug Fixes

  • When tracking source positions, if the first node was a TextNode, its position was incorrectly set to -1. 2106
  • When connecting (or redirecting) to URLs with characters such as {, } in the path, a Malformed URL exception would be thrown (if in development), or the URL might otherwise not be escaped correctly (if in production). The URL encoding process has been improved to handle these characters correctly. 2142
  • When using W3CDom with a custom output Document, a Null Pointer Exception would be thrown. 2114
  • The :has() selector did not match correctly when using sibling combinators (like e.g.: h1:has(+h2)). 2137
  • The :empty selector incorrectly matched elements that started with a blank text node and were followed by non-empty nodes, due to an incorrect short-circuit. 2130
  • Fuzz: a Stack Overflow exception could occur when resolving a crafted <base href> URL, in the normalizing regex. 2165

1.17.2 (2023-Dec-29)

Improvements

  • Attribute object accessors: Added Element.attribute(String) and Attributes.attribute(String) to more simply obtain an Attribute object. 2069
  • Attribute source tracking: If source tracking is on, and an Attribute's key is changed ( via Attribute.setKey(String)), the source range is now still tracked in Attribute.sourceRange(). 2070
  • Wildcard attribute selector: Added support for the [*] element with any attribute selector. And also restored support for selecting by an empty attribute name prefix ([^]). 2079

Bug Fixes

  • Mixed-cased source position: When tracking the source position of attributes, if the source attribute name was mix-cased but the parser was lower-case normalizing attribute names, the source position for that attribute was not tracked correctly. 2067
  • Source position NPE: When tracking the source position of a body fragment parse, a null pointer exception was thrown. 2068
  • Multi-point emoji entity: A multi-point encoded emoji entity may be incorrectly decoded to the replacement character. 2074
  • Selector sub-expressions: (Regression) in a selector like parent [attr=va], other, the , OR was binding to [attr=va] instead of parent [attr=va], causing incorrect selections. The fix includes a EvaluatorDebug class that generates a sexpr to represent the query, allowing simpler and more thorough query parse tests. 2073
  • XML CData output: When generating XML-syntax output from parsed HTML, script nodes containing (pseudo) CData sections would have an extraneous CData section added, causing script execution errors. Now, the data content is emitted in a HTML/XML/XHTML polyglot format, if the data is not already within a CData section. 2078
  • Thread safety: The :has evaluator held a non-thread-safe Iterator, and so if an Evaluator object was shared across multiple concurrent threads, a NoSuchElement exception may be thrown, and the selected results may be incorrect. Now, the iterator object is a thread-local. 2088

Older changes for versions 0.1.1 (2010-Jan-31) through 1.17.1 (2023-Nov-27) may be found in change-archive.txt.