Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy the CSS parser API #3218

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA
* [CRuby] `Nokogiri::HTML5::Builder` is similar to `HTML4::Builder` but returns an `HTML5::Document`. @flavorjones
* [CRuby] Attributes in an HTML5 document can be serialized individually, something that has always been supported by the HTML4 serializer. [#3125, #3127] @flavorjones
* [CRuby] When compiling packaged libraries from source, allow users' `AR` and `LD` environment variables to set the archiver and linker commands, respectively. This augments the existing `CC` environment variable to set the compiler command. [#3165] @ziggythehamster
* `XPathVisitor` now handles the xpath prefix as a constructor argument.
* `XPathVisitor` exposes `#prefix`, `#builtins`, and `#doctype` attributes.


### Fixed
Expand All @@ -24,6 +26,18 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA
### Changed

* [CRuby] `Nokogiri::XML::CData.new` no longer accepts `nil` as the content argument, making `CData` behave like other character data classes (like `Comment` and `Text`). This change was necessitated by behavioral changes in the upcoming libxml 2.13.0 release. If you wish to create an empty CDATA node, pass an empty string. [#3156] @flavorjones
* The CSS selector cache can now be disabled with a boolean `cache:` keyword argument to `CSS.xpath_for` which defaults to `true`. Note that this keyword replaces the previous global methods `Parser.set_cache` and `Parser.without_cache(&block)`.


### Deprecated

* Passing an options hash to `XPathVisitor.new` is now deprecated and will generate a warning. Use keyword arguments instead. This will become an error in a future version of Nokogiri.
* The undocumented and unused method `Nokogiri::CSS.parse` is now deprecated and will generate a warning. Use `CSS::Parser#parse` instead. This will be removed in a future version of Nokogiri.


### Removed

* The internal-only CSS parser classes have changed dramatically. These classes have always been internal-only and not intended to be part of the public API, as mentioned in the CHANGELOG entry for v1.13.0. Public APIs are available for all supported functionality. If your code has been using these undocumented classes directly and you need assistance upgrading, feel free to open an issue and we'll try to help.


## v1.16.5
Expand Down
40 changes: 33 additions & 7 deletions lib/nokogiri/css.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ class << self
# TODO: Deprecate this method ahead of 2.0 and delete it in 2.0.
# It is not used by Nokogiri and shouldn't be part of the public API.
def parse(selector) # :nodoc:
warn("Nokogiri::CSS.parse is deprecated and will be removed in a future version of Nokogiri. Use Nokogiri::CSS::Parser#parse instead.", uplevel: 1, category: :deprecated)
Parser.new.parse(selector)
end

Expand Down Expand Up @@ -35,21 +36,45 @@ def parse(selector) # :nodoc:
# The namespaces that are referenced in the query, if any. This is a hash where the keys are
# the namespace prefix and the values are the namespace URIs. Default is an empty Hash.
#
# - +cache:+ (Boolean)
#
# Whether to use the SelectorCache for the translated query. Default is +true+ to ensure
# that repeated queries don't incur the overhead of re-parsing the selector.
#
# [Returns] (String) The equivalent XPath query for +selector+
#
# 💡 Note that translated queries are cached for performance concerns.
#
def xpath_for(selector, options = {})
raise TypeError, "no implicit conversion of #{selector.inspect} to String" unless selector.respond_to?(:to_str)
def xpath_for(
selector, options = nil,
prefix: options&.delete(:prefix),
visitor: options&.delete(:visitor),
ns: options&.delete(:ns),
cache: true
)
unless options.nil?
warn("Passing options as an explicit hash is deprecated. Use keyword arguments instead. This will become an error in a future release.", uplevel: 1, category: :deprecated)
end

raise(TypeError, "no implicit conversion of #{selector.inspect} to String") unless selector.respond_to?(:to_str)

selector = selector.to_str
raise Nokogiri::CSS::SyntaxError, "empty CSS selector" if selector.empty?
raise(Nokogiri::CSS::SyntaxError, "empty CSS selector") if selector.empty?

raise ArgumentError, "cannot provide both :prefix and :visitor" if prefix && visitor

prefix = options.fetch(:prefix, Nokogiri::XML::XPath::GLOBAL_SEARCH_PREFIX)
visitor = options.fetch(:visitor) { Nokogiri::CSS::XPathVisitor.new }
ns = options.fetch(:ns, {})
visitor ||= if prefix
Nokogiri::CSS::XPathVisitor.new(prefix: prefix)
else
Nokogiri::CSS::XPathVisitor.new
end

Parser.new(ns).xpath_for(selector, prefix, visitor)
if cache
key = SelectorCache.key(ns: ns, selector: selector, visitor: visitor)
SelectorCache[key] ||= Parser.new(selector, ns: ns).xpath(visitor)
else
Parser.new(selector, ns: ns).xpath(visitor)
end
end
end
end
Expand All @@ -62,5 +87,6 @@ def xpath_for(selector, options = {})
require_relative "css/parser"
$-w = x

require_relative "css/selector_cache"
require_relative "css/tokenizer"
require_relative "css/syntax_error"
8 changes: 6 additions & 2 deletions lib/nokogiri/css/node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,12 @@ def accept(visitor)

###
# Convert this CSS node to xpath with +prefix+ using +visitor+
def to_xpath(prefix, visitor)
prefix = "." if ALLOW_COMBINATOR_ON_SELF.include?(type) && value.first.nil?
def to_xpath(visitor)
prefix = if ALLOW_COMBINATOR_ON_SELF.include?(type) && value.first.nil?
"."
else
visitor.prefix
end
prefix + visitor.accept(self)
end

Expand Down
4 changes: 2 additions & 2 deletions lib/nokogiri/css/parser.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# frozen_string_literal: true
#
# DO NOT MODIFY!!!!
# This file is automatically generated by Racc 1.7.3
# This file is automatically generated by Racc 1.8.0
# from Racc grammar file "parser.y".
#

Expand Down Expand Up @@ -480,7 +480,7 @@ def _reduce_24(val, _values, result)
end

def _reduce_25(val, _values, result)
name = @namespaces.key?('xmlns') ? "xmlns:#{val[0]}" : val[0]
name = @namespaces&.key?('xmlns') ? "xmlns:#{val[0]}" : val[0]
result = Node.new(:ELEMENT_NAME, [name])

result
Expand Down
2 changes: 1 addition & 1 deletion lib/nokogiri/css/parser.y
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ rule
namespaced_ident:
namespace '|' IDENT { result = Node.new(:ELEMENT_NAME, [[val[0], val[2]].compact.join(':')]) }
| IDENT {
name = @namespaces.key?('xmlns') ? "xmlns:#{val[0]}" : val[0]
name = @namespaces&.key?('xmlns') ? "xmlns:#{val[0]}" : val[0]
result = Node.new(:ELEMENT_NAME, [name])
}
;
Expand Down
76 changes: 9 additions & 67 deletions lib/nokogiri/css/parser_extras.rb
Original file line number Diff line number Diff line change
@@ -1,70 +1,19 @@
# frozen_string_literal: true

require "thread"

module Nokogiri
module CSS
class Parser < Racc::Parser # :nodoc:
CACHE_SWITCH_NAME = :nokogiri_css_parser_cache_is_off

@cache = {}
@mutex = Mutex.new

class << self
# Return a thread-local boolean indicating whether the CSS-to-XPath cache is active. (Default is `true`.)
def cache_on?
!Thread.current[CACHE_SWITCH_NAME]
end

# Set a thread-local boolean to turn caching on and off. Truthy values turn the cache on, falsey values turn the cache off.
def set_cache(value) # rubocop:disable Naming/AccessorMethodName
Thread.current[CACHE_SWITCH_NAME] = !value
end

# Get the css selector in +string+ from the cache
def [](string)
return unless cache_on?

@mutex.synchronize { @cache[string] }
end

# Set the css selector in +string+ in the cache to +value+
def []=(string, value)
return value unless cache_on?

@mutex.synchronize { @cache[string] = value }
end

# Clear the cache
def clear_cache(create_new_object = false)
@mutex.synchronize do
if create_new_object
@cache = {}
else
@cache.clear
end
end
end

# Execute +block+ without cache
def without_cache(&block)
original_cache_setting = cache_on?
set_cache(false)
yield
ensure
set_cache(original_cache_setting)
end
end

# Create a new CSS parser with respect to +namespaces+
def initialize(namespaces = {})
def initialize(selector, ns: nil)
@selector = selector
@namespaces = ns
@tokenizer = Tokenizer.new
@namespaces = namespaces

super()
end

def parse(string)
@tokenizer.scan_setup(string)
def parse
@tokenizer.scan_setup(@selector)
do_parse
end

Expand All @@ -73,10 +22,9 @@ def next_token
end

# Get the xpath for +string+ using +options+
def xpath_for(string, prefix, visitor)
key = cache_key(string, prefix, visitor)
self.class[key] ||= parse(string).map do |ast|
ast.to_xpath(prefix, visitor)
def xpath(visitor)
parse.map do |ast|
ast.to_xpath(visitor)
end
end

Expand All @@ -85,12 +33,6 @@ def on_error(error_token_id, error_value, value_stack)
after = value_stack.compact.last
raise SyntaxError, "unexpected '#{error_value}' after '#{after}'"
end

def cache_key(query, prefix, visitor)
if self.class.cache_on?
[query, prefix, @namespaces, visitor.config]
end
end
end
end
end
40 changes: 40 additions & 0 deletions lib/nokogiri/css/selector_cache.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# frozen_string_literal: true

require "thread"

module Nokogiri
module CSS
module SelectorCache # :nodoc:
@cache = {}
@mutex = Mutex.new

class << self
# Get the css selector in +string+ from the cache
def [](key)
@mutex.synchronize { @cache[key] }
end

# Set the css selector in +string+ in the cache to +value+
def []=(key, value)
@mutex.synchronize { @cache[key] = value }
end

# Clear the cache
def clear_cache(create_new_object = false)
@mutex.synchronize do
if create_new_object
@cache = {}
else
@cache.clear
end
end
end

# Construct a unique key cache key
def key(ns:, selector:, visitor:)
[ns, selector, visitor.config]
end
end
end
end
end
18 changes: 16 additions & 2 deletions lib/nokogiri/css/xpath_visitor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,15 @@ module DoctypeConfig
VALUES = [XML, HTML4, HTML5]
end

# The visitor configuration set via the +builtins:+ keyword argument to XPathVisitor.new.
attr_reader :builtins

# The visitor configuration set via the +doctype:+ keyword argument to XPathVisitor.new.
attr_reader :doctype

# The visitor configuration set via the +prefix:+ keyword argument to XPathVisitor.new.
attr_reader :prefix

# :call-seq:
# new() → XPathVisitor
# new(builtins:, doctype:) → XPathVisitor
Expand All @@ -54,7 +63,11 @@ module DoctypeConfig
#
# [Returns] XPathVisitor
#
def initialize(builtins: BuiltinsConfig::NEVER, doctype: DoctypeConfig::XML)
def initialize(
builtins: BuiltinsConfig::NEVER,
doctype: DoctypeConfig::XML,
prefix: Nokogiri::XML::XPath::GLOBAL_SEARCH_PREFIX
)
unless BuiltinsConfig::VALUES.include?(builtins)
raise(ArgumentError, "Invalid values #{builtins.inspect} for builtins: keyword parameter")
end
Expand All @@ -64,6 +77,7 @@ def initialize(builtins: BuiltinsConfig::NEVER, doctype: DoctypeConfig::XML)

@builtins = builtins
@doctype = doctype
@prefix = prefix
end

# :call-seq: config() → Hash
Expand All @@ -72,7 +86,7 @@ def initialize(builtins: BuiltinsConfig::NEVER, doctype: DoctypeConfig::XML)
# a Hash representing the configuration of the XPathVisitor, suitable for use as
# part of the CSS cache key.
def config
{ builtins: @builtins, doctype: @doctype }
{ builtins: @builtins, doctype: @doctype, prefix: @prefix }
end

# :stopdoc:
Expand Down
8 changes: 3 additions & 5 deletions lib/nokogiri/decorators/slop.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,9 @@ def method_missing(name, *args, &block)
list = xpath("#{XPATH_PREFIX}#{name}[#{conds}]")
end
else
CSS::Parser.without_cache do
list = xpath(
*CSS.xpath_for("#{name}#{args.first}", prefix: XPATH_PREFIX),
)
end
list = xpath(
*CSS.xpath_for("#{name}#{args.first}", prefix: XPATH_PREFIX, cache: false),
)
end

super if list.empty?
Expand Down
13 changes: 5 additions & 8 deletions lib/nokogiri/xml/searchable.rb
Original file line number Diff line number Diff line change
Expand Up @@ -243,16 +243,13 @@ def css_rules_to_xpath(rules, ns)
end

def xpath_query_from_css_rule(rule, ns)
visitor = Nokogiri::CSS::XPathVisitor.new(
builtins: Nokogiri::CSS::XPathVisitor::BuiltinsConfig::OPTIMAL,
doctype: document.xpath_doctype,
)
self.class::IMPLIED_XPATH_CONTEXTS.map do |implied_xpath_context|
CSS.xpath_for(rule.to_s, {
visitor = Nokogiri::CSS::XPathVisitor.new(
builtins: Nokogiri::CSS::XPathVisitor::BuiltinsConfig::OPTIMAL,
doctype: document.xpath_doctype,
prefix: implied_xpath_context,
ns: ns,
visitor: visitor,
})
)
CSS.xpath_for(rule.to_s, ns: ns, visitor: visitor)
end.join(" | ")
end

Expand Down
1 change: 1 addition & 0 deletions nokogiri.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,7 @@ Gem::Specification.new do |spec|
"lib/nokogiri/css/parser.rb",
"lib/nokogiri/css/parser.y",
"lib/nokogiri/css/parser_extras.rb",
"lib/nokogiri/css/selector_cache.rb",
"lib/nokogiri/css/syntax_error.rb",
"lib/nokogiri/css/tokenizer.rb",
"lib/nokogiri/css/tokenizer.rex",
Expand Down
Loading