Skip to content

Make URI friendly to Ractor #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/uri.rb
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
# class RSYNC < Generic
# DEFAULT_PORT = 873
# end
# @@schemes['RSYNC'] = RSYNC
# register_scheme('RSYNC', RSYNC)
# end
# #=> URI::RSYNC
#
Expand Down
25 changes: 22 additions & 3 deletions lib/uri/common.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ module URI
Parser = RFC2396_Parser
RFC3986_PARSER = RFC3986_Parser.new

Ractor.make_shareable(RFC3986_PARSER) if defined?(Ractor)

# URI::Parser.new
DEFAULT_PARSER = Parser.new
DEFAULT_PARSER.pattern.each_pair do |sym, str|
Expand All @@ -27,6 +29,7 @@ module URI
DEFAULT_PARSER.regexp.each_pair do |sym, str|
const_set(sym, str)
end
Ractor.make_shareable(DEFAULT_PARSER) if defined?(Ractor)

module Util # :nodoc:
def make_components_hash(klass, array_hash)
Expand Down Expand Up @@ -62,10 +65,26 @@ def make_components_hash(klass, array_hash)

include REGEXP

@@schemes = {}
SCHEMES = {}.freeze
private_constant :SCHEMES

# Returns a Hash of the defined schemes.
def self.scheme_list
@@schemes
SCHEMES
end

# Registers a new scheme when adding custom URIs.
# Example:
# module URI
# class RSYNC < Generic
# DEFAULT_PORT = 873
# end
# register_scheme('RSYNC', RSYNC)
# end
def self.register_scheme(name, mod)
updated = SCHEMES.merge(name.to_s => mod).freeze
remove_const(:SCHEMES)
const_set(:SCHEMES, updated)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite hacky and is extremely inefficient as well: for every call to register_scheme you will typically invalidate all constant lookup caches under URI, and potentially cause deoptimizations for JITed code.
Also the fact it's exposed as a public API seems a huge footgun given this drawback.

There must be a better way to handle this.
If there isn't, then something should be improved in Ractor.

For instance maybe having SCHEMES initially mutable and at the end of lib/uri.rb make it shareable?
I guess that doesn't work for "Adding custom URIs" though.

I think using an instance variable on URI would work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's obviously not thread-safe and could lead to errors like no constant URI::SCHEMES.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great points. I agree that we might need to come up with something better.

To me this problem sounds generic to all Ruby apps: there's a boot phase when some things might get mutated, and there's a runtime phase where multiple threads/ractors will run where we want all global state to be frozen.

It would be ideal if we could somehow make that a pattern, and then we could freeze SCHEMES only once the code is fully loaded and everyone is done adding custom URIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for every call to register_scheme

which is going to be called a half dozen of times when loading the library...

the fact it's exposed as a public API

why is that an issue though? It looks to me as it is a feature that there's a API for this, although it need to be documented.

If it's not to be public, then it could be made private.

Also it's obviously not thread-safe

Right. That should be in the docs. I don't see this as an issue, do not change the configuration after your multi-thread/ractor app has started.

I think using an instance variable on URI would work.

It won't. You can't access instance variables from shareable objects (like Module or Ractor)

ruby -e 'String.singleton_class.attr_accessor :foo; String.foo = 42; Ractor.new { p String.foo }.take'
# => IsolationError

Copy link
Member

@marcandre marcandre Jan 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal if we could somehow make that a pattern

I'm planning on having a small ractor-config gem to do this but using the same trick of using const_set (with the same caveats, i.e. config only at load time coz it's not thread/ractor safe).

For anything that requires being mutable and shared between ractors, there is no other choice than to involve ractor communication. My gem ractor-server makes it easy, but it seems way overkill. A huge issue is also that all apps are currently single-ractor and that the first time another Ractor is started it changes things (I think it lowers performance too). So making a Ractor-compatible & safe config without starting a new Ractor is tricky and would require a dedicated Thread using Ractor.receive_if, but this also adds a constraint that the main Ractor can not ever do Ractor.receive (or else it could intercept communication for that config thread).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clear to everyone this is pretty ugly hack.
And IMHO it has no place in any stdlib or gem.
Rather, I think it shows a general problem of Ractor programming being too limiting, for no fundamental reason.
And I agree creating an extra Ractor for this is complete overkill (in terms of code, complexity, extra native thread, extra footprint).

@ko1 What do you think? This is a hack, yes?
There must be a better way.

What about allowing to read/write instance variables of modules, as long as the value is shareable?
Then it would be similar to constants, but it would actually make sense to change it multiple times.
And it would be thread-safe (no time where the ivar is not set except initially).

Another idea is we could have a hook just before the first non-main Ractor is started.
Then you could do something like Ractor.on_first_ractor { Ractor.make_shareable(URI::SCHEMES) }.
But I'm not sure it would work well, e.g., if some code creates a Ractor while code is still loading (and that code might be autoloaded so the order is not predictable, and even if loaded explicitly, gems are often loaded "in the middle of loading the app" which could still be problematic).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is going to be called a half dozen of times when loading the library...

Yes, maybe not a performance issue in this specific case, but the pattern clearly could be.
Do you know any other constant being redefined 6 times?
I think the more important point is redefining constants many times does not make sense conceptually.
It goes against the concept of constants itself.

Copy link
Member

@marcandre marcandre Jan 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about allowing to read/write instance variables of modules, as long as the value is shareable

That would be really nice 👍 (at minimum reading)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcandre Could file a issue on Redmine to suggest supporting reading/writing @ivars of Module? I'll support it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! https://bugs.ruby-lang.org/issues/17592
I've asked for read-only, I'll let you argue for read-write if it feels best to you.

end

#
Expand All @@ -74,7 +93,7 @@ def self.scheme_list
#
def self.for(scheme, *arguments, default: Generic)
if scheme
uri_class = @@schemes[scheme.upcase] || default
uri_class = SCHEMES[scheme.upcase] || default
else
uri_class = default
end
Expand Down
2 changes: 1 addition & 1 deletion lib/uri/file.rb
Original file line number Diff line number Diff line change
Expand Up @@ -90,5 +90,5 @@ def set_password(v)
end
end

@@schemes['FILE'] = File
register_scheme('FILE', File)
end
2 changes: 1 addition & 1 deletion lib/uri/ftp.rb
Original file line number Diff line number Diff line change
Expand Up @@ -262,5 +262,5 @@ def to_s
return str
end
end
@@schemes['FTP'] = FTP
register_scheme('FTP', FTP)
end
2 changes: 1 addition & 1 deletion lib/uri/http.rb
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,6 @@ def request_uri
end
end

@@schemes['HTTP'] = HTTP
register_scheme('HTTP', HTTP)

end
2 changes: 1 addition & 1 deletion lib/uri/https.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,5 @@ class HTTPS < HTTP
# A Default port of 443 for URI::HTTPS
DEFAULT_PORT = 443
end
@@schemes['HTTPS'] = HTTPS
register_scheme('HTTPS', HTTPS)
end
2 changes: 1 addition & 1 deletion lib/uri/ldap.rb
Original file line number Diff line number Diff line change
Expand Up @@ -257,5 +257,5 @@ def hierarchical?
end
end

@@schemes['LDAP'] = LDAP
register_scheme('LDAP', LDAP)
end
2 changes: 1 addition & 1 deletion lib/uri/ldaps.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ class LDAPS < LDAP
# A Default port of 636 for URI::LDAPS
DEFAULT_PORT = 636
end
@@schemes['LDAPS'] = LDAPS
register_scheme('LDAPS', LDAPS)
end
2 changes: 1 addition & 1 deletion lib/uri/mailto.rb
Original file line number Diff line number Diff line change
Expand Up @@ -289,5 +289,5 @@ def to_mailtext
alias to_rfc822text to_mailtext
end

@@schemes['MAILTO'] = MailTo
register_scheme('MAILTO', MailTo)
end
2 changes: 1 addition & 1 deletion lib/uri/ws.rb
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,6 @@ def request_uri
end
end

@@schemes['WS'] = WS
register_scheme('WS', WS)

end
2 changes: 1 addition & 1 deletion lib/uri/wss.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,5 @@ class WSS < WS
# A Default port of 443 for URI::WSS
DEFAULT_PORT = 443
end
@@schemes['WSS'] = WSS
register_scheme('WSS', WSS)
end
19 changes: 19 additions & 0 deletions test/uri/test_common.rb
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,25 @@ def test_extract
end
end

def test_ractor
return unless defined?(Ractor)
r = Ractor.new { URI.parse("https://ruby-lang.org/").inspect }
assert_equal(URI.parse("https://ruby-lang.org/").inspect, r.take)
end

def test_register_scheme
assert_equal(["FILE", "FTP", "HTTP", "HTTPS", "LDAP", "LDAPS", "MAILTO", "WS"].sort, URI.scheme_list.keys.sort)

original = URI.scheme_list.dup
begin
URI.register_scheme('FOOBAR', Module.new)
assert_equal(["FILE", "FTP", "HTTP", "HTTPS", "LDAP", "LDAPS", "MAILTO", "WS", "FOOBAR"].sort, URI.scheme_list.keys.sort)
ensure
URI.send(:remove_const, :SCHEMES)
URI.send(:const_set, :SCHEMES, original.freeze)
end
end

def test_regexp
EnvUtil.suppress_warning do
assert_instance_of Regexp, URI.regexp
Expand Down