Skip to content

Make URI friendly to Ractor #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

kirs
Copy link

@kirs kirs commented Jan 6, 2021

This PR is an updated version of ruby/ruby#4007.

Right now, something as basic as parsing an URI cannot be done from a Ractor (I raised that earlier in https://bugs.ruby-lang.org/issues/17180):

r = Ractor.new do
  res = URI.parse("https://ruby-lang.org/")
  puts res.inspect
end

Ractor.select(r)

fails with:

#<Thread:0x00007fbdac057028 run> terminated with exception (report_on_exception is true):
/opt/rubies/3.0.0/lib/ruby/3.0.0/uri/common.rb:68:in `scheme_list': can not access class variables from non-main Ractors (Ractor::IsolationError)
	from notractor.rb:31:in `block in <main>'
<internal:ractor>:345:in `select': thrown by remote Ractor. (Ractor::RemoteError)
	from notractor.rb:37:in `<main>'
/opt/rubies/3.0.0/lib/ruby/3.0.0/uri/common.rb:68:in `scheme_list': can not access class variables from non-main Ractors (Ractor::IsolationError)
	from notractor.rb:31:in `block in <main>'

IMO, something as tiny and safe as URI::parse should be allowed to be called from a Ractor.

This PR changes the URI class to no longer use class instance variables, and makes it a frozen Hash that is safe to be used by multiple Ractors.

I'm open to other ideas how we can make it work.

@ko1 @marcandre @akr

Related issue on ruby-lang: https://bugs.ruby-lang.org/issues/17569

@kirs kirs force-pushed the ractor-friendly branch 2 times, most recently from 41ac183 to a57e1a5 Compare January 6, 2021 15:05
@kirs kirs force-pushed the ractor-friendly branch from a57e1a5 to 6bb0c09 Compare January 7, 2021 11:42
@kirs
Copy link
Author

kirs commented Jan 22, 2021

Ping @ko1 @marcandre @akr - I'd love to make URI working on Ractor and I'd appreciate your reviews. Thanks!

@marcandre
Copy link
Member

marcandre commented Jan 22, 2021

Looks good. Did you check the inspect methods? I don't understand why they are written this way, or why they are needed at all tbh, but I imagine that calling inspect from a Ractor would not work, right?

    @@to_s = Kernel.instance_method(:to_s)
    def inspect
      @@to_s.bind_call(self)
    end

@kirs
Copy link
Author

kirs commented Jan 22, 2021

URI#inspect does seem to work. I've added a test just in case.

@kirs
Copy link
Author

kirs commented Jan 22, 2021

Oh, I see that you're referring to URI::RFC2396_Parser and URI::RFC3986_Parser. They probably won't work, but I haven't seen anything calling inspect on a parser class in real life (e.g. in Sinatra stack)

@marcandre
Copy link
Member

Oh, I see that you're referring to URI::RFC2396_Parser and URI::RFC3986_Parser. They probably won't work, but I haven't seen anything calling inspect on a parser class in real life (e.g. in Sinatra stack)

It is always quite difficult to know what happens in real life; they would be best fixed, although it could be in a different PR if you don't want to do it.

def self.register_scheme(name, mod)
updated = SCHEMES.merge(name.to_s => mod).freeze
remove_const(:SCHEMES)
const_set(:SCHEMES, updated)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite hacky and is extremely inefficient as well: for every call to register_scheme you will typically invalidate all constant lookup caches under URI, and potentially cause deoptimizations for JITed code.
Also the fact it's exposed as a public API seems a huge footgun given this drawback.

There must be a better way to handle this.
If there isn't, then something should be improved in Ractor.

For instance maybe having SCHEMES initially mutable and at the end of lib/uri.rb make it shareable?
I guess that doesn't work for "Adding custom URIs" though.

I think using an instance variable on URI would work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's obviously not thread-safe and could lead to errors like no constant URI::SCHEMES.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great points. I agree that we might need to come up with something better.

To me this problem sounds generic to all Ruby apps: there's a boot phase when some things might get mutated, and there's a runtime phase where multiple threads/ractors will run where we want all global state to be frozen.

It would be ideal if we could somehow make that a pattern, and then we could freeze SCHEMES only once the code is fully loaded and everyone is done adding custom URIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for every call to register_scheme

which is going to be called a half dozen of times when loading the library...

the fact it's exposed as a public API

why is that an issue though? It looks to me as it is a feature that there's a API for this, although it need to be documented.

If it's not to be public, then it could be made private.

Also it's obviously not thread-safe

Right. That should be in the docs. I don't see this as an issue, do not change the configuration after your multi-thread/ractor app has started.

I think using an instance variable on URI would work.

It won't. You can't access instance variables from shareable objects (like Module or Ractor)

ruby -e 'String.singleton_class.attr_accessor :foo; String.foo = 42; Ractor.new { p String.foo }.take'
# => IsolationError

Copy link
Member

@marcandre marcandre Jan 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal if we could somehow make that a pattern

I'm planning on having a small ractor-config gem to do this but using the same trick of using const_set (with the same caveats, i.e. config only at load time coz it's not thread/ractor safe).

For anything that requires being mutable and shared between ractors, there is no other choice than to involve ractor communication. My gem ractor-server makes it easy, but it seems way overkill. A huge issue is also that all apps are currently single-ractor and that the first time another Ractor is started it changes things (I think it lowers performance too). So making a Ractor-compatible & safe config without starting a new Ractor is tricky and would require a dedicated Thread using Ractor.receive_if, but this also adds a constraint that the main Ractor can not ever do Ractor.receive (or else it could intercept communication for that config thread).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clear to everyone this is pretty ugly hack.
And IMHO it has no place in any stdlib or gem.
Rather, I think it shows a general problem of Ractor programming being too limiting, for no fundamental reason.
And I agree creating an extra Ractor for this is complete overkill (in terms of code, complexity, extra native thread, extra footprint).

@ko1 What do you think? This is a hack, yes?
There must be a better way.

What about allowing to read/write instance variables of modules, as long as the value is shareable?
Then it would be similar to constants, but it would actually make sense to change it multiple times.
And it would be thread-safe (no time where the ivar is not set except initially).

Another idea is we could have a hook just before the first non-main Ractor is started.
Then you could do something like Ractor.on_first_ractor { Ractor.make_shareable(URI::SCHEMES) }.
But I'm not sure it would work well, e.g., if some code creates a Ractor while code is still loading (and that code might be autoloaded so the order is not predictable, and even if loaded explicitly, gems are often loaded "in the middle of loading the app" which could still be problematic).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is going to be called a half dozen of times when loading the library...

Yes, maybe not a performance issue in this specific case, but the pattern clearly could be.
Do you know any other constant being redefined 6 times?
I think the more important point is redefining constants many times does not make sense conceptually.
It goes against the concept of constants itself.

Copy link
Member

@marcandre marcandre Jan 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about allowing to read/write instance variables of modules, as long as the value is shareable

That would be really nice 👍 (at minimum reading)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcandre Could file a issue on Redmine to suggest supporting reading/writing @ivars of Module? I'll support it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! https://bugs.ruby-lang.org/issues/17592
I've asked for read-only, I'll let you argue for read-write if it feels best to you.

@hsbt
Copy link
Member

hsbt commented Jun 25, 2021

I've merged #22 . If you have any concerns about this, Can you file a new issue or pull-request?

Thanks.

@eregon
Copy link
Member

eregon commented Jun 25, 2021

I found a clean solution to support Ractor in #26 by using a module to map scheme names to scheme classes, which also works with Ractor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants