Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-entrancy and RefCells #1068

Closed
jneem opened this issue Jun 29, 2020 · 12 comments
Closed

Re-entrancy and RefCells #1068

jneem opened this issue Jun 29, 2020 · 12 comments
Labels
architecture changes the architecture, usually breaking discussion needs feedback and ideas

Comments

@jneem
Copy link
Collaborator

jneem commented Jun 29, 2020

druid-shell has an issue with re-entrancy, particularly with file dialogs. The background is probably familiar to many of you, but let me include it anyway.

The main entry point from druid-shell to druid is the WinHandler trait, which druid-shell defines and druid implements. In the other direction, druid-shell provides druid with a WindowHandle, which druid can use to interact with the shell in various ways. An important difference between WinHandler and WindowHandle is that WinHandler's methods take &mut self, while WindowHandle's take &self. In particular, WindowHandle is cloneable and shareable and generally uses interior mutability to get stuff done.

A typical call stack looks like:

  1. the system calls into druid-shell, maybe because it got a mouse event or something
  2. druid-shell calls into druid through some method on WinHandler, like maybe WinHandler::mouse_up. Note that this usually involves borrowing the WinHandler using RefCell::borrow_mut.
  3. druid does a bunch of complicated stuff, including maybe calling some methods in druid-shell like WindowHandle::invalidate or WindowHandle::request_timer

All of this is fine, because (for example) WindowHandle::request_timer does its interior mutation quickly and returns right away. But WindowHandle has two problematic functions, open_file_sync and save_as_sync. As the names suggest, these are blocking methods. In particular, they return control to the system immediately. So now the call stack can continue like:
3) druid does a bunch of complicated stuff, including calling WindowHandle::open_file_sync
4) the system calls into druid-shell, maybe because it got a timer event or something
5) druid-shell wants to call some method on WinHandler, but it can't because its reference to WinHandler has already been RefCell::borrow_muted.

The current behavior is to ignore any events that arrive while the WinHandler is borrowed, but this isn't great. This problem comes up on several of the backends. I think only x11 and web are immune, and only because they don't implement the problematic methods on WindowHandle yet.

I can think of two ways to solve this issue. One is to queue the events instead of dropping them, but I want to advocate for another solution here: removing all blocking methods from WindowHandle. I think that we can do this while preserving the illusion of synchronicity by adding a notion of an "active" callback (as opposed to an idle callback). An idle callback is scheduled at some unspecified future time, and so in particular there might be several more mouse events that get processed after the idle request gets made and before it executes. An active callback would be guaranteed to get run before druid-shell returns control to the system. In the example call-stack above, the call to WinHandler in part 2 will return, then druid-shell will drop any std::cell::RefMuts and std::cell::Refs that it's holding, and then will call any queued active callbacks. Because of the dropping, there won't be issues with re-entrancy; because druid-shell doesn't return control to the system, the file dialog will still appear to be displayed synchronously.

What do you think?

@jneem jneem added discussion needs feedback and ideas architecture changes the architecture, usually breaking labels Jun 29, 2020
@raphlinus
Copy link
Contributor

This has been discussed before, probably the best place is #95. And yes, I think we should migrate those blocking calls to deferred calls. That can be effectively synchronous wrt the OS runloop. In other words, if in our event processing handler we request a file dialog, then the file dialog can be opened before the wndproc actually returns (but of course after the refcell returns). It would be easier, of course, to stick those in an idle handler, so "some future time" really is loose.

Incidentally, we had similar issues around the quit handler, and I believe that is just done as an idle callback at the moment. (Can search up the issues and PR's but it shouldn't be hard to find them)

@rhzk
Copy link
Collaborator

rhzk commented Jun 30, 2020

Just to add information to this issue. I have created a new function set_size() inside WindowHandle, that can be called by for example:
ctx.window().set_size(Size::new(200.0,200.0));
This code generates a WM_SIZE message and I am assuming a context switch into:

impl WndProc for MyWndProc {
    #[allow(clippy::cognitive_complexity)]
    fn window_proc(
        &self,
        hwnd: HWND,
        msg: UINT,
        wparam: WPARAM,
        lparam: LPARAM,
    ) -> Option<LRESULT> {
        //println!("wndproc msg: {}", msg);
        match msg {
            .................................
            WM_SIZE => unsafe {
                if let Ok(mut s) = self.state.try_borrow_mut() {
                    ........................
                } else {
                    self.log_dropped_msg(hwnd, msg, wparam, lparam);
                }
                Some(0)
            },

That always hits the else statement, causing the message to be "dropped" instead of handled. I am assuming this is somehow caused by a borrow lock on WndState caused by WindowHandle, maybe because of a circular reference between those 3?

@raphlinus
Copy link
Contributor

Yes, setting window size is another example of this reentrancy problem. I think the best solution is to defer calling that function until the RefCell borrow is dropped, same as quit and file dialog. Another possibility to consider is queuing messages that are delivered while the RefCell is borrowed, and processing that queue after the RefCell is dropped (at the same time as invoking the deferred functions in plan A). If it was just window size, the latter might be preferable, but for file dialogs, plan A is decidedly better, because in that case the RefCell borrow will be held for a very long time.

@raphlinus
Copy link
Contributor

raphlinus commented Sep 1, 2020

#1037 is one of many that's running into reentrancy issues. I have some comments in that PR, but want to follow up to the more general question of reentrancy, which is tracked in this issue.

I suggested in that issue the basic strategy of putting deferred ops into a queue, and I see that's what #1037 does, but there's the question of when the queue will be run. That PR adds a queue that can only be run from the application mainloop, which is problematic for reasons I've described inline.

There might be a solid solution to this problem: run the queue synchronously after the borrow of the mutable state is dropped. I've been sketching such a thing out, and think it might look something like this:

    fn with_state<T>(&self, mut f: impl FnMut(&mut WndState) -> Option<T>) -> Option<T> {
        let result = if let Ok(mut s) = self.state.try_borrow_mut() {
            let s = s.as_mut().unwrap();
            f(s)
        } else {
            ...log dropped message (may benefit from having at least msg passed in as an arg)...
            return None;
        };
        ...run queue here...
    }

I think that also can cut down on some of the boilerplate in the message handler (I wrestled with this in the past and was not satisfied with my solution, but I think my previous attempts were also complicated by the desire to thread a mutable WinCtx down, which is now removed). With that change, the WM_SETFOCUS handler (just to pick a simple one) becomes:

            WM_SETFOCUS => {
                self.with_state(|s| {
                    s.handler.got_focus();
                    Some(0)
                })
            }

I think this approach might be best for running the file dialog as well. We've already discussed an async interface for that (having it be a blocking function is problematic), but I think it might be nicer to run it quickly, before the wndproc returns, than to idle-schedule it so that it may run at some time in the future - there's a bunch of stuff that can happen in the meantime, and I think it reduces the risk of races.

What do people think?

@jneem
Copy link
Collaborator Author

jneem commented Oct 2, 2020

I ended up writing something similar while trying out a GTK implementation of alerts. It seems to work pretty well, so I'm going to have a stab at doing it for file dialogs too.

@jneem
Copy link
Collaborator Author

jneem commented Oct 15, 2020

#1302 solves this for GTK. The other backends needing support are windows and mac (since x11 and web don't support file dialogs yet anyway). I can probably do windows in the near future, but someone else will need to do mac

  • support non-synchronous file dialogs in gtk
  • support non-synchronous file dialogs in windows
  • support non-synchronous file dialogs on mac
  • remove the sync/non-sync compatibility layer

@cmyr
Copy link
Member

cmyr commented Oct 16, 2020

Thanks for taking this up @jneem, much appreciated! I can do the mac impl in the next week or so.

cmyr added a commit that referenced this issue Oct 19, 2020
This implements the new save/open (#1068) API on macOS.

The implementation here is slightly different; Cocoa uses the idea
of 'blocks' (a c extension for callbacks) in most of their async API,
so we don't need to to manually defer the work.
cmyr added a commit that referenced this issue Oct 20, 2020
This implements the new save/open (#1068) API on macOS.

The implementation here is slightly different; Cocoa uses the idea
of 'blocks' (a c extension for callbacks) in most of their async API,
so we don't need to to manually defer the work.
@jneem
Copy link
Collaborator Author

jneem commented Oct 29, 2020

The open and save dialogs are done. The only other thing I'm aware of is the context menus, but they should be easier because it doesn't involve API changes to druid-shell.

  • defer windows context menus
  • defer gtk context menus
  • defer mac context menus

(the mac one might be ok already? I can't check...)

@cmyr
Copy link
Member

cmyr commented Nov 3, 2020

I think mac is fine.

@jneem
Copy link
Collaborator Author

jneem commented Nov 19, 2020

With #1359 in, I don't think there's anything left here.

@jneem jneem closed this as completed Nov 19, 2020
@cmyr
Copy link
Member

cmyr commented Nov 19, 2020

@jneem awesome, thanks for all your work on this!

@raphlinus
Copy link
Contributor

Yes, great stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture changes the architecture, usually breaking discussion needs feedback and ideas
Projects
None yet
Development

No branches or pull requests

4 participants