Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for user-level flat combining #1612

Closed
catern opened this issue Jun 12, 2020 · 8 comments
Closed

Support for user-level flat combining #1612

catern opened this issue Jun 12, 2020 · 8 comments

Comments

@catern
Copy link
Contributor

catern commented Jun 12, 2020

Flat combining is a name coined by the paper "Flat Combining and the Synchronization-Parallelism
Tradeoff"
for a synchronization technique where a single thread dynamically takes on the role of handling requests from other threads to perform mutations on a datastructure. This can provide performance improvements, and simplifies implementation of the datastructure - no need for fine-grained concurrency.

In trio, threads (tasks) are cheap, so it might seem pointless - just spawn a dedicated thread to service requests to mutate the datastructure. But spawning a thread requires a nursery for that thread, and complicates management for objects. It's much simpler for an object which is concurrently accessed to just pick one of the threads accessing it to perform mutations, as in flat combining.

I've implemented a lot of flat-combining manually in libraries using trio. It would be nice if there was a built-in way as part of trio to do this. This might combine nicely with support for kill-safe synchronization abstractions which I've also done a lot in trio, painfully manually.

@njsmith
Copy link
Member

njsmith commented Jun 12, 2020

Can you post an example of what your manual version looks like, versus what it might look like with library support?

@catern
Copy link
Contributor Author

catern commented Jun 12, 2020

Sure, one example of usage is https://github.com/catern/rsyscall/blob/master/python/rsyscall/epoller.py#L160 which uses https://github.com/catern/rsyscall/blob/master/python/rsyscall/concurrency.py#L7 which briefly says:

    A better API would be a "shared coroutine" which runs whenever any
    other coroutine is waiting on it, and is suspended if no other
    coroutine is waiting on it. A shared coroutine also must not
    require entering a contextmanager to create. We should try to get
    that kind of API merged into trio/asyncio.

I don't have a concrete suggestion for how to fit that into the trio API.

@njsmith
Copy link
Member

njsmith commented Jun 15, 2020

Ah, in that case I think this is a duplicate of #266

@oremanj
Copy link
Member

oremanj commented Jun 26, 2020

Closing in favor of #266.

@oremanj oremanj closed this as completed Jun 26, 2020
@catern
Copy link
Contributor Author

catern commented Jun 26, 2020

Er, I didn't comment before, but this is not a duplicate. I said one possible way to implement it would be #266, but that's just one implementation strategy. There are other ways this could be done - it's a completely independent feature.

@njsmith
Copy link
Member

njsmith commented Jun 26, 2020

@catern In that case, can you give more details about how this is different?

@catern
Copy link
Contributor Author

catern commented Jun 26, 2020

Well, the paper coining (mentioned in my original post) the term describes an implementation based on a lock and a list of requesters:

1.1 Flat Combining in a nutshell
In its simplest form, the idea behind flat combining is to have a given sequential data structure D be protected by a
lock and have an associated dynamic publication list of a size proportional to the number of threads that are
concurrently accessing it (see Figure 1). Each thread accessing D for the first time adds a thread-local publication
record to the list, and publishes all its successive access/modification requests using a simple write to the request
field of its publication record (there is no need for a store-load memory barrier). In each access, after writing its
request, it checks if the shared lock is free, and if so attempts to acquire it using a compare-and-set (CAS)
operation. A thread that successfully acquires the lock becomes a combiner : it scans the list, collects pending
requests, applies the combined requests to D, writes the results back to the threads’ request fields in the associated
publication records, and releases the lock. A thread that detects that some other thread already owns the lock, spins on
its record, waiting for the owner to return a response in the request field, at which point it knows the published
request has been applied to D. Once in a while, a combining thread will perform a cleanup operation on the publication
list. During this cleanup it will remove records not recently used (we will later explain in more detail the mechanism
for determining use times), so as to shorten the length of the combining traversals. Thus, in each repeated access
request, if a thread has an active publication record, it will use it, and if not, it will create a new record and
insert it into the list.

That's a quite different implementation which could also be made easier by support in trio.

@njsmith
Copy link
Member

njsmith commented Jun 26, 2020

Yeah, I clicked through to the paper :-). My trouble is that they're talking about a very different scenario: implementing low-level threadsafe data structures with minimal locking overhead. In Trio this case is trivial because of being single-threaded. OTOH, we have to deal with cancellation, which they don't. So it seems like our challenges are totally different, and I'm not quite seeing how to translate their ideas into something that makes sense in the Trio context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants