Skip to content

Conversation

@polytypic
Copy link
Collaborator

@polytypic polytypic commented Jan 31, 2023

This PR proposes a complete redesign of the API that divides the API into four modules Backoff, Loc, Op, and Tx. The diffs show how to change client code to use the new API.

+module Loc = Kcas.Loc
+module Op = Kcas.Op
+module Tx = Kcas.Tx

As proposed in this PR, Loc is a module that has a signature that is essentially compatible with the signature of the stdlib Atomic module. The main difference is that several operations take an optional backoff (and that there is a Backoff module). Client code wishing to perform multiple compare_and_set operations atomically could theoretically just switch to use Loc instead of Atomic.

-Atomic
+Loc
-Kcas.cas
+Loc.compare_and_set

The constructor function ref is also renamed to make.

-Kcas.ref initial
+Loc.make initial

The nameref, while not a keyword in OCaml, has a long history and it is probably better to avoid it. The term "location" is used e.g. in the papers on the OCaml memory model.

The previous API contained two related operations named try_map and map. I believe this is a mistake in a couple of ways. First of all, the mnemonic map is generally used for functional updates while kcas is fundamentally imperative. Second, the map operations had arguably cumbersome signatures and the signature of map was also imprecise. This PR proposes a single simpler update operation that also requires fewer allocations. Note how in this PR several other operations are implemented in terms of update.

-match Kcas.map r (fun c -> if condition c then None else Some (next c)) with
-| Success c -> (* success *)
-| Failed | Aborted -> (* aborted *)
+match Loc.update r (fun c -> if condition c then raise Exit else next c) with
+| c -> (* success *)
+| exception Exit -> (* aborted *)
-match Kcas.try_map r (fun c -> if condition c then None else Some (next c)) with
-| Success c -> (* success *)
-| Failed -> (* failed *)
-| Aborted -> (* aborted *)
+let c = Loc.get r in
+if condition c then (* aborted *)
+else if Loc.compare_and_set r c (next c) then (* success *)
+else (* failure *)

The previous API had a W1 module. That has been removed, because it didn't provide anything substantial beyond of what the stdlib Atomic already provides.

-Kcas.W1
+Atomic

Actually the W1 module did provide map and try_map. It would probably make sense to move the Backoff module to the stdlib and also to add an Atomic.update function. I leave that to future work.

This PR also includes a redesign of the Backoff module to use an internal representation as a single immutable int and the once operation with t -> t type. This way Backoff does not require memory allocations during a backoff loop and possibility of false sharing is also reduced. The default lower and upper backoff values are chosen to approximate what is currently in the lockfree library Backoff module.

The atypically named kCAS operation has been renamed to atomically and commit is now called atomic. They are now inside the Op module.

-Kcas.kCAS
+Op.atomically
-Kcas.commit
+Op.atomic
-Kcas.mk_cas
+Op.mk_cas
-Kcas.is_on_ref
+Op.is_on_loc

This PR also adds an API as the Tx module for performing transactions on shared memory locations. One may consider the transaction API as a higher-level alternative to constructing a list of operations as with the Op module. Transactions can be composed sequentially and conditionally. This PR includes two different transactional queues and a stack and demonstrates that one can transfer elements between different data structures atomically.

Here is an example of committing a transaction that swaps the values of the two shared memory references x_loc and y_loc and returns their sum:

Tx.(
  commit begin
    let* x = get x_loc
    and* y = get y_loc in
    let+ () = set y_loc x
    and+ () = set x_loc y in
    x + y
  end
)

One potentially interesting avenue for further work would be to extend the algorithms and this library to support efficient compare-only operations. One can express a kind of compare operation as mk_cas loc expected expected. While that works, it writes to the location potentially causing contention and resulting in poor performance. Efficient read-only transactions, for example, could be a useful addition to kcas. As mentioned in

Nonblocking k-compare-single-swap
Luchangco, Moir, Shavit

operation with multiple compares and a single swap also has uses. I'll leave this to further work.

@polytypic polytypic changed the title Refine the API of the library Complete redesign of the kcas library API Jan 31, 2023
@polytypic polytypic linked an issue Jan 31, 2023 that may be closed by this pull request
@polytypic polytypic force-pushed the refine-api branch 3 times, most recently from e82c756 to 8352001 Compare January 31, 2023 19:35
@polytypic polytypic marked this pull request as ready for review January 31, 2023 19:40
@polytypic polytypic force-pushed the refine-api branch 5 times, most recently from 1602567 to 5f785bd Compare January 31, 2023 20:31
@polytypic polytypic force-pushed the refine-api branch 17 times, most recently from d016e86 to 822abf3 Compare February 1, 2023 16:50
@polytypic polytypic force-pushed the refine-api branch 7 times, most recently from 34ddbe7 to 6150f30 Compare February 6, 2023 07:41
@polytypic
Copy link
Collaborator Author

polytypic commented Feb 6, 2023

The Tx mechanism is fundamentally limited as it does not support non-busy wait or blocking. Unfortunately, adding full support for blocking seems like it would be outside the scope of the underlying algorithms as it would add significant dependencies and/or significant overheads.

However, there might be practical ways to extend the kcas API to allow it to support low overhead blocking transactions on top of the underlying transaction log mechanism.

To support blocking, one essentially needs a way to signal waiters. After mutating some locations the mutator signals waiters. For a scalable mechanism that signal needs to be selective and only wake up those waiters that are interested in the mutated locations.

To associate waiters with locations in a truly low-overhead fashion, one possibility would be to allow locations to be "tagged":

module Loc : sig
  type ('tag, 'a) tagged

  val make_tagged: 'tag -> 'a -> ('tag, 'a) tagged
  val get_tag : ('tag, 'a) tagged -> 'tag

  type 'a t = (unit, 'a) t
  
  (* ... *)

In a blocking transaction mechanism that 'tag could be a bag of the waiters of changes to the location.

Additionally, a scalable blocking mechanism also needs to be able to efficiently figure out which locations have been read and which have been written. A waiter needs to add itself to the read locations and a mutator needs to signal waiters of written locations.

module Tx : sig
  (* ... *)

  module Log : sig
    type t

    type 'r reducer = {
      one : 't 'a. ('t, 'a) Loc.tagged -> 'r;
      zero : 'r;
      plus : 'r -> 'r -> 'r;
    }

    val reduce : 'r reducer -> t -> 'r
    (** [reduce reducer] performs a fold over the transaction log. *)
  end

  exception Retry of unit t
  (** Exception raised by {!reset_and_retry}. *)

  val reset_and_retry : (Log.t -> unit t) -> 'a t
  (** [reset_and_retry on_read] returns a transaction that resets the current
      transaction such that it only reads from the accessed locations.  The
      [on_read] function is then called with the internal transaction log to
      construct a transaction that is then composed after the current
      transaction.  The composed transaction [tx] is then raised as a
      [Retry tx] exception. *)

  val written: (Log.t -> unit t) -> 'a t -> 'a t
  (** [written on_written tx] returns a transaction that executes as [tx] and
      then calls the given function with a view of the internal transaction log
      restricted to the written locations.  The returned transaction is then
      composed after the transaction [tx].

      The intended use case for [written] is to extend a transaction to signal
      waiters in a blocking transaction mechanism:

      {[
        let rec blocking_tx tx =
          let all_waiters = Loc.make [] in
          match
            tx
            |> written (fun log ->
                 (* remove all waiters of all written locations
                    and add them to the [all_waiters] list. *)
               )
            |> attempt
          with
          | result ->
            (* signal [all_waiters] *)
            result
          | exception Exit ->
            blocking_tx tx
          | exception Retry add_waiters_tx -> (
            match attempt add_waiters_tx with
            | () ->
              (* blocking wait *)
              blocking_tx tx
            | exception Exit ->
              (* Locations were already mutated before waiters could be added *)
              blocking_tx tx)
      ]} *)

The idea of (resetting and) extended transactions with the waiter operations is that this way the kcas mechanism itself checks whether the waiters should be added (as the read locations didn't change during the original transaction and the addition of waiters — if either fails then the transaction can be just retried without blocking) or signaled (as the mutations, including taking all the waiters, were completed successfully).

The above is only a preliminary idea. I have not yet fully implemented the above to verify it in practise.

Here is the blocking_tx example with proper highlighting:

        let rec blocking_tx tx =
          let all_waiters = Loc.make [] in
          match
            tx
            |> written (fun log ->
                 (* remove all waiters of all written locations
                    and add them to the [all_waiters] list. *)
               )
            |> attempt
          with
          | result ->
            (* signal [all_waiters] *)
            result
          | exception Exit ->
            blocking_tx tx
          | exception Retry add_waiters_tx -> (
            match attempt add_waiters_tx with
            | () ->
              (* blocking wait *)
              blocking_tx tx
            | exception Exit ->
              (* Locations were already mutated before waiters could be added *)
              blocking_tx tx)

Of course, a proper implementation would be a bit more complicated with things like backoff.

Copy link
Contributor

@bartoszmodelski bartoszmodelski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The new interface is a lot cleaner. I left a few comments with questions and minor suggestions.

I experimented a bit with transactions and they look solid to me.

Comment on lines +26 to +32
let create ?(lower_wait_log = 4) ?(upper_wait_log = 17) () =
assert (
0 <= lower_wait_log
&& lower_wait_log <= upper_wait_log
&& upper_wait_log <= max_wait_log);
(upper_wait_log lsl (bits * 2))
lor (lower_wait_log lsl bits) lor lower_wait_log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaict there's no way in ocaml to pack this efficiently into a record or tuple of bytes, giving justification to this representation. Do you think it'd be useful to have a separate lib for packing multiple shorts into an int?

executed by {!once}. *)

val default : t
(** [default] is equivalent to [create ()]. *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this exposed for performance reasons?

@polytypic
Copy link
Collaborator Author

BTW, one thing that I'd love to see is that we'd have documentation from different library versions simultaneously available in the gh-pages. I worked on a such thing for one of own projects the other weekend and I could prepare some scripts for kcas to do the same.

@bartoszmodelski
Copy link
Contributor

This sounds good for API reference (with all other info being in the readme, eio-style). Do you have it up for the other project already?

@polytypic polytypic force-pushed the refine-api branch 5 times, most recently from 344c2e4 to 903ed8e Compare February 9, 2023 10:09
@polytypic
Copy link
Collaborator Author

This sounds good for API reference (with all other info being in the readme, eio-style). Do you have it up for the other project already?

Yes and no. I was planning to use it e.g. in par-ml and idle-domains, because they have multiple branches with different approaches. I developed the scripts to do it, but I have yet to actually use it on those projects.

But the basic idea is very simple. The script first clones gh-pages branch and clears it completely (locally). It then iterates through all the git (branches or) tags, runs the commands to generate documentation, and if generation was successful, copies the generated documentation to the gh-pages clone under the name of the (branch or) tag. So, you'll have a directory per doc version. To link to the documentation you just add the (branch or) tag name to the URL path:

-https://ocaml-multicore.github.io/kcas/doc/
+https://ocaml-multicore.github.io/kcas/main/

Copy link
Contributor

@bartoszmodelski bartoszmodelski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@polytypic polytypic force-pushed the refine-api branch 2 times, most recently from df244b3 to 1b782f8 Compare February 9, 2023 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

First-impression notes on the API

3 participants