Skip to content

Task: avoid double handler installation #101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gasche
Copy link
Contributor

@gasche gasche commented Jan 5, 2023

Explanation of the suspected issue:

(This suspected issue was noticed by @clef-men; he convinced me that this looks credible and we wrote the proposed fix together.)

The Task implementation seems to be slightly wasteful in that, if we understand correctly, it may install its effect handler twice. This occurs in the following scenario:

  • call Task.run, under which you
  • create a promise with async
  • await this promise
  • when await gives control back, two step handlers are on the stack

Two handlers are on the stack because:

  • when the current task is paused by await, the continuation k contains the deep handler of the surrounding step call:

    let cont v (k, c) = Multi_channel.send c (Work (fun _ -> continue k v))

    invoking this Work function will thus reinstate the surrounding deep handler.

  • when this continuation is received by another worker, step is called again, installing a second handler

    let rec worker task_chan =
      match Multi_channel.recv task_chan with
      | Quit -> Multi_channel.clear_local_state task_chan
      | Work f -> step f (); worker task_chan

At this point, if we understand correctly, there are two step handlers on the call stack. Note that this does not grow to an unbounded number of nested handlers: on the next Wait effect, the inner handler either continues immediately (still 2 handlers) or pushes the current continuation to a Pending queue and returns, popping the two handlers. Once this continuation is invoked again under step, we are back to 2 handlers.

Note that in some case the current implementation does need the step call in worker, because the function argument of Work f does not systematically include a handler:

let async pool f =
  let pd = get_pool_data pool in
  let p = Atomic.make (Pending []) in
  Multi_channel.send pd.task_chan (Work (fun _ -> do_task f p));
  p

Explanation of our proposed fix:

This PR proposes to fix the issue by enforcing the invariant that the Work function always includes its own handler for Task effects: for the Work functions that use continue and discontinue there is nothing to do, for the Work function of async we call step at this point.

Another approach would be to use a shallow handler, and use step in the same way as currently (around each Work function), but this may be slightly slower -- we would be exactly encoding a deep handler using a shallow handler. (The current code reads like the authors had the shallow-handler semantics in mind.)

Co-authored-by: Clement Allain clement.allain@inria.fr

Explanation of the suspected issue:

(This suspected issue was noticed by @clef-men; he convinced me that
this looks credible and we wrote the proposed fix together.)

The Task implementation seems to be slightly wasteful in that, if we
understand correctly, it may install its effect handler twice. This
occurs in the following scenario:

- call Task.run, under which you
- create a promise with `async`
- `await` this promise
- when `await` gives control back, two `step` handlers are on the stack

Two handlers are on the stack because:

- when the current task is paused by `await`,
  the continuation `k` contains the deep handler of the surrounding
  `step` call:

   ```ocaml
   let cont v (k, c) = Multi_channel.send c (Work (fun _ -> continue k v))
   ```

  invoking this Work function will thus reinstate the surrounding deep handler.

- when this continuation is received by another worker, `step` is
  called again, installing a second handler

   ```ocaml
   let rec worker task_chan =
     match Multi_channel.recv task_chan with
     | Quit -> Multi_channel.clear_local_state task_chan
     | Work f -> step f (); worker task_chan
   ```

At this point, if we understand correctly, there are two `step`
handlers on the call stack. Note that this does not grow to an
unbounded number of nested handlers: on the next Wait effect, the
inner handler either continues immediately (still 2 handlers) or
pushes the current continuation to a Pending queue and returns,
popping the two handlers. Once this continuation is invoked again
under `step`, we are back to 2 handlers.

Note that in some case the current implementation does need the `step`
call in `worker`, because the function argument of `Work f` does not
systematically include a handler:

```ocaml
let async pool f =
  let pd = get_pool_data pool in
  let p = Atomic.make (Pending []) in
  Multi_channel.send pd.task_chan (Work (fun _ -> do_task f p));
  p
```

Explanation of our proposed fix:

This PR proposes to fix the issue by enforcing the invariant that the
Work function always includes its own handler for Task effects: for
the Work functions that use `continue` and `discontinue` there is
nothing to do, for the Work function of `async` we call `step` at this
point.

Another approach would be to use a shallow handler, and use `step` in
the same way as currently (around each Work function), but this may be
slightly slower -- we would be exactly encoding a deep handler using
a shallow handler. (The current code reads like the authors had the
shallow-handler semantics in mind.)

Co-authored-by: Clement Allain <clement.allain@inria.fr>
Copy link
Contributor

@Sudha247 Sudha247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @gasche @clef-men. The explanation makes sense to me.

I don't have a good mental model of how the handlers are implemented under the hood, I could confirm this by walking through it in gdb. (@kayceesrk or others would know about this)

@Sudha247 Sudha247 merged commit 5cfd180 into ocaml-multicore:master Jan 23, 2023
Sudha247 added a commit to Sudha247/opam-repository that referenced this pull request Jul 18, 2023
CHANGES:

* Add parallel_find (ocaml-multicore/domainslib#90, @gasche)
* Update CI (ocaml-multicore/domainslib#93, @Sudha247)
* Optimisation to work-stealing (ocaml-multicore/domainslib#96, @art-w)
* Improve docs presentation (ocaml-multicore/domainslib#99, @metanivek)
* Property based tests (ocaml-multicore/domainslib#100, jmid)
* Task: avoid double handler installation (ocaml-multicore/domainslib#101, @gasche & @clef-men)
* Fix a benign data-race in Chan reported by ocaml-tsan (ocaml-multicore/domainslib#103, @art-w)
* Dune, opam, and GitHub Actions fixes (ocaml-multicore/domainslib#105, @MisterDA)
* domain local await support (ocaml-multicore/domainslib#107, @polytypic)
* Windows run on GitHub Actions (ocaml-multicore/domainslib#110, @Sudha247)
* Adjust PBTs based on recommended_domain_count (ocaml-multicore/domainslib#112, @jmid)
* Test condition tweaks (ocaml-multicore/domainslib#113, @jmid)
nberth pushed a commit to nberth/opam-repository that referenced this pull request Jun 18, 2024
CHANGES:

* Add parallel_find (ocaml-multicore/domainslib#90, @gasche)
* Update CI (ocaml-multicore/domainslib#93, @Sudha247)
* Optimisation to work-stealing (ocaml-multicore/domainslib#96, @art-w)
* Improve docs presentation (ocaml-multicore/domainslib#99, @metanivek)
* Property based tests (ocaml-multicore/domainslib#100, jmid)
* Task: avoid double handler installation (ocaml-multicore/domainslib#101, @gasche & @clef-men)
* Fix a benign data-race in Chan reported by ocaml-tsan (ocaml-multicore/domainslib#103, @art-w)
* Dune, opam, and GitHub Actions fixes (ocaml-multicore/domainslib#105, @MisterDA)
* domain local await support (ocaml-multicore/domainslib#107, @polytypic)
* Windows run on GitHub Actions (ocaml-multicore/domainslib#110, @Sudha247)
* Adjust PBTs based on recommended_domain_count (ocaml-multicore/domainslib#112, @jmid)
* Test condition tweaks (ocaml-multicore/domainslib#113, @jmid)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants