Task: avoid double handler installation #101

gasche · 2023-01-05T16:48:22Z

Explanation of the suspected issue:

(This suspected issue was noticed by @clef-men; he convinced me that this looks credible and we wrote the proposed fix together.)

The Task implementation seems to be slightly wasteful in that, if we understand correctly, it may install its effect handler twice. This occurs in the following scenario:

call Task.run, under which you
create a promise with async
await this promise
when await gives control back, two step handlers are on the stack

Two handlers are on the stack because:

when the current task is paused by await, the continuation k contains the deep handler of the surrounding step call:
```
let cont v (k, c) = Multi_channel.send c (Work (fun _ -> continue k v))
```
invoking this Work function will thus reinstate the surrounding deep handler.

when this continuation is received by another worker, step is called again, installing a second handler

let rec worker task_chan =
  match Multi_channel.recv task_chan with
  | Quit -> Multi_channel.clear_local_state task_chan
  | Work f -> step f (); worker task_chan

At this point, if we understand correctly, there are two step handlers on the call stack. Note that this does not grow to an unbounded number of nested handlers: on the next Wait effect, the inner handler either continues immediately (still 2 handlers) or pushes the current continuation to a Pending queue and returns, popping the two handlers. Once this continuation is invoked again under step, we are back to 2 handlers.

Note that in some case the current implementation does need the step call in worker, because the function argument of Work f does not systematically include a handler:

let async pool f =
  let pd = get_pool_data pool in
  let p = Atomic.make (Pending []) in
  Multi_channel.send pd.task_chan (Work (fun _ -> do_task f p));
  p

Explanation of our proposed fix:

This PR proposes to fix the issue by enforcing the invariant that the Work function always includes its own handler for Task effects: for the Work functions that use continue and discontinue there is nothing to do, for the Work function of async we call step at this point.

Another approach would be to use a shallow handler, and use step in the same way as currently (around each Work function), but this may be slightly slower -- we would be exactly encoding a deep handler using a shallow handler. (The current code reads like the authors had the shallow-handler semantics in mind.)

Co-authored-by: Clement Allain clement.allain@inria.fr

@clef-men

Explanation of the suspected issue: (This suspected issue was noticed by @clef-men; he convinced me that this looks credible and we wrote the proposed fix together.) The Task implementation seems to be slightly wasteful in that, if we understand correctly, it may install its effect handler twice. This occurs in the following scenario: - call Task.run, under which you - create a promise with `async` - `await` this promise - when `await` gives control back, two `step` handlers are on the stack Two handlers are on the stack because: - when the current task is paused by `await`, the continuation `k` contains the deep handler of the surrounding `step` call: ```ocaml let cont v (k, c) = Multi_channel.send c (Work (fun _ -> continue k v)) ``` invoking this Work function will thus reinstate the surrounding deep handler. - when this continuation is received by another worker, `step` is called again, installing a second handler ```ocaml let rec worker task_chan = match Multi_channel.recv task_chan with | Quit -> Multi_channel.clear_local_state task_chan | Work f -> step f (); worker task_chan ``` At this point, if we understand correctly, there are two `step` handlers on the call stack. Note that this does not grow to an unbounded number of nested handlers: on the next Wait effect, the inner handler either continues immediately (still 2 handlers) or pushes the current continuation to a Pending queue and returns, popping the two handlers. Once this continuation is invoked again under `step`, we are back to 2 handlers. Note that in some case the current implementation does need the `step` call in `worker`, because the function argument of `Work f` does not systematically include a handler: ```ocaml let async pool f = let pd = get_pool_data pool in let p = Atomic.make (Pending []) in Multi_channel.send pd.task_chan (Work (fun _ -> do_task f p)); p ``` Explanation of our proposed fix: This PR proposes to fix the issue by enforcing the invariant that the Work function always includes its own handler for Task effects: for the Work functions that use `continue` and `discontinue` there is nothing to do, for the Work function of `async` we call `step` at this point. Another approach would be to use a shallow handler, and use `step` in the same way as currently (around each Work function), but this may be slightly slower -- we would be exactly encoding a deep handler using a shallow handler. (The current code reads like the authors had the shallow-handler semantics in mind.) Co-authored-by: Clement Allain <clement.allain@inria.fr>

Sudha247

Thanks for the PR @gasche @clef-men. The explanation makes sense to me.

I don't have a good mental model of how the handlers are implemented under the hood, I could confirm this by walking through it in gdb. (@kayceesrk or others would know about this)

@gasche

CHANGES: * Add parallel_find (ocaml-multicore/domainslib#90, @gasche) * Update CI (ocaml-multicore/domainslib#93, @Sudha247) * Optimisation to work-stealing (ocaml-multicore/domainslib#96, @art-w) * Improve docs presentation (ocaml-multicore/domainslib#99, @metanivek) * Property based tests (ocaml-multicore/domainslib#100, jmid) * Task: avoid double handler installation (ocaml-multicore/domainslib#101, @gasche & @clef-men) * Fix a benign data-race in Chan reported by ocaml-tsan (ocaml-multicore/domainslib#103, @art-w) * Dune, opam, and GitHub Actions fixes (ocaml-multicore/domainslib#105, @MisterDA) * domain local await support (ocaml-multicore/domainslib#107, @polytypic) * Windows run on GitHub Actions (ocaml-multicore/domainslib#110, @Sudha247) * Adjust PBTs based on recommended_domain_count (ocaml-multicore/domainslib#112, @jmid) * Test condition tweaks (ocaml-multicore/domainslib#113, @jmid)

@gasche

CHANGES: * Add parallel_find (ocaml-multicore/domainslib#90, @gasche) * Update CI (ocaml-multicore/domainslib#93, @Sudha247) * Optimisation to work-stealing (ocaml-multicore/domainslib#96, @art-w) * Improve docs presentation (ocaml-multicore/domainslib#99, @metanivek) * Property based tests (ocaml-multicore/domainslib#100, jmid) * Task: avoid double handler installation (ocaml-multicore/domainslib#101, @gasche & @clef-men) * Fix a benign data-race in Chan reported by ocaml-tsan (ocaml-multicore/domainslib#103, @art-w) * Dune, opam, and GitHub Actions fixes (ocaml-multicore/domainslib#105, @MisterDA) * domain local await support (ocaml-multicore/domainslib#107, @polytypic) * Windows run on GitHub Actions (ocaml-multicore/domainslib#110, @Sudha247) * Adjust PBTs based on recommended_domain_count (ocaml-multicore/domainslib#112, @jmid) * Test condition tweaks (ocaml-multicore/domainslib#113, @jmid)

Sudha247 approved these changes Jan 9, 2023

View reviewed changes

Sudha247 merged commit 5cfd180 into ocaml-multicore:master Jan 23, 2023

Sudha247 mentioned this pull request Jul 18, 2023

[new release] domainslib (0.5.1) ocaml/opam-repository#24119

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Task: avoid double handler installation #101

Task: avoid double handler installation #101

Uh oh!

gasche commented Jan 5, 2023

Uh oh!

Sudha247 left a comment

Uh oh!

Uh oh!

Task: avoid double handler installation #101

Task: avoid double handler installation #101

Uh oh!

Conversation

gasche commented Jan 5, 2023

Uh oh!

Sudha247 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!