Skip to content

std/tasks: "small functions" optimization #443

Open
@mratsim

Description

@mratsim

Currently std/tasks is implemented the following way:

type
  Task* = object ## `Task` contains the callback and its arguments.
    callback: proc (args: pointer) {.nimcall, gcsafe.}
    args: pointer
    destroy: proc (args: pointer) {.nimcall, gcsafe.}

Any function call with an argument will trigger an allocation. This is something that runtimes that use std/task can't avoid.
Given that runtimes also have to allocate a future/flowvar, scheduling a task always incur allocation overhead even if there are just 1 or 2 arguments that needs to be passed.[1]

GCC and Clang have an optimization where std:function can store up to 8 (?) bytes intrusive to the data structure and uses the heap otherwise.

It would significantly reduce memory overhead and memory fragmentation for long-running tasking runtimes if this was implemented:

  • Tasks are most useful in a multithreaded context and they are moved around threads.
    Reclaiming memory allocated from another thread is non-trivial for allocators and requires synchronization and advanced techniques to avoid contention like extreme freelist sharding in the Mimalloc allocator
  • Tasks may be quickly allocated and deallocated leading to fragmentation especially if args are small.
  • As the number of cores grows, fine-grained tasks become more important to exploit multiple cores, leading to more std/tasks overhead

--

[1]: Runtime actually don't have to allocate a future/flowvar if those don't escape, they can use the caller stackframe if the compiler helps with heap allocation elision: (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2477r0.html#design-and-example, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html)

Proposal

For a to be decided size N, change the type to:

type
  Task* = object ## `Task` contains the callback and its arguments.
    callback: proc (args: pointer) {.nimcall, gcsafe.}
    args: pointer
    destroy: proc (args: pointer) {.nimcall, gcsafe.}
    argsBuffer: array[N, byte]

args would point to either the buffer or the heap, change would be minimal and would mainly require changing this:

let scratchObjPtrType = quote do:
  cast[ptr `scratchObjType`](c_calloc(csize_t 1, csize_t sizeof(`scratchObjType`)))

to something in that vein.

let scratchObjPtrType = quote do:
  when sizeof(`scratchObjType`) <= N:
    cast[ptr `scratchObjType`](task.buffer.addr)
  else:
    cast[ptr `scratchObjType`](c_calloc(csize_t 1, csize_t sizeof(`scratchObjType`)))

at https://github.com/nim-lang/Nim/blob/d102b2f54c41f19a0545f28109d0a93550b5d886/lib/std/tasks.nim#L187

Discussion of the buffer size

The current task object has a size of 24 bytes.

  • Buffer of size 8: this rounds up the total size to 32, a nice power of 2. However the majority of function need more than 1 argument.
  • Buffer of size 16 and 24: many functions have the form proc(a, b): result or proc(r, a, b), this would cover those efficiently.
  • Buffer of size 40:
    • this rounds up the total size to 64, a nice power of 2.
    • 64 bytes is the cache-line size on most CPUs (when it's not 128), if a task is aligned, it can be loaded from L1 cache in a single cache read.
    • This buffer size allows to store up to 5 ints/pointer arguments or 2 openarrays + 1 pointer/int (for example a callback when doing parallel map or a pointer to a result location)
      Working on large openarrays is often a very compelling reason to divide work in tasks.
    • x86-64 can pass up to 6 arguments through registers and ARM64 up to 8 so there won't be stack spilling with that buffer size.
    • Also due to false sharing a runtime that stores tasks contiguously for example in a shared task queue must make sure that they each start 64 bytes (or even 128 bytes due to CPU prefetching 2 cache-lines at a time).
      Even if tasks were smaller, space would be unused.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions