std/tasks: "small functions" optimization

Currently std/tasks is implemented the following way:

```Nim
type
  Task* = object ## `Task` contains the callback and its arguments.
    callback: proc (args: pointer) {.nimcall, gcsafe.}
    args: pointer
    destroy: proc (args: pointer) {.nimcall, gcsafe.}
```

Any function call with an argument will trigger an allocation. This is something that runtimes that use std/task can't avoid.
Given that runtimes also have to allocate a future/flowvar, scheduling a task always incur allocation overhead even if there are just 1 or 2 arguments that needs to be passed.[1]

GCC and Clang have an optimization where std:function can store up to 8 (?) bytes intrusive to the data structure and uses the heap otherwise.

It would significantly reduce memory overhead and memory fragmentation for long-running tasking runtimes if this was implemented:
- Tasks are most useful in a multithreaded context and they are moved around threads.
  Reclaiming memory allocated from another thread is non-trivial for allocators and requires synchronization and advanced techniques to avoid contention like extreme freelist sharding in [the Mimalloc allocator](https://www.microsoft.com/en-us/research/uploads/prod/2019/06/mimalloc-tr-v1.pdf)
- Tasks may be quickly allocated and deallocated leading to fragmentation especially if args are small.
- As the number of cores grows, fine-grained tasks become more important to exploit multiple cores, leading to more std/tasks overhead

--

_[1]: Runtime actually don't have to allocate a future/flowvar if those don't escape, they can use the caller stackframe if the compiler helps with heap allocation elision: (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2477r0.html#design-and-example, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html)_

## Proposal

For a to be decided size N, change the type to:
```Nim
type
  Task* = object ## `Task` contains the callback and its arguments.
    callback: proc (args: pointer) {.nimcall, gcsafe.}
    args: pointer
    destroy: proc (args: pointer) {.nimcall, gcsafe.}
    argsBuffer: array[N, byte]
```

`args` would point to either the buffer or the heap, change would be minimal and would mainly require changing this:
```Nim
let scratchObjPtrType = quote do:
  cast[ptr `scratchObjType`](c_calloc(csize_t 1, csize_t sizeof(`scratchObjType`)))
```
to something in that vein.
```Nim
let scratchObjPtrType = quote do:
  when sizeof(`scratchObjType`) <= N:
    cast[ptr `scratchObjType`](task.buffer.addr)
  else:
    cast[ptr `scratchObjType`](c_calloc(csize_t 1, csize_t sizeof(`scratchObjType`)))
```

at https://github.com/nim-lang/Nim/blob/d102b2f54c41f19a0545f28109d0a93550b5d886/lib/std/tasks.nim#L187

## Discussion of the buffer size

The current task object has a size of 24 bytes.
- Buffer of size 8: this rounds up the total size to 32, a nice power of 2. However the majority of function need more than 1 argument.
- Buffer of size 16 and 24: many functions have the form `proc(a, b): result` or `proc(r, a, b)`, this would cover those efficiently.
- Buffer of size 40:
  - this rounds up the total size to 64, a nice power of 2.
  - 64 bytes is the cache-line size on most CPUs (when it's not 128), if a task is aligned, it can be loaded from L1 cache in a single cache read.
  - This buffer size allows to store up to 5 ints/pointer arguments or 2 openarrays + 1 pointer/int (for example a callback when doing parallel `map` or a pointer to a result location)
    Working on large openarrays is often a very compelling reason to divide work in tasks.
  - x86-64 can pass up to 6 arguments through registers and ARM64 up to 8 so there won't be stack spilling with that buffer size.
  - Also due to false sharing a runtime that stores tasks contiguously for example in a shared task queue must make sure that they each start 64 bytes (or even 128 bytes due to CPU prefetching 2 cache-lines at a time).
Even if tasks were smaller, space would be unused.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

std/tasks: "small functions" optimization #443

Proposal

Discussion of the buffer size

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

std/tasks: "small functions" optimization #443

Description

Proposal

Discussion of the buffer size

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions