-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Currently all run-time dispatch methods ultimately rely on an atomic relaxed read, and then a jump to the read address (whether directly or via a jump table). This is mostly fine as modern day CPU branch target predictors are incredibly efficient and intelligent, meaning that beyond the first few calls each successive call should be fairly quick. However, no matter what, this still costs CPU cycles.
The best solution would be to replace the jump address in the machine code directly at run-time, similar to something like static-keys, or even potentially replace every call to the function (although I have no clue how to implement something like that). However, handling data races here is extremely complicated as relaxed atomics would no longer cut it for modifying code.
This has the problem though that it would be extremely dependent on architecture (as each architecture has their own machine code for jump instructions and such), and operating system (certain operating systems are fussy about modifying executable memory). Adding additional features that would only be beneficial to a limited number of operating systems and architectures is out of scope for what I am currently aiming to implement, but I will happily accept any PR that does this so long as there are no regressions for other systems.