-
Notifications
You must be signed in to change notification settings - Fork 10
Improve performance of FFI calls with struct parameters #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
There were a few concerns with the proposals.
Further optimizations that are proposed now:
One point of contention here: I'm personally not perfectly happy with how this is implemented. The function objects themselves are inlined into the forms which is a bit unclean, since functions aren't unfortunately really data in clojure. it works, but it's not the perfect solution. ideally we would wrap the whole expression with a
Before the thread-loca-arena there was a lot of actual allocation when calling a native function: But with the Looking at memory profiling, the creation of and with
|




Hi, In this pull request I propose some performance improvements that mainly target
ffi/defcfndefined functions taking struct arguments.While developing a test application using the latest version of coffi, i noticed that repeatedly calling a function which took arguments that are to be serialized via
defstructdefined serdes, performance took a big hit. Profiling resulted that the majority of time is spent inmem/size-ofandmem/align-of:The reason for this is, that the serde-wrapper for FFI functions called
mem/alloc-instanceandmem/serialize-intofor non-primitive arguments, which results in calls tomem/size-ofandmem/align-of, which will actually go throughmem/c-layout.mem/c-layoutcan be a pretty expensive function on top of being a multimethod but here it is called multiple times for every argument.One optimization proposed here is to memoize calls to
mem/size-ofandmem/align-ofwhose argument is not aMemoryLayout.This improved performance, but unfortunately not by as much as i had hoped.
Therefore, another improvement proposed here is generating a call to
mem/allocwith the size and alignment baked in, instead of doing so every time the FFI call is made usingmem/alloc-instance.The next bottleneck was actually the call to
mem/serialize-intowhich had a similar issue asmem/alloc-instance, needing to dispatch on the multimethodmem/type-dispatchwith the serde descriptor.Leveraging the serde registry introduced with the defstruct macro, we can allow for an inline solution, should one exist in the registry. Together with the addition of some type hints for
defstructserdes, this eliminated all calls tomem/size-of,mem/align-ofandmem/type-dispatchaltogether and is my last proposed optimization.With these changes in place, the actual allocation of the segments in the confined arena becomes the dominant cost of the whole FFI call, suggesting little other performance gains:
I unfortunately don't have a rigourous benchmark for the impact of the improvements, but in my private raylib example I started with an fps of around 100 and ended somewhere over 4000.