-
-
Notifications
You must be signed in to change notification settings - Fork 367
List refcounting perf fix: List size on heap #6894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ASSERT(num_alloc <= rc_pointers_capacity, "Too many allocations %zd > %zd", num_alloc, rc_pointers_capacity); | ||
|
|
||
| size_t *rc_ptr = alloc_ptr_to_rc_ptr(allocated, alignment); | ||
| *rc_ptr = RC_LIST_INDICATOR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is something non-obvious happening here so I think it's definitely worth adding a code comment. (When you get to it. I know it's still WIP at the moment.)
Are we going to require all roc_alloc implementations to do this?
Also, I notice it is happening unconditionally for every allocation, not just lists. I assume it will get overridden for other types? But why isn't there some Roc code somewhere to initialise the refcount for the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is totally a hack that I should document better. It is so that we can figure out which allocations are lists and get their refcount. No other platforms will need this. It is strictly needed for the introspection without knowing the underlying data type. I'll add some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, removed that hack and just made it explicit in the tests:
assert_refcounts!(
indoc!(
r#"
s = Str.concat "A long enough string " "to be heap-allocated"
list = [s, s, s]
[list, list]
"#
),
RocList<RocList<RocStr>>,
&[
(StandardRC, Live(3)), // s
(AfterSize, Live(2)), // list
(AfterSize, Live(1)) // result
]
);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really glad to see that the code patterns here were clear enough to follow!
| // We need to be able to increment the refcount of elements loaded. | ||
| for el in arg_elem_layouts.iter() { | ||
| let ptr = build_refcount_element_fn(backend, *el, HelperOp::IndirectInc); | ||
| backend.code_builder.i32_const(ptr); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a comment in the old code about decrementing. I think for a map2 on two lists of different lengths, the last few elements of the long list end up with a different refcount than the elements that have a corresponding entry in the short list.
Do we not need to do that any more? I have not been keeping up with the discussions 100% so maybe something has changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, with the new setup, we don't decrement the tail anymore.
In the new setup: The original List holds 1 reference to each element. For each element we copy out of the original list, we must increment the refcount of when copying the element. When the original list is deallocated, it will cleanup the tail.
| self.code_builder.get_local(local_id); | ||
| self.code_builder.i32_const(extra_bytes as i32); | ||
| self.code_builder.i32_add(); | ||
| self.call_host_fn_after_loading_args(bitcode::UTILS_ALLOCATE_WITH_REFCOUNT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
|
Ok, I expect this to pass CI. That said, I disabled two glue tests, those will fail when re-enabled. |
…s refcounted This also requires zig bitcode to have access to the dec functions for elements. This is needed so that zig will be able to free elements in lists.
This is required to properly handle refcounting of RocList. Without it, we can't tell if we need to get the length from the heap. That said, it isn't a pretty solution. I think dealing with generating bespoke type in glue would feel nicer than this but be much more work. It also would deal with the issue of implementations in the bitcode not matching external libraries. That said, it would require exposing way more symbols from roc for each monomophorphized list variant.
This increased the number of failing tests for some reason (more segfaults). That said, I think it is correct. Probably just exposing errors from elsewher that haven't been addressed (maybe causing a double free).
I needed to update RustGlue.roc to generate this. That said, I want to see and measure results. As such, I am just manually implementing this for now!
Just inc by 1. No need for inc by n.
This reverts commit 0854a1a.
This reverts commit a8b4d0e.
a51630b to
c5c2617
Compare
rtfeldman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is AMAZING!!!!! 🤩 🤩 🤩 🤩 🤩
Thank you so much for making this happen @bhansconnect!!! I'm really stoked to see how many situations get massively faster from this. Easily one of the most impactful performance-improving PRs in the history of the project - this is really, really excellent! 😍
Currently, when we are refcounting a list, we also refcount every single element. This is absolutely horrid from a performance perspective. It turns what could be an O(1) operation into O(n). This PR changes us to instead refcount the list separate from the elements. We only refcount elements when they are inserted or removed from a list. Thus, the only time we do O(n) refcounts is when freeing the underlying allocation for a list.
This work does make refcounting a bit more finicky and requires a bit of orchestration. Fundamentally, all zig builtins now have to think about refcounts in a new way. As such, we end up passing around more refcounting functions.
As part of this work, all refcounted lists must store their size on the heap. This is required so that a seamless slice referencing a list can free the underlying list. The seamless slice needs to be able to decrement the refcount of all elements in the underlying allocation if it needs to free the list as a whole.
Currently, this PR seems to be working for the llvm backend. It is likely to have some incorrect refcounting in places that leads to memory leaks, but seems functional otherwise.
That said, this PR has some hard to debug issues with the dev and wasm backend. They both have many failing tests. On top of that, many of the failures are hitting test cases that aren't even related to lists (This actually brings my hopes up. I probably changed something unrelated to lists and just need to undo that to get these tests working). Any help in debugging is greatly appreciated. Run
cargo nextest-gen-wasm --release --no-fail-fastorcargo nextest-gen-dev --release --no-fail-fastto repro the current issues.Note: Before merge, we also should update
RustGlueto generate theRocRefcountedtrait for all types it generates.Fixes #6455