Skip to content

Conversation

@bhansconnect
Copy link
Member

@bhansconnect bhansconnect commented Jul 10, 2024

Currently, when we are refcounting a list, we also refcount every single element. This is absolutely horrid from a performance perspective. It turns what could be an O(1) operation into O(n). This PR changes us to instead refcount the list separate from the elements. We only refcount elements when they are inserted or removed from a list. Thus, the only time we do O(n) refcounts is when freeing the underlying allocation for a list.

This work does make refcounting a bit more finicky and requires a bit of orchestration. Fundamentally, all zig builtins now have to think about refcounts in a new way. As such, we end up passing around more refcounting functions.

As part of this work, all refcounted lists must store their size on the heap. This is required so that a seamless slice referencing a list can free the underlying list. The seamless slice needs to be able to decrement the refcount of all elements in the underlying allocation if it needs to free the list as a whole.

Currently, this PR seems to be working for the llvm backend. It is likely to have some incorrect refcounting in places that leads to memory leaks, but seems functional otherwise.

That said, this PR has some hard to debug issues with the dev and wasm backend. They both have many failing tests. On top of that, many of the failures are hitting test cases that aren't even related to lists (This actually brings my hopes up. I probably changed something unrelated to lists and just need to undo that to get these tests working). Any help in debugging is greatly appreciated. Run cargo nextest-gen-wasm --release --no-fail-fast or cargo nextest-gen-dev --release --no-fail-fast to repro the current issues.

Note: Before merge, we also should update RustGlue to generate the RocRefcounted trait for all types it generates.

Fixes #6455

ASSERT(num_alloc <= rc_pointers_capacity, "Too many allocations %zd > %zd", num_alloc, rc_pointers_capacity);

size_t *rc_ptr = alloc_ptr_to_rc_ptr(allocated, alignment);
*rc_ptr = RC_LIST_INDICATOR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something non-obvious happening here so I think it's definitely worth adding a code comment. (When you get to it. I know it's still WIP at the moment.)
Are we going to require all roc_alloc implementations to do this?
Also, I notice it is happening unconditionally for every allocation, not just lists. I assume it will get overridden for other types? But why isn't there some Roc code somewhere to initialise the refcount for the list?

Copy link
Member Author

@bhansconnect bhansconnect Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is totally a hack that I should document better. It is so that we can figure out which allocations are lists and get their refcount. No other platforms will need this. It is strictly needed for the introspection without knowing the underlying data type. I'll add some comments.

Copy link
Member Author

@bhansconnect bhansconnect Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, removed that hack and just made it explicit in the tests:

    assert_refcounts!(
        indoc!(
            r#"
                s = Str.concat "A long enough string " "to be heap-allocated"
                list = [s, s, s]
                [list, list]
            "#
        ),
        RocList<RocList<RocStr>>,
        &[
            (StandardRC, Live(3)), // s
            (AfterSize, Live(2)),  // list
            (AfterSize, Live(1))   // result
        ]
    );

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really glad to see that the code patterns here were clear enough to follow!

Comment on lines 2961 to 2965
// We need to be able to increment the refcount of elements loaded.
for el in arg_elem_layouts.iter() {
let ptr = build_refcount_element_fn(backend, *el, HelperOp::IndirectInc);
backend.code_builder.i32_const(ptr);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a comment in the old code about decrementing. I think for a map2 on two lists of different lengths, the last few elements of the long list end up with a different refcount than the elements that have a corresponding entry in the short list.

Do we not need to do that any more? I have not been keeping up with the discussions 100% so maybe something has changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, with the new setup, we don't decrement the tail anymore.

In the new setup: The original List holds 1 reference to each element. For each element we copy out of the original list, we must increment the refcount of when copying the element. When the original list is deallocated, it will cleanup the tail.

self.code_builder.get_local(local_id);
self.code_builder.i32_const(extra_bytes as i32);
self.code_builder.i32_add();
self.call_host_fn_after_loading_args(bitcode::UTILS_ALLOCATE_WITH_REFCOUNT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@bhansconnect
Copy link
Member Author

Ok, I expect this to pass CI. That said, I disabled two glue tests, those will fail when re-enabled.

@bhansconnect bhansconnect marked this pull request as ready for review July 13, 2024 06:29
…s refcounted

This also requires zig bitcode to have access to the dec functions for elements.
This is needed so that zig will be able to free elements in lists.
This is required to properly handle refcounting of RocList.
Without it, we can't tell if we need to get the length from the heap.
That said, it isn't a pretty solution.

I think dealing with generating bespoke type in glue would feel nicer than this but be much more work.
It also would deal with the issue of implementations in the bitcode not matching external libraries.
That said, it would require exposing way more symbols from roc for each monomophorphized list variant.
This increased the number of failing tests for some reason (more segfaults).
That said, I think it is correct.

Probably just exposing errors from elsewher that haven't been addressed (maybe causing a double free).
I needed to update RustGlue.roc to generate this.
That said, I want to see and measure results.
As such, I am just manually implementing this for now!
Just inc by 1. No need for inc by n.
Copy link
Contributor

@rtfeldman rtfeldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is AMAZING!!!!! 🤩 🤩 🤩 🤩 🤩

Thank you so much for making this happen @bhansconnect!!! I'm really stoked to see how many situations get massively faster from this. Easily one of the most impactful performance-improving PRs in the history of the project - this is really, really excellent! 😍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Store Shared List Size on the Heap

4 participants