Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smaller Refcounts #23

Merged
merged 1 commit into from
Apr 29, 2014
Merged

Smaller Refcounts #23

merged 1 commit into from
Apr 29, 2014

Conversation

cgaebel
Copy link

@cgaebel cgaebel commented Mar 29, 2014

No description provided.

@alexcrichton
Copy link
Member

This was discussed at great length on the original pr as well as in a meeting.

This RFC is quite terse and has very little statistical information to base any claims on. This would likely be a more amenable RFC with concrete real-world statistics, more thought out names, and more analysis into why the original decisions should be overturned. The rationale of "but there's one extra word" is not precise enough to have this be more actionable.

@cgaebel
Copy link
Author

cgaebel commented Mar 29, 2014

Thanks for the links! I didn't know this was previously discussed.

I'll get more data.

@huonw
Copy link
Member

huonw commented Mar 29, 2014

With allocators with size classes, how often does the weak count actually promote an allocation to the next class?

(Also, rustc probably won't be using just Rc. The ast etc are likely to be better suited to being an arena.)

@eddyb
Copy link
Member

eddyb commented Mar 30, 2014

I actually switched the focus of my P experiment to an owning AST model instead of a sharing one, after I ended up with an i1 ref-count and realized the ast_map can do without sharing.

There's a few good reasons against sharing AST nodes - and if there currently was a way to sneak duplicates past the last folding stage (ast_map itself), you could use safe code to bypass borrowck.

@huonw
Copy link
Member

huonw commented Apr 13, 2014

Something like this came up on /r/rust maybe we could have:

pub struct Rc<T, Strong=uint, Weak=uint> {
     data: *mut RcBox<T, Strong, Weak>
}
struct RcBox<T, Strong, Weak> {
    data: T,
    strong: Cell<Strong>,
    weak: Cell<Weak>
}

impl<T, Strong: Num, Weak: Num> Rc<T, Strong, Weak> {
    fn downgrade(&self) -> Weak<T, Strong, Weak> { ... }
}

// no assumptions on Weak here
impl<T, Strong: Copy + Num, Weak> Clone for Rc<T, Strong, Weak> {
    fn clone() {
        ...
    }
}

Notably this would allow instantiating something like Rc<T, uint, ()> to avoid weak counts statically (the downgrade method would be designed to not work with (), by having a trait bound it doesn't satisfy), without needing to duplicate the functionality.

This would presumably also require CheckedAdd, so that Rc<T, u8, u8> worked correctly, i.e. didn't wrap for something with 256 references. One might be concerned about this from a performance perspective for the common case of Rc<T> (i.e. both uint) but theoretically we could have something like

fn clone(&self) -> Rc<T, Strong, Weak> {
    if Strong::max_value() < uint::max_value() {
        // use checked_add
    } else {
        // range is large enough, just add directly
    }
}

(which trades a little bit of no-opt performance for opt performance.)

(NB. this is mostly a chain of thought, so I haven't worked through the details.)

@esummers
Copy link

I'm not sure if this is doable in Rust, but Objective-C does smart pointers (via Automatic Reference Counting) with the reference count stored in the most significant 16-bits of the pointer on 64-bit platforms. That allows them to use a single word to represent a smart pointer since weak references are stored in the object itself (weak references are a bit different in Obj-C since they are tracked by pointer and zeroed when the original goes away). Apparently it also helps with atomic updates to the reference count since everything is in a single word.

@huonw
Copy link
Member

huonw commented Apr 13, 2014

Doesn't that mean they're storing the reference count at the non-shared end of pointers? i.e. copying/cloning one reference isn't reflected in the counts of other references.

@dobkeratops
Copy link

maybe its embedded in their object class info pointer or something

@thestinger
Copy link

I think it would be better to consider exposing a subset of the type without weak pointer support before adding another failure case.

@esummers
Copy link

My bad, I obviously was not thinking clearly. They store the ref count in the isa pointer, not the object pointer. That technique would only work for some sort of smart pointer to structs that have virtual inheritance. https://www.mikeash.com/pyblog/friday-qa-2013-09-27-arm64-and-you.html

@dobkeratops
Copy link

to throw another idea in there, could a compact refcount select an extended area when it reaches a certain value .. probably too much complexity. I think anyone actually selecting a compact refcount would be doing it for a good reason ... eg, I know i'm not going to have over 4billion objects refering to a texture, because the vast majority of the 8gb memory is actually storing textures, not objects)

@esummers
Copy link

Why are two words used anyway? Can't the weak ref and strong ref be 32-bit on 64-bit platforms and 16-bit on 32-bit platforms?

EDIT: OK, answer is safety.

@thestinger
Copy link

You can certainly make more than 65536 references on 32-bit, and more than 4294967296 on 64-bit. In addition to making the type less scalable, it would break memory safety without adding overflow checks.

@huonw
Copy link
Member

huonw commented Apr 13, 2014

You can easily have more than 65536 references to an object on a 32 bit platform

let x = Rc::new(1);
let v = range(0, 1_000_000).map(|_| x.clone()).collect::<Vec<Rc<int>>>();

(You may regard this as unlikely, but it's still entirely possible. uint is the smallest type guaranteed to be large enough to store the maximum number of references.)

@thestinger
Copy link

Creating more than 2^32 references at 8 bytes a pop only requires 32GiB of memory.

@cgaebel
Copy link
Author

cgaebel commented Apr 13, 2014

How expensive would it be to just have the refcount overflow check, where the "handling" code is just statically predicted as unlikely? Wouldn't this just be a well predicted jump conditional on the overflow register? Isn't that crazy cheap?

Also, this is a cost that's only paid when creating and destroying a bunch of references to something, which shouldn't be especially common. As far as I know, the "best practice" when dealing with refcounts is to borrow it and use the borrowed pointer as much as possible.

@dobkeratops
Copy link

I can see the default useful to most people is uint refcounts. There are situations where you control the number of objects though.. eg. when you explicitely manage textures and numbers of objects to fit within memory and frame-rate budgets There's plenty of situations where you might be using 64bit adress space but 32 or even 16 bits worth of 'count' handles any 'management'. In games the majority of memory is textures, then vertex arrays, and the CPU doesn't traverse these at fine grain, it just tells the GPU what to do with large batches. But back on the xbox360 and ps3 we were kept very busy shaving bytes off control structures to prevent cache misses that crippled the cpu, and fiddling with alignment to keep things on cache-lines boundaries. (... and reworking things to avoid branches.. whch also crippled its pipeline sadly, worst of all worlds - even extra checks wouldn't have been acceptable, you'd have needed to ensure you had the option to compile them out)

@esummers
Copy link

I think if you have a specialty case, you would just make your own RC for your crate. It probably doesn't make sense to have something in std unless it is safe.

@thestinger
Copy link

@cgaebel: Adding new sources of unwinding is never cheap. It breaks many optimization passes all the way up the stack. If it called abort then sure, it would likely only result in wasted instruction cache space. However, that's not what Rust does when it encounters failures like this.

@dobkeratops
Copy link

i guess if the language has HKT in future algorithms will be able to abstract over custom pointer types :)

@dobkeratops
Copy link

Would you consider the same thing for vectors, & slices..;

struct Vec<T,IndexType=uint> {
    len: IndexType,
    cap: IndexType,
    ptr:*mut T
}
impl<T,IndexType> Index for Vec<T,IndexType> {
    fn index(i:IndexType)->&T {... }
}

That would end a lot of the pain I was having with casting indices , in the right way.

I gather the rust compiler itself has u32 node id's. Its this middle ground of machines with 4,8,16 mb where 64bit addressing is overkill,but 32bits is insufficient and segmenting things into multiple 32 bit spaces per resource works well.

u32 indexing would be my most common case

I know the servo people also perceive problems with pointer overhead, they want to express a node hierarchy, I would suspect their use case might suit this sort of thing...an array of nodes and 32bit indexing, or 32bit offsets within an arena with a max size of 4gb for the DOM when running on phones ..

with objects of 16byte alignment (which you want for SIMD vec4 types) a 32bit index is sufficient to cover 64 gb, and its more likely your memory is divided between different classes of resource anyway

i've also heard talk of a 'smallvector' elsewhere. Might parameterizing the index (and allocator) mean the Vec can do that job.

@thestinger
Copy link

i've also heard talk of a 'smallvector' elsewhere. Might parameterizing the index (and allocator) mean the Vec can do that job.

The small vector optimization is the opposite of what you're proposing.

@dobkeratops
Copy link

is there a link describing the 'small vector' then . i'd also heard slices 'might fill a niche a bit like small vectors', but i think slices can be slices into large vectors...
do i just need to roll my own to be content

@thestinger
Copy link

Slices and smaller index fields are both unrelated to the small vector optimization.

The libc++ implementation of std::string is still 24 bytes on x86_64 (pointer, length, capacity) but is capable of storing up to 23 byte strings directly in the object itself without performing dynamic allocation. One byte is used to distinguish between large and small strings and record the small string length. In this case, reducing the size of the length and capacity fields would even be counter-productive.

A more general small vector: http://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h

@brson brson merged commit a1d6e9e into rust-lang:master Apr 29, 2014
@brson
Copy link
Contributor

brson commented Apr 29, 2014

Merged as RFC 13, but not accepted. This discussion was had previously when the original decision was made to merge Rc with weak refcounting. Although it's a difficult tradeoff, the extra word was seen as worth it to reduce the number of refcounted types.

@dobkeratops
Copy link

was the idea of parameterizing the refcount type itself also rejected? i know theres' details with introducing a fail case if you use a smaller refcount type, but this would save the community from implementing their own variations to get the desired behaviour (somethign that will happen many times, independantly). for those of us who target machines with 4-16mb ram , uint counts and indices everywhere are wasteful, and 32bit builds are insufficient

withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
@Centril Centril added the A-allocation Proposals relating to allocation. label Nov 23, 2018
wycats pushed a commit to wycats/rust-rfcs that referenced this pull request Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-allocation Proposals relating to allocation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants