-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smaller Refcounts #23
Conversation
This was discussed at great length on the original pr as well as in a meeting. This RFC is quite terse and has very little statistical information to base any claims on. This would likely be a more amenable RFC with concrete real-world statistics, more thought out names, and more analysis into why the original decisions should be overturned. The rationale of "but there's one extra word" is not precise enough to have this be more actionable. |
Thanks for the links! I didn't know this was previously discussed. I'll get more data. |
With allocators with size classes, how often does the weak count actually promote an allocation to the next class? (Also, rustc probably won't be using just |
I actually switched the focus of my There's a few good reasons against sharing AST nodes - and if there currently was a way to sneak duplicates past the last folding stage ( |
Something like this came up on /r/rust maybe we could have: pub struct Rc<T, Strong=uint, Weak=uint> {
data: *mut RcBox<T, Strong, Weak>
}
struct RcBox<T, Strong, Weak> {
data: T,
strong: Cell<Strong>,
weak: Cell<Weak>
}
impl<T, Strong: Num, Weak: Num> Rc<T, Strong, Weak> {
fn downgrade(&self) -> Weak<T, Strong, Weak> { ... }
}
// no assumptions on Weak here
impl<T, Strong: Copy + Num, Weak> Clone for Rc<T, Strong, Weak> {
fn clone() {
...
}
} Notably this would allow instantiating something like This would presumably also require fn clone(&self) -> Rc<T, Strong, Weak> {
if Strong::max_value() < uint::max_value() {
// use checked_add
} else {
// range is large enough, just add directly
}
} (which trades a little bit of no-opt performance for opt performance.) (NB. this is mostly a chain of thought, so I haven't worked through the details.) |
I'm not sure if this is doable in Rust, but Objective-C does smart pointers (via Automatic Reference Counting) with the reference count stored in the most significant 16-bits of the pointer on 64-bit platforms. That allows them to use a single word to represent a smart pointer since weak references are stored in the object itself (weak references are a bit different in Obj-C since they are tracked by pointer and zeroed when the original goes away). Apparently it also helps with atomic updates to the reference count since everything is in a single word. |
Doesn't that mean they're storing the reference count at the non-shared end of pointers? i.e. copying/cloning one reference isn't reflected in the counts of other references. |
maybe its embedded in their object class info pointer or something |
I think it would be better to consider exposing a subset of the type without weak pointer support before adding another failure case. |
My bad, I obviously was not thinking clearly. They store the ref count in the |
to throw another idea in there, could a compact refcount select an extended area when it reaches a certain value .. probably too much complexity. I think anyone actually selecting a compact refcount would be doing it for a good reason ... eg, I know i'm not going to have over 4billion objects refering to a texture, because the vast majority of the 8gb memory is actually storing textures, not objects) |
Why are two words used anyway? Can't the weak ref and strong ref be 32-bit on 64-bit platforms and 16-bit on 32-bit platforms? EDIT: OK, answer is safety. |
You can certainly make more than 65536 references on 32-bit, and more than 4294967296 on 64-bit. In addition to making the type less scalable, it would break memory safety without adding overflow checks. |
You can easily have more than 65536 references to an object on a 32 bit platform let x = Rc::new(1);
let v = range(0, 1_000_000).map(|_| x.clone()).collect::<Vec<Rc<int>>>(); (You may regard this as unlikely, but it's still entirely possible. |
Creating more than 2^32 references at 8 bytes a pop only requires 32GiB of memory. |
How expensive would it be to just have the refcount overflow check, where the "handling" code is just statically predicted as unlikely? Wouldn't this just be a well predicted jump conditional on the overflow register? Isn't that crazy cheap? Also, this is a cost that's only paid when creating and destroying a bunch of references to something, which shouldn't be especially common. As far as I know, the "best practice" when dealing with refcounts is to borrow it and use the borrowed pointer as much as possible. |
I can see the default useful to most people is uint refcounts. There are situations where you control the number of objects though.. eg. when you explicitely manage textures and numbers of objects to fit within memory and frame-rate budgets There's plenty of situations where you might be using 64bit adress space but 32 or even 16 bits worth of 'count' handles any 'management'. In games the majority of memory is textures, then vertex arrays, and the CPU doesn't traverse these at fine grain, it just tells the GPU what to do with large batches. But back on the xbox360 and ps3 we were kept very busy shaving bytes off control structures to prevent cache misses that crippled the cpu, and fiddling with alignment to keep things on cache-lines boundaries. (... and reworking things to avoid branches.. whch also crippled its pipeline sadly, worst of all worlds - even extra checks wouldn't have been acceptable, you'd have needed to ensure you had the option to compile them out) |
I think if you have a specialty case, you would just make your own |
@cgaebel: Adding new sources of unwinding is never cheap. It breaks many optimization passes all the way up the stack. If it called |
i guess if the language has HKT in future algorithms will be able to abstract over custom pointer types :) |
Would you consider the same thing for vectors, & slices..;
That would end a lot of the pain I was having with casting indices , in the right way. I gather the rust compiler itself has u32 node id's. Its this middle ground of machines with 4,8,16 mb where 64bit addressing is overkill,but 32bits is insufficient and segmenting things into multiple 32 bit spaces per resource works well. u32 indexing would be my most common case I know the servo people also perceive problems with pointer overhead, they want to express a node hierarchy, I would suspect their use case might suit this sort of thing...an array of nodes and 32bit indexing, or 32bit offsets within an arena with a max size of 4gb for the DOM when running on phones .. with objects of 16byte alignment (which you want for SIMD vec4 types) a 32bit index is sufficient to cover 64 gb, and its more likely your memory is divided between different classes of resource anyway i've also heard talk of a 'smallvector' elsewhere. Might parameterizing the index (and allocator) mean the Vec can do that job. |
The small vector optimization is the opposite of what you're proposing. |
is there a link describing the 'small vector' then . i'd also heard slices 'might fill a niche a bit like small vectors', but i think slices can be slices into large vectors... |
Slices and smaller index fields are both unrelated to the small vector optimization. The libc++ implementation of A more general small vector: http://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h |
Merged as RFC 13, but not accepted. This discussion was had previously when the original decision was made to merge |
was the idea of parameterizing the refcount type itself also rejected? i know theres' details with introducing a fail case if you use a smaller refcount type, but this would save the community from implementing their own variations to get the desired behaviour (somethign that will happen many times, independantly). for those of us who target machines with 4-16mb ram , uint counts and indices everywhere are wasteful, and 32bit builds are insufficient |
Fix some typos in tutorial
Add shell completion
No description provided.