Skip to content

WIP: Add Faux allocator API #19320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Nov 25, 2014

This adds a alloc::faux module that implements an interpretation of @pnkfelix's proposed high-level allocator API (rust-lang/rfcs#244), optimized for the libstd usecases. This moves all of the tedious oom/zst/capacity-overflow logic into a central place, so that collections and other things don't need to think about it.

As part of this PR I have migrated RingBuf to this API as a proof of concept. Migrating Vec will likely be more tricky, because it does a lot more special tricks based on ZSTs, making the logic more tangled. It is possible that migrating to this API would cause a perf regression for duplicated checks/extra branching. However most of the checks are largely based on compile-time-constants, so I am optimistic about LLVM's ability to optimize these to death.

I guessed out the best semantics I could for some iffy things, like how the alloc_array API should handle len = 0.

I also am considering adding triple_alloc_array/triple_dealloc_array methods to handle the HashMap usecase (which BTree can also benefit from longterm).

@rust-highfive
Copy link
Contributor

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!

@pnkfelix
Copy link
Member

cc me

@Gankra
Copy link
Contributor Author

Gankra commented Nov 25, 2014

CC @thestinger @aturon @huonw

@Gankra
Copy link
Contributor Author

Gankra commented Nov 25, 2014

@pczarn looking at HashMap now, you only store the one pointer in the RawTable. Did you look at the relative perf of keeping the ptrs around? Would like the API to just return/take 3 ptrs. Not expose the details.

// else UB anyway.
let result_size = heap::reallocate_inplace(ptr as *mut u8, size * old_len,
desired_size, align);
if result_size == usable_size(desired_size, align) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add a lot of overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was under the impression it was necessary to check. Or is it only for shrinks? If that's the case, it might make sense to split this into grow and shrink?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Growth only requires a check for new_size >= requested_size. The usable_size check is only necessary to see if shrinking would have succeeded if heap::reallocate had been used instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is it reasonable to forbid new_len == old_len?
  • Is it then possible to just check new_size != old_size unconditionally?

The docs seem unclear to me on how to evaluate if a shrink is successful, whatever that really means. Or is that just something we shouldn't care about? Like if a shrink fails "oh well"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to forbid new_len == old_len?

That's not going to work around the issue. There's a whole range of sizes corresponding to the same real size.

Is it then possible to just check new_size != old_size unconditionally?

No, because the returned size is the real size.

The docs seem unclear to me on how to evaluate if a shrink is successful, whatever that really means. Or is that just something we shouldn't care about? Like if a shrink fails "oh well"?

You can check for a shrink failure with the usable_size check. I don't think it's a common use case.

@thestinger
Copy link
Contributor

This would significantly regress vector performance with all of the extra branches / code bloat so it's not going to be useful there. I don't think it makes sense for hash tables and ring buffers to use the low-level allocator API if they're going to be accepting needless overhead.

It would make more sense to reuse Vec<T> (perhaps stored as Box<[T]>) with a call to set_len(0) at the end. It's already a relatively low-level wrapper around buffer allocation / reallocation.

@Gankra
Copy link
Contributor Author

Gankra commented Nov 26, 2014

How can HashMap reasonably use Vec to request the kind of buffer it wants? It wants a single buffer to store three different arrays of different kinds of elements.

Also I don't think RingBuf should ultimately use Vec as it wants finer control over reallocation than Vec can provide (though it doesn't do this yet).

@Gankra
Copy link
Contributor Author

Gankra commented Nov 26, 2014

Benches:

On my crappy laptop (x64 windows in mingw):

test bench::ring_buf::bench_faux::bench_push_10        ... bench:       209 ns/iter (+/- 6)
test bench::ring_buf::bench_faux::bench_push_100       ... bench:      1323 ns/iter (+/- 20)
test bench::ring_buf::bench_faux::bench_push_10000     ... bench:    120749 ns/iter (+/- 9442)
test bench::ring_buf::bench_faux::bench_push_cap_10    ... bench:       134 ns/iter (+/- 19)
test bench::ring_buf::bench_faux::bench_push_cap_100   ... bench:       971 ns/iter (+/- 46)
test bench::ring_buf::bench_faux::bench_push_cap_10000 ... bench:    109398 ns/iter (+/- 51145)

test bench::ring_buf::bench_old::bench_push_10         ... bench:       223 ns/iter (+/- 8)
test bench::ring_buf::bench_old::bench_push_100        ... bench:      1387 ns/iter (+/- 42)
test bench::ring_buf::bench_old::bench_push_10000      ... bench:    122797 ns/iter (+/- 3298)
test bench::ring_buf::bench_old::bench_push_cap_10     ... bench:       142 ns/iter (+/- 8)
test bench::ring_buf::bench_old::bench_push_cap_100    ... bench:      1016 ns/iter (+/- 14)
test bench::ring_buf::bench_old::bench_push_cap_10000  ... bench:    111685 ns/iter (+/- 4981)

No perf regression visible. Benches visible here, tested with cargo bench. You can run them yourself by pulling down this branch and using the latest nightly.

Note that this is with the len == 0 check removed as per the version of faux in that branch.

Feel free to suggest changes to my benches, or run it on your own hardware.

@Gankra
Copy link
Contributor Author

Gankra commented Nov 27, 2014

Removed the len = 0 checks in favour of UB. doc'd UB better. Tentatively added try_grow_inplace and try_shrink_inplace in favour of realloc_array_inplace. Not clear if we have any usecases where the caller is unclear if they want to grow/shrink the allocation.

EMPTY as *mut T
} else {
let ptr = heap::allocate(size, min_align_of::<T>()) as *mut T;
if ptr == null_mut() { ::oom(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptr.is_null()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

derp

@cgaebel
Copy link
Contributor

cgaebel commented Dec 3, 2014

LGTM. Although ideally I'd like to see that hashmap+btree integration instead of abstracting with only one consumer.

@thestinger
Copy link
Contributor

I don't think it's going to work for HashMap. It doesn't allocate N elements of the same type in a single array allocation.

@thestinger
Copy link
Contributor

I don't see why this should abort but the heap API should not. It doesn't make any sense. This crate is meant to be usable in cases where allocation errors need to be handled. Ignoring allocation failure by aborting is not supposed to be convenient. It is not a good way of handling the error, and the fact that the standard libraries approach it that way is a problem. It doesn't make sense to encode that anti-pattern into this low-level crate.

@thestinger
Copy link
Contributor

The fact that the higher level crates cannot be used in cases where robustness in the situation of memory exhaustion is required is not an excuse to ruin more of this crate too. It's a problem that should be addressed, not a good design pattern to emulate.

@cgaebel
Copy link
Contributor

cgaebel commented Dec 3, 2014

The work as-is isn't enough for HashMap, but is a valuable starting point. @gankro is still working on a nice API for that use case.

meta: "ruin" is a little melodramatic, don'tcha think?

@thestinger
Copy link
Contributor

I don't think it's melodramatic to refer to objectively bad design that way. It's not acceptable to abort on runtime errors. In general, out-of-memory is a runtime error. There are many situations where it needs to be handled and this crate is intended to be used there. You can't write truly robust code with the higher level standard libraries.

@thestinger
Copy link
Contributor

Simply aborting on a runtime error and then pointing out that the API is more convenient is lazy. There are two or three places in the standard libraries where it's going to be used, and explicitly aborting there is not a big deal. On the other hand, it makes the library totally inappropriate for use cases like a kernel or embedded where robustness is required.

@Gankra
Copy link
Contributor Author

Gankra commented Dec 3, 2014

@thestinger Would an in-between API that returns a nullable ptr that this API wraps, checks, and panics for be acceptable?

@Gankra
Copy link
Contributor Author

Gankra commented Dec 3, 2014

Basically I agree that always panicing/aborting isn't appropriate for all usecases, but it's a common enough strategy that having some API that does it for you seems reasonable to me.

@thestinger
Copy link
Contributor

I don't think it's a common use case. There are not meant to be many users of this API, and the fact that more and more code is calling it directly is bad design. There isn't a good excuse for reinventing the same wheels over and over via low-level unsafe code when there's no performance overhead to reusing a safe or semi-safe abstraction.

@thestinger
Copy link
Contributor

There should only be one user of this API, like a version of Box<[T]> not automatically destroying the inner elements and not exposing a way to read the elements beyond a method returning a raw pointer. It can safely allow writes to them.

@thestinger
Copy link
Contributor

(and with only one user of the API, it doesn't make sense to have it)

@thestinger
Copy link
Contributor

Even the existence of a heap::reallocate function making the new allocation, copying to it and deallocating the original is questionable. It's there because of the C standard library legacy, but in Vec<T> it probably makes more sense to do the copy by hand so it doesn't copy the uninitialized slack at the end from size class rounding or unusable reserved space. It has a big impact (up to 40%) on string / vector concatenation due to the power of 2 rounding.

@thestinger
Copy link
Contributor

The same thing applies to every other user of heap::reallocate that I can think of. The only advantage of the heap::reallocate API is the fact that it can potentially avoid some duplicate API entry point costs.

@thestinger
Copy link
Contributor

The only reason I can think of is the ability to use mremap, but TCMalloc / jemalloc already can't use it and a better mremap API like what I proposed would be usable by the caller.

@eddyb
Copy link
Member

eddyb commented Dec 10, 2014

To enforce non-zero length values, you could use @luqmana's NonZero - given the intended optimizations for it, Option<NonZero<uint>> should behave just like an uint where 0 is special wrt pattern-matching - for which responsibility would be moved to the callee without introducing unsafety.

@Gankra
Copy link
Contributor Author

Gankra commented Dec 18, 2014

Closing this for now. @pnkfelix would rather a more robust longterm solution. I have moved this to my raw-rs project.

@Gankra Gankra closed this Dec 18, 2014
lnicola pushed a commit to lnicola/rust that referenced this pull request Apr 28, 2025
Observe unsafeness when generating manual impls of former derives
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants