-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesing of PooledArray internals #64
Comments
I'd split this into separate issues as these are quite orthogonal. I think locking is worth investigating, but I'm less convinced by using Storing |
I agree, but I want to first have a roadmap and then implement things separately. However, they are all needed to reach good performance.
The point is that we need to avoid having to reallocate the pool if we go between non-missing and allow-missing arrays. But indeed we could always keep
so how would you "detach" the pool and invpool of one |
What would you say if we decided that |
Alternatively internal |
I have drafted the changes for storing |
Things to do:
missing
as a special value that is not pooled, probably with level0
. This would work the same as in CategoricalArrays.jl; the benefit is that twoPooledArrays
differing only in the fact if they allowMissing
or not could share poolsetindex!
but make sure that we support batch operations of adding levels (both insetindex!
and in e.g.copyto!
); this will allow to fully drop Copy-On-Write and never copy pool and invpool by default; tentativelyunsafe_setindex!
would be an alternative that does not use lockinvpool
is not safe if potentially other threads are modifying it (this should not be a problem)droplevels!
to DataAPI.jl and to PooledArrays.jl (this requires also a change in CategoricalArrays.jl); this function would reduce pool and invpool to only used levels and also at the same time make a fresh copy of them (as a way to detach pool and invpool between PooledArray-s)I think this design is better than global pool. It will still cost us a bit in H2O benchmarks, but at least we avoid a global pool that is not reclaimable.
@nalimilan + @quinnj : any additional comments on this?
The text was updated successfully, but these errors were encountered: