Redesing of PooledArray internals

Things to do:
* treat `missing` as a special value that is not pooled, probably with level `0`. This would work the same as in CategoricalArrays.jl; the benefit is that two `PooledArrays` differing only in the fact if they allow `Missing` or not could share pool
* add locking for `setindex!` but make sure that we support batch operations of adding levels (both in `setindex!` and in e.g. `copyto!`); this will allow to fully drop Copy-On-Write and never copy pool and invpool by default; tentatively `unsafe_setindex!` would be an alternative that does not use lock
* stress in documentation that using `invpool` is not safe if potentially other threads are modifying it (this should not be a problem)
* add `droplevels!` to DataAPI.jl and to PooledArrays.jl (this requires also a change in CategoricalArrays.jl); this function would reduce pool and invpool to only used levels and also at the same time make a fresh copy of them (as a way to detach pool and invpool between PooledArray-s)

I think this design is better than global pool. It will still cost us a bit in H2O benchmarks, but at least we avoid a global pool that is not reclaimable.

@nalimilan + @quinnj : any additional comments on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Redesing of PooledArray internals #64

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Redesing of PooledArray internals #64

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions