Skip to content

Redesing of PooledArray internals #64

Open
@bkamins

Description

@bkamins

Things to do:

  • treat missing as a special value that is not pooled, probably with level 0. This would work the same as in CategoricalArrays.jl; the benefit is that two PooledArrays differing only in the fact if they allow Missing or not could share pool
  • add locking for setindex! but make sure that we support batch operations of adding levels (both in setindex! and in e.g. copyto!); this will allow to fully drop Copy-On-Write and never copy pool and invpool by default; tentatively unsafe_setindex! would be an alternative that does not use lock
  • stress in documentation that using invpool is not safe if potentially other threads are modifying it (this should not be a problem)
  • add droplevels! to DataAPI.jl and to PooledArrays.jl (this requires also a change in CategoricalArrays.jl); this function would reduce pool and invpool to only used levels and also at the same time make a fresh copy of them (as a way to detach pool and invpool between PooledArray-s)

I think this design is better than global pool. It will still cost us a bit in H2O benchmarks, but at least we avoid a global pool that is not reclaimable.

@nalimilan + @quinnj : any additional comments on this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions