-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
constuctor to create PooledArray sharing a pool from another PooledArray #69
Comments
This should never be done.
The signature is up to the discussion, the implementation should be something like (probably something more efficient but I am showing that we already almost have it):
and now |
Forgive my ignorance, but renaming a pool entry would corrupt the PooledArray? What would be the idiomatic and safe way to change all occurrences of a single pool item? |
Note though that points described in step 3 are implementation detail that might change in the future. We do not expose this as an official API, as we do copy-on-write of pool, which means that several arrays can share the same pool so exposing such functionality would be very error prone. |
Could you describe your use case? It would make sense to provide a constructor to share pools between arrays, but that would only have an effect on performance. What you seem to be asking for is a way to synchronize arrays (a kind of "spooky action at a distance"? :-D). |
In PopGen.jl the main data struct is
The |
I see. That sounds like a legitimate use case for this, but it's tricky in particular because of thread safety issues. @bkamins worked hard to find a way to implement copy-on-write of the pools, ensuring that while two arrays can share their pools (e.g. with This wouldn't be a problem if we had a way to make pools thread-safe, but that appears to be impossible to do without completely killing performance of That said, if you really don't care about thread safety, it should be relatively easy to allow creating two arrays that totally share their pools. But then your users may get trapped by this if they mutate the columns of the data frames. Or maybe they are not supposed to do it at all? |
It is already possible - just use an inner constructor explicitly.
This is not that bad, as currently we do not allow removing levels from pool - you can only add levels, so essentially only thread safety is an issue. However, this might change in the future, see JuliaData/DataAPI.jl#31. |
In the case where you have 2+ PooledArrays and you want them to share a single pool, it would be useful to have a constructor that's something like
where
bar
is a PooledArray of 100 elements that share thepool
offoo
. Therefore, if one was to changefoo.pool[1] = "zebra"
, every occurrence of"a"
inbar
would become"zebra"
The constructor would look something like
I'm not sure of what a clean and efficient process for that would look like. Perhaps since the pool already exists, replacing every occurrence of a unique element with the corresponding pool index?
The text was updated successfully, but these errors were encountered: