-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate the (new) _array methods by allowing both parent(u) and parent[u] access? #2015
Comments
It might be worth some perf testing. If doing it this way makes |
Great idea @hyanwong, very clever! I doubt many people are using However, we do need to be careful with performance here. Counterintuitively, doing We could do something like class QuintuplyLInkedTreeArray(np.ndarray):
def __call__(self, index):
return self._get_method(index)
class Tree
def __init__(self, ll_tree):
....
self._parent = self.ll_tree.parent_array.view(QuintuplyLInkedTreeArray)
self._parent._get_method = ll_tree.get_parent
# Ideally we'd pass the get_method in with the contstructor, but not sure if that's possible.
@property
def parent(self):
return self._parent This should be fully compatible and have no loss of perf then? |
So I guess we want to say "if you are accessing elements of the array, you might want to use round braces for speed, rather than access using |
It probably wouldn't matter the vast majority of the time, but yes, we'd end up doing this I'd say. |
Another thing we need to check as well before going through with this - does the subclassed numpy array still work well with numba? Can it get direct memory access from a view, or do we end up invoking the Python interpreter in order to unravel things? |
Taking this out of 0.4.0 for now, as I don't think we want to rush into it. The cat's out of the bag for the |
Here's a bit of a test for numba, based on one of your previous examples. It seems not to make much difference adding the import msprime
import numba
import numpy as np
import tskit
ts1m = msprime.sim_ancestry(1e6, ploidy=1, random_seed=42)
ts = ts1m
class QuintuplyLinkedTreeArray(np.ndarray):
def __call__(self, index):
return self._get_method(index)
tree = ts.first()
time = ts.tables.nodes.time
@numba.njit()
def total_branch_length_numba(root):
tbl = 0
stack = [root]
while len(stack) > 0:
u = stack.pop()
v = left_child[u]
while v != tskit.NULL:
tbl += time[u] - time[v]
stack.append(v)
v = right_sib[v]
return tbl
@numba.njit()
def total_branch_length_numba_recursive(u):
tbl = 0
v = left_child[u]
while v != tskit.NULL:
tbl += (time[u] - time[v]) + total_branch_length_numba_recursive(v)
v = right_sib[v]
return tbl |
It's not obvious that this will do what it looks like it should do, numba could be doing something odd with the references. But, it looks good and I think we should make the change once we've done the due diligence in making sure we're not dropping any performance. Let's pick this up again once 0.4.0 is released and we're thinking about numba on arrays again for the paper. |
Interesting, thanks @hyanwong will do a proper think about this after 0.4.0 |
I wonder if it is time to revisit this, before people start using |
I'm not particularly motivated tbh, there's a lot of critical path stuff we need to get done for the tskit paper first. |
In #1320 we bemoaned the introduction of new
Tree.parent_array[u]
,Tree.left_sib_array[u]
, etc. notation, which are the array-based (direct memory access) equivalent ofTree.parent(u)
,Tree.left_sib(u)
, etc. Nevertheless, this made it into tskit 0.3.6.However, it seems like we can use the
__call__
dunder method to determine what happens when we callclass_instance(u)
. So we could have bothTree.parent(u)
andTree.parent[u]
as valid access methods, by creating an overloaded numpyndarray
class, and returning a numpy array using this as a view:If this doesn't seem too hacky, we could thus have
That would mean all the following are equivalent:
And we would strongly recommend that new code uses the first (standard numpy) method. Using
__call__
to keep the old behaviour too seems a bit hacky, but would mean that old code would (hopefully) switch seamlessly to the new.The text was updated successfully, but these errors were encountered: