Description
In my work I find myself making frequent cuda calls to torch_cluster.nearest
with the form nearest(different_every_time, same_every_time)
without providing a batch_x
or batch_y
. different_every_time
is on the order of say (40000, 3)
and same_every_time
is (2000, 3)
. If this could be accelerated an order of magnitude, that would have significant value to me.
Any suggestions?
Does anyone else find themselves in a similar situation?
Do they have a solution?
Would a solution have significant value to the community?
I assume that the strategy would be to pre-compute some kind of tree data structure, and then provide that nearest_with_tree(different_every_time, precomputed_tree_structure)
On CPU I guess this would be a kd-tree and it would be orders and orders of magnitude faster than computing the 40000 * 2000 pairwise distances. On CUDA, I think the current torch_cluster.nearest
is computing all of those distances, but of course the parallelisation and memory access patterns on CUDA might change the game in terms of gains with a tree structure?