Closed as not planned
Description
Thank you for your excellent work.
I am considering whether it would be feasible to further accelerate the process using a GPU (such as CuPy).
As mentioned in #63 , using CuPy should speed up some operators, but my implementation did not observe a significant speed increase (in fact, it significantly decreased). (I simply replaced numpy with cupy in aggregate_numpy.py )
Do you have some ideas on implementations on GPU?