Description
Following are a list of suggested changes to the Python Nways materials as suggested by Robert Searles and Jonathan Dursi
JIT kernels
• Can we move this before CUDA kernels?
• Maybe add Numba Vectorize as an introduction? the following flow: Vectorize -> JIT -> CuPy CUDA makes more sense than CuPy CUDA -> JIT
• In fact, is the order of cupy then numba the right way to go? Can we flip those sections?
Numba notebook:
Exercise 1
• Again, exercise is too easy; students will just copy and paste. Could we make them change it to float, and multiply? Or some slightly deeper change?
Thread re-use - this comes out of nowhere
Matrix multiply:
• Same idea, could we do a naïve matrix transpose instead?
Numba vectorize/ufuncs
• This seems out of place. It doesn't make sense to me to have this come before Numba CUDA kernels and interrupting the flow between numba cuda kernels and atomics
Atomic
• It would be nice if the atomic example for a reduction built on an earlier example, say calculating average matrix element after the multiplication or something