### Summary This issue asks to merge norm and encode into a kernel method for less kernel launch overhead and global memory read/write (in norm buffer).