Intially "Understand why moving npar to CPPProcess.h increases the number of registers" [later on: undertood this is int vs size_t]
This is a follow-up to MR #668, which I will probably merge in any case before understanding/fixing this issue
To support processes with P1 and P2 with differnt numbers of particles (e.g gg to tt or ttg), some parameters like npar must be moved from src to P1 (issue #667, addressed in MR #668). However this leads to a higher number of registers in CUDA. This is to be understood.
Note: unfortunately I had the bad idea of mixing this with a change from int to size_t for the parameters. I wonder if this is the cause of the issue?