cudaMalloc/cudaMallocManaged support

This is pretty awesome, as-is. Thank you so much for this class.

However, I wonder if it would be possible to update this to use cudaMalloc() OR cudaMallocManaged().  Also, in the cudaMallocManaged case, one could choose cudaMemAttachGlobal or cudaMemAttachHost.

I guess if it could handle non-managed allocation, then that would go against the name of the class, though.