Vecpar implementation #1

georgi-mania · 2022-12-21T14:09:41Z

Provide implementations for scenarios

run in parallel using vecpar offloading lambda (CUDA and OpenMP)
run in parallel using single-source algorithms (CUDA and OpenMP)
ability to choose memory resource for GPU (managed-memory AND host-device memory)
ability to choose different offloading backend (CUDA vs OpenMP target) -- requires vecpar::map/mmap/map_reduce with 1,2,3 collections
scripts for automatic testing (update/adapt existing)
BUG - dot kernel fails for 100 repetitions (works for 50); I assume some chunks on memory are not released

initial commit

57671b0

georgi-mania self-assigned this Dec 21, 2022

georgi-mania added 3 commits February 27, 2023 17:24

add vecpar ompt

8c0114a

update vecpar functionality

0587972

add new kernel for vecpar-cuda-mm case

967a7df

Provide feedback