Skip to content

Conversation

@dmitry-mikushin
Copy link
Collaborator

@dmitry-mikushin dmitry-mikushin commented Oct 11, 2025

Summary

Fixes #1

This PR restores the HPCG_MEMMGMT functionality that was broken by commit abb79b3 which replaced custom deviceMalloc/deviceFree calls with direct gpuMalloc/gpuFree HIP/CUDA API calls.

The custom memory management system (gpuAllocator_t) requires that all GPU memory be allocated through deviceMalloc to properly manage memory defragmentation via deviceDefrag and reallocation via deviceRealloc.

Changes

1. First commit: Restore HPCG_MEMMGMT core functionality

  • Replaced 7 gpuMallocdeviceMalloc calls in GenerateProblem.cu
  • Replaced 2 gpuMallocdeviceMalloc calls in GenerateCoarseProblem.cu
  • Replaced 5 gpuMallocdeviceMalloc calls in MultiColoring.cu
  • Re-enabled 2 deviceRealloc calls in SparseMatrix.cu (lines 284, 315)
  • Fixed corresponding gpuFreedeviceFree calls in all affected files

2. Second commit: Fix remaining deviceDefrag errors

  • Fixed ExtractDiagonal in SparseMatrix.cu: Changed gpuMallocdeviceMalloc for diag_idx and inv_diag allocations
  • Fixed PermuteVector in Permute.cu: Changed gpuMallocdeviceMalloc for buffer allocation and gpuFreedeviceFree for deallocation

Testing

Tested with nx=104 ny=104 nz=104 on HIP backend:

  • All GPU operations completed successfully
  • No "invalid device pointer" or "invalid argument" errors
  • HPCG benchmark ran to completion with correct results

Test plan

  • Compile with HPCG_MEMMGMT enabled
  • Run HPCG benchmark with test problem size
  • Verify no GPU memory errors occur
  • Verify deviceDefrag and deviceRealloc work correctly

🤖 Generated with Claude Code

dmikushin and others added 2 commits October 11, 2025 16:21
…eallocation

This commit fixes the HPCG_MEMMGMT custom memory management system that was
broken by commit abb79b3. The issue was that gpuMalloc (direct HIP/CUDA API)
was incorrectly used instead of deviceMalloc (custom allocator), causing
"invalid device pointer" errors when deviceRealloc/deviceDefrag tried to
manage memory not allocated through the custom allocator.

Changes:
- Replace gpuMalloc with deviceMalloc in GenerateProblem.cu (7 allocations)
- Replace gpuMalloc with deviceMalloc in GenerateCoarseProblem.cu (2 allocations)
- Replace gpuMalloc with deviceMalloc in MultiColoring.cu (5 allocations)
- Replace gpuFree with deviceFree for all memory allocated via deviceMalloc
- Re-enable deviceRealloc calls in SparseMatrix.cu (previously commented out)

The custom memory management system (gpuAllocator_t) provides performance
optimizations through pre-allocation, memory defragmentation, and improved
memory locality. This fix restores that functionality.

Resolves issue #1

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…Vector

This commit fixes additional "invalid device pointer" errors that occurred
when deviceDefrag tried to manage memory allocated with gpuMalloc instead of
deviceMalloc. These errors were discovered after fixing the initial
HPCG_MEMMGMT issues.

Changes:
- Replace gpuMalloc with deviceMalloc in ExtractDiagonal (SparseMatrix.cu)
  for A.diag_idx and A.inv_diag arrays
- Replace gpuMalloc with deviceMalloc in PermuteVector (Permute.cu) for
  vector buffer allocation
- Replace gpuFree with deviceFree in PermuteVector for memory allocated
  through deviceMalloc (v.d_values)

These arrays are used with deviceDefrag in OptimizeProblem.cu, so they must
be allocated through the custom memory allocator.

After this fix, the program runs successfully without crashes and produces
correct results.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HPCG_MEMMGMT broken by dual-target CUDA/HIP refactoring

2 participants