Refactor: Device module modernization and GPU initialization consolidation #6936

dzzz2001 · 2026-01-30T11:58:38Z

Summary

This PR modernizes the device module architecture, introducing a centralized GPU initialization mechanism and unifying error handling across the codebase. The refactoring improves code maintainability, reduces code duplication, and establishes consistent patterns for GPU resource management.

Motivation

The previous device module had several issues:

Scattered cudaGetDeviceCount calls across multiple files
Inconsistent error checking macros (cudaErrcheck, CAL_CHECK, cufftErrcheck, etc.)
Tightly coupled code with mixed responsibilities in single files
Legacy compatibility code that was no longer needed

Key Changes

1. DeviceContext Singleton

Introduced a DeviceContext singleton class that provides:

Centralized GPU initialization and device selection
Unified access to GPU count and current device information
Single point of control for multi-GPU environments

2. Unified Error Handling (`device_check.h`)

Consolidated all CUDA/HIP/cuFFT error checking into a single header with consistent naming:

CHECK_CUDA() - CUDA runtime API calls
CHECK_CUBLAS() - cuBLAS library calls
CHECK_CUSOLVER() - cuSOLVER library calls
CHECK_CUFFT() - cuFFT library calls
Equivalent CHECK_HIP* macros for ROCm support

3. Modular Architecture

Reorganized device module with single responsibility per file:

device.h/cpp - Core device abstraction and DeviceContext
device_check.h - Error checking macros
device_helpers.h/cpp - Utility functions
kernel_compat.h - Kernel compatibility layer (separated from device.h)
gpu_runtime.h - GPU runtime abstractions

4. Code Cleanup

Removed deprecated and redundant code:

Removed helper_cuda.h and helper_string.h (NVIDIA samples legacy)
Removed CAL_CHECK compatibility alias
Removed cufftGetErrorStringCompat from cuda_compat
Removed redundant cudaGetDeviceCount calls in snap_psibeta_gpu

5. Logic Reorganization

Moved get_device_kpar logic to read_input_item_system for better cohesion
Renamed initialize_gpu_resources to init_snap_psibeta_gpu for clarity
Simplified parakSolve_cusolver using DeviceContext

Files Changed

Major changes in source/source_base/module_device/:

device.h/cpp - Refactored with DeviceContext singleton
device_check.h - New unified error checking header
device_helpers.h/cpp - New helper functions
kernel_compat.h - New separated kernel compatibility
gpu_runtime.h - New GPU runtime abstractions
cuda_compat.cpp/h - Removed deprecated functions
output_device.cpp - Simplified using new architecture

🤖 Generated with Claude Code

…ation - Add DeviceContext singleton class (device_context.h/cpp) to manage GPU device binding with thread-safe initialization using std::mutex - Move GPU initialization from get_device_kpar() side-effect to explicit DeviceContext::init() call in read_input.cpp after INPUT parsing - Use MPI_COMM_TYPE_SHARED for modern node-local rank detection - Update callers to use DeviceContext::instance().get_device_id(): - hsolver_lcao.cpp: parakSolve_cusolver() - diag_cusolvermp.cu: constructor - gint_gpu_vars.cpp: constructor - td_nonlocal_lcao.cpp: remove redundant set_device_by_rank() call This is Phase 1 of GPU resource initialization refactoring, establishing a single entry point for GPU device binding instead of scattered calls. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Phase 2 of GPU device module refactoring: - Remove deprecated functions: stringCmp, get_node_rank(), set_device_by_rank() - Merge device_context.h/cpp into device.h/cpp - Update include statements in dependent files - Remove device_context.cpp from CMakeLists.txt This reduces ~70 lines of dead code and consolidates the device module into fewer files with a unified interface. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Create kernel_compat.h: move atomicAdd polyfill for pre-Pascal GPUs - Create device_helpers.h/cpp: extract get_device_type and get_current_precision templates - Create gpu_runtime.h: unified CUDA/ROCm API macros for portable GPU code - Refactor output_device.cpp: unify duplicated CUDA/ROCm implementations (~150 lines reduced) - Update device.h: clean interface with only DeviceContext and information namespace - Update CMakeLists.txt: add device_helpers.cpp to build Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Utilize DeviceContext for node-local rank and device count - Remove redundant MPI_Comm_split_type and manual CUDA calls - Cleanup unused variables and communicator management

- Remove manual device count check in initialize_gpu_resources - Rely on DeviceContext to handle device availability and binding - Simplify GPU initialization logic in module_rt

- Rename function to reflect its module-specific scope - Update comments to clarify that general GPU setup is handled by DeviceContext - Remove unused finalize_gpu_resources declaration - Update caller in td_nonlocal_lcao.cpp

Remove unconditional include of kernel_compat.h from device.h to properly separate Host/Device code. The kernel_compat.h header contains CUDA-specific code (__device__ keyword) and should only be included by .cu files that actually use atomicAdd. Add explicit includes to the 4 CUDA files that need it: - stress_op.cu - force_op.cu - exx_cal_energy_op.cu - phi_operator_kernel.cu Co-Authored-By: Claude Opus 4.5 <[email protected]>

Move GPU kpar validation from device module to kpar's reset_value in read_input_item_system.cpp. This keeps parameter validation logic with its related input item rather than in the device module. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Update device_check.h with proper error handling (exit on error, stderr output) - Add cuSOLVER error string function with full IRS status codes - Add CHECK_LAST_CUDA_ERROR, CHECK_CUDA_SYNC, CHECK_CAL macros - Add ROCm CHECK_CUSOLVER support with hipsolver - Remove error checking code from module_container/base/macros/cuda.h - Remove error checking macros from helper_cuda.h - Delete helper_cusolver.h (functionality merged into device_check.h) - Update diag_cusolver.cu and diag_cusolvermp.cu to use new macros Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Replace cudaErrcheck -> CHECK_CUDA - Replace cublasErrcheck -> CHECK_CUBLAS - Replace cusolverErrcheck -> CHECK_CUSOLVER - Replace checkCudaErrors -> CHECK_CUDA - Replace CUSOLVER_CHECK -> CHECK_CUSOLVER - Replace CAL_CHECK -> CHECK_CAL - Replace cudaCheckOnDebug -> CHECK_CUDA_SYNC - Replace getLastCudaError -> CHECK_LAST_CUDA_ERROR - Update gpuErrcheck alias in gpu_runtime.h - Remove deprecated compatibility aliases from device_check.h Co-Authored-By: Claude Opus 4.5 <[email protected]>

Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Add _cufftGetErrorString static function to device_check.h - Remove dependency on cuda_compat.h for cufft error strings - Update CHECK_CUFFT macro to use the local function Co-Authored-By: Claude Opus 4.5 <[email protected]>

The function is now unified into device_check.h as _cufftGetErrorString. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Replaced helper_cuda.h with device_check.h in hegvd_op.cu and diag_cusolvermp.cu - Removed helper_cuda.h and helper_string.h as they are no longer used Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Add __CUDA compile definition to MODULE_HSOLVER_LCAO_cusolver test target (fixes undefined CHECK_CUDA/CHECK_CUSOLVER macros after remove_definitions) - Add device_helpers.cpp to pyabacus _base_pack module (fixes undefined symbol get_device_type in PyTest) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Add device_helpers.cpp to hsolver and ModuleNAO pyabacus modules (fixes undefined symbol get_device_type in libdiagopack.so and libnaopack.so) - Add diag(Hamilt*, Psi&, Real*) overload to DiagoCusolver (fixes test compilation error due to interface mismatch) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Always provide both init() and init(MPI_Comm) overloads - init() version for non-MPI builds (local_rank = 0) - init(MPI_Comm) version for MPI builds (uses MPI_COMM_TYPE_SHARED) - Fixes undefined reference error in MODULE_IO_read_item_serial test Co-Authored-By: Claude Opus 4.5 <[email protected]>

dzzz2001 and others added 13 commits January 28, 2026 19:47

Refactor: simplify parakSolve_cusolver using DeviceContext

f578dd4

- Utilize DeviceContext for node-local rank and device count - Remove redundant MPI_Comm_split_type and manual CUDA calls - Cleanup unused variables and communicator management

Refactor: remove redundant cudaGetDeviceCount in snap_psibeta_gpu

47b3092

- Remove manual device count check in initialize_gpu_resources - Rely on DeviceContext to handle device availability and binding - Simplify GPU initialization logic in module_rt

Refactor: remove CAL_CHECK compatibility alias from device_check.h

51e508f

Co-Authored-By: Claude Opus 4.5 <[email protected]>

Refactor: unify cufftGetErrorString into device_check.h

5bae880

- Add _cufftGetErrorString static function to device_check.h - Remove dependency on cuda_compat.h for cufft error strings - Update CHECK_CUFFT macro to use the local function Co-Authored-By: Claude Opus 4.5 <[email protected]>

Refactor: remove cufftGetErrorStringCompat from cuda_compat

1530dfb

The function is now unified into device_check.h as _cufftGetErrorString. Co-Authored-By: Claude Opus 4.5 <[email protected]>

dzzz2001 closed this Jan 30, 2026

Refactor: remove helper_cuda.h and helper_string.h

93a9b04

- Replaced helper_cuda.h with device_check.h in hegvd_op.cu and diag_cusolvermp.cu - Removed helper_cuda.h and helper_string.h as they are no longer used Co-Authored-By: Claude Opus 4.5 <[email protected]>

dzzz2001 reopened this Jan 30, 2026

Merge branch 'develop' into device-refactor

957c076

dzzz2001 changed the title ~~device refactor~~ Refactor: Device module modernization and GPU initialization consolidation Jan 30, 2026

dzzz2001 and others added 4 commits January 30, 2026 21:27

Merge branch 'develop' into device-refactor

a7f30e8

Merge branch 'develop' into device-refactor

0bad958

dzzz2001 marked this pull request as draft January 30, 2026 13:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Device module modernization and GPU initialization consolidation #6936

Refactor: Device module modernization and GPU initialization consolidation #6936

dzzz2001 commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor: Device module modernization and GPU initialization consolidation #6936

Are you sure you want to change the base?

Refactor: Device module modernization and GPU initialization consolidation #6936

Conversation

dzzz2001 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Key Changes

1. DeviceContext Singleton

2. Unified Error Handling (device_check.h)

3. Modular Architecture

4. Code Cleanup

5. Logic Reorganization

Files Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dzzz2001 commented Jan 30, 2026 •

edited

Loading

2. Unified Error Handling (`device_check.h`)