forked from abacusmodeling/abacus-develop
-
Notifications
You must be signed in to change notification settings - Fork 151
Refactor: Device module modernization and GPU initialization consolidation #6936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
dzzz2001
wants to merge
20
commits into
develop
Choose a base branch
from
device-refactor
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,551
−2,793
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ation - Add DeviceContext singleton class (device_context.h/cpp) to manage GPU device binding with thread-safe initialization using std::mutex - Move GPU initialization from get_device_kpar() side-effect to explicit DeviceContext::init() call in read_input.cpp after INPUT parsing - Use MPI_COMM_TYPE_SHARED for modern node-local rank detection - Update callers to use DeviceContext::instance().get_device_id(): - hsolver_lcao.cpp: parakSolve_cusolver() - diag_cusolvermp.cu: constructor - gint_gpu_vars.cpp: constructor - td_nonlocal_lcao.cpp: remove redundant set_device_by_rank() call This is Phase 1 of GPU resource initialization refactoring, establishing a single entry point for GPU device binding instead of scattered calls. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Phase 2 of GPU device module refactoring: - Remove deprecated functions: stringCmp, get_node_rank(), set_device_by_rank() - Merge device_context.h/cpp into device.h/cpp - Update include statements in dependent files - Remove device_context.cpp from CMakeLists.txt This reduces ~70 lines of dead code and consolidates the device module into fewer files with a unified interface. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Create kernel_compat.h: move atomicAdd polyfill for pre-Pascal GPUs - Create device_helpers.h/cpp: extract get_device_type and get_current_precision templates - Create gpu_runtime.h: unified CUDA/ROCm API macros for portable GPU code - Refactor output_device.cpp: unify duplicated CUDA/ROCm implementations (~150 lines reduced) - Update device.h: clean interface with only DeviceContext and information namespace - Update CMakeLists.txt: add device_helpers.cpp to build Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Utilize DeviceContext for node-local rank and device count - Remove redundant MPI_Comm_split_type and manual CUDA calls - Cleanup unused variables and communicator management
- Remove manual device count check in initialize_gpu_resources - Rely on DeviceContext to handle device availability and binding - Simplify GPU initialization logic in module_rt
- Rename function to reflect its module-specific scope - Update comments to clarify that general GPU setup is handled by DeviceContext - Remove unused finalize_gpu_resources declaration - Update caller in td_nonlocal_lcao.cpp
Remove unconditional include of kernel_compat.h from device.h to properly separate Host/Device code. The kernel_compat.h header contains CUDA-specific code (__device__ keyword) and should only be included by .cu files that actually use atomicAdd. Add explicit includes to the 4 CUDA files that need it: - stress_op.cu - force_op.cu - exx_cal_energy_op.cu - phi_operator_kernel.cu Co-Authored-By: Claude Opus 4.5 <[email protected]>
Move GPU kpar validation from device module to kpar's reset_value in read_input_item_system.cpp. This keeps parameter validation logic with its related input item rather than in the device module. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Update device_check.h with proper error handling (exit on error, stderr output) - Add cuSOLVER error string function with full IRS status codes - Add CHECK_LAST_CUDA_ERROR, CHECK_CUDA_SYNC, CHECK_CAL macros - Add ROCm CHECK_CUSOLVER support with hipsolver - Remove error checking code from module_container/base/macros/cuda.h - Remove error checking macros from helper_cuda.h - Delete helper_cusolver.h (functionality merged into device_check.h) - Update diag_cusolver.cu and diag_cusolvermp.cu to use new macros Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Replace cudaErrcheck -> CHECK_CUDA - Replace cublasErrcheck -> CHECK_CUBLAS - Replace cusolverErrcheck -> CHECK_CUSOLVER - Replace checkCudaErrors -> CHECK_CUDA - Replace CUSOLVER_CHECK -> CHECK_CUSOLVER - Replace CAL_CHECK -> CHECK_CAL - Replace cudaCheckOnDebug -> CHECK_CUDA_SYNC - Replace getLastCudaError -> CHECK_LAST_CUDA_ERROR - Update gpuErrcheck alias in gpu_runtime.h - Remove deprecated compatibility aliases from device_check.h Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add _cufftGetErrorString static function to device_check.h - Remove dependency on cuda_compat.h for cufft error strings - Update CHECK_CUFFT macro to use the local function Co-Authored-By: Claude Opus 4.5 <[email protected]>
The function is now unified into device_check.h as _cufftGetErrorString. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Replaced helper_cuda.h with device_check.h in hegvd_op.cu and diag_cusolvermp.cu - Removed helper_cuda.h and helper_string.h as they are no longer used Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add __CUDA compile definition to MODULE_HSOLVER_LCAO_cusolver test target (fixes undefined CHECK_CUDA/CHECK_CUSOLVER macros after remove_definitions) - Add device_helpers.cpp to pyabacus _base_pack module (fixes undefined symbol get_device_type in PyTest) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add device_helpers.cpp to hsolver and ModuleNAO pyabacus modules (fixes undefined symbol get_device_type in libdiagopack.so and libnaopack.so) - Add diag(Hamilt*, Psi&, Real*) overload to DiagoCusolver (fixes test compilation error due to interface mismatch) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Always provide both init() and init(MPI_Comm) overloads - init() version for non-MPI builds (local_rank = 0) - init(MPI_Comm) version for MPI builds (uses MPI_COMM_TYPE_SHARED) - Fixes undefined reference error in MODULE_IO_read_item_serial test Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR modernizes the device module architecture, introducing a centralized GPU initialization mechanism and unifying error handling across the codebase. The refactoring improves code maintainability, reduces code duplication, and establishes consistent patterns for GPU resource management.
Motivation
The previous device module had several issues:
cudaGetDeviceCountcalls across multiple filescudaErrcheck,CAL_CHECK,cufftErrcheck, etc.)Key Changes
1. DeviceContext Singleton
Introduced a
DeviceContextsingleton class that provides:2. Unified Error Handling (
device_check.h)Consolidated all CUDA/HIP/cuFFT error checking into a single header with consistent naming:
CHECK_CUDA()- CUDA runtime API callsCHECK_CUBLAS()- cuBLAS library callsCHECK_CUSOLVER()- cuSOLVER library callsCHECK_CUFFT()- cuFFT library callsCHECK_HIP*macros for ROCm support3. Modular Architecture
Reorganized device module with single responsibility per file:
device.h/cpp- Core device abstraction and DeviceContextdevice_check.h- Error checking macrosdevice_helpers.h/cpp- Utility functionskernel_compat.h- Kernel compatibility layer (separated from device.h)gpu_runtime.h- GPU runtime abstractions4. Code Cleanup
Removed deprecated and redundant code:
helper_cuda.handhelper_string.h(NVIDIA samples legacy)CAL_CHECKcompatibility aliascufftGetErrorStringCompatfrom cuda_compatcudaGetDeviceCountcalls in snap_psibeta_gpu5. Logic Reorganization
get_device_kparlogic toread_input_item_systemfor better cohesioninitialize_gpu_resourcestoinit_snap_psibeta_gpufor clarityparakSolve_cusolverusing DeviceContextFiles Changed
Major changes in
source/source_base/module_device/:device.h/cpp- Refactored with DeviceContext singletondevice_check.h- New unified error checking headerdevice_helpers.h/cpp- New helper functionskernel_compat.h- New separated kernel compatibilitygpu_runtime.h- New GPU runtime abstractionscuda_compat.cpp/h- Removed deprecated functionsoutput_device.cpp- Simplified using new architecture🤖 Generated with Claude Code