feat: azurelinux add nvidia vgpu driver installation selection#7986
feat: azurelinux add nvidia vgpu driver installation selection#7986
Conversation
8895c0f to
f869d83
Compare
There was a problem hiding this comment.
Pull request overview
Adds NVIDIA GRID (vGPU guest) driver selection for converged A10 SKUs on Azure Linux/Mariner by branching GPU driver installation based on NVIDIA_GPU_DRIVER_TYPE, and extends ShellSpec coverage for the routing logic.
Changes:
- Add
downloadGridDrivers()and route converged SKUs (NVIDIA_GPU_DRIVER_TYPE=grid) to GRID installation indownloadGPUDrivers(). - Add ShellSpec tests validating GRID vs CUDA vs CUDA-open routing behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
parts/linux/cloud-init/artifacts/mariner/cse_install_mariner.sh |
Introduces GRID driver install function and selects GRID vs CUDA driver flow based on NVIDIA_GPU_DRIVER_TYPE. |
spec/parts/linux/cloud-init/artifacts/cse_install_mariner_spec.sh |
Adds tests that validate the new selection/routing behavior without performing real downloads. |
parts/linux/cloud-init/artifacts/mariner/cse_install_mariner.sh
Outdated
Show resolved
Hide resolved
parts/linux/cloud-init/artifacts/mariner/cse_install_mariner.sh
Outdated
Show resolved
Hide resolved
parts/linux/cloud-init/artifacts/mariner/cse_install_mariner.sh
Outdated
Show resolved
Hide resolved
parts/linux/cloud-init/artifacts/mariner/cse_install_mariner.sh
Outdated
Show resolved
Hide resolved
Add GRID(vGPU) driver installation logic for azurelinux with A10 GPU VM sizes Signed-off-by: Mitch Zhu <mitchzhu@microsoft.com>
f869d83 to
9730e58
Compare
| # Converged GPU sizes (NVads_A10_v5, NCads_A10_v4) require NVIDIA GRID (vGPU guest) | ||
| # drivers instead of CUDA drivers. | ||
| GRID_PACKAGE=$(dnf repoquery -y --available "nvidia-vgpu-guest-driver*" | \ | ||
| grep -E "nvidia-vgpu-guest-driver-[0-9]+.*_${KERNEL_VERSION}" | sort -V | tail -n 1) |
There was a problem hiding this comment.
just checked on a node and i think the package has the architecture at the end like so: nvidia-vgpu-guest-driver-0:570.195.03-1_6.6.126.1.1.azl3.x86_64. So i thnk you need another * asterisk at the end.
There was a problem hiding this comment.
I see.
Through AB-E2E I found that
grep regex: nvidia-vgpu-guest-driver-[0-9]+.*_${KERNEL_VERSION} is not anchored with $, so it matches the kernel version substring regardless of what comes after it (.x86_64). This will work with or without the extra asterisk.
Signed-off-by: Mitch Zhu <mitchzhu@microsoft.com>
What this PR does / why we need it:
Converged GPU sizes (NVads_A10_v5, NCads_A10_v4) require NVIDIA GRID vGPU guest drivers instead of standard CUDA drivers. Previously, AzureLinux 3.0 had no GRID driver support and cannot support these sizes. This PR adds azurelinux GRID driver installation logic, routing converged sizes to the GRID driver path based on NVIDIA_GPU_DRIVER_TYPE while leaving the existing cuda/cuda-open selection unchanged for all other GPU SKUs.
Which issue(s) this PR fixes:
Fixes #
Validation: