Skip to content

feat: azurelinux add nvidia vgpu driver installation selection#7986

Merged
miz060 merged 3 commits intomainfrom
mitchzhu/azl-grid-gpu_driver-pr
Mar 11, 2026
Merged

feat: azurelinux add nvidia vgpu driver installation selection#7986
miz060 merged 3 commits intomainfrom
mitchzhu/azl-grid-gpu_driver-pr

Conversation

@miz060
Copy link
Member

@miz060 miz060 commented Feb 27, 2026

What this PR does / why we need it:
Converged GPU sizes (NVads_A10_v5, NCads_A10_v4) require NVIDIA GRID vGPU guest drivers instead of standard CUDA drivers. Previously, AzureLinux 3.0 had no GRID driver support and cannot support these sizes. This PR adds azurelinux GRID driver installation logic, routing converged sizes to the GRID driver path based on NVIDIA_GPU_DRIVER_TYPE while leaving the existing cuda/cuda-open selection unchanged for all other GPU SKUs.

Which issue(s) this PR fixes:

Fixes #

Validation:

Copilot AI review requested due to automatic review settings February 27, 2026 20:59
@miz060 miz060 force-pushed the mitchzhu/azl-grid-gpu_driver-pr branch from 8895c0f to f869d83 Compare February 27, 2026 21:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds NVIDIA GRID (vGPU guest) driver selection for converged A10 SKUs on Azure Linux/Mariner by branching GPU driver installation based on NVIDIA_GPU_DRIVER_TYPE, and extends ShellSpec coverage for the routing logic.

Changes:

  • Add downloadGridDrivers() and route converged SKUs (NVIDIA_GPU_DRIVER_TYPE=grid) to GRID installation in downloadGPUDrivers().
  • Add ShellSpec tests validating GRID vs CUDA vs CUDA-open routing behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
parts/linux/cloud-init/artifacts/mariner/cse_install_mariner.sh Introduces GRID driver install function and selects GRID vs CUDA driver flow based on NVIDIA_GPU_DRIVER_TYPE.
spec/parts/linux/cloud-init/artifacts/cse_install_mariner_spec.sh Adds tests that validate the new selection/routing behavior without performing real downloads.

Add GRID(vGPU) driver installation logic for azurelinux with A10 GPU VM
sizes

Signed-off-by: Mitch Zhu <mitchzhu@microsoft.com>
@miz060 miz060 force-pushed the mitchzhu/azl-grid-gpu_driver-pr branch from f869d83 to 9730e58 Compare March 6, 2026 01:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

# Converged GPU sizes (NVads_A10_v5, NCads_A10_v4) require NVIDIA GRID (vGPU guest)
# drivers instead of CUDA drivers.
GRID_PACKAGE=$(dnf repoquery -y --available "nvidia-vgpu-guest-driver*" | \
grep -E "nvidia-vgpu-guest-driver-[0-9]+.*_${KERNEL_VERSION}" | sort -V | tail -n 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checked on a node and i think the package has the architecture at the end like so: nvidia-vgpu-guest-driver-0:570.195.03-1_6.6.126.1.1.azl3.x86_64. So i thnk you need another * asterisk at the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
Through AB-E2E I found that
grep regex: nvidia-vgpu-guest-driver-[0-9]+.*_${KERNEL_VERSION} is not anchored with $, so it matches the kernel version substring regardless of what comes after it (.x86_64). This will work with or without the extra asterisk.

@miz060 miz060 merged commit abede5c into main Mar 11, 2026
33 of 34 checks passed
@miz060 miz060 deleted the mitchzhu/azl-grid-gpu_driver-pr branch March 11, 2026 00:06
janenotjung-hue pushed a commit that referenced this pull request Mar 11, 2026
Signed-off-by: Mitch Zhu <mitchzhu@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants