Benchmarks: Micro benchmark - add nvbench based kernel-launch & sleep-kernel by WenqingLan1 · Pull Request #750 · microsoft/superbenchmark

WenqingLan1 · 2025-10-09T23:12:33Z

This pull request adds support for NVBench-based GPU micro-benchmarks to SuperBench.

Integrated the NVBench submodule
Implemented two benchmarks
- nvbench-sleep-kernel
- nvbench-kernel-launch
updated documentation and added example scripts

Example config:

version: v0.12
superbench:
  enable:
  # nvbench benchmarks
  - nvbench-sleep-kernel:single
  - nvbench-sleep-kernel:list
  - nvbench-sleep-kernel:range
  - nvbench-sleep-kernel:range-step
  - nvbench-kernel-launch
  var:
    default_local_mode: &default_local_mode
      modes:
      - name: local
        proc_num: 4
        prefix: CUDA_VISIBLE_DEVICES={proc_rank}
        parallel: yes
  benchmarks:
    nvbench-sleep-kernel:single:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "50"                   # Single value format
        timeout: 30
    nvbench-sleep-kernel:list:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "[25,50,75]"         # List format - no spaces after commas
        timeout: 30
    nvbench-sleep-kernel:range:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "[0:5]"           # Range format
        timeout: 30
    nvbench-sleep-kernel:range-step:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "[0:50:10]"         # Range with step format
        timeout: 30
    nvbench-kernel-launch:
      <<: *default_local_mode
      timeout: 300

codecov · 2025-10-10T20:44:21Z

Codecov Report

❌ Patch coverage is 98.20628% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.05%. Comparing base (575859b) to head (fe48e35).

Files with missing lines	Patch %	Lines
...rbench/benchmarks/micro_benchmarks/nvbench_base.py	97.91%	2 Missing ⚠️
...hmarks/micro_benchmarks/nvbench_auto_throughput.py	98.07%	1 Missing ⚠️
...enchmarks/micro_benchmarks/nvbench_sleep_kernel.py	97.67%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #750      +/-   ##
==========================================
+ Coverage   85.70%   86.05%   +0.35%     
==========================================
  Files         102      106       +4     
  Lines        7703     7926     +223     
==========================================
+ Hits         6602     6821     +219     
- Misses       1101     1105       +4

Flag	Coverage Δ
cpu-python3.10-unit-test	`71.72% <98.17%> (+0.76%)`	⬆️
cpu-python3.12-unit-test	`71.72% <98.17%> (+0.76%)`	⬆️
cpu-python3.7-unit-test	`71.22% <98.20%> (+0.79%)`	⬆️
cuda-unit-test	`84.00% <98.17%> (+0.40%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds NVBench-based CUDA GPU micro-benchmarks to SuperBench, including build integration, result parsing, tests, examples, and documentation updates.

Changes:

Adds NVBench submodule integration and a cuda_nvbench third-party build target.
Introduces two new micro-benchmarks (nvbench-sleep-kernel, nvbench-kernel-launch) with parsing + unit tests.
Updates Docker images, docs, and CI workflow to support required tooling (notably newer CMake for NVBench).

Reviewed changes

Copilot reviewed 20 out of 23 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
third_party/nvbench	Adds NVBench as a git submodule dependency.
third_party/Makefile	Adds `cuda_nvbench` build/install target and adjusts recipe indentation.
tests/data/nvbench_sleep_kernel.log	Adds a sample NVBench sleep-kernel output fixture for parsing tests.
tests/data/nvbench_kernel_launch.log	Adds a sample NVBench kernel-launch output fixture for parsing tests.
tests/benchmarks/micro_benchmarks/test_nvbench_sleep_kernel.py	Adds unit tests for sleep-kernel preprocess and parsing.
tests/benchmarks/micro_benchmarks/test_nvbench_kernel_launch.py	Adds unit tests for kernel-launch preprocess and parsing.
superbench/benchmarks/micro_benchmarks/nvbench_sleep_kernel.py	Implements the NVBench sleep-kernel benchmark wrapper + output parser.
superbench/benchmarks/micro_benchmarks/nvbench_kernel_launch.py	Implements the NVBench kernel-launch benchmark wrapper + output parser.
superbench/benchmarks/micro_benchmarks/nvbench_base.py	Adds a shared NVBench benchmark base class (CLI args, parsing helpers).
superbench/benchmarks/micro_benchmarks/nvbench/sleep_kernel.cu	Adds NVBench CUDA benchmark implementing a sleep/busy-wait kernel.
superbench/benchmarks/micro_benchmarks/nvbench/kernel_launch.cu	Adds NVBench CUDA benchmark for empty-kernel launch overhead.
superbench/benchmarks/micro_benchmarks/nvbench/CMakeLists.txt	Adds CMake build for NVBench-based benchmark executables.
superbench/benchmarks/micro_benchmarks/init.py	Exports the new NVBench benchmarks from the micro-benchmarks package.
examples/benchmarks/nvbench_sleep_kernel.py	Adds an example runner for the sleep-kernel benchmark.
examples/benchmarks/nvbench_kernel_launch.py	Adds an example runner for the kernel-launch benchmark.
docs/user-tutorial/benchmarks/micro-benchmarks.md	Documents the new NVBench benchmarks and their metrics.
dockerfile/rocm5.0.x.dockerfile	Updates Intel MLC download version used in the ROCm image.
dockerfile/cuda13.0.dockerfile	Installs newer CMake and builds `cuda_nvbench` in the CUDA image.
dockerfile/cuda12.9.dockerfile	Installs newer CMake and builds `cuda_nvbench` in the CUDA image.
dockerfile/cuda12.8.dockerfile	Installs newer CMake and builds `cuda_nvbench` in the CUDA image.
.gitmodules	Registers the `third_party/nvbench` submodule.
.gitignore	Ignores `compile_commands.json`.
.github/workflows/codeql-analysis.yml	Upgrades CodeQL actions to v3 and adds CMake setup for the C++ job.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

superbench/benchmarks/micro_benchmarks/nvbench_sleep_kernel.py

superbench/benchmarks/micro_benchmarks/nvbench_kernel_launch.py

superbench/benchmarks/micro_benchmarks/nvbench_base.py

superbench/benchmarks/micro_benchmarks/nvbench/sleep_kernel.cu

tests/benchmarks/micro_benchmarks/test_nvbench_sleep_kernel.py

superbench/benchmarks/micro_benchmarks/__init__.py

Copilot · 2026-01-23T00:05:47Z

.github/workflows/codeql-analysis.yml

-          DEBIAN_FRONTEND=noninteractive apt-get install -y ffmpeg libavcodec-dev libavformat-dev libavutil-dev libswresample-dev sudo
+          DEBIAN_FRONTEND=noninteractive apt-get install -y ffmpeg libavcodec-dev libavformat-dev libavutil-dev libswresample-dev sudo build-essential
+      - name: Setup CMake
+        uses: lukka/get-cmake@latest


Using @latest for third-party GitHub Actions is a supply-chain risk and can lead to non-reproducible CI behavior. Pin this action to a specific tagged version or commit SHA.

Suggested change

uses: lukka/get-cmake@latest

uses: lukka/get-cmake@v3.20.0

third_party/Makefile

Copilot

Pull request overview

Copilot reviewed 20 out of 23 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dockerfile/cuda12.9.dockerfile

examples/benchmarks/nvbench_kernel_launch.py

Copilot

Pull request overview

Copilot reviewed 26 out of 30 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

third_party/Makefile

superbench/benchmarks/micro_benchmarks/nvbench/auto_throughput.cu

…launch

Copilot

Pull request overview

Copilot reviewed 26 out of 30 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dockerfile/cuda13.0.dockerfile

.gitmodules

superbench/benchmarks/micro_benchmarks/__init__.py

dockerfile/rocm5.0.x.dockerfile

Copilot

Pull request overview

Copilot reviewed 25 out of 29 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

third_party/Makefile

superbench/benchmarks/micro_benchmarks/nvbench_base.py

tests/benchmarks/micro_benchmarks/test_nvbench_base.py

dockerfile/cuda12.9.dockerfile

tests/benchmarks/micro_benchmarks/test_nvbench_sleep_kernel.py

Copilot

Pull request overview

Copilot reviewed 25 out of 29 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-10T21:34:24Z

tests/benchmarks/micro_benchmarks/test_nvbench_sleep_kernel.py

+        # assert benchmark.result['duration_us_25_samples'][0] == 10175
+        self.assertAlmostEqual(benchmark.result['duration_us_25_cpu_time'][0], 42.123)
+        # self.assertAlmostEqual(benchmark.result['duration_us_25_cpu_noise'][0], 69.78)
+        self.assertAlmostEqual(benchmark.result['duration_us_25_gpu_time'][0], 25.321)
+        # self.assertAlmostEqual(benchmark.result['duration_us_25_gpu_noise'][0], 0.93)
+        # assert benchmark.result['duration_us_25_batch_samples'][0] == 17448
+        self.assertAlmostEqual(benchmark.result['duration_us_25_batch_gpu_time'][0], 23.456)
+
+        # assert benchmark.result['duration_us_50_samples'][0] == 8187
+        # assert benchmark.result['duration_us_75_samples'][0] == 6279


There are several commented-out assertions in this test. If these metrics are intentionally not produced by the benchmark, consider removing the commented assertions; otherwise, consider emitting those metrics (samples/noise) and asserting on them to keep the test expectations complete and avoid stale commented code.

Suggested change

# assert benchmark.result['duration_us_25_samples'][0] == 10175

self.assertAlmostEqual(benchmark.result['duration_us_25_cpu_time'][0], 42.123)

# self.assertAlmostEqual(benchmark.result['duration_us_25_cpu_noise'][0], 69.78)

self.assertAlmostEqual(benchmark.result['duration_us_25_gpu_time'][0], 25.321)

# self.assertAlmostEqual(benchmark.result['duration_us_25_gpu_noise'][0], 0.93)

# assert benchmark.result['duration_us_25_batch_samples'][0] == 17448

self.assertAlmostEqual(benchmark.result['duration_us_25_batch_gpu_time'][0], 23.456)

# assert benchmark.result['duration_us_50_samples'][0] == 8187

# assert benchmark.result['duration_us_75_samples'][0] == 6279

self.assertAlmostEqual(benchmark.result['duration_us_25_cpu_time'][0], 42.123)

self.assertAlmostEqual(benchmark.result['duration_us_25_gpu_time'][0], 25.321)

self.assertAlmostEqual(benchmark.result['duration_us_25_batch_gpu_time'][0], 23.456)

WenqingLan1 and others added 15 commits July 22, 2025 16:03

add nvbench kernel launch

741ee98

submodule update

0ae7864

init sleep kernel

35bfb61

Merge branch 'microsoft:main' into feat/third_party/nvbench

66b4786

Merge branch 'microsoft:main' into feat/third_party/nvbench

82aed0c

Merge branch 'microsoft:main' into feat/third_party/nvbench

24ee0a5

test sleep kernel

bd87f50

add sm 103

a663db6

add arg parsing logic

32fe197

Merge branch 'microsoft:main' into feat/third_party/nvbench

76562dc

add arg parsing tests

3eb5525

refactor

4785fe6

refine logic - remove gpu_id

1fb7c05

add doc

83c442c

refine regex & update nvbench submodule

4b274c4

WenqingLan1 requested a review from a team as a code owner October 9, 2025 23:12

WenqingLan1 added benchmarks SuperBench Benchmarks micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks labels Oct 9, 2025

WenqingLan1 added 8 commits October 10, 2025 16:48

update cmake

0cf48bb

fix lint

5905647

fix lint

baa57c9

fix import

ecce2d9

fix

3a58ead

fix

d0d8773

fix

fbb5969

fix

f007745

WenqingLan1 added 3 commits October 10, 2025 21:23

fix

b6b6082

fix

0f2c838

fix

5bd20f6

polarG requested a review from Copilot January 23, 2026 00:00

Copilot AI reviewed Jan 23, 2026

View reviewed changes

WenqingLan1 added 2 commits February 3, 2026 14:14

Merge branch 'microsoft:main' into feat/third_party/nvbench

0902eef

Merge branch 'microsoft:main' into feat/third_party/nvbench

498d551

Copilot AI review requested due to automatic review settings February 6, 2026 00:03

Copilot AI reviewed Feb 6, 2026

View reviewed changes

dockerfile/cuda12.9.dockerfile Show resolved Hide resolved

examples/benchmarks/nvbench_kernel_launch.py Show resolved Hide resolved

WenqingLan1 added 2 commits February 6, 2026 11:03

fix comments

0804c12

add auto throughput benchmark

c1d1e43

Copilot AI review requested due to automatic review settings February 18, 2026 22:43

Copilot AI reviewed Feb 18, 2026

View reviewed changes

third_party/Makefile Show resolved Hide resolved

third_party/Makefile Show resolved Hide resolved

superbench/benchmarks/micro_benchmarks/nvbench/auto_throughput.cu Show resolved Hide resolved

WenqingLan1 added 2 commits February 20, 2026 15:11

refined logic & fix bug

c34591d

add comment to clarify diff between nvbench-kernel-launch and kernel-…

68f5c7d

…launch

Copilot AI review requested due to automatic review settings February 26, 2026 22:04

Copilot AI reviewed Feb 26, 2026

View reviewed changes

polarG reviewed Mar 5, 2026

View reviewed changes

dockerfile/cuda13.0.dockerfile Outdated Show resolved Hide resolved

.gitmodules Show resolved Hide resolved

superbench/benchmarks/micro_benchmarks/__init__.py Show resolved Hide resolved

polarG reviewed Mar 5, 2026

View reviewed changes

dockerfile/rocm5.0.x.dockerfile Outdated Show resolved Hide resolved

microsoft deleted a comment from Copilot AI Mar 10, 2026

WenqingLan1 added 2 commits March 10, 2026 10:54

resolve comments

0bde332

fix lint

7c456cf

Copilot AI review requested due to automatic review settings March 10, 2026 20:55

Copilot AI reviewed Mar 10, 2026

View reviewed changes

WenqingLan1 added 2 commits March 10, 2026 14:18

fix pipeline & resolve comments

9643150

fix lint

f1a3b6d

Copilot AI review requested due to automatic review settings March 10, 2026 21:31

Copilot AI reviewed Mar 10, 2026

View reviewed changes

microsoft deleted a comment from Copilot AI Mar 10, 2026

fix test

fe48e35

Conversation

WenqingLan1 commented Oct 9, 2025

Uh oh!

codecov bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Oct 10, 2025 •

edited

Loading