Benchmarks: Micro benchmark - add nvbench based kernel-launch & sleep-kernel#750
Benchmarks: Micro benchmark - add nvbench based kernel-launch & sleep-kernel#750WenqingLan1 wants to merge 44 commits intomicrosoft:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #750 +/- ##
==========================================
+ Coverage 85.70% 86.05% +0.35%
==========================================
Files 102 106 +4
Lines 7703 7926 +223
==========================================
+ Hits 6602 6821 +219
- Misses 1101 1105 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds NVBench-based CUDA GPU micro-benchmarks to SuperBench, including build integration, result parsing, tests, examples, and documentation updates.
Changes:
- Adds NVBench submodule integration and a
cuda_nvbenchthird-party build target. - Introduces two new micro-benchmarks (
nvbench-sleep-kernel,nvbench-kernel-launch) with parsing + unit tests. - Updates Docker images, docs, and CI workflow to support required tooling (notably newer CMake for NVBench).
Reviewed changes
Copilot reviewed 20 out of 23 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| third_party/nvbench | Adds NVBench as a git submodule dependency. |
| third_party/Makefile | Adds cuda_nvbench build/install target and adjusts recipe indentation. |
| tests/data/nvbench_sleep_kernel.log | Adds a sample NVBench sleep-kernel output fixture for parsing tests. |
| tests/data/nvbench_kernel_launch.log | Adds a sample NVBench kernel-launch output fixture for parsing tests. |
| tests/benchmarks/micro_benchmarks/test_nvbench_sleep_kernel.py | Adds unit tests for sleep-kernel preprocess and parsing. |
| tests/benchmarks/micro_benchmarks/test_nvbench_kernel_launch.py | Adds unit tests for kernel-launch preprocess and parsing. |
| superbench/benchmarks/micro_benchmarks/nvbench_sleep_kernel.py | Implements the NVBench sleep-kernel benchmark wrapper + output parser. |
| superbench/benchmarks/micro_benchmarks/nvbench_kernel_launch.py | Implements the NVBench kernel-launch benchmark wrapper + output parser. |
| superbench/benchmarks/micro_benchmarks/nvbench_base.py | Adds a shared NVBench benchmark base class (CLI args, parsing helpers). |
| superbench/benchmarks/micro_benchmarks/nvbench/sleep_kernel.cu | Adds NVBench CUDA benchmark implementing a sleep/busy-wait kernel. |
| superbench/benchmarks/micro_benchmarks/nvbench/kernel_launch.cu | Adds NVBench CUDA benchmark for empty-kernel launch overhead. |
| superbench/benchmarks/micro_benchmarks/nvbench/CMakeLists.txt | Adds CMake build for NVBench-based benchmark executables. |
| superbench/benchmarks/micro_benchmarks/init.py | Exports the new NVBench benchmarks from the micro-benchmarks package. |
| examples/benchmarks/nvbench_sleep_kernel.py | Adds an example runner for the sleep-kernel benchmark. |
| examples/benchmarks/nvbench_kernel_launch.py | Adds an example runner for the kernel-launch benchmark. |
| docs/user-tutorial/benchmarks/micro-benchmarks.md | Documents the new NVBench benchmarks and their metrics. |
| dockerfile/rocm5.0.x.dockerfile | Updates Intel MLC download version used in the ROCm image. |
| dockerfile/cuda13.0.dockerfile | Installs newer CMake and builds cuda_nvbench in the CUDA image. |
| dockerfile/cuda12.9.dockerfile | Installs newer CMake and builds cuda_nvbench in the CUDA image. |
| dockerfile/cuda12.8.dockerfile | Installs newer CMake and builds cuda_nvbench in the CUDA image. |
| .gitmodules | Registers the third_party/nvbench submodule. |
| .gitignore | Ignores compile_commands.json. |
| .github/workflows/codeql-analysis.yml | Upgrades CodeQL actions to v3 and adds CMake setup for the C++ job. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
superbench/benchmarks/micro_benchmarks/nvbench_kernel_launch.py
Outdated
Show resolved
Hide resolved
| DEBIAN_FRONTEND=noninteractive apt-get install -y ffmpeg libavcodec-dev libavformat-dev libavutil-dev libswresample-dev sudo | ||
| DEBIAN_FRONTEND=noninteractive apt-get install -y ffmpeg libavcodec-dev libavformat-dev libavutil-dev libswresample-dev sudo build-essential | ||
| - name: Setup CMake | ||
| uses: lukka/get-cmake@latest |
There was a problem hiding this comment.
Using @latest for third-party GitHub Actions is a supply-chain risk and can lead to non-reproducible CI behavior. Pin this action to a specific tagged version or commit SHA.
| uses: lukka/get-cmake@latest | |
| uses: lukka/get-cmake@v3.20.0 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 23 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 30 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 30 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 29 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 29 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| # assert benchmark.result['duration_us_25_samples'][0] == 10175 | ||
| self.assertAlmostEqual(benchmark.result['duration_us_25_cpu_time'][0], 42.123) | ||
| # self.assertAlmostEqual(benchmark.result['duration_us_25_cpu_noise'][0], 69.78) | ||
| self.assertAlmostEqual(benchmark.result['duration_us_25_gpu_time'][0], 25.321) | ||
| # self.assertAlmostEqual(benchmark.result['duration_us_25_gpu_noise'][0], 0.93) | ||
| # assert benchmark.result['duration_us_25_batch_samples'][0] == 17448 | ||
| self.assertAlmostEqual(benchmark.result['duration_us_25_batch_gpu_time'][0], 23.456) | ||
|
|
||
| # assert benchmark.result['duration_us_50_samples'][0] == 8187 | ||
| # assert benchmark.result['duration_us_75_samples'][0] == 6279 |
There was a problem hiding this comment.
There are several commented-out assertions in this test. If these metrics are intentionally not produced by the benchmark, consider removing the commented assertions; otherwise, consider emitting those metrics (samples/noise) and asserting on them to keep the test expectations complete and avoid stale commented code.
| # assert benchmark.result['duration_us_25_samples'][0] == 10175 | |
| self.assertAlmostEqual(benchmark.result['duration_us_25_cpu_time'][0], 42.123) | |
| # self.assertAlmostEqual(benchmark.result['duration_us_25_cpu_noise'][0], 69.78) | |
| self.assertAlmostEqual(benchmark.result['duration_us_25_gpu_time'][0], 25.321) | |
| # self.assertAlmostEqual(benchmark.result['duration_us_25_gpu_noise'][0], 0.93) | |
| # assert benchmark.result['duration_us_25_batch_samples'][0] == 17448 | |
| self.assertAlmostEqual(benchmark.result['duration_us_25_batch_gpu_time'][0], 23.456) | |
| # assert benchmark.result['duration_us_50_samples'][0] == 8187 | |
| # assert benchmark.result['duration_us_75_samples'][0] == 6279 | |
| self.assertAlmostEqual(benchmark.result['duration_us_25_cpu_time'][0], 42.123) | |
| self.assertAlmostEqual(benchmark.result['duration_us_25_gpu_time'][0], 25.321) | |
| self.assertAlmostEqual(benchmark.result['duration_us_25_batch_gpu_time'][0], 23.456) |
This pull request adds support for NVBench-based GPU micro-benchmarks to SuperBench.
nvbench-sleep-kernelnvbench-kernel-launchExample config: