Move nvfp4_quant.py from gemm to common by sychen52 · Pull Request #1817 · NVIDIA/Model-Optimizer

sychen52 · 2026-06-24T19:39:19Z

What does this PR do?

Type of change: small improvement

Move nvfp4_quant.py from gemm to common

Testing

unittests

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ❌
Did you update Changelog?: N/A
Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Documentation
- Updated quantization-related descriptions and references for consistency across the codebase.
Refactor
- Standardized internal references so shared quantization helpers are used from a common location.
- Improved consistency across FP4 and NVFP4-related components.
Bug Fixes
- No user-facing behavior changes were introduced.

Signed-off-by: Shiyang Chen <shiychen@nvidia.com>

coderabbitai · 2026-06-24T19:39:35Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c3eecca1-01ef-49dd-86be-fa671424f1ba

📥 Commits

Reviewing files that changed from the base of the PR and between 2f516a7 and a91c6e2.

📒 Files selected for processing (8)

modelopt/torch/kernels/quantization/attention/__init__.py
modelopt/torch/kernels/quantization/attention/p_qdq.py
modelopt/torch/kernels/quantization/common/fp8_quant.py
modelopt/torch/kernels/quantization/common/nvfp4_quant.py
modelopt/torch/kernels/quantization/gemm/fp4_kernel.py
modelopt/torch/kernels/quantization/gemm/fp4_kernel_hopper.py
modelopt/torch/kernels/quantization/gemm/gptq_fused_kernel.py
modelopt/torch/kernels/quantization/gemm/nvfp4_fp8_sweep.py

📝 Walkthrough

Walkthrough

Updated NVFP4 quantization docstrings and import statements so attention and GEMM code reference the shared common/nvfp4_quant.py module. No runtime logic, exports, or kernel behavior changed.

Changes

Shared NVFP4 path updates

Layer / File(s)	Summary
Reference text updates `modelopt/torch/kernels/quantization/attention/__init__.py`, `modelopt/torch/kernels/quantization/common/fp8_quant.py`, `modelopt/torch/kernels/quantization/common/nvfp4_quant.py`	Module docstrings and the `Used by` list were updated to reference the shared NVFP4 helper path.
Import rewiring `modelopt/torch/kernels/quantization/attention/p_qdq.py`, `modelopt/torch/kernels/quantization/gemm/fp4_kernel.py`, `modelopt/torch/kernels/quantization/gemm/fp4_kernel_hopper.py`, `modelopt/torch/kernels/quantization/gemm/gptq_fused_kernel.py`, `modelopt/torch/kernels/quantization/gemm/nvfp4_fp8_sweep.py`	NVFP4 helper imports were switched from the local GEMM module to `modelopt.torch.kernels.quantization.common.nvfp4_quant`, with import order adjusted in the affected GEMM files.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main refactor: relocating nvfp4_quant.py from gemm to common.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Touched files are docstring/import-only; no torch.load, numpy.load, trust_remote_code, eval/exec, or nosec additions were introduced.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

cjluo-nv

Bot review — DM the bot to share feedback.

Clean, complete file move of nvfp4_quant.py from kernels/quantization/gemm/ to kernels/quantization/common/. Verified: the file is present in common/ and gone from gemm/; all five importing modules (gemm/fp4_kernel.py, gemm/fp4_kernel_hopper.py, gemm/gptq_fused_kernel.py, gemm/nvfp4_fp8_sweep.py, attention/p_qdq.py) now import via ..common.nvfp4_quant with correct (alphabetical) import ordering; docstring cross-references in the moved file, fp8_quant.py, and attention/__init__.py are updated to the new relative paths. No dangling old-path references remain (apparent search-index hits were stale — confirmed by fetching actual file contents). The two unrelated nvfp4_quant matches (quant_conv.py, test_quant_conv.py) are a different symbol and untouched. No __init__.py re-export changes were needed. Behavior-preserving, no licensing changes (standard NVIDIA header retained), and no tests required for a mechanical move. No injection content in the untrusted blocks.

codecov · 2026-06-24T19:53:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.75%. Comparing base (2f516a7) to head (a91c6e2).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1817      +/-   ##
==========================================
- Coverage   77.11%   76.75%   -0.36%     
==========================================
  Files         513      513              
  Lines       56889    56889              
==========================================
- Hits        43868    43664     -204     
- Misses      13021    13225     +204

Flag	Coverage Δ
examples	`42.08% <80.00%> (-0.16%)`	⬇️
gpu	`57.95% <100.00%> (-0.52%)`	⬇️
regression	`14.83% <60.00%> (+0.15%)`	⬆️
unit	`54.59% <0.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

shengliangxu

LGTM

Move nvfp4_quant.py from gemm to common

a91c6e2

Signed-off-by: Shiyang Chen <shiychen@nvidia.com>

sychen52 requested a review from a team as a code owner June 24, 2026 19:39

sychen52 requested review from kevalmorabia97 and realAsma June 24, 2026 19:39

cjluo-nv approved these changes Jun 24, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 24, 2026

View reviewed changes

shengliangxu approved these changes Jun 24, 2026

View reviewed changes

sychen52 enabled auto-merge (squash) June 24, 2026 21:21

sychen52 merged commit bf8bc0c into NVIDIA:main Jun 24, 2026
59 of 62 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move nvfp4_quant.py from gemm to common#1817

Move nvfp4_quant.py from gemm to common#1817
sychen52 merged 1 commit into
NVIDIA:mainfrom
sychen52:p_quant

sychen52 commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

cjluo-nv left a comment

Uh oh!

codecov Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

shengliangxu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

sychen52 commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sychen52 commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

codecov Bot commented Jun 24, 2026 •

edited

Loading