Skip to content

[OMNIML-3776]: add clear docs restrict the model types#1105

Open
shengliangxu wants to merge 1 commit intomainfrom
shengliangx/docs-update
Open

[OMNIML-3776]: add clear docs restrict the model types#1105
shengliangxu wants to merge 1 commit intomainfrom
shengliangx/docs-update

Conversation

@shengliangxu
Copy link
Contributor

@shengliangxu shengliangxu commented Mar 23, 2026

What does this PR do?

Our current library does not support loading quantized models and that make QA confusing. Let's clearly document it.

More detail in the NVBug:

https://nvbugspro.nvidia.com/bug/5993598

Summary by CodeRabbit

  • Documentation
    • Clarified input requirements for simulated quantization workflows in LM-Eval-Harness, MMLU, and MT-Bench examples. Model path arguments must reference the original unquantized model, which then undergoes simulated quantization during evaluation.

Our current library does not support loading quantized models and that
make QA confusing. Let's clearly document it.

More detail in the NVBug:

https://nvbugspro.nvidia.com/bug/5993598

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 23, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

Documentation updates to examples/llm_eval/README.md clarifying that simulated quantization workflows require the original unquantized model path as input. The scripts then perform simulated quantization internally before evaluation across LM-Eval-Harness, MMLU, and MT-Bench workflows.

Changes

Cohort / File(s) Summary
Documentation
examples/llm_eval/README.md
Added clarification that input model paths must reference the original unquantized model for LM-Eval-Harness, MMLU, and MT-Bench scripts, with simulated quantization performed internally before evaluation.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions 'restrict the model types' but the actual changes clarify input expectations for simulated quantization workflows by documenting that model paths must reference unquantized models, not restricting/preventing model types. Update the title to accurately reflect the changes, such as 'Clarify documentation for unquantized model inputs in evaluation scripts' or 'Add documentation clarifying model input requirements for quantization workflows'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns ✅ Passed PR contains only documentation changes to README; no Python source files modified in modelopt package or examples.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shengliangx/docs-update

Comment @coderabbitai help to get the list of available commands and usage tips.

@shengliangxu shengliangxu marked this pull request as ready for review March 23, 2026 21:57
@shengliangxu shengliangxu requested a review from a team as a code owner March 23, 2026 21:57
@shengliangxu shengliangxu requested a review from realAsma March 23, 2026 21:57
@github-actions
Copy link
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1105/

Built to branch gh-pages at 2026-03-23 22:00 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
examples/llm_eval/README.md (1)

62-82: Consider adding the same clarification to auto_quantize sections.

The auto_quantize sections (both here and in MMLU at lines 131-142) also perform simulated quantization and use the same scripts (lm_eval_hf.py, mmlu.py) with similar --quant_cfg parameters. For consistency and to fully address the PR objective of reducing confusion, consider adding the same clarification about requiring the original unquantized model as input.

Similarly, the "Customize quantization method for evaluation" section (lines 200-210) could benefit from the same clarification.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_eval/README.md` around lines 62 - 82, Add a short clarifying
sentence to the auto_quantize sections (the blocks referencing lm_eval_hf.py and
mmlu.py and flags like --quant_cfg and --auto_quantize_bits) stating that
auto_quantize performs simulated per-layer quantization and therefore requires
the original unquantized pretrained model as input (not an already-quantized
checkpoint); also add the same clarification to the "Customize quantization
method for evaluation" section that describes using --quant_cfg so readers know
to provide the original model for these simulated quantization workflows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@examples/llm_eval/README.md`:
- Around line 62-82: Add a short clarifying sentence to the auto_quantize
sections (the blocks referencing lm_eval_hf.py and mmlu.py and flags like
--quant_cfg and --auto_quantize_bits) stating that auto_quantize performs
simulated per-layer quantization and therefore requires the original unquantized
pretrained model as input (not an already-quantized checkpoint); also add the
same clarification to the "Customize quantization method for evaluation" section
that describes using --quant_cfg so readers know to provide the original model
for these simulated quantization workflows.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fc0822e5-bb27-48a9-9cb0-8822c4c8258b

📥 Commits

Reviewing files that changed from the base of the PR and between c425524 and 3709acd.

📒 Files selected for processing (1)
  • examples/llm_eval/README.md

@codecov
Copy link

codecov bot commented Mar 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.23%. Comparing base (b61fb4e) to head (3709acd).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1105      +/-   ##
==========================================
- Coverage   70.24%   70.23%   -0.02%     
==========================================
  Files         227      227              
  Lines       25909    25909              
==========================================
- Hits        18201    18198       -3     
- Misses       7708     7711       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shengliangxu shengliangxu requested a review from meenchen March 23, 2026 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant