[OMNIML-3776]: add clear docs restrict the model types#1105
[OMNIML-3776]: add clear docs restrict the model types#1105shengliangxu wants to merge 1 commit intomainfrom
Conversation
Our current library does not support loading quantized models and that make QA confusing. Let's clearly document it. More detail in the NVBug: https://nvbugspro.nvidia.com/bug/5993598 Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughDocumentation updates to Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
examples/llm_eval/README.md (1)
62-82: Consider adding the same clarification to auto_quantize sections.The auto_quantize sections (both here and in MMLU at lines 131-142) also perform simulated quantization and use the same scripts (
lm_eval_hf.py,mmlu.py) with similar--quant_cfgparameters. For consistency and to fully address the PR objective of reducing confusion, consider adding the same clarification about requiring the original unquantized model as input.Similarly, the "Customize quantization method for evaluation" section (lines 200-210) could benefit from the same clarification.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_eval/README.md` around lines 62 - 82, Add a short clarifying sentence to the auto_quantize sections (the blocks referencing lm_eval_hf.py and mmlu.py and flags like --quant_cfg and --auto_quantize_bits) stating that auto_quantize performs simulated per-layer quantization and therefore requires the original unquantized pretrained model as input (not an already-quantized checkpoint); also add the same clarification to the "Customize quantization method for evaluation" section that describes using --quant_cfg so readers know to provide the original model for these simulated quantization workflows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@examples/llm_eval/README.md`:
- Around line 62-82: Add a short clarifying sentence to the auto_quantize
sections (the blocks referencing lm_eval_hf.py and mmlu.py and flags like
--quant_cfg and --auto_quantize_bits) stating that auto_quantize performs
simulated per-layer quantization and therefore requires the original unquantized
pretrained model as input (not an already-quantized checkpoint); also add the
same clarification to the "Customize quantization method for evaluation" section
that describes using --quant_cfg so readers know to provide the original model
for these simulated quantization workflows.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fc0822e5-bb27-48a9-9cb0-8822c4c8258b
📒 Files selected for processing (1)
examples/llm_eval/README.md
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1105 +/- ##
==========================================
- Coverage 70.24% 70.23% -0.02%
==========================================
Files 227 227
Lines 25909 25909
==========================================
- Hits 18201 18198 -3
- Misses 7708 7711 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Our current library does not support loading quantized models and that make QA confusing. Let's clearly document it.
More detail in the NVBug:
https://nvbugspro.nvidia.com/bug/5993598
Summary by CodeRabbit