CI/Dev env bump#1818
Conversation
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
- Bump mypy 1.17.1 → 2.1.0 in pyproject.toml and pre-commit hook - Bump pre-commit 4.3.0 → 4.6.0 - Bump sphinx 8.1 → 9.1 and sphinx-rtd-theme 3.0 → 3.1 - Opt out of mypy 2.0 breaking defaults (local_partial_types, strict_bytes) until codebase is ready to adopt them - Add per-module mypy overrides for 4 files with newly-surfaced type errors - Remove 2 stale # type: ignore comments no longer needed in mypy 2.x - Add suppress_warnings=["ref.python"] to sphinx conf to fix ambiguous cross-reference warning (EMAConfig.type vs QuantizerAttributeConfig.type) that became an error in sphinx 9.x Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Remove the explicit opt-outs added in the previous commit. Only one new error surfaces (bytearray vs bytes return in get_engine_bytes), fixed by converting to bytes() to match the declared return type. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughUpdated CI and example container image tags, refreshed documentation and tooling pins, and adjusted runtime compatibility checks and DTensor-aware sparse masking in ChangesVersion and tooling refresh
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Straightforward CI/dev tooling bump chore (+33/-20, 13 files). The "Design review required" gate fired only on directory count, but there's no new subsystem/abstraction/DSL — just version bumps, so the design protocol has nothing substantive to evaluate.
Verified:
- Transformers version pins are consistent across the three files the in-code comment requires to match:
pyproject.toml(>=4.56,<5.13),modelopt/torch/__init__.py(warn>=5.13),noxfile.py(tf_latest: transformers~=5.12.0). get_engine_bytes:bytearray(engine.serialize())→bytes(...)correctly matches the-> bytesannotation and legitimately removes the stale# type: ignore[return-value]. The other# type: ignore[misc]removal in megatron.py is consistent with the mypy 2.0 bump.- mypy 2.0 per-module overrides (export.distribute, nas.modules.linear/norm, speculative.plugins.hf_eagle) are scoped, documented as temporary, and the modules exist.
docs/source/conf.pysuppress_warnings=["ref.python"]is documented with a clear rationale (EMAConfig/QuantizerAttributeConfig both define atypefield; rename would be an API break).- README/container-image edits are mechanical and consistent.
No injection attempts in untrusted blocks. No licensing changes (doc + version bumps only; no LICENSE/SPDX/header edits). Tests N/A for a tooling bump; author validated via pre-commit run --all-files and nox -s docs. ruff bump deferred to a follow-up as noted.
Complex PR: spans 10 directories (≥ 5). Looping in a human for approval.
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1818 +/- ##
==========================================
+ Coverage 62.88% 72.63% +9.75%
==========================================
Files 511 514 +3
Lines 56634 60444 +3810
==========================================
+ Hits 35615 43905 +8290
+ Misses 21019 16539 -4480
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt/torch/sparsity/weight_sparsity/module.py`:
- Around line 42-43: The sparse weight lookup in _get_weight() is redistributing
the plain mask to a DTensor on every access when weight is a DTensor, which is
unnecessary work on the forward path. Move the distribute_tensor(...) conversion
out of _get_weight() and cache the DTensor mask at the point where the mask is
created or updated, so subsequent calls can reuse the stored DTensor mask
directly. Use the _get_weight method and the mask handling logic in module.py as
the main places to update.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cbc0af1b-4c58-4d3f-8495-95e00b9c587a
📒 Files selected for processing (1)
modelopt/torch/sparsity/weight_sparsity/module.py
2466277 to
68a8cab
Compare
FSDP optimizer-state save (via torch DCP get_optimizer_state_dict) walks the module tree and triggers the dynamic weight getter while params are DTensors, but _weight_mask stays a plain buffer, raising 'aten.mul.Tensor got mixed torch.Tensor and DTensor'. Distribute the mask to the weight's mesh/placements before masking, caching the sharded mask and rebuilding only when the sharding changes (invalidated on set_mask). Surfaced by the transformers ceiling bump to <5.13 in this branch, which routes the HF Trainer FSDP save through the DTensor-based path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Replace the module-wide mypy override (disable_error_code arg-type/index on 4
modules) with targeted handling of the 11 underlying errors:
- export/distribute.py: assert SharedMemory.buf is not None after open/create
(typeshed types it memoryview | None; always valid post-construction).
- speculative/plugins/hf_eagle.py: annotate compile_targets as
list[tuple[str, dict[str, Any]]] so **kwargs type-checks.
- nas/modules/{norm,linear}.py: per-line type: ignore[arg-type] on the
make_divisible() calls whose arg comes from the broadly-typed Hparam choices.
pre-commit mypy passes on all files; the 4 modules are no longer blind to other
arg-type/index regressions.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
/claude review |
There was a problem hiding this comment.
Claude review passed — no blocking issues found. LGTM
Findings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 0
Scope: reviewed the 8 modelopt/ source files in the PR's file list plus the tooling/config changes (pyproject.toml, noxfile.py, docs/source/conf.py, tests/conftest.py, .pre-commit-config.yaml). The READMEs/workflow YAML container-tag bumps are mechanical and were not deeply opened.
Key verifications:
- Import reorder (
modelopt/torch/__init__.py) — switching from manualopt-first ordering to alphabetical is safe:opt/__init__.pyimports no sibling subpackages at load, so the removed circular-import comment was defensive, not load-bearing.distill(now first) importingmodelopt.torch.opt.configfully initializesoptwithout re-entry. Transformers warn threshold (5.13) matchespyproject.toml(<5.13) andnoxfile.py(~=5.12.0). - DTensor-aware sparse mask (
sparsity/weight_sparsity/module.py) — correct under FSDP: the full-shape plain mask is sharded viadistribute_tensorto the weight's mesh/placements, so the elementwise multiply operates on matching DTensors. Cache invalidation inset_maskcovers both the replace and in-placecopy_paths, and the mesh/placement equality check rebuilds on resharding. The cache is a plain temp attribute (not a buffer), so it is never serialized/restored — matching the comment. - Typing changes —
bytes(engine.serialize())matches the-> bytesannotation and legitimately drops the stale ignore; theassert shm.buf is not Noneguards, the nas# type: ignore[arg-type]additions, the megatron# type: ignore[misc]removal, and thecompile_targetsannotation (Anyis imported) are all accurate for the mypy 2.1.0 bump.
Risk: low. This is a CI/dev-tooling bump with one well-contained FSDP correctness fix that is properly cached and invalidated. No mode-registration, state-schema, or export-compatibility changes.
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
What does this PR do?
Type of change: chore
Bumps CI/dev tooling and test containers.
Container bumps
Dev tooling bumps
local_partial_types,strict_bytes); fix/narrow the errors newly surfaced by mypy 2.0 in 4 modules (rather than blanket-suppressing them); remove 2 stale# type: ignorecommentssuppress_warnings = ["ref.python"]to fix cross-reference ambiguity error new in sphinx 9.xruff bump (0.12.11 → 0.15.18) is deferred to a follow-up PR as it requires many auto-fixes.
Bug fixes surfaced by the bumps
get_optimizer_state_dict, which triggeredaten.mul.Tensor got mixed torch.Tensor and DTensorin the dynamicweightgetter. The mask is now distributed to the weight's mesh/placements before masking, cached, and rebuilt only when the sharding changes (invalidated onset_mask). Fixes thellm_sparsityexample test.Testing
pre-commit run --all-files✅ (including mypy 2.1.0)nox -s docs✅tests/unit/torch/sparsity+tests/unit/torch/nas✅llm_sparsityGPU example test (FSDP path) verified in CIBefore your PR is "Ready for review"
CONTRIBUTING.md: N/ASummary by CodeRabbit
Summary
bytes.