[OMNIML-5084] cell_t0_d7#1788
Draft
ChenhanYu wants to merge 4 commits into
Draft
Conversation
…nfra fixes
modelopt/torch/quantization/plugins/transformer_engine.py:
MODELOPT_TEGROUPED_PER_EXPERT_QUANTIZER=1 opts into per-gemm
weight_quantizer_0..N-1 inside _QuantTEGroupedLinear (deepcopied from
the shared weight_quantizer). Lets TEGroupedMLP recover per-expert
amax granularity, matching SequentialMLP's default behavior.
modelopt/torch/distill/plugins/megatron.py:
LogitsKLLoss.forward prints student/teacher logit stats (mean/std/
min/max/shape) on rank 0 each call. Diagnostic for the QAD loss-spike
investigation — confirms which spec produces which logits without
changing the KL math.
tests/gpu_megatron/torch/quantization/plugins/test_megatron.py:
New test_te_grouped_vs_sequential_default_amax + ..._default_loss
cover the structural amax asymmetry between TEGroupedMLP and
SequentialMLP (TEGrouped per-linear amax = max-over-Sequential-experts
amax) and a finiteness sanity check on the resulting quant error.
tools/launcher/common/service_utils.sh:
- Fall back to SLURM_PROCID / SLURM_LOCALID when PMIX_*/OMPI_* are
unset, so `[[ "$mpi_local_rank" -eq 0 ]]` doesn't silently pass on
every rank under plain srun.
- util_install_extra_dep: per-node marker so concurrent ranks wait
for rank 0 to finish installing (concurrent pip on a shared FS
leaves a broken state); also installs nvidia-resiliency-ext.
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
- transformer_engine.py: dedup `import copy`/`import os` left over from the
rebase, sort the four imports alphabetically.
- transformer_engine.py: comment near the per-expert weight_quantizer setup
explaining that base modelopt_post_restore won't re-calibrate the
weight_quantizer_{i} modules, so save/restore is only safe when TP/EP is
unchanged. Per-channel _amax shape depends on the TP-sliced output dim.
- service_utils.sh: drop the duplicated mpi_rank / mpi_local_rank
re-assignments — main already carries the SLURM fallback, the extra two
lines were leftover rebase noise.
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft PR opened by pensieve-intern for OMNIML-5084.
Stage
cell_t0_d7of EpicOMNIML-5081. The agent ran from the SPEC on the ticket description; review every change before marking ready.Always-draft is enforced — the bot never auto-merges.