Skip to content

Draft model update params#4452

Open
CUHKSZzxy wants to merge 6 commits intoInternLM:mainfrom
CUHKSZzxy:draft-model-update-params
Open

Draft model update params#4452
CUHKSZzxy wants to merge 6 commits intoInternLM:mainfrom
CUHKSZzxy:draft-model-update-params

Conversation

@CUHKSZzxy
Copy link
Collaborator

@CUHKSZzxy CUHKSZzxy commented Mar 24, 2026

  • Support qwen3.5 model update params
  • get_schedule_metrics (sync -> async), avoid race condition in ZMQ rpc client when update_params (sync).

@CUHKSZzxy CUHKSZzxy requested a review from RunningLeon March 24, 2026 04:28
@CUHKSZzxy CUHKSZzxy marked this pull request as ready for review March 24, 2026 09:51
Copilot AI review requested due to automatic review settings March 24, 2026 09:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends runtime parameter/weight updates to also cover the speculative-decoding draft model, and changes schedule-metrics retrieval to be async-safe to avoid ZMQ RPC client race conditions during synchronous update_params calls.

Changes:

  • Make get_schedule_metrics async through AsyncEngine and the OpenAI API server metrics loop.
  • Switch MP engine schedule-metrics RPC to an async RPC path (_collective_rpc_async).
  • Update ModelAgent to apply update_params, sleep, and wakeup to the speculative draft model as well as the main model.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
lmdeploy/serve/openai/api_server.py Await schedule-metrics fetch in the periodic metrics logging task.
lmdeploy/serve/core/async_engine.py Convert get_schedule_metrics to async and support both sync/async engine implementations.
lmdeploy/pytorch/engine/mp_engine/base.py Make schedule-metrics retrieval async via _collective_rpc_async.
lmdeploy/pytorch/engine/model_agent/agent.py Add draft-model support for update_params and include draft model in sleep/wakeup flows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if not self.spec_agent.is_enabled():
return weights, []
main = [(name, weight) for name, weight in weights if not name.startswith('mtp.')]
draft = [(name, weight) for name, weight in weights if name.startswith('mtp.')]
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Draft-model weight updates will likely fail because draft_weights retain the 'mtp.' prefix, but the spec draft model built by spec_agent is a standalone patched model whose parameter names typically do not include that outer prefix. This can lead to missing-key/KeyError inside load_weights when indexing params_dict[name]. Consider stripping the 'mtp.' prefix (and/or applying an explicit mapping) before passing weights through _rename_weights_iterator/load_weights for spec_model.

Suggested change
draft = [(name, weight) for name, weight in weights if name.startswith('mtp.')]
# For the draft (spec) model, strip the outer "mtp." prefix from parameter names
draft = [(name[len('mtp.'):], weight)
for name, weight in weights
if name.startswith('mtp.')]

Copilot uses AI. Check for mistakes.

self.spec_agent.reset_graph_runner()

def _get_spec_model(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may put this method to spec_agent class

return [(k, _construct(v)) for k, v in raw]

def _split_main_and_draft(weights):
if not self.spec_agent.is_enabled() or self.spec_agent.method != 'qwen3_5_mtp':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may add a TODO or warning message in here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants