Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends runtime parameter/weight updates to also cover the speculative-decoding draft model, and changes schedule-metrics retrieval to be async-safe to avoid ZMQ RPC client race conditions during synchronous update_params calls.
Changes:
- Make
get_schedule_metricsasync throughAsyncEngineand the OpenAI API server metrics loop. - Switch MP engine schedule-metrics RPC to an async RPC path (
_collective_rpc_async). - Update
ModelAgentto applyupdate_params,sleep, andwakeupto the speculative draft model as well as the main model.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
lmdeploy/serve/openai/api_server.py |
Await schedule-metrics fetch in the periodic metrics logging task. |
lmdeploy/serve/core/async_engine.py |
Convert get_schedule_metrics to async and support both sync/async engine implementations. |
lmdeploy/pytorch/engine/mp_engine/base.py |
Make schedule-metrics retrieval async via _collective_rpc_async. |
lmdeploy/pytorch/engine/model_agent/agent.py |
Add draft-model support for update_params and include draft model in sleep/wakeup flows. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not self.spec_agent.is_enabled(): | ||
| return weights, [] | ||
| main = [(name, weight) for name, weight in weights if not name.startswith('mtp.')] | ||
| draft = [(name, weight) for name, weight in weights if name.startswith('mtp.')] |
There was a problem hiding this comment.
Draft-model weight updates will likely fail because draft_weights retain the 'mtp.' prefix, but the spec draft model built by spec_agent is a standalone patched model whose parameter names typically do not include that outer prefix. This can lead to missing-key/KeyError inside load_weights when indexing params_dict[name]. Consider stripping the 'mtp.' prefix (and/or applying an explicit mapping) before passing weights through _rename_weights_iterator/load_weights for spec_model.
| draft = [(name, weight) for name, weight in weights if name.startswith('mtp.')] | |
| # For the draft (spec) model, strip the outer "mtp." prefix from parameter names | |
| draft = [(name[len('mtp.'):], weight) | |
| for name, weight in weights | |
| if name.startswith('mtp.')] |
|
|
||
| self.spec_agent.reset_graph_runner() | ||
|
|
||
| def _get_spec_model(self): |
There was a problem hiding this comment.
we may put this method to spec_agent class
| return [(k, _construct(v)) for k, v in raw] | ||
|
|
||
| def _split_main_and_draft(weights): | ||
| if not self.spec_agent.is_enabled() or self.spec_agent.method != 'qwen3_5_mtp': |
There was a problem hiding this comment.
may add a TODO or warning message in here
get_schedule_metrics(sync -> async), avoid race condition in ZMQ rpc client whenupdate_params(sync).