Conversation
There was a problem hiding this comment.
Pull request overview
This PR binds an AsyncEngine “epoch” to each HTTP-bound Session so that /stop_all_session (and related abort flows) can reliably invalidate in-flight work that was bound before the stop/abort, reducing races between request binding and generation.
Changes:
- Stamp
session.epochin the OpenAI API server when resolving/binding a session for a request. - Add
epochtoSessionobjects (with reset behavior) and improve abort logging to include epoch. - In
AsyncEngine.generate(), drop “stale” sessions whenstop_all_session()has bumped the engine epoch since the request bound the session; also adjust some metrics accounting and/sleepbehavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| lmdeploy/serve/openai/api_server.py | Binds AsyncEngine.epoch to sessions on request bind; /sleep now stops all sessions before sleeping. |
| lmdeploy/serve/managers/session_manager.py | Adds Session.epoch state, resets it, and logs epoch during abort. |
| lmdeploy/serve/core/async_engine.py | Adds stale-session detection based on epoch, bumps epoch on stop-all, and updates metrics/abort handling in generate(). |
Comments suppressed due to low confidence (2)
lmdeploy/serve/core/async_engine.py:413
- This branch yields
finish_reason='length'(normal completion due to token limit) but incrementsincrease_failed_requests('error'). That will skew scheduler metrics by counting expected length-limited completions as errors; consider incrementing succeeded requests (or not marking as failed) in this case.
if gen_config.max_new_tokens == 0:
logger.info(f'run out of tokens. session={session_id}.')
metrics_processor.increase_failed_requests('error')
yield GenOut(response='',
history_token_len=session.step,
input_token_len=len(input_ids),
generate_token_len=0,
finish_reason='length',
token_ids=[])
lmdeploy/serve/core/async_engine.py:462
- This pre-inference abort path yields
finish_reason='abort'even thoughGenOut.finish_reasonis not typed to allow'abort'. Align the finish_reason enum/typing acrossGenOutand response models so metrics/logging and downstream code don't see unexpected values.
if session.epoch is not None and session.epoch != self.epoch:
logger.warning(f'[generate] session {session_id} got aborted before starting inference, '
f'session.epoch={session.epoch}, epoch={self.epoch}')
metrics_processor.increase_failed_requests('abort')
yield GenOut(response='',
history_token_len=0,
input_token_len=len(input_ids),
generate_token_len=0,
finish_reason='abort',
token_ids=[])
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """If session is stamped ``http_bind_epoch`` by api_server and | ||
| ``stop_all_session`` ran since then (epoch changed), drop the | ||
| session.""" |
There was a problem hiding this comment.
_if_session_stale docstring references http_bind_epoch, but the actual field being checked is session.epoch (stamped in api_server). Consider renaming the docstring terminology (or the attribute) to avoid confusion when debugging epoch-based abort behavior.
| """If session is stamped ``http_bind_epoch`` by api_server and | |
| ``stop_all_session`` ran since then (epoch changed), drop the | |
| session.""" | |
| """If ``session.epoch`` was stamped by api_server and | |
| ``stop_all_session`` ran since then (the engine epoch changed), | |
| drop the session.""" |
| history_token_len=session.step, | ||
| input_token_len=input_token_len, | ||
| generate_token_len=0, | ||
| finish_reason='abort', |
There was a problem hiding this comment.
GenOut.finish_reason is typed as Literal['stop','length','error'], but this new stale-session path returns finish_reason='abort'. This introduces an additional inconsistent value; please update the allowed finish_reason type(s) (and any downstream consumers) to include 'abort' or map this case to an existing reason consistently.
| finish_reason='abort', | |
| finish_reason='error', |
No description provided.