bind epoch to session by lvhan028 · Pull Request #4496 · InternLM/lmdeploy

lvhan028 · 2026-04-04T09:55:30Z

No description provided.

Copilot

Pull request overview

This PR binds an AsyncEngine “epoch” to each HTTP-bound Session so that /stop_all_session (and related abort flows) can reliably invalidate in-flight work that was bound before the stop/abort, reducing races between request binding and generation.

Changes:

Stamp session.epoch in the OpenAI API server when resolving/binding a session for a request.
Add epoch to Session objects (with reset behavior) and improve abort logging to include epoch.
In AsyncEngine.generate(), drop “stale” sessions when stop_all_session() has bumped the engine epoch since the request bound the session; also adjust some metrics accounting and /sleep behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
lmdeploy/serve/openai/api_server.py	Binds `AsyncEngine.epoch` to sessions on request bind; `/sleep` now stops all sessions before sleeping.
lmdeploy/serve/managers/session_manager.py	Adds `Session.epoch` state, resets it, and logs epoch during abort.
lmdeploy/serve/core/async_engine.py	Adds stale-session detection based on epoch, bumps epoch on stop-all, and updates metrics/abort handling in `generate()`.

Comments suppressed due to low confidence (2)

lmdeploy/serve/core/async_engine.py:413

This branch yields finish_reason='length' (normal completion due to token limit) but increments increase_failed_requests('error'). That will skew scheduler metrics by counting expected length-limited completions as errors; consider incrementing succeeded requests (or not marking as failed) in this case.

        if gen_config.max_new_tokens == 0:
            logger.info(f'run out of tokens. session={session_id}.')
            metrics_processor.increase_failed_requests('error')
            yield GenOut(response='',
                         history_token_len=session.step,
                         input_token_len=len(input_ids),
                         generate_token_len=0,
                         finish_reason='length',
                         token_ids=[])

lmdeploy/serve/core/async_engine.py:462

This pre-inference abort path yields finish_reason='abort' even though GenOut.finish_reason is not typed to allow 'abort'. Align the finish_reason enum/typing across GenOut and response models so metrics/logging and downstream code don't see unexpected values.

            if session.epoch is not None and session.epoch != self.epoch:
                logger.warning(f'[generate] session {session_id} got aborted before starting inference, '
                               f'session.epoch={session.epoch}, epoch={self.epoch}')
                metrics_processor.increase_failed_requests('abort')
                yield GenOut(response='',
                             history_token_len=0,
                             input_token_len=len(input_ids),
                             generate_token_len=0,
                             finish_reason='abort',
                             token_ids=[])

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-04T09:59:37Z

lmdeploy/serve/core/async_engine.py

+        """If session is stamped ``http_bind_epoch`` by api_server and
+        ``stop_all_session`` ran since then (epoch changed), drop the
+        session."""


_if_session_stale docstring references http_bind_epoch, but the actual field being checked is session.epoch (stamped in api_server). Consider renaming the docstring terminology (or the attribute) to avoid confusion when debugging epoch-based abort behavior.

Suggested change

"""If session is stamped ``http_bind_epoch`` by api_server and

``stop_all_session`` ran since then (epoch changed), drop the

session."""

"""If ``session.epoch`` was stamped by api_server and

``stop_all_session`` ran since then (the engine epoch changed),

drop the session."""

Copilot · 2026-04-04T09:59:37Z

lmdeploy/serve/core/async_engine.py

+                      history_token_len=session.step,
+                      input_token_len=input_token_len,
+                      generate_token_len=0,
+                      finish_reason='abort',


GenOut.finish_reason is typed as Literal['stop','length','error'], but this new stale-session path returns finish_reason='abort'. This introduces an additional inconsistent value; please update the allowed finish_reason type(s) (and any downstream consumers) to include 'abort' or map this case to an existing reason consistently.

Suggested change

finish_reason='abort',

finish_reason='error',

bind epoch to session

e0ceca3

lvhan028 requested review from Copilot and removed request for Copilot April 4, 2026 09:55

Copilot started reviewing on behalf of lvhan028 April 4, 2026 09:56 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bind epoch to session#4496

bind epoch to session#4496
lvhan028 wants to merge 1 commit intoInternLM:mainfrom
lvhan028:fix-abort

lvhan028 commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lvhan028 commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants