Skip to content

bind epoch to session#4496

Open
lvhan028 wants to merge 1 commit intoInternLM:mainfrom
lvhan028:fix-abort
Open

bind epoch to session#4496
lvhan028 wants to merge 1 commit intoInternLM:mainfrom
lvhan028:fix-abort

Conversation

@lvhan028
Copy link
Copy Markdown
Collaborator

@lvhan028 lvhan028 commented Apr 4, 2026

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR binds an AsyncEngine “epoch” to each HTTP-bound Session so that /stop_all_session (and related abort flows) can reliably invalidate in-flight work that was bound before the stop/abort, reducing races between request binding and generation.

Changes:

  • Stamp session.epoch in the OpenAI API server when resolving/binding a session for a request.
  • Add epoch to Session objects (with reset behavior) and improve abort logging to include epoch.
  • In AsyncEngine.generate(), drop “stale” sessions when stop_all_session() has bumped the engine epoch since the request bound the session; also adjust some metrics accounting and /sleep behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
lmdeploy/serve/openai/api_server.py Binds AsyncEngine.epoch to sessions on request bind; /sleep now stops all sessions before sleeping.
lmdeploy/serve/managers/session_manager.py Adds Session.epoch state, resets it, and logs epoch during abort.
lmdeploy/serve/core/async_engine.py Adds stale-session detection based on epoch, bumps epoch on stop-all, and updates metrics/abort handling in generate().
Comments suppressed due to low confidence (2)

lmdeploy/serve/core/async_engine.py:413

  • This branch yields finish_reason='length' (normal completion due to token limit) but increments increase_failed_requests('error'). That will skew scheduler metrics by counting expected length-limited completions as errors; consider incrementing succeeded requests (or not marking as failed) in this case.
        if gen_config.max_new_tokens == 0:
            logger.info(f'run out of tokens. session={session_id}.')
            metrics_processor.increase_failed_requests('error')
            yield GenOut(response='',
                         history_token_len=session.step,
                         input_token_len=len(input_ids),
                         generate_token_len=0,
                         finish_reason='length',
                         token_ids=[])

lmdeploy/serve/core/async_engine.py:462

  • This pre-inference abort path yields finish_reason='abort' even though GenOut.finish_reason is not typed to allow 'abort'. Align the finish_reason enum/typing across GenOut and response models so metrics/logging and downstream code don't see unexpected values.
            if session.epoch is not None and session.epoch != self.epoch:
                logger.warning(f'[generate] session {session_id} got aborted before starting inference, '
                               f'session.epoch={session.epoch}, epoch={self.epoch}')
                metrics_processor.increase_failed_requests('abort')
                yield GenOut(response='',
                             history_token_len=0,
                             input_token_len=len(input_ids),
                             generate_token_len=0,
                             finish_reason='abort',
                             token_ids=[])

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +206 to +208
"""If session is stamped ``http_bind_epoch`` by api_server and
``stop_all_session`` ran since then (epoch changed), drop the
session."""
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_if_session_stale docstring references http_bind_epoch, but the actual field being checked is session.epoch (stamped in api_server). Consider renaming the docstring terminology (or the attribute) to avoid confusion when debugging epoch-based abort behavior.

Suggested change
"""If session is stamped ``http_bind_epoch`` by api_server and
``stop_all_session`` ran since then (epoch changed), drop the
session."""
"""If ``session.epoch`` was stamped by api_server and
``stop_all_session`` ran since then (the engine epoch changed),
drop the session."""

Copilot uses AI. Check for mistakes.
history_token_len=session.step,
input_token_len=input_token_len,
generate_token_len=0,
finish_reason='abort',
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenOut.finish_reason is typed as Literal['stop','length','error'], but this new stale-session path returns finish_reason='abort'. This introduces an additional inconsistent value; please update the allowed finish_reason type(s) (and any downstream consumers) to include 'abort' or map this case to an existing reason consistently.

Suggested change
finish_reason='abort',
finish_reason='error',

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants