-
Notifications
You must be signed in to change notification settings - Fork 2.5k
migrate to turn_handling #4502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: chenghaomou/v1.4.0
Are you sure you want to change the base?
migrate to turn_handling #4502
Conversation
d5e0f00 to
06e482c
Compare
887213e to
de1d9e8
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
https://github.com/codex review |
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bb5db358d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self._interruption_detection = turn_handling.interruption_cfg.mode | ||
| self._allow_interruptions = turn_handling.interruption_cfg.mode is not False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve session-level interruption defaults
Setting _allow_interruptions from turn_handling.interruption_cfg.mode here makes the agent’s allow_interruptions always a concrete bool (defaulting to True), so AgentActivity.allow_interruptions will always pick the agent value and never fall back to AgentSession.options.allow_interruptions. This means a session configured with turn_handling.interruption.mode=False (or deprecated allow_interruptions=False) will still allow interruptions unless every agent explicitly opts out. That’s a regression in session-level override behavior and will surprise users who only configure the session.
Useful? React with 👍 / 👎.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughThis pull request performs a terminology migration from "barge-in" to "interruption" throughout the voice agent codebase, introduces TurnHandlingConfig to centralize turn-handling configuration, adds deprecation support utilities, and enhances Pydantic integration for type utilities. The changes affect public APIs, internal detectors, configuration structures, and event types. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
livekit-agents/livekit/agents/voice/events.py (1)
84-92: Add event classes foruser_interruption_detectedanduser_non_interruption_detectedto the AgentEvent union.These two event types are declared in
EventTypesand are being emitted fromagent_activity.pywithInterruptionEventpayloads, but they lack corresponding event classes in theAgentEventunion. Consumers relying onAgentEventtype discrimination will be unable to deserialize or validate these events. CreateUserInterruptionDetectedEventandUserNonInterruptionDetectedEventclasses wrapping the relevant fields fromInterruptionEventwith the appropriate type discriminators, and add them to theAgentEventunion.
🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/inference/interruption.py`:
- Around line 64-68: The docstring and default value mismatch: fix the typo
"tnterruption" to "interruption" in the InterruptionEvent class docstring and
ensure the detection_interval default matches the constant (use 0.1s) wherever
it's declared and used (e.g., the detection_interval parameter/default in the
interruption detection function or class at the later occurrence around the same
module); update both places so the wording and numeric default are consistent
and accurate.
In `@livekit-agents/livekit/agents/utils/deprecation.py`:
- Around line 16-27: The docstring example for deprecate_params is misleading
because the shown output claims print(my_function(new_param=1)) prints "1" even
though my_function only prints old_param and returns None; update the example in
deprecation.py to be consistent by either changing my_function to return the
printed value or removing the erroneous print/returned output line — edit the
example that uses the decorator deprecate_params and the function my_function
(and references NOT_GIVEN/NotGivenOr) so the displayed output matches the actual
behavior (e.g., show only the WARNING and printed value, or change my_function
to return the value and then show the returned output).
In `@livekit-agents/livekit/agents/voice/agent_session.py`:
- Around line 217-220: The docstring in agent_session.AgentSession.__init__ (or
the constructor docstring near the parameters block) references a non-existent
parameter named "interruption_handling"; update that docstring to reflect the
actual parameter name "turn_handling" (or remove the stale entry) and ensure the
description matches the semantics of turn_handling (e.g., allowed values,
defaults, and requirements). Locate the parameter description currently labeled
"interruption_handling" and either rename it to "turn_handling" and adjust its
details to match the signature, or delete the stale paragraph so the docstring
matches the function signature.
- Around line 138-153: The deprecation decorator deprecate_params is currently
applied at the class level; move it onto the AgentSession.__init__ method so
deprecation warnings are emitted per-construction, and update the deprecation
message entries that reference the wrong class name—replace any
"TurnDetectionConfig" mentions with the correct "TurnHandlingConfig" for keys
like "min_endpointing_delay" and "turn_detection" to match the imported
TurnHandlingConfig and the other mappings; ensure the decorator mapping stays
identical otherwise and is attached directly above the __init__ signature.
In `@livekit-agents/livekit/agents/voice/agent.py`:
- Around line 690-710: The legacy min_endpointing_delay and
max_endpointing_delay parameters are never forwarded to Agent.__init__, so
per-agent endpointing overrides are dropped; update the super().__init__ call to
pass min_endpointing_delay and max_endpointing_delay (or otherwise ensure these
values are applied when turn_handling is provided) so that Agent.__init__
receives them; locate the turn_handling computation (TurnHandlingConfig.migrate
and is_given usage) and the subsequent super().__init__ invocation and add
forwarding of min_endpointing_delay and max_endpointing_delay (or incorporate
them into turn_handling before calling super) so legacy callers keep their
overrides.
In `@livekit-agents/livekit/agents/voice/audio_recognition.py`:
- Around line 249-258: The else branch currently unconditionally sets
trace_types.ATTR_IS_INTERRUPTION="false" for non-ReadableSpan types which can
overwrite an existing "true"; update the logic in the user_speaking_span
handling (the block using user_speaking_span.is_recording(), ReadableSpan, and
set_attribute) to first attempt to read the current attribute on
non-ReadableSpan spans (e.g., check getattr(user_speaking_span, "attributes",
None) or call a safe get_attribute method inside a try/except) and only call
set_attribute("false") if the attribute is absent/None; if the span type truly
cannot expose its attributes, skip setting the attribute (or document why it’s
safe) instead of unconditionally writing "false".
In `@livekit-agents/livekit/agents/voice/turn.py`:
- Around line 133-179: migrate currently only maps allow_interruptions=False, so
an explicit allow_interruptions=True ends up as NOT_GIVEN and can be overridden
by session defaults; update migrate (the interruption_mode handling in
TurnHandlingConfig.migrate) to check is_given(allow_interruptions) and if
allow_interruptions is True set interruption_mode to "adaptive" (or the intended
enabling mode) and if False set it to False, so an explicit True is preserved
rather than lost.
🧹 Nitpick comments (1)
livekit-agents/livekit/agents/voice/audio_recognition.py (1)
991-994: Channel cleanup in finally block references potentially stale channel.The cleanup logic at lines 991-994 accesses
self._interruption_ch, but this could beNoneifupdate_interruption_detection(None)was called concurrently. The guardif self._interruption_ch:helps, but there's a potential TOCTOU issue between the check and the operations.Consider using the local
audio_inputparameter (which is the channel) for cleanup instead:Suggested improvement
finally: await aio.cancel_and_wait(forward_task) await stream.aclose() - if self._interruption_ch: - while not self._interruption_ch.empty(): - self._interruption_ch.recv_nowait() - self._interruption_ch.close() + # audio_input is the channel passed to this task + ch = audio_input if hasattr(audio_input, 'close') else None + if ch and hasattr(ch, 'empty'): + while not ch.empty(): + ch.recv_nowait() + ch.close()Alternatively, since
audio_inputis typed asAsyncIterable, consider whether draining is necessary at all, or pass the channel explicitly.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
livekit-agents/livekit/agents/cli/cli.pylivekit-agents/livekit/agents/inference/__init__.pylivekit-agents/livekit/agents/inference/interruption.pylivekit-agents/livekit/agents/telemetry/trace_types.pylivekit-agents/livekit/agents/types.pylivekit-agents/livekit/agents/utils/deprecation.pylivekit-agents/livekit/agents/voice/agent.pylivekit-agents/livekit/agents/voice/agent_activity.pylivekit-agents/livekit/agents/voice/agent_session.pylivekit-agents/livekit/agents/voice/audio_recognition.pylivekit-agents/livekit/agents/voice/events.pylivekit-agents/livekit/agents/voice/room_io/_output.pylivekit-agents/livekit/agents/voice/turn.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-agents/livekit/agents/voice/room_io/_output.pylivekit-agents/livekit/agents/voice/events.pylivekit-agents/livekit/agents/inference/__init__.pylivekit-agents/livekit/agents/telemetry/trace_types.pylivekit-agents/livekit/agents/voice/agent.pylivekit-agents/livekit/agents/types.pylivekit-agents/livekit/agents/cli/cli.pylivekit-agents/livekit/agents/inference/interruption.pylivekit-agents/livekit/agents/utils/deprecation.pylivekit-agents/livekit/agents/voice/agent_activity.pylivekit-agents/livekit/agents/voice/audio_recognition.pylivekit-agents/livekit/agents/voice/agent_session.pylivekit-agents/livekit/agents/voice/turn.py
🧬 Code graph analysis (8)
livekit-agents/livekit/agents/voice/room_io/_output.py (2)
livekit-agents/livekit/agents/ipc/proc_client.py (1)
send(82-83)livekit-agents/livekit/agents/utils/aio/channel.py (2)
send(30-30)send(71-88)
livekit-agents/livekit/agents/inference/__init__.py (1)
livekit-agents/livekit/agents/inference/interruption.py (4)
AdaptiveInterruptionDetector(168-328)InterruptionDetectionError(101-107)InterruptionEvent(64-98)InterruptionEventType(59-61)
livekit-agents/livekit/agents/voice/agent.py (1)
livekit-agents/livekit/agents/voice/turn.py (3)
TurnHandlingConfig(80-205)migrate(133-193)interruption_cfg(196-199)
livekit-agents/livekit/agents/cli/cli.py (1)
livekit-agents/livekit/agents/voice/io.py (1)
on_playback_started(188-189)
livekit-agents/livekit/agents/utils/deprecation.py (1)
livekit-agents/livekit/agents/utils/misc.py (1)
is_given(25-26)
livekit-agents/livekit/agents/voice/agent_activity.py (5)
livekit-agents/livekit/agents/inference/interruption.py (2)
AdaptiveInterruptionDetector(168-328)InterruptionEvent(64-98)livekit-agents/livekit/agents/voice/agent.py (2)
interruption_detection(147-148)vad(582-592)livekit-agents/livekit/agents/voice/agent_session.py (2)
interruption_detection(1281-1282)vad(1277-1278)livekit-agents/livekit/agents/voice/audio_recognition.py (1)
on_interruption(87-87)livekit-agents/livekit/agents/utils/misc.py (1)
is_given(25-26)
livekit-agents/livekit/agents/voice/audio_recognition.py (1)
livekit-agents/livekit/agents/inference/interruption.py (9)
InterruptionStreamBase(331-525)InterruptionEvent(64-98)AdaptiveInterruptionDetector(168-328)_AgentSpeechStartedSentinel(332-333)_AgentSpeechEndedSentinel(335-336)_OverlapSpeechStartedSentinel(338-345)_OverlapSpeechEndedSentinel(347-348)InterruptionEventType(59-61)stream(303-312)
livekit-agents/livekit/agents/voice/turn.py (3)
livekit-agents/livekit/agents/utils/misc.py (1)
is_given(25-26)livekit-agents/livekit/agents/voice/agent.py (5)
turn_detection(510-520)turn_detection(523-527)min_endpointing_delay(608-618)max_endpointing_delay(621-627)allow_interruptions(595-605)livekit-agents/livekit/agents/voice/agent_session.py (1)
turn_detection(391-392)
🔇 Additional comments (28)
livekit-agents/livekit/agents/cli/cli.py (1)
165-169: LGTM!The new
on_playback_startedcallback is correctly invoked when the first audio frame is captured. The call signature properly uses the keyword-only argumentcreated_atas expected by the parent class method, andtime.time()provides the appropriate wall-clock timestamp.livekit-agents/livekit/agents/voice/room_io/_output.py (1)
101-103: Good ordering to keep_pushed_durationin sync with queued frames.Incrementing the duration before enqueueing avoids a race where
_forward_audiocould see a frame while_pushed_durationis still 0.livekit-agents/livekit/agents/telemetry/trace_types.py (1)
91-96: Interruption telemetry keys updated consistently.Names and namespaces align with the terminology migration.
livekit-agents/livekit/agents/utils/deprecation.py (1)
30-47: Decorator logic looks solid.Signature binding plus grouped warnings keeps behavior predictable and avoids duplicate logs.
livekit-agents/livekit/agents/types.py (1)
2-6: No action needed—Pydantic v2 is already the required minimum version.The project declares
pydantic>=2.0,<3inlivekit-agents/pyproject.toml, which explicitly requires Pydantic v2 and excludes v1. The__get_pydantic_core_schema__hook is appropriate and required for the declared Pydantic version.Likely an incorrect or invalid review comment.
livekit-agents/livekit/agents/inference/__init__.py (1)
1-5: Exports align with the interruption renaming.
Public inference exports now match the new interruption types and keep the API surface consistent.Also applies to: 19-22
livekit-agents/livekit/agents/voice/agent.py (1)
45-96: Turn-handling migration wiring looks consistent.
Agent state now derives turn detection and interruption settings fromTurnHandlingConfigas intended.Also applies to: 146-148
livekit-agents/livekit/agents/voice/turn.py (1)
11-131: Config models and dict coercion look solid.
Typed dict inputs plusmodel_post_initcoercion make the config ergonomics clear and safe.Also applies to: 195-205
livekit-agents/livekit/agents/voice/agent_activity.py (3)
157-161: Interruption detector wiring into AudioRecognition is clean.
The detector is resolved once and passed through toAudioRecognitionwithout extra side effects.Also applies to: 600-606
1129-1152: Overlap-speech hooks are consistently gated byinterruption_enabled.
Guarding these calls avoids unnecessary detector work when adaptive interruption is disabled.Also applies to: 1235-1243, 1251-1275, 1306-1314, 1635-1643, 1732-1740, 1842-1849, 2143-2150, 2175-2184, 2817-2824
2868-2908: Interruption detection resolution logic is clear.
The capability checks and explicit mode handling make detector selection easy to follow.livekit-agents/livekit/agents/inference/interruption.py (3)
58-329: Interruption model/enum refactor is consistent.
The event types, options, cache entries, and detector now align with interruption terminology without altering behavior.
331-1079: Stream wiring and event emission remain cohesive.
HTTP/WebSocket stream flows still emit the expected interruption events with the updated naming.
1199-1229: Deprecation helper is straightforward.
The guard rails around mixed legacy/new parameters are clear and appropriately strict.livekit-agents/livekit/agents/voice/agent_session.py (4)
260-277: Legacy parameters are silently ignored whenturn_handlingis provided.When both
turn_handlingand legacy parameters (e.g.,min_endpointing_delay) are provided, the legacy parameters are completely ignored sinceturn_handlingis used directly. While the@deprecate_paramsdecorator will emit warnings, users might expect their legacy values to be merged or to see an explicit error.Consider adding a warning when both are provided, or document this precedence behavior clearly.
254-258: LGTM!The
video_samplerinitialization correctly usesis_given()to distinguish betweenNOT_GIVEN(use default) and an explicitNone(disable sampling).
281-303: LGTM!The mapping from
TurnHandlingConfigsub-configurations toAgentSessionOptionsis well-structured and correctly extracts values frominterruption_cfgandendpointing_cfg.
1280-1282: LGTM!The
interruption_detectionproperty correctly exposes the renamed internal state with appropriate type annotation.livekit-agents/livekit/agents/voice/audio_recognition.py (10)
18-18: LGTM!Import of
InterruptionStreamBasefrom the new location is correct.
149-165: LGTM!The interruption channel initialization correctly types all sentinel classes from
InterruptionStreamBase, and_interruption_enabledis properly gated on bothinterruption_detectionandvadbeing present.
210-219: LGTM!Proper guard checking
_interruption_enabled, channel existence, and closed state before sending the sentinel.
221-238: LGTM!The method correctly guards the sentinel send with
_agent_speakingcheck, ensuring overlap speech is only tracked during agent speech.
446-473: LGTM!The
update_interruption_detectionmethod properly manages the lifecycle of the interruption task and channel, including cleanup of the previous task.
764-766: LGTM!The event handler correctly filters for
InterruptionEventType.INTERRUPTIONevents and forwards to the hooks.
915-916: LGTM!Local import of
ModelSettingswithin_stt_taskis appropriate for avoiding circular dependencies.
340-346: LGTM!The
_flush_held_transcriptsand_should_hold_stt_eventmethods consistently check_interruption_enabledand use appropriate log levels.Also applies to: 348-351
584-588: LGTM!The comment correctly reflects the updated terminology, and the logic properly gates interruption handling on
_interruption_enabled.
86-87: Breaking change toRecognitionHooksprotocol.The
on_interruptionmethod replaceson_bargein. All implementations ofRecognitionHookshave been updated to the new signature.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
turn_handling, supporting both syntax:or
Summary by CodeRabbit
Refactor
TurnHandlingConfigstructure for simplified configuration.New Features
interruption_detectionproperty to Agent to expose current interruption mode.✏️ Tip: You can customize this high-level summary in your review settings.