Skip to content

fix: reconnect to platform events websocket after connection drop#967

Merged
vdusek merged 7 commits into
masterfrom
fix/events-websocket-reconnect
Jun 16, 2026
Merged

fix: reconnect to platform events websocket after connection drop#967
vdusek merged 7 commits into
masterfrom
fix/events-websocket-reconnect

Conversation

@vdusek

@vdusek vdusek commented Jun 11, 2026

Copy link
Copy Markdown
Contributor
  • Previously, ApifyEventManager._process_platform_messages connected to the platform events websocket exactly once. Any dropped connection raised ConnectionClosedError and permanently ended the processing task, while a graceful server-side close exited silently. In both cases, the Actor missed all subsequent platform events for the rest of the run.
  • The connection now uses the websockets reconnect iterator, which re-establishes the connection after every drop or graceful close, with backoff on failed attempts. An abnormal drop is logged as a warning and a graceful close as info, both with the close code and reason, and a successful reconnect is logged as well.
  • A process_exception callback keeps errors fatal before the first successful connection, so Actor.init still fails fast on a misconfigured URL instead of hanging. After the first connection, the default websockets classification decides which errors are transient and retried.
  • __aexit__ now cancels the processing task before closing the websocket. Otherwise, every clean shutdown would trigger a spurious reconnect attempt.
  • Added a regression test, parametrized over graceful and abnormal closes, that drops the connection server-side and asserts the drop is logged, the client reconnects, and events keep arriving. It replaces the obsolete mid-stream disconnect test, whose premise (the task ends after a drop) no longer holds.

@vdusek vdusek added adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. labels Jun 11, 2026
@vdusek vdusek self-assigned this Jun 11, 2026
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.21429% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 90.43%. Comparing base (0daca28) to head (3ba124d).
⚠️ Report is 18 commits behind head on master.

Files with missing lines Patch % Lines
src/apify/events/_apify_event_manager.py 98.21% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #967      +/-   ##
==========================================
+ Coverage   89.90%   90.43%   +0.53%     
==========================================
  Files          49       49              
  Lines        3091     3127      +36     
==========================================
+ Hits         2779     2828      +49     
+ Misses        312      299      -13     
Flag Coverage Δ
e2e 36.16% <21.42%> (+0.25%) ⬆️
integration 57.11% <21.42%> (+0.24%) ⬆️
unit 79.18% <98.21%> (+0.43%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

vdusek added 2 commits June 11, 2026 15:56
After the first successful connection, delegate to the default `websockets` transient/fatal classification
instead of retrying every error. Log graceful and abnormal closes (with close code and reason) as well as
reconnect success, and cover both close paths with parametrized tests.
@github-actions github-actions Bot added this to the 142nd sprint - Tooling team milestone Jun 12, 2026
@github-actions github-actions Bot added the tested Temporary label used only programatically for some analytics. label Jun 12, 2026
@vdusek vdusek requested a review from Pijukatel June 16, 2026 07:18
@vdusek vdusek marked this pull request as ready for review June 16, 2026 07:18

@Pijukatel Pijukatel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some Claude spam...

Comment thread src/apify/events/_apify_event_manager.py
Comment thread src/apify/events/_apify_event_manager.py
Comment thread src/apify/events/_apify_event_manager.py Outdated
Comment thread tests/unit/events/test_apify_event_manager.py Outdated

@Pijukatel Pijukatel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this implemented in JS? If not, is there a parity issue?

Comment thread src/apify/events/_apify_event_manager.py Outdated
@vdusek vdusek requested a review from Pijukatel June 16, 2026 09:55
@vdusek

vdusek commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Is this implemented in JS? If not, is there a parity issue?

I'll open it.

@vdusek vdusek merged commit 5653a22 into master Jun 16, 2026
28 checks passed
@vdusek vdusek deleted the fix/events-websocket-reconnect branch June 16, 2026 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants