Skip to content

fix(ci): Fix E2E test flakiness#5830

Draft
antonis wants to merge 37 commits intomainfrom
antonis/fix-e2e-flakiness-combined
Draft

fix(ci): Fix E2E test flakiness#5830
antonis wants to merge 37 commits intomainfrom
antonis/fix-e2e-flakiness-combined

Conversation

@antonis
Copy link
Contributor

@antonis antonis commented Mar 17, 2026

📢 Type of change

  • Bugfix

📜 Description

Fixes iOS E2E test flakiness across e2e-v2 and sample-application workflows caused by the migration to Cirrus Labs Tart VMs (nested virtualisation).

Root cause: crash-loop cascade after nativeCrash() test

After crash.yml triggers Sentry.nativeCrash(), the plain launchApp (without clearState) causes the app to crash immediately on relaunch because the Sentry SDK reads the pending crash report during init and hits a failure path. This writes a second crash report, triggering iOS's crash-loop protection for the bundle ID. The cascade causes subsequent flows (feedback, close, captureException) to also fail with "App crashed or stopped while executing flow".

Fix: clearState: true on the post-crash launchApp in crash.yml, so Maestro reinstalls the app and clears the crash-loop state.

Root cause: simulator not fully ready on Tart VMs

Tart VMs use nested virtualisation, making the simulator significantly slower to stabilise after boot. Maestro's XCTest driver races to connect before SpringBoard and system services finish post-boot init.

Fixes:

  • wait_for_boot: true on simulator-action. Blocks until the simulator fully boots
  • erase_before_boot: false. Skip redundant erase (each flow already uses clearState)
  • Simulator warm-up step. Launch/terminate Settings.app so system services finish init
  • MAESTRO_DRIVER_STARTUP_TIMEOUT: 180000 (3 min). Extra headroom for Tart VM startup

Safety net: per-flow retries (3 attempts)

Even with the above fixes, the Tart VM environment has residual timing flakiness (~1 in 18 flows needed a retry in CI). Both test harnesses now retry each Maestro flow individually up to 3 times:

  • dev-packages/e2e-tests/cli.mjs. For the e2e-v2 workflow
  • samples/react-native/e2e/utils/maestro.ts. For the sample-application workflow

Sample app test fixes

  • Search all envelopes for app start transaction. On slow emulators the app start transaction may arrive in a separate envelope from the navigation transaction
  • Sort news envelopes by timestamp. Ensures consistent ordering regardless of arrival time
  • Exclude auto.app.start from time-to-display assertions. App start transactions have app_start_cold measurements, not TTID/TTFD

Supersedes #5752 and #5755 by unifying them and simplifying the changes

💡 Motivation and Context

iOS E2E tests have been failing on every main . The consistent failure pattern on main:

  • feedback, close, crash, captureException: all fail with "App crashed or stopped"
  • captureMessage, captureReplay: pass intermittently
  • sample-application iOS: fails on captureErrorsScreenTransaction (app start transaction not found in first envelope)

💚 How did you test it?

  • CI

📝 Checklist

  • I added tests to verify changes
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled
  • I updated the docs if needed.
  • I updated the wizard if needed.
  • All tests passing
  • No breaking changes

🔮 Next steps

#skip-changelog

antonis and others added 18 commits March 3, 2026 15:26
After the react-native-test job was moved from GitHub-hosted macos-26 to
Cirrus Labs Tart VMs (macos-tahoe-xcode:26.2.0), iOS simulators take longer
to fully boot in the new virtualised environment. With `wait_for_boot` defaulting
to false, Maestro was racing to connect before the simulator was ready, causing
different failures on each run.

- Add `wait_for_boot: true` to `futureware-tech/simulator-action` so the job
  blocks until the simulator has fully completed booting before Maestro connects.
- Bump `MAESTRO_DRIVER_STARTUP_TIMEOUT` from 120s to 180s to give additional
  headroom for the Cirrus Labs runner environment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After crash.yml taps "Crash" (Sentry.nativeCrash()), the plain `launchApp`
(without clearState) causes the app to crash immediately on relaunch (~82ms)
because the Sentry SDK reads the pending crash report during initialisation
and hits a failure path. This writes a second crash report on top of the
first, triggering iOS's simulator crash-loop guard for the bundle ID.

The cascade:
1. nativeCrash → crash report #1 written
2. launchApp (no clearState) → app crashes on startup → crash report #2
3. Next test (captureMessage) gets the crash-loop ban → instant exit on launch

Fix: add `clearState: true` to the post-crash launchApp so Maestro
reinstalls the app, clearing both the crash report and the crash-loop state
before assertTestReady runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… VMs

The iOS E2E tests have been consistently failing since the migration to
Cirrus Labs Tart VMs (c1cade4). The nested virtualisation makes the
simulator slower to stabilise, causing Maestro's XCTest driver to lose
communication with the app on first launches.

Two fixes:
1. Set erase_before_boot: false — each Maestro flow already reinstalls
   the app via clearState, so erasing the entire simulator is redundant
   and adds overhead that destabilises the simulator on Tart VMs.
2. Add a warm-up step that launches and terminates Settings.app so that
   SpringBoard and other system services finish post-boot initialisation
   before Maestro connects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cirrus Labs Tart VMs intermittently fail individual app launches —
the app process exits before the JS bundle finishes loading, causing
Maestro to report "App crashed or stopped". A single retry of the
full suite is the most reliable way to absorb this flakiness.

Also increased the warmup sleep from 3s to 5s to give SpringBoard
more time to settle on the slow nested-virtualisation runners.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of retrying the entire test suite, run each flow file
individually with up to 3 attempts.  This is more effective because
different flows fail randomly on Tart VMs — retrying only the failed
flow is faster and avoids re-running flows that already passed.

The CLI now:
1. Lists all .yml files in the maestro/ directory
2. Runs each flow with `maestro test <flow.yml>`
3. On failure, retries the same flow up to 2 more times
4. Prints a summary of all results at the end

Removes the suite-level retry wrapper from the workflow since
per-flow retries in the CLI are more targeted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address CodeQL finding by using execFileSync with an argument array
instead of execSync with a template string. This avoids shell
interpolation of filesystem-sourced flow file names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ners

- Increase MAESTRO_DRIVER_STARTUP_TIMEOUT to 180s for slow Tart VMs
- Add wait_for_boot and erase_before_boot: false to simulator-action
- Add simulator warm-up step before running iOS tests
- Sort spaceflight news envelopes by timestamp instead of arrival order
- Relax HTTP spans assertion to >= 1 (not all layers complete on slow VMs)
- Search all envelopes for app start transaction (may arrive separately)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On slow Cirrus Labs Tart VMs, the app may crash during Maestro flow
execution. Add up to 3 retries to handle transient app crashes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
App start transactions (origin: auto.app.start) have app_start_cold
measurements but not time_to_initial_display/time_to_full_display.
The filter already excluded ui.action.touch but not app start transactions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use nullish coalescing for httpSpans length check to avoid TypeError
  when spans is undefined
- Document maestro retry envelope contamination limitation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The warm-up step is best-effort and should not fail the build if
the Preferences app fails to launch or terminate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use consistent comment and sleep 5 across both workflows, as suggested
in PR review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge both PRs that fix E2E test flakiness on Cirrus Labs Tart VMs:
- iOS E2E fixes: simulator warm-up, per-flow retries, crash-loop prevention (#5752)
- Sample app E2E fixes: increased timeouts, sorted envelopes, relaxed assertions (#5755)

Conflict resolution: kept Maestro 2.3.0 from main with 180s timeout from #5755.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

Semver Impact of This PR

None (no version bump detected)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


This PR will not appear in the changelog.


🤖 This preview updates automatically when you update the PR.

antonis and others added 2 commits March 17, 2026 16:23
Reverts whitespace-only changes (@{ } -> @{}) in ObjC files that
cause clang-format CI failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

Android (legacy) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 500.66 ms 494.86 ms -5.80 ms
Size 43.75 MiB 48.08 MiB 4.32 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
4a17c8f+dirty 406.62 ms 400.58 ms -6.04 ms
df1f7df+dirty 442.64 ms 427.16 ms -15.48 ms
a483f9f+dirty 396.82 ms 453.28 ms 56.46 ms
60cd796+dirty 445.84 ms 492.45 ms 46.61 ms
5c16cdc+dirty 423.48 ms 452.35 ms 28.88 ms
80e4616+dirty 411.58 ms 462.12 ms 50.54 ms
55b77fc+dirty 411.87 ms 417.16 ms 5.29 ms
bca62c0+dirty 414.36 ms 451.06 ms 36.70 ms
0b64753+dirty 448.67 ms 474.61 ms 25.94 ms
4e6d7d7+dirty 480.73 ms 515.73 ms 35.00 ms

App size

Revision Plain With Sentry Diff
4a17c8f+dirty 43.75 MiB 47.99 MiB 4.24 MiB
df1f7df+dirty 43.75 MiB 48.08 MiB 4.33 MiB
a483f9f+dirty 43.75 MiB 48.41 MiB 4.66 MiB
60cd796+dirty 43.75 MiB 48.07 MiB 4.32 MiB
5c16cdc+dirty 17.75 MiB 19.68 MiB 1.94 MiB
80e4616+dirty 43.75 MiB 48.55 MiB 4.80 MiB
55b77fc+dirty 43.75 MiB 47.99 MiB 4.24 MiB
bca62c0+dirty 43.75 MiB 48.41 MiB 4.66 MiB
0b64753+dirty 17.75 MiB 19.70 MiB 1.95 MiB
4e6d7d7+dirty 43.75 MiB 48.40 MiB 4.64 MiB

Previous results on branch: antonis/fix-e2e-flakiness-combined

Startup times

Revision Plain With Sentry Diff
7e6fe7f+dirty 425.16 ms 471.08 ms 45.92 ms
c96c5b7+dirty 446.10 ms 449.47 ms 3.37 ms
d34c279+dirty 440.10 ms 477.96 ms 37.86 ms
ca6a32c+dirty 396.66 ms 442.52 ms 45.86 ms
b6f917a+dirty 468.08 ms 529.56 ms 61.48 ms

App size

Revision Plain With Sentry Diff
7e6fe7f+dirty 43.75 MiB 48.08 MiB 4.32 MiB
c96c5b7+dirty 43.75 MiB 48.08 MiB 4.32 MiB
d34c279+dirty 43.75 MiB 48.32 MiB 4.57 MiB
ca6a32c+dirty 43.75 MiB 48.08 MiB 4.32 MiB
b6f917a+dirty 43.75 MiB 48.07 MiB 4.32 MiB

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

iOS (legacy) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1214.98 ms 1218.98 ms 4.00 ms
Size 3.38 MiB 4.73 MiB 1.35 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
ea3e26e+dirty 1229.13 ms 1228.46 ms -0.67 ms
80e4616+dirty 1221.32 ms 1225.64 ms 4.32 ms
818a608+dirty 1205.76 ms 1208.00 ms 2.24 ms
77061ed+dirty 1233.16 ms 1234.88 ms 1.71 ms
bef3709+dirty 1222.07 ms 1220.24 ms -1.83 ms
a206511+dirty 1185.00 ms 1186.35 ms 1.35 ms
74979ac+dirty 1210.49 ms 1213.31 ms 2.82 ms
a2bb688+dirty 1223.53 ms 1232.90 ms 9.37 ms
8a868fe+dirty 1221.50 ms 1230.78 ms 9.28 ms
d590428+dirty 1211.77 ms 1220.51 ms 8.75 ms

App size

Revision Plain With Sentry Diff
ea3e26e+dirty 3.41 MiB 4.58 MiB 1.17 MiB
80e4616+dirty 3.38 MiB 4.60 MiB 1.22 MiB
818a608+dirty 2.63 MiB 3.91 MiB 1.28 MiB
77061ed+dirty 2.63 MiB 3.98 MiB 1.34 MiB
bef3709+dirty 3.38 MiB 4.78 MiB 1.40 MiB
a206511+dirty 3.41 MiB 4.67 MiB 1.25 MiB
74979ac+dirty 3.38 MiB 4.60 MiB 1.22 MiB
a2bb688+dirty 2.63 MiB 3.99 MiB 1.36 MiB
8a868fe+dirty 3.38 MiB 4.60 MiB 1.22 MiB
d590428+dirty 3.38 MiB 4.78 MiB 1.39 MiB

Previous results on branch: antonis/fix-e2e-flakiness-combined

Startup times

Revision Plain With Sentry Diff
ca6a32c+dirty 1229.87 ms 1228.40 ms -1.47 ms
d34c279+dirty 1226.58 ms 1225.89 ms -0.69 ms
c96c5b7+dirty 1194.79 ms 1193.75 ms -1.04 ms
7e6fe7f+dirty 1194.50 ms 1192.54 ms -1.96 ms
b6f917a+dirty 1230.39 ms 1223.63 ms -6.76 ms

App size

Revision Plain With Sentry Diff
ca6a32c+dirty 3.38 MiB 4.73 MiB 1.35 MiB
d34c279+dirty 3.38 MiB 4.72 MiB 1.34 MiB
c96c5b7+dirty 3.38 MiB 4.73 MiB 1.35 MiB
7e6fe7f+dirty 3.38 MiB 4.73 MiB 1.35 MiB
b6f917a+dirty 3.38 MiB 4.72 MiB 1.34 MiB

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

iOS (new) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1220.13 ms 1222.59 ms 2.46 ms
Size 3.38 MiB 4.73 MiB 1.35 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
ea3e26e+dirty 1216.61 ms 1214.15 ms -2.47 ms
80e4616+dirty 1206.90 ms 1205.94 ms -0.96 ms
818a608+dirty 1218.84 ms 1223.18 ms 4.34 ms
77061ed+dirty 1210.77 ms 1218.45 ms 7.68 ms
bef3709+dirty 1217.79 ms 1225.33 ms 7.54 ms
a206511+dirty 1225.02 ms 1223.74 ms -1.28 ms
74979ac+dirty 1212.33 ms 1212.54 ms 0.21 ms
a2bb688+dirty 1244.82 ms 1238.60 ms -6.22 ms
8a868fe+dirty 1206.85 ms 1215.04 ms 8.19 ms
d590428+dirty 1221.23 ms 1225.27 ms 4.03 ms

App size

Revision Plain With Sentry Diff
ea3e26e+dirty 3.41 MiB 4.58 MiB 1.17 MiB
80e4616+dirty 3.38 MiB 4.60 MiB 1.22 MiB
818a608+dirty 3.19 MiB 4.48 MiB 1.29 MiB
77061ed+dirty 3.19 MiB 4.54 MiB 1.36 MiB
bef3709+dirty 3.38 MiB 4.78 MiB 1.40 MiB
a206511+dirty 3.41 MiB 4.67 MiB 1.25 MiB
74979ac+dirty 3.38 MiB 4.60 MiB 1.22 MiB
a2bb688+dirty 3.19 MiB 4.56 MiB 1.37 MiB
8a868fe+dirty 3.38 MiB 4.60 MiB 1.22 MiB
d590428+dirty 3.38 MiB 4.78 MiB 1.39 MiB

Previous results on branch: antonis/fix-e2e-flakiness-combined

Startup times

Revision Plain With Sentry Diff
ca6a32c+dirty 1231.83 ms 1241.28 ms 9.45 ms
d34c279+dirty 1210.63 ms 1224.85 ms 14.22 ms
c96c5b7+dirty 1223.89 ms 1228.02 ms 4.13 ms
7e6fe7f+dirty 1211.67 ms 1210.47 ms -1.20 ms
b6f917a+dirty 1212.11 ms 1220.00 ms 7.89 ms

App size

Revision Plain With Sentry Diff
ca6a32c+dirty 3.38 MiB 4.73 MiB 1.35 MiB
d34c279+dirty 3.38 MiB 4.72 MiB 1.34 MiB
c96c5b7+dirty 3.38 MiB 4.73 MiB 1.35 MiB
7e6fe7f+dirty 3.38 MiB 4.73 MiB 1.35 MiB
b6f917a+dirty 3.38 MiB 4.72 MiB 1.34 MiB

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

Android (new) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 376.63 ms 426.27 ms 49.64 ms
Size 43.94 MiB 48.93 MiB 5.00 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
70250df+dirty 418.08 ms 480.84 ms 62.76 ms
8d89cc9+dirty 357.69 ms 415.79 ms 58.10 ms
1853710+dirty 360.67 ms 396.28 ms 35.61 ms
55b77fc+dirty 410.46 ms 414.11 ms 3.65 ms
69602ce+dirty 375.37 ms 405.28 ms 29.91 ms
c1573b3+dirty 355.65 ms 448.82 ms 93.17 ms
90afdd3+dirty 367.79 ms 404.84 ms 37.05 ms
955f2eb+dirty 388.13 ms 433.56 ms 45.44 ms
80e4616+dirty 427.31 ms 461.15 ms 33.84 ms
276d348+dirty 356.30 ms 405.27 ms 48.97 ms

App size

Revision Plain With Sentry Diff
70250df+dirty 43.94 MiB 48.91 MiB 4.97 MiB
8d89cc9+dirty 7.15 MiB 8.41 MiB 1.26 MiB
1853710+dirty 7.15 MiB 8.41 MiB 1.26 MiB
55b77fc+dirty 43.94 MiB 48.82 MiB 4.88 MiB
69602ce+dirty 7.15 MiB 8.41 MiB 1.26 MiB
c1573b3+dirty 7.15 MiB 8.42 MiB 1.27 MiB
90afdd3+dirty 7.15 MiB 8.43 MiB 1.28 MiB
955f2eb+dirty 7.15 MiB 8.42 MiB 1.27 MiB
80e4616+dirty 43.94 MiB 49.38 MiB 5.44 MiB
276d348+dirty 7.15 MiB 8.42 MiB 1.26 MiB

Previous results on branch: antonis/fix-e2e-flakiness-combined

Startup times

Revision Plain With Sentry Diff
7e6fe7f+dirty 477.04 ms 520.53 ms 43.49 ms
c96c5b7+dirty 380.02 ms 436.37 ms 56.35 ms
d34c279+dirty 422.73 ms 453.91 ms 31.18 ms
ca6a32c+dirty 413.82 ms 475.83 ms 62.02 ms
b6f917a+dirty 367.91 ms 412.94 ms 45.03 ms

App size

Revision Plain With Sentry Diff
7e6fe7f+dirty 43.94 MiB 48.93 MiB 5.00 MiB
c96c5b7+dirty 43.94 MiB 48.93 MiB 5.00 MiB
d34c279+dirty 43.94 MiB 49.18 MiB 5.24 MiB
ca6a32c+dirty 43.94 MiB 48.93 MiB 5.00 MiB
b6f917a+dirty 43.94 MiB 48.93 MiB 4.99 MiB

antonis and others added 5 commits March 19, 2026 16:54
Revert the relaxation of the HTTP spans check from >= 1 back to exactly 2.
The other fixes (simulator warm-up, wait_for_boot, clearState, per-flow
retries) address the actual flakiness; weakening this assertion would
permanently hide regressions where one tracing layer stops producing spans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@antonis antonis changed the title fix(ci): Fix E2E test flakiness on Cirrus Labs runners fix(ci): Fix E2E test flakiness Mar 24, 2026
- Add platform-check skip guard to warm-up steps in both e2e-v2 and
  sample-application workflows to avoid wasting CI time when skipped
- Write maestro debug logs to per-flow/per-attempt dirs so failed
  attempt logs are preserved for debugging
- Use path.parse() for flow name extraction
- Add empty results guard in cli.mjs
- Remove retry logic from sample app maestro.ts to avoid mock server
  envelope accumulation across retries (retries stay in cli.mjs only)
- Revert unrelated expo/app.json formatting change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@antonis
Copy link
Contributor Author

antonis commented Mar 24, 2026

@cursor review

@antonis
Copy link
Contributor Author

antonis commented Mar 24, 2026

@sentry review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

antonis and others added 9 commits March 24, 2026 13:04
…up flow

The first Maestro flow after simulator boot consistently fails because
the app isn't fully ready ("E2E Tests Ready" not visible). The Settings.app
warm-up didn't help because it only warms SpringBoard, not the XCTest
driver or the test app itself.

Replace both the Settings.app warm-up step (in e2e-v2 and sample-application
workflows) and the per-flow retry logic (in cli.mjs) with a dedicated
Maestro warm-up flow that launches the actual test app and waits for
"E2E Tests Ready" before running the real test suite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Now that the warm-up flow eliminates the first-launch flakiness, remove
the per-flow orchestration, results tracking, and summary logging that
were only needed for retry support. The only change vs main is the
warm-up flow call before the existing `maestro test maestro` command.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Running all flows via `maestro test maestro` shares a single runner
session. When crash.yml kills the app, Maestro's XCTest driver loses
the connection and subsequent flows fail with "App crashed or stopped".

Run each flow in its own maestro process instead. This is the minimal
change needed — no retries, no summary, just per-flow isolation with
a warm-up flow before the test suite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After nativeCrash() the app process is dead but Maestro's XCTest driver
may be in a bad state. Adding killApp before launchApp with clearState
ensures a clean restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The post-crash launchApp + assertTestReady was unreliable on Tart VMs
because Maestro's XCTest driver gets into a bad state after nativeCrash().
Since each flow now runs in its own maestro process, the next flow starts
fresh regardless — the post-crash recovery is unnecessary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The warm-up flow helps but doesn't fully eliminate first-launch timing
issues on Cirrus Labs Tart VMs. On especially slow boots, the warm-up
itself can fail and the first few flows fail before the simulator
stabilises. Retry each flow up to 3 times to handle this.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The sample-application workflow uses its own test runner (maestro.ts)
and doesn't go through cli.mjs, so it doesn't benefit from the Maestro
warm-up flow. Restore the Settings.app warm-up step with the proper
platform-check skip guard to keep it resilient against Tart VM timing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Maestro warm-up flow alone isn't reliable enough — it depends on
Maestro's XCTest driver which itself needs the simulator to be ready.
Add a Settings.app warm-up step to e2e-v2.yml (matching sample-application)
to warm up OS-level services first, then the Maestro warm-up flow handles
the XCTest driver and test app.

Remove per-flow retries — the layered warm-up approach (Settings.app +
Maestro warm-up flow) should be sufficient.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Triggers the full CI test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant