Skip to content

chore: adjusting resources for e2e test + adding strict_scheduling#22395

Closed
mrzeszutko wants to merge 1 commit intomerge-train/spartanfrom
mr/e2e-test-resources
Closed

chore: adjusting resources for e2e test + adding strict_scheduling#22395
mrzeszutko wants to merge 1 commit intomerge-train/spartanfrom
mr/e2e-test-resources

Conversation

@mrzeszutko
Copy link
Copy Markdown
Contributor

@mrzeszutko mrzeszutko commented Apr 8, 2026

Summary

  • Increase CPU/memory allocation for E2E tests that spin up multiple nodes, prover nodes, or validators
  • Enable strict (CPU-aware) scheduling for E2E test execution to prevent oversubscription

Problem

All E2E tests currently run with the default CPUS=2, MEM=8g, regardless of how many processes they start. A standard test (1 Anvil + 1 Aztec Node + 1 PXE = 3 processes) fits comfortably in 2 CPUs. But P2P tests spin up 6-10+ full validator nodes, epoch tests run multiple validators plus prover nodes, and fee tests always start a prover node — all squeezed into the same 2 CPUs.

Meanwhile, GNU parallel runs up to 64 jobs concurrently (num_cpus / 2) without awareness of per-test CPU needs. It happily runs 64 tests each claiming 2 CPUs even when some actually need 6-8. This causes CPU starvation, missed sequencer timing windows, and flaky timeout failures.

Changes

1. Per-test resource overrides (get_test_resources)

Added a function that maps test file paths to appropriate CPUS/MEM based on the number of processes each test actually runs. The values were chosen by analyzing the source code of each test to count processes at peak load:

Category Tests CPUS MEM Processes at peak Examples
Standard (1 node) ~70 2 (default) 8g AN + AZ + PXE = 3 e2e_token_contract/*, e2e_deploy_contract/*
Single node + prover ~15 3 12g AN + AZ + PN + PXE = 4 e2e_fees/*, e2e_simple, e2e_epochs/epochs_multiple
Multi-validator (3-4 nodes) ~4 4 16g AN + 3-4 AZ + PXE = 5-6 e2e_epochs/epochs_simple_block_building, epochs_multi_proof
P2P medium (4 validators, no prover) ~10 4 16g AN + BS + 4 AZ = 6 e2e_p2p/duplicate_proposal_slash, rediscovery
Multi-validator + prover ~12 6 24g AN + 4-6 AZ + PN = 7-9 e2e_p2p/gossip_network, epochs_mbps*, reqresp/*
Extremely heavy 2 8 32g 10-13 processes e2e_p2p/preferred_gossip_network, add_rollup
Prover full fake 1 3 12g AN + AZ + PN + PXE = 4 e2e_prover/full (non-CI_FULL)
Prover full real 1 16 96g (unchanged) e2e_prover/full (CI_FULL)

Process abbreviations: AN = Anvil, AZ = Aztec Node (sequencer + archiver + world-state + P2P + validator), PN = Prover Node, PXE = Private eXecution Environment, BS = Bootstrap Node.

Memory formula: MEM = CPUS * 4g (same ratio as the existing default of CPUS=2/MEM=8g), which provides headroom for each process's heap, native code, and world-state trees.

2. Strict scheduling for standalone E2E test runs

Changed the test (and test_and_collect_avm_inputs) function in yarn-project/end-to-end/bootstrap.sh to use STRICT_SCHEDULING=1. This only affects standalone execution of E2E tests — i.e., when running ./bootstrap.sh test directly (e.g., grind runs, local dev, or any workflow that invokes the test function). It does not affect normal CI runs, where the Makefile calls test_cmds and feeds commands to the test engine, which uses plain parallelize without strict scheduling.

The strict scheduler:

  • Tracks available CPU cores with a semaphore
  • Only starts a test when enough cores are free to satisfy its CPUS requirement
  • Pins each test to specific cores via CPU_LIST / taskset

This prevents oversubscription in standalone runs: with 128 cores and tests requesting 2-8 CPUs each, the scheduler naturally limits concurrency to ~30-40 tests instead of the previous 64, with each test getting the cores it actually needs.

This is the same scheduler already used by benchmarks (bench function).

@mrzeszutko mrzeszutko added the ci-full Run all master checks. label Apr 8, 2026
@mrzeszutko
Copy link
Copy Markdown
Contributor Author

After doing some more tests this week (results will be shared separately) this does not solve the underlying issues with running e2e tests with libP2P upgrade to v2.

@mrzeszutko mrzeszutko closed this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-full Run all master checks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant