Conversation
Fixes [A-657](https://linear.app/aztec-labs/issue/A-657/ensure-bots-are-funded-on-deployment) ## Summary - **Early refuel on restart**: `ensureFeeJuiceBalance` runs before setup when token already exists (`setupTokenWithOptionalEarlyRefuel`, `setupTokenContractWithOptionalEarlyRefuel`) - **Bridge claim for low-balance deploys**: When balance < 100 FJ, use bridge claim for deploys in `setupToken` and `registerOrDeployContract` so setup doesn't fail before refuel - **Constants**: Hardcode threshold 100 FJ, target 10k FJ - **L1**: Raise FeeAssetHandler initial mint from 1000 to 10000
BatchQueue.stop() was not flushing the current in-progress batch before ending the container queue. Any items accumulated below maxBatchSize whose timer hadn't fired were silently dropped. In KVBrokerDatabase, this could lose proving job writes on graceful shutdown, requiring unnecessary re-computation on restart. Co-authored-by: danielntmd <danielntmd@nethermind.io> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mption on error message (#22247) Read JSON body then parse to avoid double stream consumption on error message. Co-authored-by: danielntmd <danielntmd@nethermind.io>
## Summary Fixes flakiness in `p2p_client.proposal_tx_collector.bench.test.ts` caused by three compounding issues: 1. **`chunkTxHashesRequest` defaulted to chunkSize=1**, creating 500 individual libp2p streams for the 500-tx `send-batch-request` case. The rapid stream churn overwhelms the connection, causing EPIPE cascades that kill the muxer. Bumped to chunkSize=8 as the existing TODO indicated. 2. **Peer scores persisted between benchmark cases**, so hundreds of HighToleranceError penalties from EPIPE failures in one case degraded peer selection in subsequent cases. Added `PeerScoring.resetAllScores()` and called it in the worker before each benchmark run. 3. **No connectivity check between cases**, so degraded connections from a previous case could silently affect the next. Added `waitForConnectivity()` to verify the aggregator has 80% of expected peers before each case starts. Full analysis with CI log evidence: https://gist.github.com/AztecBot/e5af3238fbfefc29c51de2ee5deaa8ea ## Changes - `protocols/tx.ts`: Change `chunkTxHashesRequest` default chunkSize from 1 to 8 - `peer_scoring.ts`: Add `resetAllScores()` method - `p2p_client_testbench_worker.ts`: Reset peer scores before each bench case, add `GET_PEER_COUNT` IPC command - `worker_client_manager.ts`: Add `waitForConnectivity()` and `getPeerCount()` methods - `p2p_client.proposal_tx_collector.bench.test.ts`: Check connectivity in `beforeEach` ClaudeBox log: https://claudebox.work/s/38590d3cfe6a7000?run=2 Co-authored-by: PhilWindle <60546371+PhilWindle@users.noreply.github.com>
To avoid incorrect genesis root when deploying with 4.1 contracts.
Removes an env var that wasn't being used anymore.
urlJoin now properly appends single characters. Co-authored-by: danielntmd <danielntmd@nethermind.io>
## Summary Extracted from #22329 — adds support for running a different docker image on HA validator nodes. - Adds `VALIDATOR_HA_DOCKER_IMAGE` terraform variable that, when set, overrides the image for HA validator releases (idx > 0) - Passes the variable through `deploy_network.sh` tfvars generation - Adds optional `ha_docker_image` input to the `deploy-network` workflow (both `workflow_call` and `workflow_dispatch`). When unset, HA nodes use the regular aztec docker image. **Note:** The workflow change is in `.github-new/workflows/deploy-network.yml` — please move it to `.github/workflows/` before merging (ci-allow was requested but not detected by the sidecar). ## How it works - If `VALIDATOR_HA_DOCKER_IMAGE` is empty (default), all validator releases use `AZTEC_DOCKER_IMAGE` - If set, HA releases (idx > 0) get their `global.aztecImage.repository` and `global.aztecImage.tag` overridden - The primary validator release (idx=0) always uses `AZTEC_DOCKER_IMAGE` ## Test plan - [ ] Verify terraform plan with `VALIDATOR_HA_DOCKER_IMAGE` unset — no change to existing behavior - [ ] Verify terraform plan with `VALIDATOR_HA_DOCKER_IMAGE` set — HA releases use the override image - [ ] Move `.github-new/workflows/deploy-network.yml` to `.github/workflows/deploy-network.yml` and test workflow dispatch with `ha_docker_image` parameter ClaudeBox log: https://claudebox.work/s/c73d93309f2bbc88?run=1
## Summary Decommissions the v4-devnet-2 network by removing its references across the repo. The generic devnet infrastructure (workflows, scripts, env templates) is preserved for future devnet iterations. ClaudeBox log: https://claudebox.work/s/e8707a3b2ea53bf3?run=1
## Summary The `#team-alpha` Slack channel was renamed to `#e-team-alpha`. This updates all references across CI scripts and docs: - `ci3/merge_train_failure_slack_notify` — merge train failure notifications - `ci3/network_healthcheck` — network healthcheck dispatches - `scripts/socket-fix-ci.sh` — Socket vulnerability notifications - `spartan/testnet-runbook.md` — testnet runbook docs - `.claude/skills/merge-trains/SKILL.md` — merge train skill reference ## Test plan - Verify merge train failures for `merge-train/spartan` post to `#e-team-alpha` - Verify network healthcheck posts to `#e-team-alpha` ClaudeBox log: https://claudebox.work/s/0cbf19c4f8f32780?run=1
## Overview Revert the pending chain whenever a pipeline does not get checkpointed onto L1
## Summary Makes validator and HA validator pod counts independently configurable. Adds two new optional variables: - **`VALIDATOR_PRIMARY_REPLICA_COUNT`**: Override pod count for the primary validator release (defaults to `VALIDATOR_REPLICAS`) - **`VALIDATOR_HA_REPLICA_COUNT`**: Override pod count for HA validator releases (defaults to `VALIDATOR_REPLICAS`) `VALIDATOR_REPLICAS` remains the canonical "node slot count" used for key derivation and publisher key stride. The new variables only affect how many pods each release runs. ## staging-public configuration Sets staging-public to run **2 primary validators + 4 HA validators**: ``` VALIDATOR_REPLICAS=4 # 4 node slots (256 attesters) VALIDATOR_PRIMARY_REPLICA_COUNT=2 # 2 primary pods VALIDATOR_HA_REPLICAS=1 # 1 HA release VALIDATOR_HA_REPLICA_COUNT=4 # 4 HA pods ``` This means attester slots 0-1 are served by both primary and HA, while slots 2-3 are served only by HA nodes. When HA runs a different image (via `VALIDATOR_HA_DOCKER_IMAGE`), this forces mixed-version consensus. ## Changes - **`variables.tf`**: Added `VALIDATOR_PRIMARY_REPLICA_COUNT` and `VALIDATOR_HA_REPLICA_COUNT` (both `number`, default `null`) - **`main.tf`**: Moved `validator.replicaCount` from shared settings to per-release, using `coalesce()` to fall back to `VALIDATOR_REPLICAS` - **`deploy_network.sh`**: Passes both new variables through env → tfvars - **`staging-public.env`**: Set to 2 primary + 4 HA
## Summary Fixes timeout in "prunes uncheckpointed blocks when proposer fails to deliver" test that was blocking merge-train/spartan from merging into next. The prune detection timeout of `L2_SLOT_DURATION_IN_S * 3` (108s) is exactly at the worst-case boundary. After `executeTimeout` starts, three things must happen sequentially: 1. Wait for the proposer's slot to arrive (up to 1 L2 slot = 36s) 2. Proposer builds blocks and skips publishing (during the slot) 3. L1 advances past the slot boundary so the archiver detects the prune (up to 1 L2 slot = 36s) In CI, total was 119s — exceeding the 108s timeout by 11s. The prune event did fire (on validator-4 at 08:00:37), but 11 seconds after the timeout had already triggered teardown and stopped all archivers. Increased to `L2_SLOT_DURATION_IN_S * 5` (180s) for comfortable margin. ClaudeBox log: https://claudebox.work/s/94ade5ccbe68dc27?run=1
…d correct lag (#22204) ## Motivation PR #22153 introduced a hard "finalized block guard" that refuses to compute committees if L1 data isn't finalized. While the safety goal is valid (preventing L1 reorgs from invalidating cached committees), it breaks many tests that don't properly set L1 finalized time and would cause the chain to stall if L1 stops finalizing. This PR takes a different approach that preserves safety while maintaining liveness. Also fixes the lag parameter: the old code used `lagInEpochsForValidatorSet` (the looser constraint) instead of `lagInEpochsForRandao` (the binding one), and computed the sampling timestamp from the slot rather than the epoch start. Fixes A-680 ## Approach Instead of refusing to serve committee data that isn't finalized, use a TTL-based cache: finalized entries are cached permanently, non-finalized entries expire after one Ethereum slot (12s) and get re-fetched from L1. The cache map stores both resolved entries and in-flight promises directly, so concurrent callers for the same epoch coalesce on a single L1 query. On fetch failure, the previous stale entry is restored so the next caller retries cleanly. ## Changes - **epoch-cache**: Replaced the simple `Map<EpochNumber, EpochCommitteeInfo>` cache with `Map<EpochNumber, CachedEpochEntry | Promise<CachedEpochEntry>>`. Each resolved entry carries L1 block provenance metadata (number, hash, timestamp) and a `finalized` flag. Switched from `lagInEpochsForValidatorSet` to `lagInEpochsForRandao` and compute sampling timestamp from epoch start via `getStartTimestampForEpoch`. Simplified `isEscapeHatchOpen` to delegate cache management to `getCommittee`. - **epoch-cache (tests)**: Updated unit tests for the new cache structure. Added 4 new TTL tests: re-query after TTL, no re-query for finalized, concurrent coalescing, eventual finalization promotion. - **epoch-cache (integration tests)**: New integration test suite against real Anvil with deployed L1 contracts and 4 validators. Tests finalized committee retrieval, non-finalized TTL refresh, and cache re-fetch after L1 reorg. - **epoch-cache (README)**: Added comprehensive documentation covering committee computation, LAG values, RANDAO seed, proposer selection, escape hatch, TTL caching with finalization tracking, and configuration. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test was failing with: ``` 15:21:49 Error warping: InvalidParamsRpcError: Invalid parameters were provided to the RPC method. 15:21:49 Double check you have provided the correct parameters. 15:21:49 15:21:49 URL: http://127.0.0.1:8545/ 15:21:49 Request body: {"method":"evm_setNextBlockTimestamp","params":[1775575740]} 15:21:49 15:21:49 Details: Timestamp error: 1775575740 is lower than previous block's timestamp 15:21:49 Version: viem@2.38.2 15:21:49 15:21:49 250 | } 15:21:49 251 | } catch (err) { 15:21:49 > 252 | throw new Error(`Error warping: ${err}`); 15:21:49 | ^ 15:21:49 253 | } finally{ 15:21:49 254 | // Restore interval mining so the next block is mined in `blockInterval` seconds from this one 15:21:49 255 | if (opts.resetBlockInterval && blockInterval !== null && blockInterval > 0) { 15:21:49 15:21:49 at EthCheatCodes.warp (../../ethereum/dest/test/eth_cheat_codes.js:252:19) 15:21:49 at Object.<anonymous> (e2e_epochs/epochs_ha_sync.test.ts:157:5) ```
…22401) ## Summary The "prunes uncheckpointed blocks when proposer fails to deliver" test in `epochs_mbps.pipeline.parallel.test.ts` consistently times out in CI, even after the timeout bump in #22392. Adding `skip: true` in `.test_patterns.yml` to unblock the merge train while the underlying timing issue is investigated. ClaudeBox log: https://claudebox.work/s/94ade5ccbe68dc27?run=3
## Motivation The empire slashing model was developed during an earlier iteration but never tested in a real network. The tally model is the only one in production use. The empire code adds ~5000 lines of unnecessary complexity across L1 contracts, TypeScript, deployment scripts, and configuration with no benefit. Fixes A-670 ## Approach Removed all empire-slasher-specific code while preserving shared governance infrastructure (`EmpireBase`, `IEmpire`) that the `GovernanceProposer` depends on. Also removed the `SlashFactory` periphery contract (empire-only), the `slashMinPenaltyPercentage`/`slashMaxPenaltyPercentage` config fields (empire-only), replaced the `SlasherFlavor` enum with a simple `slasherEnabled` boolean, and renamed all `TallySlashingProposer`/`TallySlasherClient` types to `SlashingProposer`/`SlasherClient` since "tally" is now redundant. ## Breaking changes - **Env var**: `AZTEC_SLASHER_FLAVOR` (string `"tally"|"none"`) replaced with `AZTEC_SLASHER_ENABLED` (boolean) - **Env vars removed**: `SLASH_MIN_PENALTY_PERCENTAGE`, `SLASH_MAX_PENALTY_PERCENTAGE` - **Node admin API**: `getSlashPayloads()` method removed - **Deploy outputs**: `slashFactoryAddress` removed - **L1 contracts**: `SlasherFlavor` enum removed; `RollupConfigInput.slasherFlavor` replaced with `slasherEnabled: bool`; `TallySlashingProposer` renamed to `SlashingProposer`; `SlashFactory` contract removed - **TS config**: `slasherFlavor: 'tally' | 'none'` replaced with `slasherEnabled: boolean`; `slashMinPenaltyPercentage`/`slashMaxPenaltyPercentage` removed from `SlasherConfig` ## Changes - **l1-contracts**: Deleted `EmpireSlashingProposer.sol`, `EmpireSlasherDeploymentExtLib.sol`, `SlashFactory.sol`, `ISlashFactory.sol`, and empire slashing tests. Replaced `SlasherFlavor` enum with `bool slasherEnabled`. Renamed `TallySlashingProposer` to `SlashingProposer` and `TallySlasherDeploymentExtLib` to `SlasherDeploymentExtLib`. - **slasher**: Deleted `EmpireSlasherClient`, `SlasherPayloadsStore`, and all empire helpers. Removed `getSlashPayloads()` from the interface. Renamed `TallySlasherClient` to `SlasherClient`. Updated README. - **stdlib**: Deleted empire slashing helpers, `SlashFactoryContract` wrapper, `SlashPayload`/`SlashPayloadRound` types. Removed empire action variants from `ProposerSlashAction`. Removed `slashMinPenaltyPercentage`/`slashMaxPenaltyPercentage` from `SlasherConfig`. - **ethereum**: Deleted `EmpireSlashingProposerContract` and `SlashFactory` artifacts. Removed `slashFactoryAddress` from `L1ContractAddresses`. Replaced `slasherFlavor` with `slasherEnabled: boolean`. Renamed `TallySlashingProposerContract` to `SlashingProposerContract`. - **sequencer-client**: Removed empire action handling from `SequencerPublisher`. Removed `SlashFactoryContract`. Updated types to use renamed slasher classes. - **end-to-end**: Removed empire branches from slash tests and `slash_veto_demo`. Removed `SlashFactory` from test setup. - **spartan/terraform**: Replaced `AZTEC_SLASHER_FLAVOR` with `AZTEC_SLASHER_ENABLED`. Removed `SLASH_MIN_PENALTY_PERCENTAGE`/`SLASH_MAX_PENALTY_PERCENTAGE` and `slashFactoryAddress` from helm, terraform, and deploy scripts. - **cli, archiver, aztec-node, pxe**: Removed `slashFactoryAddress` and `getSlashPayloads` references. - **docs**: Added migration notes for breaking changes. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary - Replace loose equality (`==`/`!=`) with strict equality (`===`/`!==`) in `world_state_ops_queue.ts` - 5 instances in concurrency-critical queue dispatch logic Fixes A-733 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary - Remove the dead BLOCK req/resp sub-protocol from the P2P layer - The protocol was fully wired up (handler, validator, rate limits, metrics) but never called as a client — no production code ever requested blocks from peers via this protocol - The validator was also broken: it rejected responses when the local node didn't have the block, which is the exact scenario where you'd want to request it - Removes enum member, protocol constant, handler, validator, rate limits, metrics entry, mock helpers, and all associated tests - 9 files changed, ~215 lines deleted Fixes A-860 --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Motivation
The last block in a checkpoint was being synced to the archiver _before_
passing through the HA signing gate. If HA signing failed afterwards,
the archiver would be polluted with a block that never made it on-chain.
## Approach
Reorder the block-building loop so that block proposal signing and
archiver sync happen in the same order for all blocks (sign first, sync
second), not just for non-last blocks. Pass the already-signed
`BlockProposal` through to `CheckpointProposal.createProposalFromSigner`
instead of raw block data.
Unrelated to the above, we replaced `undefined` returns from
`buildSingleBlock` with typed `{ failure: 'insufficient-txs' |
'insufficient-valid-txs' }` objects, so the failure reason is made
explicit.
## Changes
- **sequencer-client**: Reorder block-building loop so signing happens
before archiver sync for all blocks (including last). Simplify
`blockPendingBroadcast` to just `BlockProposal | undefined`. Return
typed failure objects from `buildSingleBlock` instead of `undefined`.
Update caller to check `'failure' in buildResult`.
- **stdlib**: `CheckpointProposal.createProposalFromSigner` accepts
`BlockProposal | undefined` instead of `CheckpointLastBlockData`. Remove
`CreateCheckpointProposalLastBlockData` type. Update
`makeCheckpointProposal` test helper to create a real `BlockProposal`
via `makeBlockProposal`.
- **validator-client**: Update `Validator` interface, `ValidatorClient`,
and `ValidationService` to pass `BlockProposal` instead of raw block
data for checkpoint proposal creation.
- **tests**: Update assertions from `toBeUndefined()` to `toEqual({
failure: ... })`, update signing-context test to reflect that only
checkpoint signing happens in `createCheckpointProposal`.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…#22359) Retry for #22201 ## Motivation The genesis block has `timestamp=0`, which forces a block 1 special case for transaction expiration validation. Transactions anchored to genesis get an expiration clamped to `0 + MAX_TX_LIFETIME` (~86400 = Jan 2 1970), making them impossible to include after block 1. This complicates e2e test setup by requiring empty block 1 mining and `minTxsPerBlock` manipulation. ## Approach Adds a `genesisTimestamp` parameter that flows through the full world state stack (C++ → NAPI → TypeScript), allowing the genesis block header to have a non-zero timestamp. Introduces a `GenesisData` type that bundles `prefilledPublicData` and `genesisTimestamp`, replacing the two separate parameters that were threaded everywhere. The e2e setup automatically passes the current time as the genesis timestamp. ## Changes - **New `GenesisData` type** (`stdlib/src/world-state/genesis_data.ts`) — bundles `prefilledPublicData` and `genesisTimestamp` into a single type - **C++ world state** — accepts `genesis_timestamp` parameter, uses it in the genesis block header hash - **TS world state stack** — `NativeWorldState`, `NativeWorldStateService`, factory, and synchronizer all take `GenesisData` - **Node/CLI** — `AztecNodeService.createAndSync`, `createAztecNode`, `start_node.ts`, `standby.ts` all use `GenesisData` - **E2e setup** — passes `genesisTimestamp: Date.now()` to genesis values; block 1 wait logic preserved - **~30 e2e/p2p test files** — `prefilledPublicData` references replaced with `genesis` - **New e2e test** — `e2e_genesis_timestamp.test.ts` verifies genesis-anchored txs work after block 1 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary - Replace `Uint32Value()` with `Int64Value()` when reading LMDB map sizes from JS - Affects `lmdb_store_wrapper.cpp` and `world_state.cpp` — 3 call sites - `Uint32Value()` silently truncates values above 4 GiB; `Int64Value()` is the widest integer accessor available in N-API Fixes A-849 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR to simply reduce an annoying logline to debug.
## Motivation When a transaction fails validation (e.g. insufficient balance, gas limits out of bounds), the error returned to the client is a static string like "Insufficient fee payer balance" with no context. Users have to guess what went wrong. The validators already have the relevant values (balance, fee limit, gas limits) and log them server-side, but don't include them in the error reason returned to the client. ## Approach Append contextual values to each error reason string in `GasTxValidator` and `GasLimitsValidator`. The base error constant is preserved as a prefix so existing substring-based assertions (e2e tests using `toThrow`) continue to work. Refactored `#shouldSkip` to `#getSkipReason` so the fee-per-gas skip path can also include values. ## Changes - **p2p (gas_validator.ts)**: All five validation error paths now include values in the reason string: fee payer balance (`required`/`available`), gas limits (`required`/`got` or `limit`/`max`), and fee per gas (`maxFee`/`required`) - **p2p (gas_validator.test.ts)**: Updated test helpers and assertions to use substring matching (`toContain`/`stringContaining`) since error messages now have dynamic suffixes Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BEGIN_COMMIT_OVERRIDE
chore: fix mempool limit test (#22332)
fix(bot): bot fee juice funding (#21949)
fix(foundation): flush current batch on BatchQueue.stop() (#22341)
chore: (A-750) read JSON body then parse to avoid double stream consumption on error message (#22247)
chore: bump log level in stg-public (#22354)
chore: fix main.tf syntax (#22356)
chore: wire up spartan checks to make (#22358)
fix(p2p): reduce flakiness in proposal tx collector benchmark (#22240)
fix: disable sponsored fpc and test accounts for devnet (#22331)
chore: add v4-devnet-3 to tf network ingress (#22327)
chore: remove unused env var (#22365)
chore: add pdb (#22364)
chore: dispatch CB on failed deployments (#22367)
chore: (A-749) single character url join (#22269)
feat: support different docker image for HA validator nodes (#22371)
chore: fix the daily healthchecks (#22373)
chore: remove v4-devnet-2 references (#22372)
fix: rename #team-alpha → #e-team-alpha slack channel (#22374)
chore(pipeline): timetable adjustments under pipelining (#21076)
feat(pipeline): handle pipeline prunes (#21250)
fix: handle error types serialization errors (#22379)
feat(spartan): configurable HA validator replica count (#22384)
fix(e2e): increase prune timeout in epochs_mbps_pipeline test (#22392)
fix(epoch-cache): use TTL-based caching with finalization tracking and correct lag (#22204)
chore: deflake e2e ha sync test (#22403)
chore(ci): skip prunes-uncheckpointed test in epochs_mbps_pipeline (#22401)
refactor(slasher): remove empire slasher model (#21830)
fix: use strict equality in world-state ops queue (#22398)
fix: remove unused BLOCK reqresp sub-protocol (#22407)
refactor(sequencer): sign last block before archiver sync (#22117)
feat(world-state): add genesis timestamp support and GenesisData type (#22359)
fix: use Int64Value instead of Uint32Value for 64-bit map sizes (#22400)
chore: Reduce logging verbosity (#22423)
fix(p2p): include values in tx validation error messages (#22422)
END_COMMIT_OVERRIDE