[codex] Optimize Terra stepping and reset accounting; expose masks by Idate96 · Pull Request #34 · leggedrobotics/terra

Idate96 · 2026-05-15T08:00:44Z

Summary

This PR contains the Terra environment side of the fast-reset, timeout-accounting, and diagnostic action-mask work for multi-agent.

factors env stepping into step_no_reset so batched training can avoid composing full resets unless an env is actually done
preserves info["final_observation"] through auto-reset so PPO can bootstrap max-step truncations from the pre-reset observation
adds info["timeout_done"] and observation/info["episode_progress"]
gates terminal success reward on true task_done, not time-limit timeout
keeps reachability recomputation gated to effective terrain-changing DO actions
exposes coarse action_mask and edge-progress features through observations/info for diagnostics and optional probes
keeps reset info pytree structure aligned with step outputs

Important framing: the paired terra-baselines PR now keeps PPO action masking disabled by default. Env-side action_mask remains useful as an observation/debug signal, for masked-vs-unmasked diagnostic scripts, and for future robot-side safety checks, but the normal train/eval actor path is unmasked.

Why

The old path mixed true task terminals and max-step truncations too much for PPO accounting. Synchronized timeouts can spike value loss and explained variance, and bootstrapping from the reset observation is the wrong target for a time-limit truncation. This branch gives the PPO side explicit reset/timeout information while keeping task success semantics separate from horizon expiry.

The paired terra-baselines branch uses this to:

bootstrap timeouts from final_observation
stop GAE at reset boundaries
randomize initial episode ages so timeout phases are not synchronized
log bounded first-episode eval success separately from legacy successes-per-env counters
train/evaluate with an unmasked actor by construction while retaining masks for diagnostics
feed edge/progress affordances to the critic only in the ResMap setup

Current Baseline And Learning Evidence

The clean reference run is:

terra-clean-multiagent-4x4090-autotune0-euler-pr-2026-05-13-19-49-55

Local stochastic rollout probe shape for the comparison below: solo_excavator, 32 envs, 550 max steps, seeds 0 1 2 3, first episode only, stochastic actions.

checkpoint/run	mode	success	avg return	policy entropy	notes
clean baseline `terra-clean-multiagent-4x4090-autotune0-euler-pr-2026-05-13-19-49-55.pkl`	unmasked	48/128 = 37.5%	3.956	0.035	per-seed successes: 14, 13, 8, 13; 278 invalid sampled actions
masked run `terra-mask-multiagent-4gpu-online-euler-pr-2026-05-15-00-50-06` / W&B `ti3k3tdp`	masked	37/128 = 28.9%	2.983	0.116	invalid sampled actions: 0; W&B throughput about 95k FPS
ResMap terminal-fix run W&B `04e8dada`	unmasked	76/128 = 59.4%	5.484	0.143	per-seed successes: 19, 21, 17, 19; W&B throughput about 19k FPS

Caveats:

These are local rollout probes, not a full eval sweep.
The masked result above is the latest synced checkpoint, not the best checkpoint from that run. An earlier local sync of the same masked run produced 50/128 = 39.1%, roughly tied with the clean baseline, with lower entropy (0.035 vs 0.116) and less remaining dig (0.087 vs 0.131). W&B also shows the old count-style masked eval peaking around step 73,200 before declining toward the latest checkpoint.
Existing online W&B jobs still use the old in-memory eval counter where eval/success_rate can mean successes per env and can exceed 1. The paired baselines branch fixes future logs so bounded first-episode success is eval/success_rate and the old count-style quantity is logged separately as eval/successes_per_env.
The ResMap run changes architecture and capacity, so it is not a pure reset-only ablation. The clean run above is the baseline reference for policy quality.

Validation

Validated from the paired local trees:

PYTHONPATH=/home/lorenzo/moleworks/terra_mask_wip:/home/lorenzo/moleworks/terra-baselines_mask_wip

python -m py_compile terra/env.py terra/state.py
git diff --check
scripts/validation/validate_edge_mask_changes.py --case all --jax-platforms cpu --dataset-path /home/lorenzo/moleworks/terra_data/train --dataset-size 1

The full validation sweep passed:

PPO mask logits
multi-device training accounting and reset_prepared wiring
model policy input compatibility
compact reward logging
edge/no-mask model shape
critic affordance shapes
checkpoint config restore
timeout value bootstrap from final_observation
initial episode progress randomization
GAE timeout bootstrap without reset-episode leakage
state action mask and step dispatch
synthetic and dataset-backed env action masks
batched fast reset parity
step_no_reset parity with TerraEnv.step
episode progress plus final pre-reset observation

SawneyX · 2026-05-21T09:24:33Z

i think the last commit is faulty, because the line metadata and the agent position use different conventions.

metadata: x = column and y = row.

agent:
current_pos[0] = row / y
current_pos[1] = col / x

so line_dist = jnp.abs(abc[0] * current_pos[1] + abc[1] * current_pos[0] + abc[2]) / denom was correct.

This reverts commit 918c29f.

Idate96 · 2026-05-21T11:12:17Z

Thanks @SawneyX, you were right. I had trusted the x/y names too much; pos_base is used as [row, col], while the metadata line is A*x + B*y + C with x=col and y=row. I reverted the faulty commit and pushed da3a54d4, which keeps the original formula but makes the row/col <-> x/y conversion explicit and adds a regression for the exact counterexample plus the cabin-angle rejection.

Idate96 marked this pull request as ready for review May 15, 2026 10:35

Idate96 changed the title ~~[codex] Optimize Terra stepping and expose coarse masks~~ [codex] Optimize Terra stepping, masks, and reset accounting May 18, 2026

Idate96 added 2 commits May 18, 2026 10:46

Optimize Terra env stepping and expose coarse masks

41a2310

Add reset timeout accounting to Terra env

3398456

Idate96 force-pushed the codex/mask-speedup-wip branch from 175a629 to 3398456 Compare May 18, 2026 08:58

Idate96 changed the title ~~[codex] Optimize Terra stepping, masks, and reset accounting~~ [codex] Optimize Terra stepping and reset accounting; expose masks May 18, 2026

Fix foundation border proximity coordinates

918c29f

Idate96 added 2 commits May 21, 2026 12:31

Revert "Fix foundation border proximity coordinates"

0055b27

This reverts commit 918c29f.

Clarify foundation border coordinate conventions

da3a54d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Optimize Terra stepping and reset accounting; expose masks#34

[codex] Optimize Terra stepping and reset accounting; expose masks#34
Idate96 wants to merge 5 commits into
multi-agentfrom
codex/mask-speedup-wip

Idate96 commented May 15, 2026 •

edited

Loading

Uh oh!

SawneyX commented May 21, 2026

Uh oh!

Idate96 commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Idate96 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Current Baseline And Learning Evidence

Validation

Uh oh!

SawneyX commented May 21, 2026

Uh oh!

Idate96 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Idate96 commented May 15, 2026 •

edited

Loading

Idate96 commented May 21, 2026 •

edited

Loading