Conversation
The chordal allocator's `try_reuse_stack_slot` used block-level liveness (`pseudos_interfere`) to decide whether a freed slot's previous owner could share with a new candidate. That check walks `live_in` / `live_out` looking for any block where both pseudos appear — but a register pseudo whose lifetime sits entirely inside one block has empty `live_in` / `live_out` projections, so the check returns false even when the intervals genuinely overlap inside that block. Concrete symptom: CPython's `PyThread_acquire_lock_timed` (in `Python/thread.o`) miscompiled. Slot -64(%rbp) was assigned to the `thelock` Sym pseudo; then to a spilled value loaded from that slot whose lifetime extended past the next call; then later overwritten by the `_PyTime_Add` return value at a point where the spilled value was still live. The subsequent `sem_clockwait` call read -64(%rbp) expecting the lock pointer and got a timestamp instead. Bisected per-TU on a `_freeze_module` build. Fix: replace the block-level interference check with direct interval overlap. Two intervals `[a.start, a.end]` and `[b.start, b.end]` are non-overlapping iff `a.end < b.start || b.end < a.start`. `FreeSlot` carries the previous owner's `LiveInterval`; `try_reuse_stack_slot` takes the candidate's interval. The check now matches what linear scan's monotonic expire-by-position guaranteed, but works for the chordal allocator's non-monotonic spill commit order. `pseudos_interfere` is removed — it was the only consumer of the block-level check, and it would only ever miss conflicts that interval overlap catches correctly. Per-arch wrappers updated to thread `&LiveInterval` (candidate) instead of `PseudoId + live_in/live_out`. CPython progress: pre-fix, 9 Python/*.o TUs miscompile on their own. Post-fix, only 2 (Python/_warnings.o, Python/flowgraph.o) — those are separate bugs unrelated to slot reuse. Memory note `bug_cpython_baseline_regression.md` tracks remaining work. Verified: - cc test suite 914+948+204+ pass - cargo build / clippy / fmt clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The IR-level live interval of a Sym pseudo only captures its direct Store/Load/SymAddr uses — NOT the lifetime of register pseudos that derive their values from the slot. Linear scan allocated slot offsets monotonically and so never reused a Sym slot in conflicting ways by accident; chordal coloring exposes the gap. CPython `_warnings.o::init_filters` and `flowgraph.o::_PyCfgBuilder_Addop` both miscompiled on the Sym-slot reuse pattern even after the preceding interval-overlap fix (commit a687c1a). The interval-overlap check returns "no conflict" because the Sym's own IR interval ends well before the derived register pseudo's lifetime does; the slot is then handed to an unrelated pseudo while the derived value still reads from it. Conservative fix: never reuse Sym pseudo slots. `addr_taken_syms` remains computed and stays the correct gating predicate when a future fix adds proper derived-value-lifetime tracking to the Sym's interval, allowing reuse to be re-enabled. CPython progress with this fix: - Pre-fix: `_freeze_module` segfaulted before producing any output (predates M6+M7; rustc 1.96 changed pcc binary enough to expose it). - Post-fix: CPython builds cleanly; `python -c "import math; print(math.pi)"` returns the correct value; 24 of 487 test files run before hitting an unrelated calling-convention miscompile in `double_round` →`PyOS_snprintf`. Verified: - cc test suite 914+948+204 pass - cargo build / clippy / fmt clean Known remaining (separate investigation): - `assertAlmostEqual` triggers `double_round` passing a `double` bit-pattern where `PyOS_snprintf`'s `char *str` is expected. Likely variadic argument handling, not slot reuse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chordal allocator's `try_reuse_stack_slot` checked the
candidate's interval against the freed slot's *most recent*
owner only. That's unsafe when a slot has been reused
multiple times: suppose A owns the slot during [11, 17] and
frees it, B reuses during [81, 130] and frees, then a
candidate C with interval [6, 42] looks up the slot. C is
checked against B (start 81 > C.end 42, no conflict) and
granted reuse — but C's interval overlaps A's old [11, 17]
window, so C's writes silently corrupt the slot in the span
where A's spilled value is still being read by derived
register pseudos.
CPython's `assertAlmostEqual(0.1+0.2, 0.3)` segfault traces to
exactly this pattern in `Objects/floatobject.c::double_round`.
Slot at offset 160 (asm -208(%rbp)) held %20 [11,17] (the
spilled `x` argument), was reused by %75 [81,130] (a later
intermediate), and was then granted to %13 [6,42] (the
`shortbuf` address) — %13's interval overlaps %20's old
window, so the codegen for `%13 = symaddr` corrupted -208
during the span where %20's value was still needed for the
call to `_Py_dg_dtoa(x, ...)`. The call received a clamped
deadline timestamp where the lock pointer / FP argument was
expected; the downstream `PyOS_snprintf` got a double
bit-pattern as `char *str` and dereferenced garbage.
Fix: `FreeSlot` now carries `Vec<LiveInterval>` — the full
ownership history. New `ActiveSlot { current, past, ... }`
type carries the inherited history forward when a slot is
reused, so a subsequent expire re-merges the complete history
back into the free pool. `try_reuse_stack_slot` checks the
candidate against EVERY past owner, not just the most-recent
one.
Verification:
- cc test suite 914+948+205 pass, including the new
`regalloc_slot_multi_owner_history` regression test.
- cargo build / clippy / fmt clean.
- CPython 3.12.9 -O2 449/449 (40,817 individual tests,
28m55s) — full baseline restored.
This is the third and final fix in the slot-reuse regression
chain (after a687c1a interval-overlap and d9b8c46 Sym-no-reuse).
The CPython baseline now matches the 2026-05-28 verified state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes a stack-slot reuse correctness bug in the chordal register allocator where the interference check only considered the slot's most recent owner, missing collisions with older owners whose intervals could still overlap a new candidate. Reuse safety is now expressed as interval disjointness against the slot's full ownership history, tracked via a new ActiveSlot type and an extended FreeSlot.owner_intervals list. As a separate, conservative fix, Sym (named local) slot reuse is disabled in both backends because the IR-level interval of a Sym does not cover the lifetimes of register pseudos derived from it.
Changes:
- Replace single-owner
FreeSlot.ownerwithFreeSlot.owner_intervals: Vec<LiveInterval>and addActiveSlot { current, past, offset, size }; updateexpire_stack_intervals/try_reuse_stack_slotto use interval-disjointness against all historical owners and merge histories on free. - Update x86_64 and aarch64 register allocators to use
ActiveSlot, pass the candidateLiveIntervalinto reuse, and disableSymslot reuse entirely (keepingaddr_taken_symscomputation as a placeholder). - Add
regalloc_slot_multi_owner_historyregression test exercising FP arithmetic with calls to trigger the multi-owner spill pattern.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| cc/arch/regalloc.rs | Introduces ActiveSlot, rewrites FreeSlot and reuse helpers to track and check full owner history. |
| cc/arch/x86_64/regalloc.rs | Uses ActiveSlot, threads candidate interval through reuse, disables Sym slot reuse with rationale. |
| cc/arch/aarch64/regalloc.rs | Mirrors the x86_64 changes for ARM64. |
| cc/tests/codegen/regalloc.rs | Adds regression test for multi-owner slot-reuse bug. |
No description provided.