Skip to content

[cc] regalloc fixes#580

Merged
jgarzik merged 3 commits into
mainfrom
updates
May 31, 2026
Merged

[cc] regalloc fixes#580
jgarzik merged 3 commits into
mainfrom
updates

Conversation

@jgarzik
Copy link
Copy Markdown
Contributor

@jgarzik jgarzik commented May 31, 2026

No description provided.

jgarzik and others added 3 commits May 31, 2026 14:39
The chordal allocator's `try_reuse_stack_slot` used block-level
liveness (`pseudos_interfere`) to decide whether a freed slot's
previous owner could share with a new candidate. That check walks
`live_in` / `live_out` looking for any block where both pseudos
appear — but a register pseudo whose lifetime sits entirely
inside one block has empty `live_in` / `live_out` projections, so
the check returns false even when the intervals genuinely
overlap inside that block.

Concrete symptom: CPython's `PyThread_acquire_lock_timed` (in
`Python/thread.o`) miscompiled. Slot -64(%rbp) was assigned to
the `thelock` Sym pseudo; then to a spilled value loaded from
that slot whose lifetime extended past the next call; then later
overwritten by the `_PyTime_Add` return value at a point where
the spilled value was still live. The subsequent `sem_clockwait`
call read -64(%rbp) expecting the lock pointer and got a
timestamp instead. Bisected per-TU on a `_freeze_module` build.

Fix: replace the block-level interference check with direct
interval overlap. Two intervals `[a.start, a.end]` and
`[b.start, b.end]` are non-overlapping iff `a.end < b.start ||
b.end < a.start`. `FreeSlot` carries the previous owner's
`LiveInterval`; `try_reuse_stack_slot` takes the candidate's
interval. The check now matches what linear scan's monotonic
expire-by-position guaranteed, but works for the chordal
allocator's non-monotonic spill commit order.

`pseudos_interfere` is removed — it was the only consumer of
the block-level check, and it would only ever miss conflicts
that interval overlap catches correctly.

Per-arch wrappers updated to thread `&LiveInterval` (candidate)
instead of `PseudoId + live_in/live_out`.

CPython progress: pre-fix, 9 Python/*.o TUs miscompile on their
own. Post-fix, only 2 (Python/_warnings.o, Python/flowgraph.o) —
those are separate bugs unrelated to slot reuse. Memory note
`bug_cpython_baseline_regression.md` tracks remaining work.

Verified:
- cc test suite 914+948+204+ pass
- cargo build / clippy / fmt clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The IR-level live interval of a Sym pseudo only captures its
direct Store/Load/SymAddr uses — NOT the lifetime of register
pseudos that derive their values from the slot. Linear scan
allocated slot offsets monotonically and so never reused a Sym
slot in conflicting ways by accident; chordal coloring exposes
the gap.

CPython `_warnings.o::init_filters` and
`flowgraph.o::_PyCfgBuilder_Addop` both miscompiled on the
Sym-slot reuse pattern even after the preceding
interval-overlap fix (commit a687c1a). The interval-overlap
check returns "no conflict" because the Sym's own IR interval
ends well before the derived register pseudo's lifetime does;
the slot is then handed to an unrelated pseudo while the
derived value still reads from it.

Conservative fix: never reuse Sym pseudo slots. `addr_taken_syms`
remains computed and stays the correct gating predicate when a
future fix adds proper derived-value-lifetime tracking to the
Sym's interval, allowing reuse to be re-enabled.

CPython progress with this fix:
- Pre-fix: `_freeze_module` segfaulted before producing any
  output (predates M6+M7; rustc 1.96 changed pcc binary enough
  to expose it).
- Post-fix: CPython builds cleanly; `python -c "import math;
  print(math.pi)"` returns the correct value; 24 of 487 test
  files run before hitting an unrelated calling-convention
  miscompile in `double_round` →`PyOS_snprintf`.

Verified:
- cc test suite 914+948+204 pass
- cargo build / clippy / fmt clean

Known remaining (separate investigation):
- `assertAlmostEqual` triggers `double_round` passing a `double`
  bit-pattern where `PyOS_snprintf`'s `char *str` is expected.
  Likely variadic argument handling, not slot reuse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chordal allocator's `try_reuse_stack_slot` checked the
candidate's interval against the freed slot's *most recent*
owner only. That's unsafe when a slot has been reused
multiple times: suppose A owns the slot during [11, 17] and
frees it, B reuses during [81, 130] and frees, then a
candidate C with interval [6, 42] looks up the slot. C is
checked against B (start 81 > C.end 42, no conflict) and
granted reuse — but C's interval overlaps A's old [11, 17]
window, so C's writes silently corrupt the slot in the span
where A's spilled value is still being read by derived
register pseudos.

CPython's `assertAlmostEqual(0.1+0.2, 0.3)` segfault traces to
exactly this pattern in `Objects/floatobject.c::double_round`.
Slot at offset 160 (asm -208(%rbp)) held %20 [11,17] (the
spilled `x` argument), was reused by %75 [81,130] (a later
intermediate), and was then granted to %13 [6,42] (the
`shortbuf` address) — %13's interval overlaps %20's old
window, so the codegen for `%13 = symaddr` corrupted -208
during the span where %20's value was still needed for the
call to `_Py_dg_dtoa(x, ...)`. The call received a clamped
deadline timestamp where the lock pointer / FP argument was
expected; the downstream `PyOS_snprintf` got a double
bit-pattern as `char *str` and dereferenced garbage.

Fix: `FreeSlot` now carries `Vec<LiveInterval>` — the full
ownership history. New `ActiveSlot { current, past, ... }`
type carries the inherited history forward when a slot is
reused, so a subsequent expire re-merges the complete history
back into the free pool. `try_reuse_stack_slot` checks the
candidate against EVERY past owner, not just the most-recent
one.

Verification:
- cc test suite 914+948+205 pass, including the new
  `regalloc_slot_multi_owner_history` regression test.
- cargo build / clippy / fmt clean.
- CPython 3.12.9 -O2 449/449 (40,817 individual tests,
  28m55s) — full baseline restored.

This is the third and final fix in the slot-reuse regression
chain (after a687c1a interval-overlap and d9b8c46 Sym-no-reuse).
The CPython baseline now matches the 2026-05-28 verified state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jgarzik jgarzik requested a review from Copilot May 31, 2026 16:26
@jgarzik jgarzik self-assigned this May 31, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a stack-slot reuse correctness bug in the chordal register allocator where the interference check only considered the slot's most recent owner, missing collisions with older owners whose intervals could still overlap a new candidate. Reuse safety is now expressed as interval disjointness against the slot's full ownership history, tracked via a new ActiveSlot type and an extended FreeSlot.owner_intervals list. As a separate, conservative fix, Sym (named local) slot reuse is disabled in both backends because the IR-level interval of a Sym does not cover the lifetimes of register pseudos derived from it.

Changes:

  • Replace single-owner FreeSlot.owner with FreeSlot.owner_intervals: Vec<LiveInterval> and add ActiveSlot { current, past, offset, size }; update expire_stack_intervals / try_reuse_stack_slot to use interval-disjointness against all historical owners and merge histories on free.
  • Update x86_64 and aarch64 register allocators to use ActiveSlot, pass the candidate LiveInterval into reuse, and disable Sym slot reuse entirely (keeping addr_taken_syms computation as a placeholder).
  • Add regalloc_slot_multi_owner_history regression test exercising FP arithmetic with calls to trigger the multi-owner spill pattern.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
cc/arch/regalloc.rs Introduces ActiveSlot, rewrites FreeSlot and reuse helpers to track and check full owner history.
cc/arch/x86_64/regalloc.rs Uses ActiveSlot, threads candidate interval through reuse, disables Sym slot reuse with rationale.
cc/arch/aarch64/regalloc.rs Mirrors the x86_64 changes for ARM64.
cc/tests/codegen/regalloc.rs Adds regression test for multi-owner slot-reuse bug.

@jgarzik jgarzik merged commit 2413317 into main May 31, 2026
10 checks passed
@jgarzik jgarzik deleted the updates branch May 31, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants