Skip to content

narrow tls sharing for dav1d#679

Draft
oinoom wants to merge 10 commits into
mainfrom
slice/20260417-rtld-initial-tls-sharing-ia2
Draft

narrow tls sharing for dav1d#679
oinoom wants to merge 10 commits into
mainfrom
slice/20260417-rtld-initial-tls-sharing-ia2

Conversation

@oinoom
Copy link
Copy Markdown
Contributor

@oinoom oinoom commented Apr 20, 2026

No description provided.

David Anekstein added 9 commits April 17, 2026 12:26
Strict dav1d single-thread decode on x86_64 was still failing in IA2 libc
compartment mode because a small amount of writable loader/libc state is
process-global ABI state, not compartment-private state.

Keep only the state that must remain shared:
- writable ld.so heap mappings tagged with IA2_LDSO_HEAP_MARKER
- the ld.so TLS-generation page read during __tls_get_addr resolution
- the libc tuning pages holding the x86_64 memmove/memset threshold globals

The intent is to replace broader carveouts with narrow runtime policy rooted in
what the loader and libc actually read and write during the dav1d reproduction.
ia2_unprotect_loader_heap_maps() handles the writable ld.so heap mappings at
startup, and protect_pages() adds the page-level carveouts before retagging the
rest of each segment.

This is the runtime-only part of the minimal main-based IA2 delta. The paired
dav1d strict reproduction still reaches `dav1d --version` and decodes test.ivf
successfully with these carveouts in place.
The minimal dav1d stack needs calls to shared_malloc(), shared_free(), and the
other shared allocator helpers to route to the shared allocator compartment.
Encoding that rule only in the rewriter makes the policy implicit and
application-specific.

Annotate the allocator declarations themselves with ia2_extern_pkey:1 and add a
libia2 wrapper header that exposes those declarations from the runtime include
 tree. This moves the ownership policy into the API contract: code that includes
these declarations can say directly that the entrypoints live in the shared
pkey.

This commit intentionally changes interface metadata only. The rewriter change
that consumes these annotations follows next.
The rewriter previously learned external pkeys only from system-header handling
and hardcoded exceptions. That was enough to get a working dav1d branch, but it
kept shared allocator routing as tool-specific knowledge instead of making it a
declaration-driven rule.

Teach SourceRewriter to parse ia2_extern_pkey:<n> annotations on declarations
and record the annotated pkey even when the declaration would otherwise be
ignored for rewriting. This lets external APIs such as the shared allocator
state their required compartment directly in headers.

Together with the previous commit, this preserves the generated callgate routing
needed by the minimal strict dav1d decode stack while removing the need for a
dav1d-specific `shared_*` exception in the rewriter.
Single-thread dav1d decode still passes after removing the
ia2_extern_pkey annotation machinery from ia2_allocator.h.

This was worth testing because the original minimal stack carried two
extra IA2 commits whose stated purpose was to annotate shared allocator
entrypoints and teach the rewriter to honor those annotations. In the
current dav1d single-thread path, those annotations are not what makes
shared_malloc/shared_free work:

- the generated dav1d binaries still reference shared_malloc and
  shared_free as direct dynamic symbols rather than wrapped callgates
- disabling annotated-extern-pkey handling in SourceRewriter.cpp did not
  change single-thread decode behavior
- after no-oping the annotation macro and restoring the normal rewriter,
  strict single-thread decode still passed

So for this reduced branch pair, the annotation syntax in the header is
currently dead scaffolding. Remove only the macro expansion logic and
leave the allocator API surface unchanged.

Validation:
- rebuilt dav1d through rewrite.py against this IA2 tree
- dav1d --version returned 0
- single-thread decode of test.ivf returned 0
Problem
- The reduced principle-check IA2 branch currently makes strict single-thread
  dav1d decode pass by retagging writable loader-heap TLS state to shared pkey
  0 during startup.
- We wanted to know whether the older e664-style x86_64 TLS policy could still
  support the reduced dav1d branch without falling back to the broader
  loader-heap carveout.
- Restoring only the TCB/static-TLS neighborhood sharing was not enough.
  `dav1d --version` succeeded, but `dav1d --threads 1` still crashed in
  `__tls_get_addr+16` while the IVF handoff path entered
  `dav1d_data_wrap -> malloc -> PartitionAlloc::ScopedDisallowAllocations`.
- GDB showed the exact mismatch at the failing read:
  - `%fs` base was on `[anon: ia2-loader-heap]` with `pkey 0`
  - `*(%fs:8)` (the DTV pointer) was on the adjacent
    `[anon: ia2-loader-heap]` mapping with `pkey 1`
  - the faulting `cmp %rax,(%rdx)` used that DTV pointer as `%rdx`

What this changes
- Keep the x86_64 PT_TLS prefix below the TCB page shared in
  `protect_tls_pages()` so `%fs`-relative ABI/TLS state that lives below the
  thread pointer remains accessible across compartment transitions.
- Reintroduce `ia2_unprotect_thread_pointer_mapping()` and call it from
  `ia2_start()` so the startup thread's TCB neighborhood is retagged shared
  after IA2 has finished compartment setup.
- Add `ia2_unprotect_thread_dtv_page()` and call it from `ia2_start()` so the
  startup thread's DTV header page is explicitly retagged shared as well.
- Leave the newer targeted runtime follow-ups from the reduced branch in place:
  the ld.so TLS-generation page carveout and the libc memmove tuning-page
  carveouts remain unchanged.
- Stop depending on `ia2_unprotect_loader_heap_maps()` for this reproduction.

Why this fixes it
- The targeted TCB/static-TLS carveout fixes `%fs`-relative accesses in the TCB
  neighborhood, but `__tls_get_addr` also dereferences `THREAD_DTV()` via
  `%fs:8`.
- On this reduced single-thread dav1d path, PartitionAlloc reaches that
  `__tls_get_addr` fast path during the IVF packet handoff allocation path.
  Leaving the DTV header page compartment-private therefore still faults even
  though the TCB page itself is shared.
- Sharing the DTV header page closes that remaining gap without going back to a
  blanket retag of every writable IA2 loader-heap mapping.
- The current single-thread reproduction does not need the larger thread-start
  changes from the earlier e664 line. In particular, the extra
  `pthread_create()` PKRU wrapper, the per-thread post-TLS retag in
  `ia2_thread_begin()`, and the standalone one-page
  `ia2_unprotect_thread_pointer_page()` startup call were all removed while the
  strict dav1d reproduction continued to pass.
- The resulting policy is narrower and better motivated: share the startup
  thread's TCB/static-TLS neighborhood and DTV page, rather than all marked
  loader-heap VMAs.

Validation
- rebuilt `principle_check/dav1d-ia2` through `rewrite.py` against this IA2
  tree and dav1d `66a21b9`
- `tools/dav1d --version` returned 0
- `tools/dav1d -i /home/davidanekstein/immunant/test.ivf -o /dev/null --muxer
  null --threads 1` returned 0
- decode completed: `Decoded 2/2 frames (100.0%)`
Problem
- `loader_minimal_malloc` was written against the older assumption that every
  VMA tagged `ia2-loader-heap` would remain compartment-private with `pkey 1`.
- The reduced single-thread dav1d branch no longer has that policy. It
  intentionally retags part of the loader heap to shared `pkey 0` so the startup
  thread can access the TCB/TLS/DTV state required by the strict decode path.
- The old test only looked at the first `ia2-loader-heap` entry in
  `/proc/self/smaps` and asserted that one mapping had `ProtectionKey: 1`.
  Once one shared loader-heap mapping appeared first, the test failed even when
  other loader-heap mappings still remained private with `pkey 1`.

What this changes
- Parse `/proc/self/smaps` entry-by-entry and collect every mapping tagged with
  `IA2_LDSO_HEAP_MARKER` instead of stopping at the first match.
- Record the address range and `ProtectionKey` for each loader-heap mapping.
- Print the full loader-heap pkey breakdown so the test output shows the actual
  mixed policy when debugging regressions.
- Change the assertion from "the first loader-heap mapping has pkey 1" to
  "at least one loader-heap mapping still has pkey 1".

Why this fixes it
- The old test was overfitting to a single mapping order rather than checking
  the actual security/property question.
- On the current branch, the useful invariant is that the loader heap must not
  collapse entirely to shared `pkey 0`. Some pages may be shared for startup TLS
  reasons, but other loader-heap mappings should still remain compartment-
  private.
- Enumerating all loader-heap mappings lets the test distinguish
  "mixed-policy carveout working as intended" from "everything got retagged
  shared".

Observed breakdown on this branch
- direct run of the updated test reported:
  - `0x74e9292fe000-0x74e929307000 pkey=0`
  - `0x74e929307000-0x74e92930a000 pkey=1`
  - `0x74e929338000-0x74e92933a000 pkey=1`
  - `0x74e929685000-0x74e929687000 pkey=1`
- That is exactly why the old test was stale: the loader heap now has a mixed
  pkey distribution, not a uniform one.

Validation
- rebuilt `tests/loader_minimal_malloc/loader_minimal_malloc`
- direct run printed the mixed pkey breakdown above and exited 0
- `ctest --test-dir build/x86_64 -R loader_minimal_malloc --output-on-failure`
  passed
- strict single-thread dav1d decode still passed after the runtime simplification
  and test update
The previous dav1d single-thread fix retagged the startup thread's TCB
window and DTV page from ia2_start(). That made the decode path work, but
it also regressed standard tracer builds: with tracer on, debug on, and
IA2_LIBC_COMPARTMENT off, tests started failing during process startup
because the tracer observed IA2 retagging loader-owned TLS state after the
loader had already initialized it.

This change removes that runtime-side startup retag path entirely:
- delete the x86_64 thread-pointer / DTV helper declarations and
  implementations
- stop calling them from ia2_start()
- update the glibc submodule to move the initial-thread TCB/DTV retag into
  rtld's init_tls() path instead

The paired glibc change now computes the exact initial-thread static TLS
page range from dl_tls_static_size and TLS_TCB_SIZE instead of retagging a
conservative fixed eight-page window below the TCB. That keeps the loader-
side policy narrow while preserving the dav1d decode path and the tracer
startup behavior.

Moving the startup retag into the loader is not sufficient by itself. Once
rtld keeps the initial-thread TCB neighborhood shared, protect_tls_pages()
can still retag that page back to a compartment pkey later when IA2 walks
PT_TLS segments. In practice that showed up as a dav1d --version crash on
exit: libstdc++ / exception-handling state accessed at fs:-0x20 lived on the
single page immediately below the TCB, and that page had been retagged to
pkey 2, causing SEGV_PKUERR during _dl_fini / __cxa_finalize.

Fix that by keeping the x86_64 PT_TLS carveout narrow:
- preserve the historical ia2_stackptr_0 shared page for compartment 1
- for non-default compartments, preserve only the single page immediately
  below the main-thread TCB
- do not restore the older broad "share everything up to the TCB" rule,
  which weakened TLS isolation and broke threads, tls_protected, and
  three_keys_minimal

Validation:
- fresh ./rewrite.py --llvm-config /usr/bin/llvm-config-18 build
- dav1d --version => RC=0
- dav1d test.ivf --threads 1 => RC=0, Decoded 2/2 frames
- tracer/debug/no-libc sweep:
  ctest --test-dir build/tracer_debug_standard_computed_20260417 \
        --output-on-failure -j1 -E terminating_threads
  => 35/35 passed
- debug/no-tracer/no-libc sweep:
  ctest --test-dir build/standard_debug_notracer_computed_20260417 \
        --output-on-failure -j1 -E terminating_threads
  => 35/35 passed
- release/libc-compartment sweep:
  ctest --test-dir build/libc_release_computed_20260417 \
        --output-on-failure -j1
  => 13/13 passed
The shared allocator declarations no longer emit
ia2_extern_pkey annotations. Remove the empty macro and its
no-op uses.
This branch no longer emits ia2_extern_pkey annotations, so the
matching declaration-parsing path in SourceRewriter is unused.
@oinoom oinoom force-pushed the slice/20260417-rtld-initial-tls-sharing-ia2 branch from d79a697 to 1228990 Compare April 20, 2026 17:42
The dav1d single-thread decode work uses ia2_ldso_heap.h only to interpret
x86_64 loader-heap markers emitted by the custom glibc loader.

AArch64 does not build or use that loader-heap marker path, but ia2.c was
including the header unconditionally. In the Arch/AArch64 CI failure this
showed up as a fatal include error when compiling runtime/libia2/ia2.c.

Guard the include with __x86_64__ so non-x86 builds do not see the x86-only
marker header. This keeps the dav1d-specific loader/TLS path out of ARM
builds without changing any x86 behavior.
@oinoom oinoom changed the title Slice/20260417 rtld initial tls sharing ia2 narrow tls sharing for dav1d Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant