Skip to content

Lower device globals to kernel arguments (no program-scope globals)#1289

Draft
pvelesko wants to merge 7 commits into
mainfrom
rusticl-device-globals-as-args
Draft

Lower device globals to kernel arguments (no program-scope globals)#1289
pvelesko wants to merge 7 commits into
mainfrom
rusticl-device-globals-as-args

Conversation

@pvelesko

@pvelesko pvelesko commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

@pvelesko pvelesko force-pushed the rusticl-device-globals-as-args branch from 1be6e29 to 2cfa4be Compare June 9, 2026 18:31
@pvelesko

Copy link
Copy Markdown
Collaborator Author

/run-aurora-ci

pvelesko added 7 commits June 13, 2026 18:37
…lobals OFF)

Drivers like rusticl/radeonsi cannot consume program-scope CrossWorkgroup
globals, so chipStar's default `__chip_var_<name>` address-holder lowering
fails there. When CHIP_ENABLE_DEVICE_PROGRAM_SCOPE_GLOBALS is OFF, a new
post-transform in HipGlobalVariables replaces each such global with an implicit
trailing kernel pointer argument carrying the global's device address:

  * kernels that load `__chip_var_<G>` get a trailing pointer param per global;
    the load becomes ptrtoint(param).
  * a `__chip_gvararg_<kernel>` annotation (NUL-separated global names) tells the
    runtime which global feeds each trailing arg; spirv.cc parses it, marks the
    args SPVTypeKind::DeviceGlobal, and strips the annotation before the driver.
  * the `__chip_var_init_<G>` kernel takes the storage address as an arg instead
    of reading the global; the `__chip_var_bind_<G>` kernel is removed.
  * setupAllArgs binds each DeviceGlobal arg to the global's DeviceVar address
    via clSetKernelArgSVMPointer.

All gated; the default (flag ON) path is unchanged. Passes on rusticl for
TestConstantMemory, TestGlobalVarInit, and a minimal __device__ read.
…rg list

copyKernelArgs / visitClientArgs must not surface the implicit device-global
address arguments (they are runtime-provided, not HIP-client args), otherwise
the graph kernel-arg capture path asserts on the unexpected arg kind.
…-node prepare

- Only convert globals whose every use is a load in a user kernel or their own
  init kernel (or the own-bind store). Globals that are address-taken, accessed
  in non-kernel functions, or cross-referenced from another global's init
  (e.g. mutually-referencing pointers Foo=&Bar; Bar=&Foo) are left as
  program-scope globals — a safe fallback on drivers that support them.
- Add the DeviceGlobal argument case to the Level Zero backend's setupAllArgs.
- Allocate a kernel's module device variables in the graph kernel-node
  constructor before setupAllArgs binds the implicit device-global args (graph
  construction happens before any launch, which is where this normally happens).

With these, the flag-OFF (globals-as-kernel-args) build passes the Level Zero
unit suite except pre-existing flaky/unrelated tests.
Add a --no-psg flag to unit_tests.sh that configures the build with
-DCHIP_ENABLE_DEVICE_PROGRAM_SCOPE_GLOBALS=OFF, then run the latest-LLVM
native unit-test job twice: once with program-scope globals ON (default)
and once OFF. The OFF build lowers user __device__/__constant__ globals
to kernel arguments (issue #1279), so a single native run exercises both
code paths on a PSG-capable backend (Level Zero / Arc).
…w fixes on rusticl

The chipstar-rusticl section masked 22 device-symbol/global/constant tests
as 'rusticl cannot consume program-scope CrossWorkgroup globals'. With the
kernel-argument lowering (issue #1279, this PR), 18 of them now pass on the
W6400 (verified against the rusticl/radeonsi W6400 with a PSG-OFF build and
the CI's exact ICD/env). Removing those 18 makes the rusticl CI actually run
and validate the fix instead of skipping it.

The 4 that still fail are kept with corrected, accurate justifications:
  - hipTestSymbolInit / hipTestResetStaticVar: cross-referenced and
    function-local-static globals are not convertible to kernel arguments
    and fall back to program-scope globals (rusticl/Mesa panics on them).
  - TestAtomics: clLinkProgram -17 (ACO int64/system-scope atomics gap),
    not a device-global issue.
  - TestLargeGlobalVar: times out (>180s) on the W6400.
…rrency flaky

After unmasking the device-global tests, the rusticl CI surfaced that the
three Unit_hipGraphAddMemcpyNodeFromSymbol_Global* tests intermittently fail
under the full -j4 suite on the W6400: the graph memcpy-to-symbol ->
memcpy-from-symbol round-trip reads stale data under concurrent load.

This is not a kernel-argument-lowering defect: the same PSG-OFF build passes
these tests reliably in isolation on rusticl (60+ runs), and consistently on
the Intel iGPU (OpenCL) and Level Zero. It is a rusticl/radeonsi driver
ordering gap that only manifests under full-suite concurrency, so they are
re-masked on rusticl only, with an accurate reason, and tracked as a #1279
follow-up. The other 15 previously-masked device-global tests remain unmasked
and now actually gate the fix on rusticl.
…not flaky)

Root-caused the Unit_hipHostMalloc_CoherentAccess failure in the native
PSG-OFF sweep: it is deterministic, not flaky. Its kernel busy-waits on
clock64():

    uint64_t start = clock64()/clkRate, cur;
    do { cur = clock64()/clkRate - start; } while (cur < wait_sec);

With program-scope device globals OFF, clock64() returns 0 (the
__chip_clk_counter program-scope global is omitted -- the same reason the
clock sample is gated), so the loop never terminates: the kernel hangs and
the GPU watchdog aborts the queue with CL_OUT_OF_RESOURCES. That is exactly
why it passed the PSG-ON sweep and aborted the PSG-OFF sweep of the same run.

Fix: unit_tests.sh excludes this test only in the --no-psg run (it uses a
feature intentionally unavailable there); it still runs with PSG ON. Also
reverts the incorrect known_failures 'flaky' mask added on meatloaf.
@pvelesko pvelesko force-pushed the rusticl-device-globals-as-args branch from 46897e2 to 6f212ac Compare June 13, 2026 15:45
@pvelesko

Copy link
Copy Markdown
Collaborator Author

/run-aurora-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant