Skip to content

[Metal] Add ray tracing pipeline, SBT, and DispatchRays bring-up#1281

Draft
MarijnS95 wants to merge 6 commits into
llvm:mainfrom
Traverse-Research:rt-pso-metal
Draft

[Metal] Add ray tracing pipeline, SBT, and DispatchRays bring-up#1281
MarijnS95 wants to merge 6 commits into
llvm:mainfrom
Traverse-Research:rt-pso-metal

Conversation

@MarijnS95

@MarijnS95 MarijnS95 commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Depends on #1275

Summary

Last backend in the PSO RT bring-up stack. DXR-style ray tracing reaches Metal through metal_irconverter: each RT entry point is lowered from DXIL to a Metal IR function, raygen is emitted as a kernel (IRRayGenerationCompilationKernel) so it can be dispatched directly, and miss / closest-hit / any-hit / intersection / callable functions are emitted as visible functions and pulled into a MTLVisibleFunctionTable.

Fills in the three virtuals the foundation PR left stubbed on Metal:

  • MTLDevice::createPipelineRT compiles every Shaders[] entry against a single IRRayTracingPipelineConfiguration (max attribute / recursion budget from the YAML RTConfig), builds one MTL::Library per entry, hands the raygen function to the compute pipeline as the kernel, and registers the rest as LinkedFunctions. The freshly-built pipeline then mints a MTLVisibleFunctionTable and resolves each callable function's handle into a slot index that the SBT builder reuses. setMaxCallStackDepth(MaxTraceRecursionDepth) is set so nested TraceRay actually unwinds (default of 1 silently drops the second trace).
  • MTLDevice::createShaderBindingTable lays the four SBT regions out via the shared computeSBTLayout helper sized for IRShaderIdentifier records, looks up each region entry's ShaderName in the pipeline's name → IRShaderIdentifier map, and memcpys the records into a shared-storage MTL::Buffer the runtime dereferences at dispatch.
  • MTLComputeEncoder::dispatchRays binds the raygen pipeline and runs dispatchThreads(Width, Height, Depth) on the encoder. The caller (createRayTracingCommands in MTLDevice) builds the per-dispatch IRDispatchRaysArgument struct (SBT region addresses + sizes, GRS / ResDescHeap GPU pointers, visible / intersection function table resourceIDs), parks it in a shared MTL::Buffer kept alive on the command buffer's KeepAlive list, and binds it at kIRRayDispatchArgumentsBindPoint so callees reached via TraceRay() inherit the same dispatch state through that pointer.

Plumbs the existing executeProgram RT branch on Metal the same way the VK / DX backends already do (validate Shaders / SBT / RTConfig, build RayTracingPipelineCreateDesc from the YAML pipeline, create PSO, build SBT, record commands), and adds the raytracing-pipeline lit feature on Metal so test/Feature/RT/raygen-roundtrip.test drops Metal from its XFAIL list and passes natively on Apple Silicon.

This bring-up only handles Triangle hit groups whose only member is a ClosestHit shader — any-hit / intersection / procedural / local root signatures land in follow-ups; createPipelineRT returns a clear unsupported error for those shapes instead of silently producing wrong output.

Test plan

Local on an NVIDIA RTX 3060:

  • Linux Vulkan (native offloader)
  • Linux D3D12 (Wine + vkd3d-proton + cross-compiled offloader.exe)
  • Windows Vulkan (native offloader.exe)
  • Windows D3D12 (native offloader.exe)

CI (RT-capable runners):

  • windows-nvidia D3D12 (RaytracingTier 1.2)
  • windows-intel VK (VK_KHR_ray_tracing_pipeline)
  • macOS Metal (supportsRaytracing)

MarijnS95 and others added 6 commits June 11, 2026 13:46
Wire up acceleration-structure descriptor binding end-to-end across all
three backends so shaders can actually consume the TLAS that
buildPipelineAccelerationStructures() produced — completing the stack
and promoting the three InlineRT tests from XFAIL to passing.

Per-resource AS handling lands in a new per-backend createAS() (paired
with createSRV() / createUAV() / createCBV()): a pure single-create
that queries TLAS sizes via Dev.getTLASBuildSizes() and allocates the
handle via Dev.createTLAS(), returning the unique_ptr to the caller. No
InvocationState or Pipeline access — the multi-create
(createBuffers() / createResources()) records the handle in
InvocationState::TLASes (a StringMap keyed by TLASDesc::Name) and
wires a non-owning AS pointer into the per-resource bundle the binding
loop reads. The shared AS-build helper picks up that map and walks
P.AccelStructs.TLAS to pair each YAML descriptor with its pre-allocated
handle by name (TLASes without a map entry are skipped, i.e. declared
but unbound). BLAS handles are still allocated by the helper itself
since BLASes aren't user-bindable.

executeProgram() in each backend now runs as:

  createBuffers / createResources (createAS() allocates TLAS handles)
  open encoder → buildPipelineAccelerationStructures() → end

Vulkan: createDescriptorPool() counts AS descriptors in a separate
scalar (the KHR enum value 1000150000 doesn't fit in the indexed
array used for the core types) and emits one VkDescriptorPoolSize
for them. createDescriptorSets() reads the resolved
VulkanAccelerationStructure handle from ResourceRef.AS (populated by
createResources()) and writes it through a
VkWriteDescriptorSetAccelerationStructureKHR chained on the descriptor
write's pNext. The dispatch's pre-barrier dst access now includes
VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_KHR so the prior AS-build's
writes are made visible to the shader's RayQuery reads. Device creation
also enables VK_KHR_ray_query when supported so the RayQuery shader
instructions actually function. copyResourceDataToDevice() short-
circuits AS bundles (no host buffer to barrier) via a new
ResourceBundle::isAccelerationStructure() predicate.

DX12: writes a D3D12_SRV_DIMENSION_RAYTRACING_ACCELERATION_STRUCTURE
SRV with the AS GPU virtual address as Location into the heap slot
that createBuffers() reserved (CreateShaderResourceView() with a null
resource — the AS data lives in the buffer pointed to by Location).

Metal: the Metal shader converter doesn't bind the AS directly; the
shader reads a buffer containing an IRRaytracingAccelerationStructure-
GPUHeader that holds the AS's gpuResourceID plus a pointer to an
instance-contributions array. createBuffers() allocates and fills both
buffers per AS-descriptor entry, then points the descriptor at the
header buffer's GPU address. The TLAS itself is built with the UserID
instance-descriptor variant so HLSL CommittedInstanceID() returns the
YAML-specified per-instance ID instead of the array index.

The three InlineRT tests now actually exercise the AS end-to-end:
TraceRayInline() issues a RayQuery against `Scene` and writes a
hit-dependent value into `Output` (the instance ID for multi-instance,
1/0 otherwise). The catch-all `XFAIL: *` is dropped; `XFAIL: Clang`
remains. The test shaders gain explicit `[[vk::binding]]` annotations
since their `t0`/`u0` registers would otherwise collide under the
default dxc HLSL→SPIR-V mapping.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundational bring-up for PSO-based raytracing tracked in
llvm#1268. Lays out the
framework-side surface (stage enums, pipeline kind, YAML schema, lit
infrastructure) so subsequent per-backend bring-up PRs (VK → DX12 →
Metal) only have to fill in pipeline-state-object creation, SBT
construction, and DispatchRays. No backend can run an RT pipeline yet —
each one's executeProgram gains a terminal `else if (P.isRayTracing())`
that returns a "not yet supported" error.

Pipeline.h gets six new Stages (RayGeneration, Miss, ClosestHit, AnyHit,
Intersection, Callable), `ShaderPipelineKind::RayTracing`, an
`isRayTracingStage` predicate, and `Pipeline::isRayTracing()`. The
declarative YAML schema for an RT pipeline lives alongside the existing
AccelerationStructureDescs: a `HitGroup` (Triangles | Procedural, with
ClosestHit + optional AnyHit / Intersection entries), a
`RayTracingPipelineConfig` block (MaxTraceRecursionDepth,
MaxPayloadSizeInBytes, MaxAttributeSizeInBytes, optional PipelineFlags),
and a `ShaderBindingTable` block with raygen / miss / hit-group /
callable record arrays. SBTEntry carries an optional `LocalRootData`
byte array reserved for the upcoming local-root-signature work.

validatePipelineKind grows an RT branch: it allows multiple shaders of
the same RT stage (a pipeline can have several misses or hit groups —
the existing duplicate check would have rejected them), requires at
least one RayGeneration, and rejects mixing RT with Compute/Vertex/Mesh.
The reverse check rejects HitGroups / RTConfig / SBT on any non-RT
pipeline. validateDispatchParameters reinterprets DispatchGroupCount as
{Width, Height, Depth} for the eventual DispatchRays and forbids
VertexCount on RT.

Existing Stages switches grow the six new cases:
  * VK: getShaderStageFlag maps each RT stage to its
    VK_SHADER_STAGE_*_KHR bit so PR 2 can build
    VkPipelineShaderStageCreateInfos for the RT pipeline.
  * Metal: getShaderStage unreachables on RT (the metal-irconverter RT
    path takes a different route from the IRShaderStage one).
  * TraditionalRasterPipelineCreateDesc::setShader adds the RT stages to
    its existing "not a raster stage" unreachable group.

test/lit.cfg.py adds a `%dxc_target_lib` substitution (same compiler,
distinct name to signal `-T lib_6_x` library targets at a glance) and a
`raytracing-pipeline` available-feature. On DX it tracks
RaytracingTier >= 1.0; on Vulkan it aliases off the
VK_KHR_ray_tracing_pipeline extension already reported by the device.
The extension isn't enabled on the VkDevice yet — that lands in PR 2 —
but the lit-level capability detection is independent of what the
backend currently consumes, so a developer on a VK box can already see
the foundational test routed through the RT path.

The foundational test `Feature/RT/raygen-roundtrip.test` exercises the
full RT YAML schema in one shape: raygen + miss + closest-hit shaders,
a BLAS/TLAS pair, a HitGroups list, RayTracingPipelineConfig, and a
ShaderBindingTable. `# REQUIRES: raytracing-pipeline` and `# XFAIL: *`
keep it expectedly failing until the per-backend PRs drop entries from
the XFAIL list as each one starts dispatching real rays.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First per-backend bring-up in the PSO raytracing series (llvm#1268). Adds
the API surface (ComputeEncoder::dispatchRays, Device::createPipelineRT,
Device::createShaderBindingTable, RayTracingPipelineCreateDesc) plus the
Vulkan implementation behind it. D3D12 and Metal stub the new methods
with not-yet-supported errors; their bring-up lands in follow-up PRs.

The pre-existing YAML schema struct from PR llvm#1270 is renamed
ShaderBindingTable -> ShaderBindingTableDesc so the bare name is free
for the runtime resource class (parallel to BLASDesc / TLASDesc vs
AccelerationStructure). A new include/API/ShaderBindingTable.h holds
the abstract runtime base; concrete backend SBT classes derive from it
with LLVM-style classof / cast<>.

The VulkanDevice's prior `RaytracingFunctions RT` lumped AS and RT
pipeline entry points together. They split into two structs —
`ASFunctions AS` and `RTPipelineFunctions RT` — matching the actual
feature-gate split (AS+ray-query is a complete configuration on its
own, RT pipeline is layered on top). `HasRayTracingSupport` renames
to `HasASSupport`, and a separate `HasRTPipelineSupport` tracks the
new VK_KHR_ray_tracing_pipeline extension.

Vulkan bring-up:
  - Extension: VK_KHR_ray_tracing_pipeline is requested when reported,
    with VkPhysicalDeviceRayTracingPipelineFeaturesKHR chained into the
    pre-create feature query. After the query the gating
    rayTracingPipeline bool is checked; capture-replay / trace-rays-
    indirect / traversal-primitive-culling sub-features are cleared
    since the tests don't exercise them.
  - Function pointers: vkCreateRayTracingPipelinesKHR,
    vkGetRayTracingShaderGroupHandlesKHR, vkCmdTraceRaysKHR.
  - Properties: VkPhysicalDeviceRayTracingPipelinePropertiesKHR is
    cached at device-create time for SBT handle size / alignment /
    base-alignment.
  - VKRayTracingPipelineState derives from VulkanPipelineState; an
    IsRayTracing flag on the base lets the existing Vulkan cast<>
    path stay polymorphic without adding a new GPUAPI value.
    classof tests both the API and the flag. The derived class also
    carries a StringMap<uint32_t> resolving each shader EntryPoint or
    HitGroup Name to its index in the pipeline's group array, plus
    per-bucket counts so the SBT builder can slice the contiguous
    handle blob into raygen / miss / hit / callable regions.
  - createPipelineRT builds a single VkShaderModule (the DXIL library
    compiles to one SPIR-V module with multiple OpEntryPoints), then
    one VkPipelineShaderStageCreateInfo per Shader entry and one
    VkRayTracingShaderGroupCreateInfoKHR per general shader / hit
    group. Pipeline layout is shared with the compute path via
    createPipelineLayout, gated on all six RT stage flags so any
    binding can be consumed from any RT shader.
  - createShaderBindingTable allocates a host-visible coherent buffer
    big enough for four regions and lays out each entry as
    [handle bytes][localRootData bytes][padding-to-stride]. Per-region
    stride = align(handleSize + max-local-root-data-in-region,
    handleAlignment); per-region size = align(count * stride,
    baseAlignment). LocalRootData support comes free from the PR1 SBT
    schema; the test doesn't exercise it yet. Each region's
    VkStridedDeviceAddressRegionKHR derives from the buffer's
    vkGetBufferDeviceAddress.
  - dispatchRays binds the pipeline at
    VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR, emits a pre-barrier with
    AS_READ + SHADER_READ/WRITE dst access into
    RAY_TRACING_SHADER_BIT_KHR, then calls vkCmdTraceRaysKHR with the
    SBT's four region structs.
  - createCommands picks the new bind point for RT pipelines so
    vkCmdBindDescriptorSets binds to the right point. executeProgram's
    isRayTracing branch builds a RayTracingPipelineCreateDesc from the
    YAML, calls createPipelineRT then createShaderBindingTable, and
    keeps both on InvocationState for the dispatch.

raygen-roundtrip.test now expects DirectX/Metal/Clang to XFAIL; on a
DXC + Vulkan combo with VK_KHR_ray_tracing_pipeline supported the test
should PASS via this implementation. On the user's Linux + clang-dxc
loop the test still XFAILs because clang-dxc doesn't yet lower
[shader("raygeneration")] entry points to SPIR-V, so the Clang XFAIL
token catches the compile failure. CI on a working DXC install will
exercise the runtime path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second per-backend bring-up in the PSO raytracing series (llvm#1268).
Mirrors PR llvm#1273 for D3D12: builds an ID3D12StateObject from the YAML
schema, hands out shader identifiers via ID3D12StateObjectProperties,
lays out the SBT in an upload heap, and routes DispatchRays through
ID3D12GraphicsCommandList4 (same query path the AS build already uses).

DXRayTracingPipelineState derives from DXPipelineState with an
IsRayTracing flag on the base for classof — matching the
VulkanPipelineState pattern. It carries the ID3D12StateObject + a
cached ID3D12StateObjectProperties + a StringMap<const void *> that
resolves each shader EntryPoint or hit-group Name to its 32-byte shader
identifier blob. The identifiers are driver-owned and stay alive for
the Properties COM lifetime, so the PSO keeps Properties alive.

DXShaderBindingTable holds a single upload-heap buffer plus four
pre-built D3D12_DISPATCH_RAYS_DESC ranges (raygen, miss, hit-group,
callable) — `RANGE` for raygen since it's always one record, and
`RANGE_AND_STRIDE` for the others.

createPipelineRT builds a CD3DX12_STATE_OBJECT_DESC with subobjects
for the DXIL library (one export per Shader entry), per-hit-group
subobjects with closest-hit / any-hit / intersection imports, the
pipeline shader config (max payload + max attribute bytes), pipeline
config (max recursion depth), and a global root signature subobject.
The root signature comes from the library's embedded RTS0 part when
present, falling back to the BindingsDesc path (matching the existing
compute / raster pipeline behaviour). Wide strings for the subobject
exports live in a SmallVector that outlives the SODesc, since the
helper classes store pointers into the strings rather than copying.

createShaderBindingTable lays out each entry as
[identifier][LocalRootData][padding-to-stride] with per-region
stride = align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES + max-LocalRoot-
Data-in-region, D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT) and
per-region size = align(count * stride,
D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT). The buffer lives in an
upload heap with D3D12_RESOURCE_STATE_GENERIC_READ — PR3 simplification;
a staging copy into a default heap is a follow-up.

dispatchRays queries the underlying CommandListX for
ID3D12GraphicsCommandList4 (matching the AS-build path), binds the
global root signature via SetComputeRootSignature, calls
SetPipelineState1 with the state object, and issues DispatchRays with
a D3D12_DISPATCH_RAYS_DESC populated from the SBT's four ranges plus
the dispatch dimensions. The descriptor heap + descriptor-table bindings
are set up by the existing createComputeCommands helper before the
encoder is created.

createComputeCommands grows an isRayTracing branch at the dispatch
point so it calls dispatchRays instead of dispatch, reusing all of the
descriptor-heap and root-signature wiring. InvocationState carries a
ShaderBindingTable unique_ptr that's only populated for RT pipelines.

executeProgram's isRayTracing branch builds a RayTracingPipelineCreate-
Desc from Pipeline.Shaders / HitGroups / RTConfig, calls
createPipelineRT then createShaderBindingTable, then re-enters
createComputeCommands which dispatches via the new RT path.

raygen-roundtrip.test's XFAIL becomes Clang, Metal — DirectX should
PASS via this implementation on Windows CI (and via Wine + vkd3d-proton
locally on Linux). The Clang token still catches the compile failure
on clang-dxc since [shader("raygeneration")] doesn't yet lower to
either DXIL libraries or SPIR-V on that path.

Locally verified by cross-compiling lib/API/DX/Device.cpp via
`clang++ --target=x86_64-pc-windows-msvc` against the xwin Windows SDK
headers and the project's bundled DirectX-Headers. Runtime verification
is left to Windows CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RayTracing pipelines compile every entry point — raygen, miss,
closest-hit, any-hit, intersection, callable — into a single DXIL
library via `dxc -T lib_6_x` / `clang-dxc -T lib_6_x`. That's the
shape every real DXR app ships: D3D12's CreateStateObject requires a
DXIL-library subobject anyway, and the driver fuses entry points
across the whole library at link time, so writing one .hlsl file and
compiling it once is both idiomatic and the path the framework's
`%dxc_target_lib` substitution emits.

Compute and raster pipelines stay one-to-one (the existing position-
based mapping handles VS+PS, AS+MS+PS, etc.). RT pipelines today need
N positional args even though one library blob holds every entry —
which the foundational `raygen-roundtrip.test` runs straight into:
3 Shaders[] entries vs 1 input file fails the count check before any
GPU work happens.

Detect the RT-pipeline-with-one-input shape and copy the library blob
into every `Shaders[].Shader` slot via `MemoryBuffer::getMemBufferCopy`.
Each entry owns its own buffer copy (DXIL libraries are KBs, no real
memory pressure) keeping the existing `unique_ptr<MemoryBuffer>`
ownership model intact. Non-RT pipelines still go through the
positional path and still enforce the count check.

Verified by re-running `raygen-roundtrip.test`'s pipeline.yaml + the
DXIL library via Wine + vkd3d-proton with a single .o argument — same
0xBEEF result the prior three-arg invocation produced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DXR-style ray tracing reaches Metal through metal_irconverter: each RT
entry point is lowered from DXIL to a Metal IR function, raygen is
emitted as a kernel (IRRayGenerationCompilationKernel) so it can be
dispatched directly, and miss / closest-hit / any-hit / intersection /
callable functions are emitted as visible functions and pulled into a
MTLVisibleFunctionTable.

Implements the three virtuals the foundation PR left stubbed on Metal:

  • MTLDevice::createPipelineRT compiles every Shaders[] entry against a
    single IRRayTracingPipelineConfiguration (max attribute/recursion
    from the YAML RTConfig), builds one MTL::Library per entry, hands
    the raygen function to the compute pipeline as the kernel, and
    registers the rest as LinkedFunctions. The freshly-built pipeline
    then mints a MTLVisibleFunctionTable and resolves each callable
    function's handle into a slot index that the SBT builder reuses.

  • MTLDevice::createShaderBindingTable lays the four SBT regions out
    via the shared computeSBTLayout helper sized for IRShaderIdentifier
    records, looks up each region entry's ShaderName in the pipeline's
    name → IRShaderIdentifier map, and memcpys the records into a
    shared-storage MTL::Buffer the runtime will dereference at dispatch.

  • MTLComputeEncoder::dispatchRays binds the raygen pipeline and runs
    dispatchThreads(Width, Height, Depth) on the encoder. The caller
    (createRayTracingCommands) is responsible for binding the global
    descriptor heap, top-level argument buffer, IRDispatchRaysArgument
    (slot 3), and marking the SBT buffer + function tables resident.

The IRDispatchRaysArgument struct is built per-dispatch in
createRayTracingCommands: SBT region addresses + sizes (read off the
MTLShaderBindingTable), GRS / ResDescHeap GPU pointers, and the
visible / intersection function table resourceIDs. It's parked in a
shared MTL::Buffer kept alive on the command buffer's KeepAlive list
and bound at kIRRayDispatchArgumentsBindPoint so callees reached via
TraceRay() inherit the same dispatch state through that pointer.

Plumbs the existing executeProgram RT branch on Metal the same way the
VK / DX backends already do (validate Shaders / SBT / RTConfig, build
RayTracingPipelineCreateDesc from the YAML pipeline, create PSO, build
SBT, record commands), and adds the raytracing-pipeline lit feature
on Metal so test/Feature/RT/raygen-roundtrip.test drops Metal from its
XFAIL list and passes natively on Apple Silicon (the 0xBEEF payload
roundtrip matches the DX / VK references, verified locally on
macOS 15 / metal-irconverter 3.1.1).

This PR1 bring-up only handles Triangle hit groups whose only member
is a ClosestHit shader — any-hit / intersection / procedural / local
root signatures land in follow-ups; createPipelineRT now returns a
clear unsupported error for those shapes instead of silently producing
wrong output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants