Skip to content

[Perf] | fix LDS bank conflicts and reduce LDS traffic in func_broad_phase_lds#47

Draft
rtmadduri wants to merge 1 commit intoamd-integrationfrom
perf/rtm/fix-lds-bank-conflicts
Draft

[Perf] | fix LDS bank conflicts and reduce LDS traffic in func_broad_phase_lds#47
rtmadduri wants to merge 1 commit intoamd-integrationfrom
perf/rtm/fix-lds-bank-conflicts

Conversation

@rtmadduri
Copy link
Copy Markdown
Collaborator

@rtmadduri rtmadduri commented Apr 28, 2026

Summary

This PR optimizes the LDS-resident Sweep-and-Prune broadphase kernel (func_broad_phase_lds) in genesis/engine/solvers/rigid/collider/broadphase.py. The kernel runs with ENVS_PER_BLOCK = 16 envs per workgroup, each owning a row of several LDS arrays of width MAX_SORT_ELEM_NUM = 2 * MAX_GEOMS_IN_LDS = 92. The kernel was bottlenecked on LDS bandwidth in the inner insertion-sort and sweep loops, both because the row stride collided in the LDS bank pattern and because each iteration touched more LDS arrays than necessary.

What changed

    1. LDS row-stride padding to eliminate bank conflicts. The LDS arrays were declared as (ENVS_PER_BLOCK, MAX_SORT_ELEM_NUM) with stride 92. With 32 LDS banks, gcd(92, 32) = 4, so column-aligned accesses across the 16 envs of a workgroup land on only 8 distinct banks → systematic 2-way bank conflicts on every LDS load/store in the hot loops. Padding the row to MAX_SORT_ELEM_NUM + 1 = 93 (coprime with 32) maps the 16 envs onto 16 distinct banks and removes the conflicts.
  1. Pack (i_g, is_max) into a single int (lds_sort_i_g_packed). The previous code kept three parallel LDS arrays per env: lds_sort_value (float), lds_sort_i_g (int), lds_sort_is_max (bool). Geom IDs fit easily in 31 bits, so we encode packed = (i_g << 1) | is_max and drop lds_sort_is_max entirely. The inner insertion-sort while loop now does 2 LDS shifts per iteration instead of 3, and the sweep loop does 1 packed LDS load instead of 2 separate loads. Decoding is a shift / mask, which is essentially free compared to an LDS access.
  2. O(1) removal from the active set (lds_pos_in_active). On a max event, the non-hibernation sweep previously did a linear scan to find the geom in lds_active and then an O(n_active) shift to remove it. We now maintain a reverse map lds_pos_in_active[i_b_lds, i_g] and remove via swap-with-last in O(1). Membership is all the sweep cares about, so order in lds_active is irrelevant. This turns the worst-case sweep from O(n²) toward O(n) shifts.
  3. Remove dead HBM ↔ LDS copies.
  • The pre-loop for i in range(env_n_geoms): lds_active[i_b_lds, i] = collider_state.active_buffer[i, i_b] was dead — the non-hibernation sweep starts with n_active = 0 and rebuilds the active set from scratch.
  • The matching collider_state.active_buffer[i, i_b] = lds_active[i_b_lds, i] writeback at the end was also dead for the same reason.
  • Both are removed, saving env_n_geoms HBM reads and writes per env per call.
  • The remaining writeback loop is tightened to a single for i in range(2 * env_n_geoms) over the packed array (previously a manually unrolled 2i / 2i+1 pair plus a third lds_active write per i).
  1. Fold the is_max branch in the warm-start initialization so that the value load (aabb_min vs aabb_max) and the packed-id store happen on the same code path, instead of writing is_max outside the branch and reading the value inside it. Same outputs, fewer LDS stores.

@rtmadduri rtmadduri self-assigned this Apr 28, 2026
Copy link
Copy Markdown
Collaborator

@yaoliu13 yaoliu13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is not based on the latest amd-integration: perf/rtm/fix-lds-bank-conflicts...ROCm:Genesis:amd-integration

Please read Confluence "Genesis PR Review Process"

@rtmadduri rtmadduri changed the title [DO NOT MERGE] | [PERF] | Fixes bank conflicts [Perf] | fix LDS bank conflicts and reduce LDS traffic in func_broad_phase_lds Apr 29, 2026
@rtmadduri rtmadduri closed this Apr 29, 2026
@rtmadduri rtmadduri reopened this Apr 29, 2026
@rtmadduri
Copy link
Copy Markdown
Collaborator Author

@yaoliu13 there are several merge conflicts. I am marking this as draft. I will reopen after fixing those conflicts

@rtmadduri rtmadduri marked this pull request as draft April 29, 2026 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants