Conversation
|
Thanks, @micmelesse for the fix! I actually prefer this triton-based barrier to a working host-side one. Seems like CI tests are failing though. Could you please take a look? |
I will get the pr green and ping you. Thank you |
b9f8264 to
2080f13
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a GPU-side, CUDA-graph-capturable device_barrier() to Iris that avoids RCCL/NCCL host barriers by synchronizing ranks via a Triton kernel using atomics on the symmetric heap.
Changes:
- Added
Iris.device_barrier()with per-process-group flag storage and a newdistributed_device_barrier()implementation. - Introduced a Triton-based device barrier kernel and centralized
extract_group_infofor group rank/stride handling. - Added unit tests covering basic usage, cross-rank visibility (eager + graph), state reuse, and timeout behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
iris/iris.py |
Adds device_barrier() API and caches per-group flags tensor state. |
iris/_distributed_helpers.py |
Implements extract_group_info(), Triton device barrier kernel, and distributed_device_barrier(). |
iris/ccl/utils.py |
Refactors group info extraction to delegate to the centralized helper. |
tests/unittests/test_barriers.py |
Adds unit tests for host/device barriers including CUDA graph capture scenarios. |
You can also share your feedback on Copilot code review. Take the survey.
Motivation
This pr adds
device_barrier()to Iris.device_barrieris a GPU-side barrier using atomic operations on the symmetric heap. This was needed to avoid crashes during graph capture in vllm workloads. See ROCm/hip#3876. The currentbarrier()usesdist.barrier()which goes through RCCL. The RCCL watchdog thread polls existing work items and fails withhipErrorStreamCaptureUnsupportedwhen another stream is in capture mode.device_barrier()avoids RCCL entirely by synchronizing on the GPU via an Iris kernel.Technical Details
Test Plan
Test Result
Submission Checklist