Skip to content

Modifying isEmptyQueue to check LZ to confirm if the queue is empty so we can catch when it has finished early#1300

Merged
pvelesko merged 7 commits into
mainfrom
update_atomic_isemptyqueue
Jun 16, 2026
Merged

Modifying isEmptyQueue to check LZ to confirm if the queue is empty so we can catch when it has finished early#1300
pvelesko merged 7 commits into
mainfrom
update_atomic_isemptyqueue

Conversation

@colleeneb

@colleeneb colleeneb commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

TLDR: ZeroRK has had a ~30% regression after the switch to in-order queues due to the marker events and isEmptyQueue which aren't in the out-of-order version. This PR adds extra checking to isEmptyQueue to avoid markers if we know the queue is empty due to LZ API call. From testing this seems to result in ZeroRK performance being similar to the out-of-order-queue performance.

Longer description:
As discussed a little in #1162, codes like ZeroRK which have a lot of streams had a slowdown due to the extra time adding in markers after the switch to in-order queues. That PR helped a lot, but ZeroRK had about a 30% slowdown still. This PR adjusts isEmptyQueue so that we now match pre in-order queue times. It does this by using the IsEmptyQueue_ atomic to avoid adding markers if IsEmptyQueue_=true, but if IsEmptyQueue_=false, instead of adding the marker, it first checks with LZ if the queue is actually busy. If it's not busy, it will avoid adding the marker.

Doing this without locks leads to a possible race due to the gap between calling zeCommandListHostSynchronize(ZeCmdListImm_, 0) and setting IsEmptyQueue_ to the new value. That is, a concurrent submitter could IsEmptyQueue_.store(false) between the L0 check and a IsEmptyQueue_.store(true) (if the LZ check says the queue is empty (true) since it missed the store(false)), resulting in setting IsEmptyQueue_ true while there actually is work in the queue. Thus a lock (UpdateEmptyQueueViaLZMtx_) was added to gate whether or not someone is submitting to the queue. If there's a cleaner way to do this, let me know. I liked the new mutex instead of using CommandListMtx for easier reasoning about the correctness. Thus CommandListMtx is used to guard it. We could alternatively have a new mutex to just lock for IsEmptyQueue_ and submitting to the queue, but using CommandListMtx causes less line changes and is conceptually simpler -- we lock whenever someone adds to the command list, which should be the same as when people submit to the queue.

For the ZeroRK timings (150 iterations):

Pre-in-order-queue baseline: ~14.4s  
Current main: ~19.4s
This PR: ~14.4s

This PR also updates the Aurora tests to remove one older LLVM build test to help speed up testing and to allow the ZeroRK performance test to hard fail if the performance isn't met, and thus any future regression should ideally be caught.

I took a look at #1293 and it looks like a complementary fix (targeting different issues). I'm happy to rebase this on top of that PR if desired.

…sy so we can catch when has finished work early
@colleeneb

Copy link
Copy Markdown
Contributor Author

/run-aurora-ci

(Trying to get a run in before aurora going on maintenance!)

@colleeneb colleeneb marked this pull request as ready for review June 15, 2026 13:34
@pvelesko pvelesko merged commit 85e5efe into main Jun 16, 2026
28 checks passed
@pvelesko pvelesko deleted the update_atomic_isemptyqueue branch June 16, 2026 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants