Modifying isEmptyQueue to check LZ to confirm if the queue is empty so we can catch when it has finished early#1300
Merged
Merged
Conversation
…sy so we can catch when has finished work early
Contributor
Author
|
/run-aurora-ci (Trying to get a run in before aurora going on maintenance!) |
pvelesko
approved these changes
Jun 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TLDR: ZeroRK has had a ~30% regression after the switch to in-order queues due to the marker events and isEmptyQueue which aren't in the out-of-order version. This PR adds extra checking to isEmptyQueue to avoid markers if we know the queue is empty due to LZ API call. From testing this seems to result in ZeroRK performance being similar to the out-of-order-queue performance.
Longer description:
As discussed a little in #1162, codes like ZeroRK which have a lot of streams had a slowdown due to the extra time adding in markers after the switch to in-order queues. That PR helped a lot, but ZeroRK had about a 30% slowdown still. This PR adjusts isEmptyQueue so that we now match pre in-order queue times. It does this by using the IsEmptyQueue_ atomic to avoid adding markers if IsEmptyQueue_=true, but if IsEmptyQueue_=false, instead of adding the marker, it first checks with LZ if the queue is actually busy. If it's not busy, it will avoid adding the marker.
Doing this without locks leads to a possible race due to the gap between calling
zeCommandListHostSynchronize(ZeCmdListImm_, 0)and setting IsEmptyQueue_ to the new value. That is, a concurrent submitter couldIsEmptyQueue_.store(false)between the L0 check and aIsEmptyQueue_.store(true)(if the LZ check says the queue is empty (true) since it missed thestore(false)), resulting in setting IsEmptyQueue_ true while there actually is work in the queue.Thus a lock (UpdateEmptyQueueViaLZMtx_) was added to gate whether or not someone is submitting to the queue. If there's a cleaner way to do this, let me know. I liked the new mutex instead of using CommandListMtx for easier reasoning about the correctness.Thus CommandListMtx is used to guard it. We could alternatively have a new mutex to just lock for IsEmptyQueue_ and submitting to the queue, but using CommandListMtx causes less line changes and is conceptually simpler -- we lock whenever someone adds to the command list, which should be the same as when people submit to the queue.For the ZeroRK timings (150 iterations):
This PR also updates the Aurora tests to remove one older LLVM build test to help speed up testing and to allow the ZeroRK performance test to hard fail if the performance isn't met, and thus any future regression should ideally be caught.
I took a look at #1293 and it looks like a complementary fix (targeting different issues). I'm happy to rebase this on top of that PR if desired.