Fixes by kpouget · Pull Request #28 · crc-org/llama.cpp

kpouget · 2026-02-18T19:04:57Z

Make sure to read the contributing guidelines before submitting a PR

Summary by CodeRabbit

New Features
- Added readable backend command names and a new stream synchronization failure status.
Bug Fixes
- Improved buffer-operation validation and handle verification to prevent invalid access.
- Strengthened graph integrity and backend initialization checks with clearer error reporting.
- Improved device initialization diagnostics and safer initialization paths.
Chores
- Device and buffer type names now prefixed with "[virtgpu]"
Documentation
- Added "Shared-memory size" limitation note for VirtGPU backend

openshift-ci · 2026-02-18T19:05:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-02-18T19:05:24Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds runtime validation and safety checks across the virtgpu remoting backend: graph and buffer validators, decoder/handle sanity checks, backend init/sync guards, new APIR return/mapping helpers, command-name generation changes, and small logging and name-prefixing updates.

Changes

Cohort / File(s)	Summary
Backend graph & dispatch `ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp`, `ggml/src/ggml-virtgpu/backend/backend-dispatched.gen.h`	Adds pre-/post-deserialization graph size and integrity checks, backend-init guard, conditional sync call, improved logging, and removes old in-header dispatch-name helper.
Buffer APIs & types `ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp`, `ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer-type.cpp`, `ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h`	Introduces offset+size overflow validator, decoder/fatal and null-handle checks for buffers, fallback alloc-size logic, and runtime validation against tracked backend_buffers (fatal on unknown).
Backend init & enums `ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp`, `ggml/src/ggml-virtgpu/backend/shared/apir_backend.h`, `ggml/src/ggml-virtgpu/backend/shared/api_remoting.h`	Adds explicit device_count check, new initialize error code `APIR_BACKEND_INITIALIZE_BACKEND_INIT_FAILED`, and new forward return code `APIR_FORWARD_FAILED_TO_SYNC_STREAMS` with adjusted base index.
Generated apir helpers & codegen `ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h`, `ggml/src/ggml-virtgpu/regenerate_remoting.py`	Adds `apir_dispatch_command_name()` mapping function generation and moves command-name mapping out of prior generated header; updates codegen to emit the new helper.
Remoting/forwarding changes `ggml/src/ggml-virtgpu/virtgpu-forward-impl.h`, `ggml/src/ggml-virtgpu/virtgpu-forward.gen.h`, `ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp`, `ggml/src/ggml-virtgpu/virtgpu.cpp`	Captures command-name during REMOTE_CALL_PREPARE, adds abort path using command name, minor error-message tweaks, and small comment/deprecation additions.
Name prefixing & registry `ggml/src/ggml-virtgpu/ggml-backend-reg.cpp`, `ggml/src/ggml-virtgpu/ggml-backend-device.cpp`, `ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp`	Prefixes stored device and buffer-type names with `"[virtgpu] "` and adds clarifying comments in name getters; adds allocation/error handling for prefixed names.
Docs `docs/backend/VirtGPU.md`	Adds "Shared-memory size" limitation note about libkrun 64 GB RAM+VRAM addressability.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

remoting: improve the frontend<>backend error handling #7: Overlapping edits to APIR remoting enums, dispatch mapping, and forwarding macros.
[remoting] Rebase on top of b6945 #15: Related moves of command-name mapping (removing old helper, introducing new apir_dispatch_command_name).

Suggested reviewers

gbraad
praveenkumar

Poem

🐰 I hopped through graphs and buffers with care,
I checked each handle, made logs more aware.
I stitched names with "[virtgpu]" so neat,
Synchronized streams and kept errors discreet,
A rabbit's small patch—stability to share.

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The pull request title 'Fixes' is vague and does not clearly summarize the primary changes.	Update the title to describe the main objective, such as 'Add validation and error handling to VirtGPU backend dispatched operations' to reflect the substantive changes.
Description check	⚠️ Warning	The pull request description only contains a template reference and provides no meaningful context about the changes, objectives, or implementation details.	Replace the template link with a clear description of the changes made, including the purpose of the validation and error-handling enhancements across the backend modules.
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kpouget · 2026-02-18T19:08:39Z

/test topsail remoting_cuda
/cluster ibm_vm

topsail-bot · 2026-02-18T19:37:10Z

🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 05 minutes 54 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: remoting_cuda

kpouget · 2026-02-18T20:49:45Z

/test topsail remoting_mac

topsail-bot · 2026-02-18T21:18:23Z

🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 00 minutes 06 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: remoting_mac

Failure indicator:

/tmp/topsail_202602181771449495/000__mac_ai__remote_capture_system_state/FAILURE | [000__mac_ai__remote_capture_system_state] ./run_toolbox.py mac_ai remote_capture_system_state --> 4
/tmp/topsail_202602181771449495/001__remote__retrieve/FAILURE | [001__remote__retrieve] ./run_toolbox.py remote retrieve --path=/tmp/topsail_202602181771449495 --dest=/tmp/topsail_202602181771449495 --> 4

kpouget · 2026-02-19T07:53:56Z

/test topsail remoting_mac

topsail-bot · 2026-02-19T07:57:33Z

🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 00 minutes 05 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: remoting_mac

Failure indicator:

/tmp/topsail_202602191771487846/000__mac_ai__remote_capture_system_state/FAILURE | [000__mac_ai__remote_capture_system_state] ./run_toolbox.py mac_ai remote_capture_system_state --> 4
/tmp/topsail_202602191771487846/001__remote__retrieve/FAILURE | [001__remote__retrieve] ./run_toolbox.py remote retrieve --path=/tmp/topsail_202602191771487846 --dest=/tmp/topsail_202602191771487846 --> 4

kpouget · 2026-02-19T14:29:13Z

/test topsail remoting_mac
/cluster mac

topsail-bot · 2026-02-19T15:22:39Z

🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 36 minutes 44 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: remoting_mac

Failure indicator:

/tmp/topsail_202602191771512353/016__build_llama_cpp_remoting/FAILURE | RuntimeError: Failed to build llama-cpp/remoting. See /tmp/topsail_202602191771512353/016__build_llama_cpp_remoting/build.log
Traceback (most recent call last):
  File "/opt/topsail/src/projects/mac_ai/testing/prepare_llama_cpp.py", line 348, in prepare_from_source
    raise RuntimeError(f"Failed to build llama-cpp/{inference_server_flavor}. See {env.ARTIFACT_DIR}/build.log")
RuntimeError: Failed to build llama-cpp/remoting. See /tmp/topsail_202602191771512353/016__build_llama_cpp_remoting/build.log

kpouget · 2026-02-20T08:59:51Z

/test topsail remoting_cuda
/cluster ibm_vm
/var prepare.podman.machine.remoting_env.env_extra: {GGML_CUDA_DISABLE_GRAPHS: "1"}

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

ggml/src/ggml-virtgpu/backend/shared/api_remoting.h (1)
83-88: ⚠️ Potential issue | 🟡 Minor

Missing APIR_FORWARD_FAILED_TO_SYNC_STREAMS in apir_forward_error mapping.

The new enum value APIR_FORWARD_FAILED_TO_SYNC_STREAMS = 3 (Line 38) has no corresponding entry in apir_forward_error(). If this code is returned, the function falls through to "Unknown APIR_COMMAND_TYPE_FORWARD error" instead of producing the correct name. The REMOTE_CALL macro in virtgpu-forward-impl.h calls apir_forward_error(ret_name) when ret_name < APIR_FORWARD_BASE_INDEX, so this will produce a misleading error message.
Proposed fix
     APIR_FORWARD_ERROR(APIR_FORWARD_SUCCESS);
     APIR_FORWARD_ERROR(APIR_FORWARD_NO_DISPATCH_FCT);
     APIR_FORWARD_ERROR(APIR_FORWARD_TIMEOUT);
+    APIR_FORWARD_ERROR(APIR_FORWARD_FAILED_TO_SYNC_STREAMS);
     APIR_FORWARD_ERROR(APIR_FORWARD_BASE_INDEX);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/backend/shared/api_remoting.h` around lines 83 - 88,
The apir_forward_error() function is missing a case for the new enum value
APIR_FORWARD_FAILED_TO_SYNC_STREAMS (value 3), so calls from REMOTE_CALL (in
virtgpu-forward-impl.h) can fall through to the generic "Unknown
APIR_COMMAND_TYPE_FORWARD error"; add an
APIR_FORWARD_ERROR(APIR_FORWARD_FAILED_TO_SYNC_STREAMS) entry in
apir_forward_error() alongside the existing
APIR_FORWARD_SUCCESS/NO_DISPATCH_FCT/TIMEOUT/BASE_INDEX entries so the enum maps
to its correct name. Ensure the symbol APIR_FORWARD_FAILED_TO_SYNC_STREAMS is
used exactly as defined in the enum.
ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp (1)
135-144: ⚠️ Potential issue | 🟡 Minor

Add validation for decoded tensor pointers in set_tensor, get_tensor, and cpy_tensor functions.

Tensor pointers decoded at lines 53, 96, and 137-138 are passed directly to buffer interface methods without null or fatal state checks, unlike the buffer handle validation at lines 27, 46, and 89. If a malicious guest sends invalid tensor handles, this could cause crashes in the backend.

Add tensor validation after decoding:
Example fix for cpy_tensor (applies to set_tensor and get_tensor similarly)
    const ggml_tensor * src;
    // safe to remove the const qualifier here
    src               = apir_decode_ggml_tensor(dec);
    ggml_tensor * dst = (ggml_tensor *) (uintptr_t) apir_decode_ggml_tensor(dec);

+    if (!src || !dst || apir_decoder_get_fatal(dec)) {
+        GGML_LOG_ERROR(GGML_VIRTGPU_BCK "%s: Invalid tensor handle from guest\n", __func__);
+        return 1;
+    }
+
    bool ret = buffer->iface.cpy_tensor(buffer, src, (ggml_tensor *) dst);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp` around lines 135
- 144, Decoded tensor pointers returned by apir_decode_ggml_tensor are used
without validation in the set_tensor/get_tensor/cpy_tensor flow; add checks
after each apir_decode_ggml_tensor call (the ones producing src and dst in the
shown snippet and the analogous decodes in set_tensor and get_tensor) to ensure
they are non-null and not in a fatal/invalid state before calling
buffer->iface.cpy_tensor / buffer->iface.set_tensor / buffer->iface.get_tensor;
if validation fails, return an error (e.g., non-zero) and avoid calling the
buffer interface. Use the same validation pattern you already use for the buffer
handle (check the decoded pointer, log or handle the error, and early return)
and apply it to the functions referencing apir_decode_ggml_tensor, src, dst, and
any casts to (ggml_tensor *).
ggml/src/ggml-virtgpu/virtgpu.cpp (4)
533-542: ⚠️ Potential issue | 🟠 Major

static apir_decoder response_dec is a data race — should be thread_local

remote_call_prepare uses thread_local for both encoder_buffer and enc, but response_dec at line 535 is static — shared across all threads. Concurrent calls from different threads will race on response_dec.cur/response_dec.end, and the decoder pointer written to *decoder can be overwritten before the caller reads it.

Additionally, response_dec.fatal is never reset between calls (unlike enc which is re-initialized with enc = {...} on every call), so a stale fatal flag from a prior error can cause spurious failures reported by remote_call_finish.
🐛 Proposed fix
-    static apir_decoder response_dec;
+    thread_local apir_decoder response_dec;
+    response_dec.fatal = false;
     response_dec.cur = (char *) gpu->reply_shmem.mmap_ptr + sizeof(*atomic_reply_notif);
     response_dec.end = (char *) gpu->reply_shmem.mmap_ptr + gpu->reply_shmem.mmap_size;
     *decoder         = &response_dec;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/virtgpu.cpp` around lines 533 - 542, The decoder
instance response_dec used in remote_call_prepare is declared static and causes
data races; change it to thread_local and ensure its state is reinitialized per
call: set response_dec.cur and response_dec.end as done already, and explicitly
reset response_dec.fatal (and any other relevant fields) before assigning
*decoder; mirror how encoder_buffer/enc are reinitialized in remote_call_prepare
so concurrent threads do not share or retain stale fatal flags (affects
response_dec, remote_call_prepare, and remote_call_finish).
167-173: ⚠️ Potential issue | 🟡 Minor

Stray ) embedded inside format string literals

The format strings at line 167 and line 172 end with %d | %s)" and %d)" respectively — the ) is inside the string literal and will appear literally in the log output (e.g., "apir code=3)"). The string fix itself is a good improvement.
✏️ Proposed fix (line 172 shown; apply the same pattern to line 167)
-                   "%s: the API Remoting backend library failed to initialize its backend library: apir code=%d)", __func__,
+                   "%s: the API Remoting backend library failed to initialize its backend library: apir code=%d", __func__,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/virtgpu.cpp` around lines 167 - 173, The format strings
passed to GGML_ABORT (using GGML_VIRTGPU and __func__) incorrectly include a
closing parenthesis inside the string literal ("%d | %s)" and "%d)"), causing an
extra ')' to appear in logs; update both calls (the GGML_ABORT invocation
handling apir_ret and the branch computing lib_ret) so the format strings end
without the trailing ')' (e.g., "%d | %s" and "%d") and place the final closing
parenthesis as part of the GGML_ABORT call itself, keeping references to
__func__, apir_ret, apir_load_library_error(apir_ret), and lib_ret unchanged.
264-290: ⚠️ Potential issue | 🟠 Major

GGML_ABORT on open() failure prevents enumeration fallback; missing available_nodes guard

Two issues:

Abort in loop: GGML_ABORT at line 269 is fatal/noreturn. virtgpu_open calls this function in a loop over all DRM devices and continues on non-success. If open() fails for any device (e.g., an integrated GPU node with restricted render access), the process aborts before trying subsequent devices. Should be GGML_LOG_ERROR + return APIR_ERROR_INITIALIZATION_FAILED.

Missing available_nodes guard: dev->nodes[DRM_NODE_RENDER] is only valid when bit DRM_NODE_RENDER is set in dev->available_nodes. Accessing nodes[DRM_NODE_RENDER] unconditionally can dereference a NULL or garbage pointer on devices that lack a render node.
🐛 Proposed fix
 static virt_gpu_result_t virtgpu_open_device(virtgpu * gpu, const drmDevicePtr dev) {
+    if (!(dev->available_nodes & (1 << DRM_NODE_RENDER))) {
+        return APIR_ERROR_INITIALIZATION_FAILED;
+    }
+
     const char * node_path = dev->nodes[DRM_NODE_RENDER];

     int fd = open(node_path, O_RDWR | O_CLOEXEC);
     if (fd < 0) {
-        GGML_ABORT(GGML_VIRTGPU
-                   "%s: failed to open %s", __func__, node_path);
-        return APIR_ERROR_INITIALIZATION_FAILED;
+        GGML_LOG_ERROR(GGML_VIRTGPU
+                       "%s: failed to open %s: %s\n", __func__, node_path, strerror(errno));
+        return APIR_ERROR_INITIALIZATION_FAILED;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/virtgpu.cpp` around lines 264 - 290, In
virtgpu_open_device, avoid the fatal GGML_ABORT on open() failure and instead
log the error (GGML_LOG_ERROR) and return APIR_ERROR_INITIALIZATION_FAILED so
virtgpu_open can continue iterating devices; additionally, before reading
dev->nodes[DRM_NODE_RENDER] check that dev->available_nodes has the
DRM_NODE_RENDER bit set (e.g., if (!(dev->available_nodes & (1 <<
DRM_NODE_RENDER))) log and return APIR_ERROR_INITIALIZATION_FAILED) to prevent
dereferencing a missing render node; update both the initial node_path access
and any subsequent uses of dev->nodes[DRM_NODE_RENDER] accordingly, leaving
virtgpu_open and virtgpu_open_device behavior otherwise unchanged.
126-146: ⚠️ Potential issue | 🔴 Critical

Dead else if branch — duplicate condition prevents symbol-missing error handling

Line 137 repeats the exact condition from line 126 (ret == APIR_LOAD_LIBRARY_ENV_VAR_MISSING), making the branch unreachable. Its message references "symbols are missing", which indicates the intended check was APIR_LOAD_LIBRARY_SYMBOL_MISSING. As-is, symbol-missing errors fall through to the generic else, producing a misleading message.
🐛 Proposed fix
-    } else if (ret == APIR_LOAD_LIBRARY_ENV_VAR_MISSING) {
+    } else if (ret == APIR_LOAD_LIBRARY_SYMBOL_MISSING) {
         GGML_ABORT(GGML_VIRTGPU
                    "%s: could not load the backend library, some symbols are missing. "
                    "Make sure virglrenderer is correctly configured by the hypervisor. (%s) ",
                    __func__, apir_load_library_error(ret));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/virtgpu.cpp` around lines 126 - 146, The duplicated
branch checks APIR_LOAD_LIBRARY_ENV_VAR_MISSING twice, making the "symbols are
missing" case unreachable; update the third condition to check
APIR_LOAD_LIBRARY_SYMBOL_MISSING instead of APIR_LOAD_LIBRARY_ENV_VAR_MISSING so
that the branch invoking GGML_ABORT with the "some symbols are missing" message
is executed for symbol-missing errors (leave the other branches using
APIR_LOAD_LIBRARY_ENV_VAR_MISSING and APIR_LOAD_LIBRARY_CANNOT_OPEN, and keep
apir_load_library_error(ret) and GGML_ABORT usage unchanged).

🧹 Nitpick comments (8)

ggml/src/ggml-virtgpu/virtgpu-forward-impl.h (1)
9-17: Macro-scoped variables leak into the enclosing scope by design — document the coupling.

REMOTE_CALL_PREPARE_forward_flag and REMOTE_CALL_PREPARE_command_name are intentionally declared outside the do { ... } while(0) block so that REMOTE_CALL can reference REMOTE_CALL_PREPARE_command_name. This implicit coupling between the two macros is functional but fragile — using REMOTE_CALL without a preceding REMOTE_CALL_PREPARE in the same scope will cause a compile error (which is acceptable), but the intent would be clearer with a brief comment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h` around lines 9 - 17, The macros
REMOTE_CALL_PREPARE and REMOTE_CALL are coupled because REMOTE_CALL_PREPARE
intentionally declares REMOTE_CALL_PREPARE_forward_flag and
REMOTE_CALL_PREPARE_command_name outside the do{...}while(0) block so
REMOTE_CALL can reference REMOTE_CALL_PREPARE_command_name; add a concise
comment above the REMOTE_CALL_PREPARE macro explaining this intentional scope
leak, naming the variables (REMOTE_CALL_PREPARE_forward_flag,
REMOTE_CALL_PREPARE_command_name) and warning that REMOTE_CALL must be used in
the same scope after REMOTE_CALL_PREPARE to avoid compile errors, to make the
coupling explicit and reduce fragility.
ggml/src/ggml-virtgpu/ggml-backend.cpp (1)
63-75: Initialization failure path is sound, but consider zeroing ctx fields on failure.

If apir_backend_initialize partially writes to ctx->device_handle or ctx->backend_id before failing internally, those fields will hold stale/partial values after the nullptr return. A subsequent retry of ggml_backend_remoting_device_init on the same device would call apir_backend_initialize again, which would overwrite them — so this is likely fine in practice. Still, zeroing on failure would be a defensive improvement.
Proposed defensive zeroing on failure
     if (init_result != APIR_BACKEND_INITIALIZE_SUCCESS) {
         // Backend initialization failed
         GGML_LOG_ERROR(GGML_VIRTGPU "%s: Backend initialization failed with result=%d\n", __func__, init_result);
+        ctx->device_handle = 0;
+        ctx->backend_id = 0;
         return nullptr;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/ggml-backend.cpp` around lines 63 - 75, The failure
path in ggml_backend_remoting_device_init may leave ctx->device_handle and
ctx->backend_id partially written by apir_backend_initialize; update the failure
branch (after init_result != APIR_BACKEND_INITIALIZE_SUCCESS) to defensively
clear ctx->device_handle and ctx->backend_id (e.g., set ctx->device_handle =
nullptr and ctx->backend_id = 0 or use memset on those fields) before logging
via GGML_LOG_ERROR and returning nullptr so the context is left in a consistent,
zeroed state for any future retries.
ggml/src/ggml-virtgpu/regenerate_remoting.py (1)
148-172: New apir_dispatch_command_name generator produces clean enum-to-name mapping.

The generated switch function looks correct. Note that the generated codebase will now have two similar name-mapping functions: apir_dispatch_command_name (in apir_backend.gen.h, returning names like "device_get_name") and backend_dispatch_command_name (in backend-dispatched.gen.h, returning names like "backend_device_get_name"). If the intent is for these to serve different layers (shared APIR header vs. backend dispatch), this is fine — just be aware of the duplication.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/regenerate_remoting.py` around lines 148 - 172, The new
generator creates apir_dispatch_command_name that returns names like
"device_get_name" which duplicates semantics of backend_dispatch_command_name
(returns "backend_device_get_name"); decide and implement one of two fixes:
either make apir_dispatch_command_name reuse or call through to
backend_dispatch_command_name (or vice-versa) to avoid duplicate mapping logic,
or explicitly document/distinguish the two by adding a clear comment and
different return prefixes; update the generator that builds
apir_dispatch_command_name (the loop creating clean_name and case for
{func['enum_name']}) to either prepend the "backend_" prefix to match
backend_dispatch_command_name or add a generated comment above the function
explaining the intended difference so callers are not confused.
ggml/src/ggml-virtgpu/backend/backend.cpp (1)
88-94: Dead code: library_reg can never be NULL here.

Line 68 assigns library_reg to either virgl_library_reg (if non-NULL) or the compile-time constant GGML_DEFAULT_BACKEND_REG ("ggml_backend_init"). Since the fallback is always non-NULL, the condition !library_reg on line 88 is unreachable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/backend/backend.cpp` around lines 88 - 94, The check
for `!library_reg` is dead because `library_reg` is assigned from
`virgl_library_reg` or the compile-time `GGML_DEFAULT_BACKEND_REG` (non-NULL);
remove the unreachable if-block that logs and returns
`APIR_LOAD_LIBRARY_ENV_VAR_MISSING`. Instead, if you intended to detect
registration failure, perform that check against the actual backend
registration/return value from the registration call (refer to `library_reg`,
`virgl_library_reg`, and `GGML_DEFAULT_BACKEND_REG`) rather than testing
`library_reg` for NULL.
ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp (1)
108-122: ret from REMOTE_CALL is unused after cleanup — consider logging on failure.

apir_backend_cleanup ignores the return code from the remote call. A failed cleanup would silently leak the backend instance on the host, which is the exact GPU OOM scenario this PR aims to fix. At minimum, log a warning.
Proposed fix
     REMOTE_CALL(gpu, encoder, decoder, ret);
 
+    if (ret != 0) {
+        GGML_LOG_ERROR(GGML_VIRTGPU "%s: Backend cleanup failed (ret=%d) for device_handle=%p backend_id=%u\n",
+                       __func__, ret, (void*)device_handle, backend_id);
+    }
+
     remote_call_finish(gpu, encoder, decoder);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp` around lines 108 - 122,
apir_backend_cleanup is ignoring the ApirForwardReturnCode ret returned by
REMOTE_CALL; update the function to check ret after REMOTE_CALL and handle
failures (at minimum log a warning/error and avoid silently proceeding to
remote_call_finish when cleanup failed) so host-side backend leaks are visible;
reference the local variables ret, REMOTE_CALL invocation, and
remote_call_finish within apir_backend_cleanup to add a conditional that logs
the failure (including ret) and performs appropriate error-handling before
returning.
ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp (2)
129-155: cleanup_device_extension: ggml_backend_free is called under both locks — verify no re-entrant access.

ggml_backend_free at Line 142 runs while holding device_extensions_mutex and backends_mutex. If the backend's destructor interacts with the device extension system (e.g., logging callbacks that access device_extensions), this would deadlock. Likely safe given the current codebase, but worth keeping in mind.

An alternative pattern is to collect instances into a local vector under the lock, release the lock, then free backends outside the lock:
Sketch
     // Clean up all backend instances
+    std::vector<apir_backend_instance*> to_free;
     {
         std::lock_guard<std::mutex> backends_lock(ext->backends_mutex);
         for (auto& [backend_id, instance] : ext->backend_instances) {
-            if (instance->bck) {
-                ggml_backend_free(instance->bck);
-            }
-
-            instance->magic = 0;
-            delete instance;
+            to_free.push_back(instance);
         }
         ext->backend_instances.clear();
     }
+    // Free backends outside the lock
+    for (auto* instance : to_free) {
+        if (instance->bck) {
+            ggml_backend_free(instance->bck);
+        }
+        instance->magic = 0;
+        delete instance;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp` around lines 129 - 155,
cleanup_device_extension currently calls ggml_backend_free while holding
device_extensions_mutex and ext->backends_mutex which can deadlock if backend
teardown re-enters device extension code; fix by under device_extensions_mutex
and ext->backends_mutex move pointers of ext->backend_instances (instances) into
a local vector, clear ext->backend_instances and set ext->magic = 0 and remove
ext from device_extensions while still under the lock, then release both locks
and iterate the local vector to call ggml_backend_free, reset instance->magic
and delete each instance, and finally delete ext after locks are released — this
uses the symbols cleanup_device_extension, device_extensions_mutex,
backends_mutex, apir_device_extension::backend_instances, ggml_backend_free,
instance->magic and ext->magic to locate where to change.
157-181: backend_id type narrowing: uintptr_t → uint32_t truncation.

create_backend_instance returns uintptr_t, but Line 178 writes it into a uint32_t*. If next_backend_id ever exceeds UINT32_MAX, the ID will silently wrap and collide with existing entries. Consider using uint32_t consistently for next_backend_id and the return type, or adding a bounds check here.
Option A: Use uint32_t throughout

In backend-dispatched.h, change next_backend_id and the return type of create_backend_instance:
-    uintptr_t next_backend_id;
+    uint32_t next_backend_id;
-uintptr_t create_backend_instance(ggml_backend_dev_t dev);
+uint32_t create_backend_instance(ggml_backend_dev_t dev);
And update the map key type accordingly:
-    std::unordered_map<uintptr_t, apir_backend_instance*> backend_instances;
+    std::unordered_map<uint32_t, apir_backend_instance*> backend_instances;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp` around lines 157 - 181,
The code casts a uintptr_t backend_id from create_backend_instance into a
uint32_t out_backend_id, risking truncation/collision if next_backend_id grows
beyond UINT32_MAX; update the implementation so types are consistent or guard
the cast: either change next_backend_id and create_backend_instance return type
to uint32_t (and update any map keys/usages) so backend_dispatch_initialize can
assign directly, or add an explicit bounds check in backend_dispatch_initialize
(check backend_id <= UINT32_MAX) and return
APIR_BACKEND_INITIALIZE_BACKEND_INIT_FAILED on overflow before casting to
uint32_t; reference backend_dispatch_initialize, create_backend_instance,
next_backend_id, out_backend_id and out_handle when making the change.
ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp (1)
71-85: Security: Guest-supplied device_handle is cast to a host pointer — currently safe but fragile.

device_handle arrives from the untrusted guest and is cast to ggml_backend_dev_t. It is only used as a map key (no dereference), so it's safe today. However, any future code that dereferences device before the map lookup would be an exploitable arbitrary-read/write. Consider validating device_handle against a known-good set (e.g., compare it to the global dev pointer) before proceeding.

The same pattern appears in backend_backend_cleanup at lines 152-154.
Proposed validation
     // Decode device handle first
     uintptr_t device_handle;
     apir_decode_uintptr_t(dec, &device_handle);
-    ggml_backend_dev_t device = (ggml_backend_dev_t)device_handle;
+    ggml_backend_dev_t device = (ggml_backend_dev_t)device_handle;
+
+    // Validate that the device handle corresponds to a known device
+    if (get_device_extension(device) == nullptr) {
+        GGML_LOG_ERROR(GGML_VIRTGPU_BCK "%s: Unknown device handle %p\n", __func__, (void*)device_handle);
+        apir_decoder_set_fatal(dec);
+        return 1;
+    }
This already happens implicitly later, but an explicit early check at the boundary documents the security intent and avoids accidentally using device before validation in future refactors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp` around lines 71
- 85, The guest-provided device_handle is being cast to ggml_backend_dev_t and
used as a key—make this explicit and safe by validating the raw uintptr_t before
casting: add an early check in the dispatch path (before casting to
ggml_backend_dev_t and before calling get_backend_instance) that confirms
device_handle matches a known-good value (e.g., compare against the global dev
pointer cast to uintptr_t or call a new is_valid_device_handle(uintptr_t)
helper), and return an error (apir_decoder_set_fatal and non-zero) if it fails;
apply the same pattern in backend_backend_cleanup where device_handle is decoded
and cast so future code cannot accidentally dereference an untrusted pointer.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp`:
- Around line 55-57: The GGML_UNUSED(enc) is incorrect because enc is actually
used later in backend_backend_graph_compute when calling
apir_encode_ggml_status; remove the GGML_UNUSED(enc) statement and keep only
GGML_UNUSED(ctx) (or otherwise mark only unused symbols) so that enc is not
erroneously annotated as unused; update backend_backend_graph_compute to stop
casting away enc and ensure apir_encode_ggml_status calls continue to use enc
as-is.
- Around line 80-85: The code currently releases backends_mutex inside
get_backend_instance(), creating a race where cleanup_backend_instance() can
free apir_backend_instance->bck before graph_compute() uses it; to fix, either
hold backends_mutex for the entire duration of graph_compute()'s use of the
instance (acquire before get_backend_instance() and release after the
compute/dereference) or change apir_backend_instance to be reference-counted
(e.g., wrap in shared_ptr) and bump the refcount in get_backend_instance() and
drop it after graph_compute() finishes; update graph_compute(),
get_backend_instance(), and cleanup_backend_instance() accordingly and ensure
you validate instance->magic after acquiring the reference or lock before
dereferencing instance->bck.

In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer-type.cpp`:
- Around line 86-93: apir_decode_ggml_tensor_inplace can return nullptr, so
guard the returned pointer `op` before using it: after calling
apir_decode_ggml_tensor_inplace(dec) check if op == nullptr and handle the
decoder error path (e.g., log/report the decode error and early-return an error
code or set value to 0) so you never call buft->iface.get_alloc_size(buft, op)
or ggml_nbytes(op) with a null pointer; update the block around the symbols
apir_decode_ggml_tensor_inplace, op, buft->iface.get_alloc_size and ggml_nbytes
accordingly.

In `@ggml/src/ggml-virtgpu/backend/backend-dispatched.h`:
- Around line 28-31: The comment in struct apir_backend_instance incorrectly
documents the magic value as 0xAB1234CD; update the inline comment next to the
magic field in apir_backend_instance to match the actual constant
APIR_BACKEND_INSTANCE_MAGIC (0xCD4321BA) so the doc and code are consistent
(locate struct apir_backend_instance and replace the wrong hex in the comment
with APIR_BACKEND_INSTANCE_MAGIC / 0xCD4321BA).

In `@ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h`:
- Around line 108-118: The access to the global unordered_set backend_buffers in
the validation block is not thread-safe; modify the global to pair it with a
synchronization primitive (e.g., declare a std::shared_mutex
backend_buffers_mutex alongside backend_buffers at file scope in a shared
header), move the extern declarations out of the inline function into that
header, then update readers (this validation code in apir_cs_ggml.h) to take a
shared/read lock (std::shared_lock) before calling backend_buffers.find(...) and
update apir_track_backend_buffer() and apir_untrack_backend_buffer() to take an
exclusive/unique lock (std::unique_lock) when mutating the set so all accesses
are synchronized while preserving concurrent reads.

In `@ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp`:
- Around line 38-41: The function currently returns a pointer to a single static
buffer (prefixed_name) which is shared across GPUs and races; instead, build the
"[virtgpu] %s" string once when gpu->cached_buffer_type is initialized (device
init) and store it on the per-instance struct (either add a field on virtgpu or
on cached_buffer_type, e.g., cached_buffer_type.prefixed_name), then remove the
static prefixed_name from the accessor and make get_name simply return the
stored per-instance pointer (ensure allocation lifetime covers the device and
access is thread-safe).

In `@ggml/src/ggml-virtgpu/ggml-backend-device.cpp`:
- Around line 3-11: The function ggml_backend_remoting_device_get_name currently
uses a static local buffer prefixed_name which causes a race/dangling-pointer;
instead, populate each device's context name once at device registration (in the
code that sets ctx->name in ggml-backend-reg.cpp) to include the "[virtgpu] "
prefix plus the existing device-specific identifier (preserving names like
"ggml-virtgpu0"/"ggml-virtgpu1"), and then change
ggml_backend_remoting_device_get_name to simply return ctx->name.c_str() (remove
the static buffer usage and any snprintf there).

In `@ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp`:
- Line 176: The format specifier in the GGML_ABORT call using
GGML_VIRTGPU/virtgpu-forward-device.cpp is wrong for the variable size (a
size_t) — change the printf-style format from "%ld" to "%zu" (or alternatively
cast size to unsigned long and use "%lu") in the GGML_ABORT invocation that
includes __func__ and size so the output is defined and correct across
platforms.

In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h`:
- Around line 30-33: The abort message in the macro should include the
GGML_VIRTGPU prefix and the macro must not unconditionally abort on backend
errors if callers (e.g., apir_backend_initialize in virtgpu-forward-backend.cpp)
expect to handle non-zero return codes; update the macro that calls GGML_ABORT
so the string begins with "GGML_VIRTGPU: " (referring to
REMOTE_CALL_PREPARE_command_name) and provide a non-aborting variant (or a
flagged version) that returns the backend return code instead of calling
GGML_ABORT so callers like apir_backend_initialize can inspect ret_name and
handle APIR_BACKEND_INITIALIZE_BACKEND_INIT_FAILED rather than being forced into
abort; adjust usages or remove dead code in apir_backend_initialize accordingly.

---

Outside diff comments:
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp`:
- Around line 135-144: Decoded tensor pointers returned by
apir_decode_ggml_tensor are used without validation in the
set_tensor/get_tensor/cpy_tensor flow; add checks after each
apir_decode_ggml_tensor call (the ones producing src and dst in the shown
snippet and the analogous decodes in set_tensor and get_tensor) to ensure they
are non-null and not in a fatal/invalid state before calling
buffer->iface.cpy_tensor / buffer->iface.set_tensor / buffer->iface.get_tensor;
if validation fails, return an error (e.g., non-zero) and avoid calling the
buffer interface. Use the same validation pattern you already use for the buffer
handle (check the decoded pointer, log or handle the error, and early return)
and apply it to the functions referencing apir_decode_ggml_tensor, src, dst, and
any casts to (ggml_tensor *).

In `@ggml/src/ggml-virtgpu/backend/shared/api_remoting.h`:
- Around line 83-88: The apir_forward_error() function is missing a case for the
new enum value APIR_FORWARD_FAILED_TO_SYNC_STREAMS (value 3), so calls from
REMOTE_CALL (in virtgpu-forward-impl.h) can fall through to the generic "Unknown
APIR_COMMAND_TYPE_FORWARD error"; add an
APIR_FORWARD_ERROR(APIR_FORWARD_FAILED_TO_SYNC_STREAMS) entry in
apir_forward_error() alongside the existing
APIR_FORWARD_SUCCESS/NO_DISPATCH_FCT/TIMEOUT/BASE_INDEX entries so the enum maps
to its correct name. Ensure the symbol APIR_FORWARD_FAILED_TO_SYNC_STREAMS is
used exactly as defined in the enum.

In `@ggml/src/ggml-virtgpu/virtgpu.cpp`:
- Around line 533-542: The decoder instance response_dec used in
remote_call_prepare is declared static and causes data races; change it to
thread_local and ensure its state is reinitialized per call: set
response_dec.cur and response_dec.end as done already, and explicitly reset
response_dec.fatal (and any other relevant fields) before assigning *decoder;
mirror how encoder_buffer/enc are reinitialized in remote_call_prepare so
concurrent threads do not share or retain stale fatal flags (affects
response_dec, remote_call_prepare, and remote_call_finish).
- Around line 167-173: The format strings passed to GGML_ABORT (using
GGML_VIRTGPU and __func__) incorrectly include a closing parenthesis inside the
string literal ("%d | %s)" and "%d)"), causing an extra ')' to appear in logs;
update both calls (the GGML_ABORT invocation handling apir_ret and the branch
computing lib_ret) so the format strings end without the trailing ')' (e.g., "%d
| %s" and "%d") and place the final closing parenthesis as part of the
GGML_ABORT call itself, keeping references to __func__, apir_ret,
apir_load_library_error(apir_ret), and lib_ret unchanged.
- Around line 264-290: In virtgpu_open_device, avoid the fatal GGML_ABORT on
open() failure and instead log the error (GGML_LOG_ERROR) and return
APIR_ERROR_INITIALIZATION_FAILED so virtgpu_open can continue iterating devices;
additionally, before reading dev->nodes[DRM_NODE_RENDER] check that
dev->available_nodes has the DRM_NODE_RENDER bit set (e.g., if
(!(dev->available_nodes & (1 << DRM_NODE_RENDER))) log and return
APIR_ERROR_INITIALIZATION_FAILED) to prevent dereferencing a missing render
node; update both the initial node_path access and any subsequent uses of
dev->nodes[DRM_NODE_RENDER] accordingly, leaving virtgpu_open and
virtgpu_open_device behavior otherwise unchanged.
- Around line 126-146: The duplicated branch checks
APIR_LOAD_LIBRARY_ENV_VAR_MISSING twice, making the "symbols are missing" case
unreachable; update the third condition to check
APIR_LOAD_LIBRARY_SYMBOL_MISSING instead of APIR_LOAD_LIBRARY_ENV_VAR_MISSING so
that the branch invoking GGML_ABORT with the "some symbols are missing" message
is executed for symbol-missing errors (leave the other branches using
APIR_LOAD_LIBRARY_ENV_VAR_MISSING and APIR_LOAD_LIBRARY_CANNOT_OPEN, and keep
apir_load_library_error(ret) and GGML_ABORT usage unchanged).

---

Nitpick comments:
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp`:
- Around line 71-85: The guest-provided device_handle is being cast to
ggml_backend_dev_t and used as a key—make this explicit and safe by validating
the raw uintptr_t before casting: add an early check in the dispatch path
(before casting to ggml_backend_dev_t and before calling get_backend_instance)
that confirms device_handle matches a known-good value (e.g., compare against
the global dev pointer cast to uintptr_t or call a new
is_valid_device_handle(uintptr_t) helper), and return an error
(apir_decoder_set_fatal and non-zero) if it fails; apply the same pattern in
backend_backend_cleanup where device_handle is decoded and cast so future code
cannot accidentally dereference an untrusted pointer.

In `@ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp`:
- Around line 129-155: cleanup_device_extension currently calls
ggml_backend_free while holding device_extensions_mutex and ext->backends_mutex
which can deadlock if backend teardown re-enters device extension code; fix by
under device_extensions_mutex and ext->backends_mutex move pointers of
ext->backend_instances (instances) into a local vector, clear
ext->backend_instances and set ext->magic = 0 and remove ext from
device_extensions while still under the lock, then release both locks and
iterate the local vector to call ggml_backend_free, reset instance->magic and
delete each instance, and finally delete ext after locks are released — this
uses the symbols cleanup_device_extension, device_extensions_mutex,
backends_mutex, apir_device_extension::backend_instances, ggml_backend_free,
instance->magic and ext->magic to locate where to change.
- Around line 157-181: The code casts a uintptr_t backend_id from
create_backend_instance into a uint32_t out_backend_id, risking
truncation/collision if next_backend_id grows beyond UINT32_MAX; update the
implementation so types are consistent or guard the cast: either change
next_backend_id and create_backend_instance return type to uint32_t (and update
any map keys/usages) so backend_dispatch_initialize can assign directly, or add
an explicit bounds check in backend_dispatch_initialize (check backend_id <=
UINT32_MAX) and return APIR_BACKEND_INITIALIZE_BACKEND_INIT_FAILED on overflow
before casting to uint32_t; reference backend_dispatch_initialize,
create_backend_instance, next_backend_id, out_backend_id and out_handle when
making the change.

In `@ggml/src/ggml-virtgpu/backend/backend.cpp`:
- Around line 88-94: The check for `!library_reg` is dead because `library_reg`
is assigned from `virgl_library_reg` or the compile-time
`GGML_DEFAULT_BACKEND_REG` (non-NULL); remove the unreachable if-block that logs
and returns `APIR_LOAD_LIBRARY_ENV_VAR_MISSING`. Instead, if you intended to
detect registration failure, perform that check against the actual backend
registration/return value from the registration call (refer to `library_reg`,
`virgl_library_reg`, and `GGML_DEFAULT_BACKEND_REG`) rather than testing
`library_reg` for NULL.

In `@ggml/src/ggml-virtgpu/ggml-backend.cpp`:
- Around line 63-75: The failure path in ggml_backend_remoting_device_init may
leave ctx->device_handle and ctx->backend_id partially written by
apir_backend_initialize; update the failure branch (after init_result !=
APIR_BACKEND_INITIALIZE_SUCCESS) to defensively clear ctx->device_handle and
ctx->backend_id (e.g., set ctx->device_handle = nullptr and ctx->backend_id = 0
or use memset on those fields) before logging via GGML_LOG_ERROR and returning
nullptr so the context is left in a consistent, zeroed state for any future
retries.

In `@ggml/src/ggml-virtgpu/regenerate_remoting.py`:
- Around line 148-172: The new generator creates apir_dispatch_command_name that
returns names like "device_get_name" which duplicates semantics of
backend_dispatch_command_name (returns "backend_device_get_name"); decide and
implement one of two fixes: either make apir_dispatch_command_name reuse or call
through to backend_dispatch_command_name (or vice-versa) to avoid duplicate
mapping logic, or explicitly document/distinguish the two by adding a clear
comment and different return prefixes; update the generator that builds
apir_dispatch_command_name (the loop creating clean_name and case for
{func['enum_name']}) to either prepend the "backend_" prefix to match
backend_dispatch_command_name or add a generated comment above the function
explaining the intended difference so callers are not confused.

In `@ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp`:
- Around line 108-122: apir_backend_cleanup is ignoring the
ApirForwardReturnCode ret returned by REMOTE_CALL; update the function to check
ret after REMOTE_CALL and handle failures (at minimum log a warning/error and
avoid silently proceeding to remote_call_finish when cleanup failed) so
host-side backend leaks are visible; reference the local variables ret,
REMOTE_CALL invocation, and remote_call_finish within apir_backend_cleanup to
add a conditional that logs the failure (including ret) and performs appropriate
error-handling before returning.

In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h`:
- Around line 9-17: The macros REMOTE_CALL_PREPARE and REMOTE_CALL are coupled
because REMOTE_CALL_PREPARE intentionally declares
REMOTE_CALL_PREPARE_forward_flag and REMOTE_CALL_PREPARE_command_name outside
the do{...}while(0) block so REMOTE_CALL can reference
REMOTE_CALL_PREPARE_command_name; add a concise comment above the
REMOTE_CALL_PREPARE macro explaining this intentional scope leak, naming the
variables (REMOTE_CALL_PREPARE_forward_flag, REMOTE_CALL_PREPARE_command_name)
and warning that REMOTE_CALL must be used in the same scope after
REMOTE_CALL_PREPARE to avoid compile errors, to make the coupling explicit and
reduce fragility.

ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp

coderabbitai · 2026-02-20T09:05:50Z

ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp

+    // Get backend instance
+    apir_backend_instance* instance = get_backend_instance(device, backend_id);
+    if (instance == nullptr || instance->bck == nullptr) {
+        apir_decoder_set_fatal(dec);
+        return 1;
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the file and examine its structure find . -name "*backend-dispatched*" -type f

Repository: crc-org/llama.cpp

Length of output: 479

🏁 Script executed:

# Check the full implementation of get_backend_instance, cleanup_backend_instance, and related code cd ggml/src/ggml-virtgpu/backend 2>/dev/null || find . -path "*ggml-virtgpu/backend*" -name "*.cpp" | head -5

Repository: crc-org/llama.cpp

Length of output: 43

🏁 Script executed:

# Search for the function implementations across the backend files rg "get_backend_instance|cleanup_backend_instance|backend_backend_graph_compute|backend_backend_initialize|backend_backend_cleanup" --type=cpp -B2 -A10

Repository: crc-org/llama.cpp

Length of output: 12055

🏁 Script executed:

wc -l ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp

Repository: crc-org/llama.cpp

Length of output: 117

🏁 Script executed:

# Read the implementation of get_backend_instance in full sed -n '84,102p' ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp

Repository: crc-org/llama.cpp

Length of output: 650

🏁 Script executed:

# Read cleanup_backend_instance sed -n '105,126p' ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp

Repository: crc-org/llama.cpp

Length of output: 768

🏁 Script executed:

# Read backend_backend_graph_compute around lines 130-145 sed -n '70,160p' ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp | cat -n

Repository: crc-org/llama.cpp

Length of output: 3625

🏁 Script executed:

# Check how dispatch commands are invoked and whether they're serialized per context/backend_id rg "apir_backend_dispatch|virgl_apir_context|backend_dispatch_t" --type=cpp -B3 -A5 | head -100

Repository: crc-org/llama.cpp

Length of output: 8124

🏁 Script executed:

# Check if there's any comment or documentation about threading model rg -i "thread|serial|concurrent|mutex|sync|atomic" ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp

Repository: crc-org/llama.cpp

Length of output: 665

🏁 Script executed:

# Check struct definitions for apir_device_extension and related synchronization rg "struct apir_device_extension|struct.*context" --type=h -A15 | head -80

Repository: crc-org/llama.cpp

Length of output: 4456

🏁 Script executed:

# Find the struct definition for apir_device_extension rg "struct apir_device_extension" --type=h -A20

Repository: crc-org/llama.cpp

Length of output: 1918

🏁 Script executed:

# Check how dispatch is called - is it single-threaded per context? cat ggml/src/ggml-virtgpu/backend/backend.cpp | head -200

Repository: crc-org/llama.cpp

Length of output: 5740

🏁 Script executed:

# Check if virgl_apir_context implies per-context serialization rg "virgl_apir_context|ctx_id" ggml/src/ggml-virtgpu/backend/backend.cpp -B2 -A2

Repository: crc-org/llama.cpp

Length of output: 1036

🏁 Script executed:

# Check if there's any async cleanup or background task that could race rg -i "async|thread|queue|background" ggml/src/ggml-virtgpu/backend/*.cpp | head -30

Repository: crc-org/llama.cpp

Length of output: 612

🏁 Script executed:

# Check the magic validation - is it checked before dereferencing instance->bck in graph_compute? sed -n '70,90p' ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp | cat -n

Repository: crc-org/llama.cpp

Length of output: 891

Hold the lock for the entire duration of backend instance use, or switch to reference counting (shared_ptr) to prevent deletion during compute.

get_backend_instance() releases backends_mutex immediately after lookup, creating a window where cleanup_backend_instance() can delete the instance. Between line 81 and lines 136–139, instance->bck may be freed. While get_backend_instance() validates instance->magic, this check is not repeated in graph_compute() before dereferencing. If the protocol relies on strict initialize→compute→cleanup ordering per backend_id with no concurrent access across contexts, document this invariant explicitly; otherwise, either extend the mutex hold to cover the compute operation or use a reference-counted wrapper to prevent premature deletion.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp` around lines 80 - 85, The code currently releases backends_mutex inside get_backend_instance(), creating a race where cleanup_backend_instance() can free apir_backend_instance->bck before graph_compute() uses it; to fix, either hold backends_mutex for the entire duration of graph_compute()'s use of the instance (acquire before get_backend_instance() and release after the compute/dereference) or change apir_backend_instance to be reference-counted (e.g., wrap in shared_ptr) and bump the refcount in get_backend_instance() and drop it after graph_compute() finishes; update graph_compute(), get_backend_instance(), and cleanup_backend_instance() accordingly and ensure you validate instance->magic after acquiring the reference or lock before dereferencing instance->bck.

ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer-type.cpp

coderabbitai · 2026-02-20T09:05:50Z

ggml/src/ggml-virtgpu/backend/backend-dispatched.h

+struct apir_backend_instance {
+    ggml_backend_t bck;  // The actual backend
+    uint32_t magic;      // For validation: 0xAB1234CD
+};


⚠️ Potential issue | 🟡 Minor

Incorrect magic value in comment.

The comment on Line 30 says 0xAB1234CD but apir_backend_instance uses APIR_BACKEND_INSTANCE_MAGIC which is 0xCD4321BA (Line 45).

Proposed fix

struct apir_backend_instance { ggml_backend_t bck; // The actual backend - uint32_t magic; // For validation: 0xAB1234CD + uint32_t magic; // For validation: APIR_BACKEND_INSTANCE_MAGIC (0xCD4321BA) };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

struct apir_backend_instance {

ggml_backend_t bck; // The actual backend

uint32_t magic; // For validation: 0xAB1234CD

};

struct apir_backend_instance {

ggml_backend_t bck; // The actual backend

uint32_t magic; // For validation: APIR_BACKEND_INSTANCE_MAGIC (0xCD4321BA)

};

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/backend/backend-dispatched.h` around lines 28 - 31, The comment in struct apir_backend_instance incorrectly documents the magic value as 0xAB1234CD; update the inline comment next to the magic field in apir_backend_instance to match the actual constant APIR_BACKEND_INSTANCE_MAGIC (0xCD4321BA) so the doc and code are consistent (locate struct apir_backend_instance and replace the wrong hex in the comment with APIR_BACKEND_INSTANCE_MAGIC / 0xCD4321BA).

coderabbitai · 2026-02-20T09:05:51Z

ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h

+    // SECURITY: Validate buffer handle against tracked buffers to prevent
+    // guest VM from providing arbitrary host memory addresses
+    if (buffer) {
+        extern std::unordered_set<ggml_backend_buffer_t> backend_buffers;
+        if (backend_buffers.find(buffer) == backend_buffers.end()) {
+            GGML_LOG_WARN("ggml-virtgpu-backend: %s: Invalid buffer handle from guest: %p\n", __func__, (void*)buffer);
+            // Set fatal flag to prevent further processing with invalid handle
+            apir_decoder_set_fatal(dec);
+            return NULL;
+        }
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find the definition and all accesses to backend_buffers echo "=== All occurrences of backend_buffers ===" rg -n 'backend_buffers' --type=cpp --type=h echo "" echo "=== Context around definition (likely std::unordered_set) ===" rg -n 'std::unordered_set.*backend_buffers|backend_buffers.*=' --type=cpp --type=h -B2 -A2 echo "" echo "=== Check for mutex/synchronization primitives ===" rg -n 'mutex|lock|atomic' --type=cpp --type=h -C 1 | head -100 echo "" echo "=== Check inline function definition ===" rg -n 'inline.*apir_decode_ggml_buffer|apir_decode_ggml_buffer' --type=cpp --type=h -B3 -A15

Repository: crc-org/llama.cpp

Length of output: 37214

Address thread-safety of backend_buffers — unprotected concurrent access.

The security validation logic is sound, but std::unordered_set<ggml_backend_buffer_t> backend_buffers is accessed without synchronization. The set is modified in apir_track_backend_buffer() and apir_untrack_backend_buffer() (in apir_cs_ggml-rpc-back.cpp), while simultaneously read in this validation check. If buffer allocation/deallocation occurs on a different thread than the decode path, there is a data race. Protect all accesses to backend_buffers with a mutex, or use a thread-safe container (e.g., std::shared_mutex with read-write locks, or a lock-free data structure).

Additionally, the extern declaration inside the inline function is valid but unconventional. Move it to file scope in a dedicated header for better discoverability and maintainability.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h` around lines 108 - 118, The access to the global unordered_set backend_buffers in the validation block is not thread-safe; modify the global to pair it with a synchronization primitive (e.g., declare a std::shared_mutex backend_buffers_mutex alongside backend_buffers at file scope in a shared header), move the extern declarations out of the inline function into that header, then update readers (this validation code in apir_cs_ggml.h) to take a shared/read lock (std::shared_lock) before calling backend_buffers.find(...) and update apir_track_backend_buffer() and apir_untrack_backend_buffer() to take an exclusive/unique lock (std::unique_lock) when mutating the set so all accesses are synchronized while preserving concurrent reads.

ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp

ggml/src/ggml-virtgpu/ggml-backend-device.cpp

ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp

ggml/src/ggml-virtgpu/virtgpu-forward-impl.h

topsail-bot · 2026-02-20T09:27:56Z

🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 05 minutes 50 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: remoting_cuda
prepare.podman.machine.remoting_env.env_extra:
    GGML_CUDA_DISABLE_GRAPHS: '1'

kpouget · 2026-02-20T09:52:25Z

/test topsail remoting_mac
/cluster mac

topsail-bot · 2026-02-20T12:52:24Z

🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 26 minutes 15 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: remoting_mac

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp (1)
10-58: ⚠️ Potential issue | 🟠 Major

Extend the virgl API to provide shmem size on the backend side, or pass the actual shmem size through the RPC protocol.

The validation gap is real: cgraph_size can be validated against sizeof(ggml_cgraph) but not against the actual mapped shmem size. The backend's virgl_apir_callbacks interface only provides get_shmem_ptr(res_id) which returns a pointer with no size metadata. Without access to the shmem allocation size, apir_new_decoder() could attempt to read past the mapped region if an adversary or bug causes size mismatch.

To fix this properly, either:

Extend virgl_apir_callbacks to include a get_shmem_size(virgl_ctx_id, res_id) callback, or

Include the actual shmem size in the RPC message alongside cgraph_size, then validate cgraph_size <= shmem_size before constructing the decoder.

The current architecture assumes client-side correctness (shmem allocated to exact cgraph_size on line 30 of virtgpu-forward-backend.cpp), but the backend cannot verify this assumption.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp` around lines 10
- 58, The backend currently trusts cgraph_size from the RPC and only checks it
against sizeof(ggml_cgraph) in validate_graph_operation; to fix, ensure the
backend verifies cgraph_size does not exceed the actual shared-memory mapping
before creating the decoder: either add a get_shmem_size(virgl_ctx_id, res_id)
callback to virgl_apir_callbacks and call
ctx->iface->get_shmem_size(ctx->ctx_id, shmem_res_id) (use this size to check
cgraph_size <= shmem_size and call apir_decoder_set_fatal(dec)/return 1 on
failure), or modify the RPC to include the shmem size alongside cgraph_size
(decoded by apir_decode_size_t) and validate cgraph_size <= received_shmem_size
in backend_backend_graph_compute prior to calling apir_new_decoder / using
get_shmem_ptr; keep validate_graph_operation as a secondary check for minimum
sizes.
ggml/src/ggml-virtgpu/backend/shared/api_remoting.h (1)
76-87: ⚠️ Potential issue | 🟡 Minor

Map the new forward error code to a string.

apir_forward_error() doesn’t include APIR_FORWARD_FAILED_TO_SYNC_STREAMS, so Line 83–86 will fall through to the generic "Unknown ..." message for that code. Add it to the mapping.
💡 Suggested fix
     APIR_FORWARD_ERROR(APIR_FORWARD_SUCCESS);
     APIR_FORWARD_ERROR(APIR_FORWARD_NO_DISPATCH_FCT);
     APIR_FORWARD_ERROR(APIR_FORWARD_TIMEOUT);
+    APIR_FORWARD_ERROR(APIR_FORWARD_FAILED_TO_SYNC_STREAMS);
     APIR_FORWARD_ERROR(APIR_FORWARD_BASE_INDEX);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/backend/shared/api_remoting.h` around lines 76 - 87,
The apir_forward_error function is missing a mapping for the new error code
APIR_FORWARD_FAILED_TO_SYNC_STREAMS; update the mapping in apir_forward_error
(the block that calls APIR_FORWARD_ERROR(...)) to include
APIR_FORWARD_FAILED_TO_SYNC_STREAMS so that apir_forward_error returns the
proper string for that enum value instead of falling through to the generic
"Unknown ..." message.

♻️ Duplicate comments (2)

ggml/src/ggml-virtgpu/virtgpu-forward-impl.h (2)

31-32: ⚠️ Potential issue | 🟡 Minor

Add GGML_VIRTGPU prefix to abort message.

Line 31’s abort omits the prefix used elsewhere; align for consistent logging.

🛠️ Suggested fix

-            GGML_ABORT("backend function '%s' failed (return code: %d)",
+            GGML_ABORT(GGML_VIRTGPU "backend function '%s' failed (return code: %d)",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h` around lines 31 - 32, The
GGML_ABORT invocation inside the remote call path omits the "GGML_VIRTGPU"
prefix used elsewhere; update the GGML_ABORT call for REMOTE_CALL_PREPARE to
include a "GGML_VIRTGPU: " (or similar) prefix in its format string so the abort
message is consistent with other virtgpu logs—locate the GGML_ABORT(...) that
references REMOTE_CALL_PREPARE_command_name and ret_name and prepend the prefix
to the message format.

19-33: ⚠️ Potential issue | 🟠 Major

Avoid unconditional abort on non‑zero backend return codes.

Line 30–32 aborts on any backend error, which prevents callers from handling non‑zero return codes. If any call site expects to interpret these codes, this becomes a hard crash. Consider a non‑aborting variant or a flag that lets callers decide.

#!/bin/bash
# Inspect REMOTE_CALL usage and how callers handle ret_name
rg -nP --type=cpp --type=h -C3 '\bREMOTE_CALL\s*\('

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h` around lines 19 - 33, The
REMOTE_CALL macro unconditionally aborts when the backend returns a non-zero
ApirForwardReturnCode (checked with REMOTE_CALL_PREPARE_command_name), which
prevents callers from handling errors; update the macro (or add a new variant
like REMOTE_CALL_NOABORT) so that after computing ret_name =
(ApirForwardReturnCode)(ret_name - APIR_FORWARD_BASE_INDEX) it does NOT call
GGML_ABORT on non-zero return codes but instead leaves ret_name set for the
caller to inspect, or accepts an extra flag/handler parameter that controls
whether to abort or return the code; change the implementation of REMOTE_CALL
(and update any call sites using REMOTE_CALL_PREPARE_command_name /
ApirForwardReturnCode) to use the non‑aborting variant where callers need to
interpret errors.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp`:
- Around line 10-58: The backend currently trusts cgraph_size from the RPC and
only checks it against sizeof(ggml_cgraph) in validate_graph_operation; to fix,
ensure the backend verifies cgraph_size does not exceed the actual shared-memory
mapping before creating the decoder: either add a get_shmem_size(virgl_ctx_id,
res_id) callback to virgl_apir_callbacks and call
ctx->iface->get_shmem_size(ctx->ctx_id, shmem_res_id) (use this size to check
cgraph_size <= shmem_size and call apir_decoder_set_fatal(dec)/return 1 on
failure), or modify the RPC to include the shmem size alongside cgraph_size
(decoded by apir_decode_size_t) and validate cgraph_size <= received_shmem_size
in backend_backend_graph_compute prior to calling apir_new_decoder / using
get_shmem_ptr; keep validate_graph_operation as a secondary check for minimum
sizes.

In `@ggml/src/ggml-virtgpu/backend/shared/api_remoting.h`:
- Around line 76-87: The apir_forward_error function is missing a mapping for
the new error code APIR_FORWARD_FAILED_TO_SYNC_STREAMS; update the mapping in
apir_forward_error (the block that calls APIR_FORWARD_ERROR(...)) to include
APIR_FORWARD_FAILED_TO_SYNC_STREAMS so that apir_forward_error returns the
proper string for that enum value instead of falling through to the generic
"Unknown ..." message.

---

Duplicate comments:
In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h`:
- Around line 31-32: The GGML_ABORT invocation inside the remote call path omits
the "GGML_VIRTGPU" prefix used elsewhere; update the GGML_ABORT call for
REMOTE_CALL_PREPARE to include a "GGML_VIRTGPU: " (or similar) prefix in its
format string so the abort message is consistent with other virtgpu logs—locate
the GGML_ABORT(...) that references REMOTE_CALL_PREPARE_command_name and
ret_name and prepend the prefix to the message format.
- Around line 19-33: The REMOTE_CALL macro unconditionally aborts when the
backend returns a non-zero ApirForwardReturnCode (checked with
REMOTE_CALL_PREPARE_command_name), which prevents callers from handling errors;
update the macro (or add a new variant like REMOTE_CALL_NOABORT) so that after
computing ret_name = (ApirForwardReturnCode)(ret_name - APIR_FORWARD_BASE_INDEX)
it does NOT call GGML_ABORT on non-zero return codes but instead leaves ret_name
set for the caller to inspect, or accepts an extra flag/handler parameter that
controls whether to abort or return the code; change the implementation of
REMOTE_CALL (and update any call sites using REMOTE_CALL_PREPARE_command_name /
ApirForwardReturnCode) to use the non‑aborting variant where callers need to
interpret errors.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cfa7bd8 and c719fe9.

📒 Files selected for processing (12)

ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp
ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp
ggml/src/ggml-virtgpu/backend/shared/api_remoting.h
ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h
ggml/src/ggml-virtgpu/backend/shared/apir_backend.h
ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp
ggml/src/ggml-virtgpu/ggml-backend-device.cpp
ggml/src/ggml-virtgpu/regenerate_remoting.py
ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp
ggml/src/ggml-virtgpu/virtgpu-forward-impl.h
ggml/src/ggml-virtgpu/virtgpu-forward.gen.h
ggml/src/ggml-virtgpu/virtgpu.cpp

🚧 Files skipped from review as they are similar to previous changes (8)

ggml/src/ggml-virtgpu/virtgpu.cpp
ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp
ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp
ggml/src/ggml-virtgpu/backend/shared/apir_backend.h
ggml/src/ggml-virtgpu/virtgpu-forward.gen.h
ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h
ggml/src/ggml-virtgpu/ggml-backend-device.cpp
ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

ggml/src/ggml-virtgpu/virtgpu-forward-impl.h (1)
30-33: Unconditional abort on non-zero backend return code — verify no caller relies on error-code propagation.

The GGML_VIRTGPU prefix is now present (addressing the first part of the prior review). The unconditional GGML_ABORT on any non-zero ret_name however is still in place. The prior review noted that apir_backend_initialize() in virtgpu-forward-backend.cpp contained explicit conditional logic to handle both success and APIR_BACKEND_INITIALIZE_BACKEND_INIT_FAILED, making that logic dead code once this macro is in effect. Please confirm whether that conditional handling has been removed or if there are any callers that still expect a non-abort error-return path.
#!/bin/bash
# Description: Check apir_backend_initialize for remaining dead error-handling code
# and all REMOTE_CALL usage to confirm no caller inspects ret_name post-macro.

echo "=== apir_backend_initialize implementation ==="
rg -n 'apir_backend_initialize' -A 40 -B 2 --type=cpp --type=h

echo ""
echo "=== All REMOTE_CALL macro invocations with surrounding context ==="
rg -n 'REMOTE_CALL\b' --type=cpp --type=h -A 6 -B 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h` around lines 30 - 33, The
current macro unconditionally calls GGML_ABORT when ret_name != 0 (using
GGML_VIRTGPU, REMOTE_CALL_PREPARE_command_name and ret_name), which can deaden
conditional logic in callers such as apir_backend_initialize in
virtgpu-forward-backend.cpp; inspect all REMOTE_CALL invocations and
apir_backend_initialize to determine whether any callers expect non-abort error
propagation, and then either (a) change the macro to return/propagate the error
code instead of calling GGML_ABORT, or (b) update the callers to handle only
aborting failures and remove dead conditional branches—ensure
REMOTE_CALL_PREPARE_command_name, ret_name, and GGML_ABORT usage are updated
consistently to match the chosen semantics.

🧹 Nitpick comments (2)

ggml/src/ggml-virtgpu/backend/shared/api_remoting.h (1)
86-87: APIR_FORWARD_FAILED_TO_SYNC_STREAMS (value 3) listed after APIR_FORWARD_BASE_INDEX (value 4) — out of numeric order.

The macro-based if-chain makes this functionally correct (each check is independent), but placing a lower-valued code after a higher sentinel is confusing. Move APIR_FORWARD_FAILED_TO_SYNC_STREAMS before APIR_FORWARD_BASE_INDEX to maintain sequential ordering consistent with the enum definition above.
♻️ Proposed fix
     APIR_FORWARD_ERROR(APIR_FORWARD_SUCCESS);
     APIR_FORWARD_ERROR(APIR_FORWARD_NO_DISPATCH_FCT);
     APIR_FORWARD_ERROR(APIR_FORWARD_TIMEOUT);
+    APIR_FORWARD_ERROR(APIR_FORWARD_FAILED_TO_SYNC_STREAMS);
     APIR_FORWARD_ERROR(APIR_FORWARD_BASE_INDEX);
-    APIR_FORWARD_ERROR(APIR_FORWARD_FAILED_TO_SYNC_STREAMS);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/backend/shared/api_remoting.h` around lines 86 - 87,
The two APIR_FORWARD_ERROR entries are out of numeric order: move
APIR_FORWARD_FAILED_TO_SYNC_STREAMS so it appears before APIR_FORWARD_BASE_INDEX
in the list to match the enum ordering; update the sequence where
APIR_FORWARD_FAILED_TO_SYNC_STREAMS and APIR_FORWARD_BASE_INDEX are declared
(look for the APIR_FORWARD_ERROR macro invocations in api_remoting.h) so the
lower-valued APIR_FORWARD_FAILED_TO_SYNC_STREAMS (value 3) precedes
APIR_FORWARD_BASE_INDEX (value 4).
ggml/src/ggml-virtgpu/virtgpu-forward-impl.h (1)
9-11: REMOTE_CALL_PREPARE_forward_flag leaks into outer scope unnecessarily; both declarations create a hidden compile-time coupling.

Two issues:

REMOTE_CALL_PREPARE_forward_flag is only consumed inside the do-while block yet is declared at function scope. It can be moved inside the block to reduce noise.

Both REMOTE_CALL_PREPARE_forward_flag and REMOTE_CALL_PREPARE_command_name live at function scope, so calling REMOTE_CALL_PREPARE twice in the same scope produces a redeclaration compile error. More importantly, REMOTE_CALL silently depends on REMOTE_CALL_PREPARE_command_name being in scope — calling REMOTE_CALL without a preceding REMOTE_CALL_PREPARE in the same scope will fail to compile with a non-obvious error.

Consider scoping REMOTE_CALL_PREPARE_forward_flag inside the do-while, and documenting (or enforcing via a wrapper) the REMOTE_CALL_PREPARE → REMOTE_CALL sequencing contract.
♻️ Proposed fix (move forward_flag into the do-while)
 `#define` REMOTE_CALL_PREPARE(gpu_dev_name, encoder_name, apir_command_type__) \
-    int32_t REMOTE_CALL_PREPARE_forward_flag = (int32_t) apir_command_type__; \
     const char* REMOTE_CALL_PREPARE_command_name = apir_dispatch_command_name(apir_command_type__); \
     do {                                                                \
-        encoder_name         = remote_call_prepare(gpu_dev_name, APIR_COMMAND_TYPE_FORWARD, REMOTE_CALL_PREPARE_forward_flag); \
+        encoder_name         = remote_call_prepare(gpu_dev_name, APIR_COMMAND_TYPE_FORWARD, (int32_t)(apir_command_type__)); \
         if (!encoder_name) {                                            \
             GGML_ABORT(GGML_VIRTGPU "%s: failed to prepare the remote call encoder", __func__); \
         }                                                               \
     } while (0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h` around lines 9 - 11, Move the
temporary variables out of function scope: modify the REMOTE_CALL_PREPARE macro
so REMOTE_CALL_PREPARE_forward_flag is declared inside the do-while block (not
at function scope) and likewise ensure REMOTE_CALL_PREPARE_command_name is
scoped locally inside that block (e.g., declare const char* within the
do-while). This eliminates redeclaration errors when REMOTE_CALL_PREPARE is
invoked multiple times and avoids hidden coupling with REMOTE_CALL; also add a
short comment in the macro noting the required REMOTE_CALL_PREPARE → REMOTE_CALL
sequencing contract (or wrap both in a helper macro/function) so callers know
REMOTE_CALL depends on the prepare macro having run.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/backend/VirtGPU.md`:
- Line 155: Typo in the documentation: change the phrase "Share-memory size" to
"Shared-memory size" in the VirtGPU.md content (look for the string
"Share-memory size" in docs/backend/VirtGPU.md) so the heading reads
"**Shared-memory size**: with the `libkrun` hypervisor, the RAM + VRAM".

In `@ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h`:
- Around line 88-93: The switch uses undefined case labels
APIR_COMMAND_TYPE_BACKEND_INITIALIZE and APIR_COMMAND_TYPE_BACKEND_CLEANUP
causing a compile error; add these two identifiers to the ApirBackendCommandType
enum (using explicit numeric values consistent with the backend block, e.g.,
place them before/after APIR_COMMAND_TYPE_BACKEND_GRAPH_COMPUTE or assign next
free values) and then update APIR_BACKEND_DISPATCH_TABLE_COUNT to reflect the
new total so the dispatch table size matches the enum.

---

Duplicate comments:
In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h`:
- Around line 30-33: The current macro unconditionally calls GGML_ABORT when
ret_name != 0 (using GGML_VIRTGPU, REMOTE_CALL_PREPARE_command_name and
ret_name), which can deaden conditional logic in callers such as
apir_backend_initialize in virtgpu-forward-backend.cpp; inspect all REMOTE_CALL
invocations and apir_backend_initialize to determine whether any callers expect
non-abort error propagation, and then either (a) change the macro to
return/propagate the error code instead of calling GGML_ABORT, or (b) update the
callers to handle only aborting failures and remove dead conditional
branches—ensure REMOTE_CALL_PREPARE_command_name, ret_name, and GGML_ABORT usage
are updated consistently to match the chosen semantics.

---

Nitpick comments:
In `@ggml/src/ggml-virtgpu/backend/shared/api_remoting.h`:
- Around line 86-87: The two APIR_FORWARD_ERROR entries are out of numeric
order: move APIR_FORWARD_FAILED_TO_SYNC_STREAMS so it appears before
APIR_FORWARD_BASE_INDEX in the list to match the enum ordering; update the
sequence where APIR_FORWARD_FAILED_TO_SYNC_STREAMS and APIR_FORWARD_BASE_INDEX
are declared (look for the APIR_FORWARD_ERROR macro invocations in
api_remoting.h) so the lower-valued APIR_FORWARD_FAILED_TO_SYNC_STREAMS (value
3) precedes APIR_FORWARD_BASE_INDEX (value 4).

In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h`:
- Around line 9-11: Move the temporary variables out of function scope: modify
the REMOTE_CALL_PREPARE macro so REMOTE_CALL_PREPARE_forward_flag is declared
inside the do-while block (not at function scope) and likewise ensure
REMOTE_CALL_PREPARE_command_name is scoped locally inside that block (e.g.,
declare const char* within the do-while). This eliminates redeclaration errors
when REMOTE_CALL_PREPARE is invoked multiple times and avoids hidden coupling
with REMOTE_CALL; also add a short comment in the macro noting the required
REMOTE_CALL_PREPARE → REMOTE_CALL sequencing contract (or wrap both in a helper
macro/function) so callers know REMOTE_CALL depends on the prepare macro having
run.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c719fe9 and 28d3675.

📒 Files selected for processing (5)

docs/backend/VirtGPU.md
ggml/src/ggml-virtgpu/backend/shared/api_remoting.h
ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h
ggml/src/ggml-virtgpu/regenerate_remoting.py
ggml/src/ggml-virtgpu/virtgpu-forward-impl.h

🚧 Files skipped from review as they are similar to previous changes (1)

ggml/src/ggml-virtgpu/regenerate_remoting.py

docs/backend/VirtGPU.md

ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h

kpouget · 2026-02-23T20:11:32Z

/test topsail remoting_mac
/cluster mac

topsail-bot · 2026-02-23T20:15:05Z

🔴 Test of 'mac_ai test pre_cleanup_ci' failed after 00 hours 00 minutes 04 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: remoting_mac

Failure indicator: Empty. (See run.log)

kpouget · 2026-02-23T20:16:44Z

/test topsail remoting_mac
/cluster mac

kpouget · 2026-02-23T20:53:32Z

/test topsail remoting_mac
/cluster mac

topsail-bot · 2026-02-23T20:53:37Z

🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 31 minutes 47 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: remoting_mac

Failure indicator: Empty. (See run.log)

topsail-bot · 2026-02-23T22:07:38Z

🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 26 minutes 22 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: remoting_mac

kpouget · 2026-02-24T08:24:01Z

/test topsail remoting_mac
/cluster mac

kpouget · 2026-02-24T08:54:36Z

/test topsail remoting_mac
/cluster mac

topsail-bot · 2026-02-24T10:18:01Z

🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 26 minutes 33 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: remoting_mac

* model : Update label for LFM2-24B-A2B ``` ❯ build/bin/llama-bench -m /data/playground/checkpoints/LFM2-24B-A2B-Preview-Q4_0.gguf,/data/playground/checkpoints/LFM2-8B-A1B-Q4_0.gguf -p 1 -n 0 | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | lfm2moe 24B.A2B Q4_0 | 12.54 GiB | 23.84 B | CPU | 10 | pp1 | 30.35 ± 2.49 | | lfm2moe 8B.A1B Q4_0 | 4.41 GiB | 8.34 B | CPU | 10 | pp1 | 49.24 ± 1.93 | ``` * Remove extra line

* gguf : prevent integer overflow for ggml_context mem size * ggml : fix int overflows in ggml_new_object() * gguf : prevent string exhaustion * gguf : prevent array elements exhaustion * ggml : fix negative tensor type oob * py : assert that alignment is non-zero power of 2 * ggml : check int overflow in ggml_new_tensor_impl and ggml_new_object * gguf-py : error on duplicate keys when reading * py : restore tensor_fields * enforce proper alignment in add_custom_alignment * gguf : better name * gguf : fix ctx size for no_alloc == true * gguf : minor print fix * ggml : print values when overflow * ggml : remove deprecated ggml_type_sizef() * ggml : relax ggml_type asserts to debug-only * gguf : add mem_size overflow test * gguf : add file size check for arrays * ggml : relax asseerts for ggml_get_type_traits() * flake8 fix --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

…outer mode (#19854) * server: fix query params lost when proxying requests in multi-model router mode * server: re-encode query params using httplib::encode_query_component in proxy

This patch adds consistency checks in the ggml-virtgpu-backend (running on the host side) to ensure that the data received from the guest is consistent (valid pointers, valid sizes and offsets).

…thods ``` 1. bck->iface.synchronize(bck) 2. buft->iface.get_alloc_size(buft, op) 3. buft->iface.get_max_size(buft) ``` these three methods are optional in the GGML interface. `get_max_size` was already properly defaulted, but `backend sychronize` and `butf get_max_size` would have segfaulted the backend if not implemented.

…lrenderer APIR (still unmerged)

… in a host/guest shared file

…64 GB with libkrun

…iles Compilation breaks when ordered alphabetically.

kpouget · 2026-02-25T09:09:20Z

/test topsail remoting_mac
/cluster mac

topsail-bot · 2026-02-25T10:37:53Z

🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 26 minutes 26 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: remoting_mac

…R return codes

kpouget force-pushed the fixes branch from a01a703 to cfa7bd8 Compare February 20, 2026 08:59

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

kpouget force-pushed the fixes branch from cfa7bd8 to c719fe9 Compare February 23, 2026 19:46

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

kpouget force-pushed the fixes branch from c719fe9 to 28d3675 Compare February 23, 2026 19:56

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

docs/backend/VirtGPU.md Outdated Show resolved Hide resolved

ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h Outdated Show resolved Hide resolved

kpouget force-pushed the fixes branch 2 times, most recently from 96021ce to 2240508 Compare February 23, 2026 20:11

kpouget force-pushed the fixes branch from 2240508 to 576fe31 Compare February 24, 2026 08:23

kpouget force-pushed the fixes branch from e82b917 to 29e06d6 Compare February 24, 2026 08:54

tdakhran and others added 20 commits February 24, 2026 14:27

server: fix query params lost when proxying requests in multi-model r…

47eb12b

…outer mode (#19854) * server: fix query params lost when proxying requests in multi-model router mode * server: re-encode query params using httplib::encode_query_component in proxy

models : fix graph splits (#19866)

2446419

gguf : fix ftell/fseek for Windows (#19870)

a96a112

ggml-virtgpu-backend: validate the consistency of the received objects

cf28b73

This patch adds consistency checks in the ggml-virtgpu-backend (running on the host side) to ensure that the data received from the guest is consistent (valid pointers, valid sizes and offsets).

ggml-virtgpu-backend: fix log format missing argument

110a581

ggml-virtgpu-backend: improve the abort message

d2756a3

ggml-virtgpu-backend: more safety checks

e654d70

ggml-virtgpu-backend: new error code

eaaf109

ggml-virtgpu-backend: initialize all the error codes

489c8da

ggml-virtgpu: add a missing comment generated by the code generator

dec895b

ggml-virtgpu: add the '[virtgpu]' prefix to the device/buffer names

23be837

ggml-virtgpu: apir_device_buffer_from_ptr: improve the error message

19c4636

ggml-virtgpu: shared: make it match the latest api_remoting.h of Virg…

5ac818d

…lrenderer APIR (still unmerged)

ggml-virtgpu: update the code generator to have dispatch_command_name…

7446e23

… in a host/guest shared file

ggml-virtgpu: REMOTE_CALL: fail if the backend returns an error

8986106

docs/backend/VirtGPU.md: indicate that the RAM+VRAM size is limed to …

72d640d

…64 GB with libkrun

ggml-virtgpu: turn off clang-format header ordering for some of the f…

b73f256

…iles Compilation breaks when ordered alphabetically.

kpouget force-pushed the fixes branch 2 times, most recently from 44e01f0 to 0f065dc Compare February 25, 2026 09:04

kpouget added 2 commits February 25, 2026 13:38

ggml-virtgpu: clang-format

baf30a3

ggml-virtgpu/backend/shared/api_remoting: better comments for the API…

1896982

…R return codes

kpouget force-pushed the fixes branch from 0f065dc to 1896982 Compare February 25, 2026 12:42

Conversation

kpouget commented Feb 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Feb 18, 2026

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (3 warnings)

Uh oh!

kpouget commented Feb 18, 2026

Uh oh!

topsail-bot bot commented Feb 18, 2026

Uh oh!

kpouget commented Feb 18, 2026

Uh oh!

topsail-bot bot commented Feb 18, 2026

Uh oh!

kpouget commented Feb 19, 2026

Uh oh!

topsail-bot bot commented Feb 19, 2026

Uh oh!

kpouget commented Feb 19, 2026

Uh oh!

topsail-bot bot commented Feb 19, 2026

Uh oh!

kpouget commented Feb 20, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

topsail-bot bot commented Feb 20, 2026

Uh oh!

kpouget commented Feb 20, 2026

Uh oh!

topsail-bot bot commented Feb 20, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kpouget commented Feb 23, 2026

Uh oh!

topsail-bot bot commented Feb 23, 2026

Uh oh!

kpouget commented Feb 23, 2026

Uh oh!

kpouget commented Feb 23, 2026

Uh oh!

topsail-bot bot commented Feb 23, 2026

Uh oh!

topsail-bot bot commented Feb 23, 2026

kpouget commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading