Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds runtime validation and safety checks across the virtgpu remoting backend: graph and buffer validators, decoder/handle sanity checks, backend init/sync guards, new APIR return/mapping helpers, command-name generation changes, and small logging and name-prefixing updates. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ❌ 3❌ Failed checks (3 warnings)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/test topsail remoting_cuda |
|
🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 05 minutes 54 seconds. 🟢 • Link to the test results. • Link to the reports index. Test configuration: |
|
/test topsail remoting_mac |
|
🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 00 minutes 06 seconds. 🔴 • Link to the test results. • No reports index generated... Test configuration: |
|
/test topsail remoting_mac |
|
🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 00 minutes 05 seconds. 🔴 • Link to the test results. • No reports index generated... Test configuration: |
|
/test topsail remoting_mac |
|
🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 36 minutes 44 seconds. 🔴 • Link to the test results. • No reports index generated... Test configuration: |
|
/test topsail remoting_cuda |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
ggml/src/ggml-virtgpu/backend/shared/api_remoting.h (1)
83-88:⚠️ Potential issue | 🟡 MinorMissing
APIR_FORWARD_FAILED_TO_SYNC_STREAMSinapir_forward_errormapping.The new enum value
APIR_FORWARD_FAILED_TO_SYNC_STREAMS = 3(Line 38) has no corresponding entry inapir_forward_error(). If this code is returned, the function falls through to"Unknown APIR_COMMAND_TYPE_FORWARD error"instead of producing the correct name. TheREMOTE_CALLmacro invirtgpu-forward-impl.hcallsapir_forward_error(ret_name)whenret_name < APIR_FORWARD_BASE_INDEX, so this will produce a misleading error message.Proposed fix
APIR_FORWARD_ERROR(APIR_FORWARD_SUCCESS); APIR_FORWARD_ERROR(APIR_FORWARD_NO_DISPATCH_FCT); APIR_FORWARD_ERROR(APIR_FORWARD_TIMEOUT); + APIR_FORWARD_ERROR(APIR_FORWARD_FAILED_TO_SYNC_STREAMS); APIR_FORWARD_ERROR(APIR_FORWARD_BASE_INDEX);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/backend/shared/api_remoting.h` around lines 83 - 88, The apir_forward_error() function is missing a case for the new enum value APIR_FORWARD_FAILED_TO_SYNC_STREAMS (value 3), so calls from REMOTE_CALL (in virtgpu-forward-impl.h) can fall through to the generic "Unknown APIR_COMMAND_TYPE_FORWARD error"; add an APIR_FORWARD_ERROR(APIR_FORWARD_FAILED_TO_SYNC_STREAMS) entry in apir_forward_error() alongside the existing APIR_FORWARD_SUCCESS/NO_DISPATCH_FCT/TIMEOUT/BASE_INDEX entries so the enum maps to its correct name. Ensure the symbol APIR_FORWARD_FAILED_TO_SYNC_STREAMS is used exactly as defined in the enum.ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp (1)
135-144:⚠️ Potential issue | 🟡 MinorAdd validation for decoded tensor pointers in
set_tensor,get_tensor, andcpy_tensorfunctions.Tensor pointers decoded at lines 53, 96, and 137-138 are passed directly to buffer interface methods without null or fatal state checks, unlike the buffer handle validation at lines 27, 46, and 89. If a malicious guest sends invalid tensor handles, this could cause crashes in the backend.
Add tensor validation after decoding:
Example fix for cpy_tensor (applies to set_tensor and get_tensor similarly)
const ggml_tensor * src; // safe to remove the const qualifier here src = apir_decode_ggml_tensor(dec); ggml_tensor * dst = (ggml_tensor *) (uintptr_t) apir_decode_ggml_tensor(dec); + if (!src || !dst || apir_decoder_get_fatal(dec)) { + GGML_LOG_ERROR(GGML_VIRTGPU_BCK "%s: Invalid tensor handle from guest\n", __func__); + return 1; + } + bool ret = buffer->iface.cpy_tensor(buffer, src, (ggml_tensor *) dst);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp` around lines 135 - 144, Decoded tensor pointers returned by apir_decode_ggml_tensor are used without validation in the set_tensor/get_tensor/cpy_tensor flow; add checks after each apir_decode_ggml_tensor call (the ones producing src and dst in the shown snippet and the analogous decodes in set_tensor and get_tensor) to ensure they are non-null and not in a fatal/invalid state before calling buffer->iface.cpy_tensor / buffer->iface.set_tensor / buffer->iface.get_tensor; if validation fails, return an error (e.g., non-zero) and avoid calling the buffer interface. Use the same validation pattern you already use for the buffer handle (check the decoded pointer, log or handle the error, and early return) and apply it to the functions referencing apir_decode_ggml_tensor, src, dst, and any casts to (ggml_tensor *).ggml/src/ggml-virtgpu/virtgpu.cpp (4)
533-542:⚠️ Potential issue | 🟠 Major
static apir_decoder response_decis a data race — should bethread_local
remote_call_prepareusesthread_localfor bothencoder_bufferandenc, butresponse_decat line 535 isstatic— shared across all threads. Concurrent calls from different threads will race onresponse_dec.cur/response_dec.end, and the decoder pointer written to*decodercan be overwritten before the caller reads it.Additionally,
response_dec.fatalis never reset between calls (unlikeencwhich is re-initialized withenc = {...}on every call), so a stalefatalflag from a prior error can cause spurious failures reported byremote_call_finish.🐛 Proposed fix
- static apir_decoder response_dec; + thread_local apir_decoder response_dec; + response_dec.fatal = false; response_dec.cur = (char *) gpu->reply_shmem.mmap_ptr + sizeof(*atomic_reply_notif); response_dec.end = (char *) gpu->reply_shmem.mmap_ptr + gpu->reply_shmem.mmap_size; *decoder = &response_dec;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/virtgpu.cpp` around lines 533 - 542, The decoder instance response_dec used in remote_call_prepare is declared static and causes data races; change it to thread_local and ensure its state is reinitialized per call: set response_dec.cur and response_dec.end as done already, and explicitly reset response_dec.fatal (and any other relevant fields) before assigning *decoder; mirror how encoder_buffer/enc are reinitialized in remote_call_prepare so concurrent threads do not share or retain stale fatal flags (affects response_dec, remote_call_prepare, and remote_call_finish).
167-173:⚠️ Potential issue | 🟡 MinorStray
)embedded inside format string literalsThe format strings at line 167 and line 172 end with
%d | %s)"and%d)"respectively — the)is inside the string literal and will appear literally in the log output (e.g.,"apir code=3)"). The string fix itself is a good improvement.✏️ Proposed fix (line 172 shown; apply the same pattern to line 167)
- "%s: the API Remoting backend library failed to initialize its backend library: apir code=%d)", __func__, + "%s: the API Remoting backend library failed to initialize its backend library: apir code=%d", __func__,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/virtgpu.cpp` around lines 167 - 173, The format strings passed to GGML_ABORT (using GGML_VIRTGPU and __func__) incorrectly include a closing parenthesis inside the string literal ("%d | %s)" and "%d)"), causing an extra ')' to appear in logs; update both calls (the GGML_ABORT invocation handling apir_ret and the branch computing lib_ret) so the format strings end without the trailing ')' (e.g., "%d | %s" and "%d") and place the final closing parenthesis as part of the GGML_ABORT call itself, keeping references to __func__, apir_ret, apir_load_library_error(apir_ret), and lib_ret unchanged.
264-290:⚠️ Potential issue | 🟠 Major
GGML_ABORTonopen()failure prevents enumeration fallback; missingavailable_nodesguardTwo issues:
Abort in loop:
GGML_ABORTat line 269 is fatal/noreturn.virtgpu_opencalls this function in a loop over all DRM devices and continues on non-success. Ifopen()fails for any device (e.g., an integrated GPU node with restricted render access), the process aborts before trying subsequent devices. Should beGGML_LOG_ERROR+return APIR_ERROR_INITIALIZATION_FAILED.Missing
available_nodesguard:dev->nodes[DRM_NODE_RENDER]is only valid when bitDRM_NODE_RENDERis set indev->available_nodes. Accessingnodes[DRM_NODE_RENDER]unconditionally can dereference a NULL or garbage pointer on devices that lack a render node.🐛 Proposed fix
static virt_gpu_result_t virtgpu_open_device(virtgpu * gpu, const drmDevicePtr dev) { + if (!(dev->available_nodes & (1 << DRM_NODE_RENDER))) { + return APIR_ERROR_INITIALIZATION_FAILED; + } + const char * node_path = dev->nodes[DRM_NODE_RENDER]; int fd = open(node_path, O_RDWR | O_CLOEXEC); if (fd < 0) { - GGML_ABORT(GGML_VIRTGPU - "%s: failed to open %s", __func__, node_path); - return APIR_ERROR_INITIALIZATION_FAILED; + GGML_LOG_ERROR(GGML_VIRTGPU + "%s: failed to open %s: %s\n", __func__, node_path, strerror(errno)); + return APIR_ERROR_INITIALIZATION_FAILED; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/virtgpu.cpp` around lines 264 - 290, In virtgpu_open_device, avoid the fatal GGML_ABORT on open() failure and instead log the error (GGML_LOG_ERROR) and return APIR_ERROR_INITIALIZATION_FAILED so virtgpu_open can continue iterating devices; additionally, before reading dev->nodes[DRM_NODE_RENDER] check that dev->available_nodes has the DRM_NODE_RENDER bit set (e.g., if (!(dev->available_nodes & (1 << DRM_NODE_RENDER))) log and return APIR_ERROR_INITIALIZATION_FAILED) to prevent dereferencing a missing render node; update both the initial node_path access and any subsequent uses of dev->nodes[DRM_NODE_RENDER] accordingly, leaving virtgpu_open and virtgpu_open_device behavior otherwise unchanged.
126-146:⚠️ Potential issue | 🔴 CriticalDead
else ifbranch — duplicate condition prevents symbol-missing error handlingLine 137 repeats the exact condition from line 126 (
ret == APIR_LOAD_LIBRARY_ENV_VAR_MISSING), making the branch unreachable. Its message references "symbols are missing", which indicates the intended check wasAPIR_LOAD_LIBRARY_SYMBOL_MISSING. As-is, symbol-missing errors fall through to the genericelse, producing a misleading message.🐛 Proposed fix
- } else if (ret == APIR_LOAD_LIBRARY_ENV_VAR_MISSING) { + } else if (ret == APIR_LOAD_LIBRARY_SYMBOL_MISSING) { GGML_ABORT(GGML_VIRTGPU "%s: could not load the backend library, some symbols are missing. " "Make sure virglrenderer is correctly configured by the hypervisor. (%s) ", __func__, apir_load_library_error(ret));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/virtgpu.cpp` around lines 126 - 146, The duplicated branch checks APIR_LOAD_LIBRARY_ENV_VAR_MISSING twice, making the "symbols are missing" case unreachable; update the third condition to check APIR_LOAD_LIBRARY_SYMBOL_MISSING instead of APIR_LOAD_LIBRARY_ENV_VAR_MISSING so that the branch invoking GGML_ABORT with the "some symbols are missing" message is executed for symbol-missing errors (leave the other branches using APIR_LOAD_LIBRARY_ENV_VAR_MISSING and APIR_LOAD_LIBRARY_CANNOT_OPEN, and keep apir_load_library_error(ret) and GGML_ABORT usage unchanged).
🧹 Nitpick comments (8)
ggml/src/ggml-virtgpu/virtgpu-forward-impl.h (1)
9-17: Macro-scoped variables leak into the enclosing scope by design — document the coupling.
REMOTE_CALL_PREPARE_forward_flagandREMOTE_CALL_PREPARE_command_nameare intentionally declared outside thedo { ... } while(0)block so thatREMOTE_CALLcan referenceREMOTE_CALL_PREPARE_command_name. This implicit coupling between the two macros is functional but fragile — usingREMOTE_CALLwithout a precedingREMOTE_CALL_PREPAREin the same scope will cause a compile error (which is acceptable), but the intent would be clearer with a brief comment.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h` around lines 9 - 17, The macros REMOTE_CALL_PREPARE and REMOTE_CALL are coupled because REMOTE_CALL_PREPARE intentionally declares REMOTE_CALL_PREPARE_forward_flag and REMOTE_CALL_PREPARE_command_name outside the do{...}while(0) block so REMOTE_CALL can reference REMOTE_CALL_PREPARE_command_name; add a concise comment above the REMOTE_CALL_PREPARE macro explaining this intentional scope leak, naming the variables (REMOTE_CALL_PREPARE_forward_flag, REMOTE_CALL_PREPARE_command_name) and warning that REMOTE_CALL must be used in the same scope after REMOTE_CALL_PREPARE to avoid compile errors, to make the coupling explicit and reduce fragility.ggml/src/ggml-virtgpu/ggml-backend.cpp (1)
63-75: Initialization failure path is sound, but consider zeroingctxfields on failure.If
apir_backend_initializepartially writes toctx->device_handleorctx->backend_idbefore failing internally, those fields will hold stale/partial values after thenullptrreturn. A subsequent retry ofggml_backend_remoting_device_initon the same device would callapir_backend_initializeagain, which would overwrite them — so this is likely fine in practice. Still, zeroing on failure would be a defensive improvement.Proposed defensive zeroing on failure
if (init_result != APIR_BACKEND_INITIALIZE_SUCCESS) { // Backend initialization failed GGML_LOG_ERROR(GGML_VIRTGPU "%s: Backend initialization failed with result=%d\n", __func__, init_result); + ctx->device_handle = 0; + ctx->backend_id = 0; return nullptr; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/ggml-backend.cpp` around lines 63 - 75, The failure path in ggml_backend_remoting_device_init may leave ctx->device_handle and ctx->backend_id partially written by apir_backend_initialize; update the failure branch (after init_result != APIR_BACKEND_INITIALIZE_SUCCESS) to defensively clear ctx->device_handle and ctx->backend_id (e.g., set ctx->device_handle = nullptr and ctx->backend_id = 0 or use memset on those fields) before logging via GGML_LOG_ERROR and returning nullptr so the context is left in a consistent, zeroed state for any future retries.ggml/src/ggml-virtgpu/regenerate_remoting.py (1)
148-172: Newapir_dispatch_command_namegenerator produces clean enum-to-name mapping.The generated switch function looks correct. Note that the generated codebase will now have two similar name-mapping functions:
apir_dispatch_command_name(inapir_backend.gen.h, returning names like"device_get_name") andbackend_dispatch_command_name(inbackend-dispatched.gen.h, returning names like"backend_device_get_name"). If the intent is for these to serve different layers (shared APIR header vs. backend dispatch), this is fine — just be aware of the duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/regenerate_remoting.py` around lines 148 - 172, The new generator creates apir_dispatch_command_name that returns names like "device_get_name" which duplicates semantics of backend_dispatch_command_name (returns "backend_device_get_name"); decide and implement one of two fixes: either make apir_dispatch_command_name reuse or call through to backend_dispatch_command_name (or vice-versa) to avoid duplicate mapping logic, or explicitly document/distinguish the two by adding a clear comment and different return prefixes; update the generator that builds apir_dispatch_command_name (the loop creating clean_name and case for {func['enum_name']}) to either prepend the "backend_" prefix to match backend_dispatch_command_name or add a generated comment above the function explaining the intended difference so callers are not confused.ggml/src/ggml-virtgpu/backend/backend.cpp (1)
88-94: Dead code:library_regcan never be NULL here.Line 68 assigns
library_regto eithervirgl_library_reg(if non-NULL) or the compile-time constantGGML_DEFAULT_BACKEND_REG("ggml_backend_init"). Since the fallback is always non-NULL, the condition!library_regon line 88 is unreachable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/backend/backend.cpp` around lines 88 - 94, The check for `!library_reg` is dead because `library_reg` is assigned from `virgl_library_reg` or the compile-time `GGML_DEFAULT_BACKEND_REG` (non-NULL); remove the unreachable if-block that logs and returns `APIR_LOAD_LIBRARY_ENV_VAR_MISSING`. Instead, if you intended to detect registration failure, perform that check against the actual backend registration/return value from the registration call (refer to `library_reg`, `virgl_library_reg`, and `GGML_DEFAULT_BACKEND_REG`) rather than testing `library_reg` for NULL.ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp (1)
108-122:retfromREMOTE_CALLis unused after cleanup — consider logging on failure.
apir_backend_cleanupignores the return code from the remote call. A failed cleanup would silently leak the backend instance on the host, which is the exact GPU OOM scenario this PR aims to fix. At minimum, log a warning.Proposed fix
REMOTE_CALL(gpu, encoder, decoder, ret); + if (ret != 0) { + GGML_LOG_ERROR(GGML_VIRTGPU "%s: Backend cleanup failed (ret=%d) for device_handle=%p backend_id=%u\n", + __func__, ret, (void*)device_handle, backend_id); + } + remote_call_finish(gpu, encoder, decoder); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp` around lines 108 - 122, apir_backend_cleanup is ignoring the ApirForwardReturnCode ret returned by REMOTE_CALL; update the function to check ret after REMOTE_CALL and handle failures (at minimum log a warning/error and avoid silently proceeding to remote_call_finish when cleanup failed) so host-side backend leaks are visible; reference the local variables ret, REMOTE_CALL invocation, and remote_call_finish within apir_backend_cleanup to add a conditional that logs the failure (including ret) and performs appropriate error-handling before returning.ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp (2)
129-155:cleanup_device_extension:ggml_backend_freeis called under both locks — verify no re-entrant access.
ggml_backend_freeat Line 142 runs while holdingdevice_extensions_mutexandbackends_mutex. If the backend's destructor interacts with the device extension system (e.g., logging callbacks that accessdevice_extensions), this would deadlock. Likely safe given the current codebase, but worth keeping in mind.An alternative pattern is to collect instances into a local vector under the lock, release the lock, then free backends outside the lock:
Sketch
// Clean up all backend instances + std::vector<apir_backend_instance*> to_free; { std::lock_guard<std::mutex> backends_lock(ext->backends_mutex); for (auto& [backend_id, instance] : ext->backend_instances) { - if (instance->bck) { - ggml_backend_free(instance->bck); - } - - instance->magic = 0; - delete instance; + to_free.push_back(instance); } ext->backend_instances.clear(); } + // Free backends outside the lock + for (auto* instance : to_free) { + if (instance->bck) { + ggml_backend_free(instance->bck); + } + instance->magic = 0; + delete instance; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp` around lines 129 - 155, cleanup_device_extension currently calls ggml_backend_free while holding device_extensions_mutex and ext->backends_mutex which can deadlock if backend teardown re-enters device extension code; fix by under device_extensions_mutex and ext->backends_mutex move pointers of ext->backend_instances (instances) into a local vector, clear ext->backend_instances and set ext->magic = 0 and remove ext from device_extensions while still under the lock, then release both locks and iterate the local vector to call ggml_backend_free, reset instance->magic and delete each instance, and finally delete ext after locks are released — this uses the symbols cleanup_device_extension, device_extensions_mutex, backends_mutex, apir_device_extension::backend_instances, ggml_backend_free, instance->magic and ext->magic to locate where to change.
157-181:backend_idtype narrowing:uintptr_t→uint32_ttruncation.
create_backend_instancereturnsuintptr_t, but Line 178 writes it into auint32_t*. Ifnext_backend_idever exceedsUINT32_MAX, the ID will silently wrap and collide with existing entries. Consider usinguint32_tconsistently fornext_backend_idand the return type, or adding a bounds check here.Option A: Use uint32_t throughout
In
backend-dispatched.h, changenext_backend_idand the return type ofcreate_backend_instance:- uintptr_t next_backend_id; + uint32_t next_backend_id;-uintptr_t create_backend_instance(ggml_backend_dev_t dev); +uint32_t create_backend_instance(ggml_backend_dev_t dev);And update the map key type accordingly:
- std::unordered_map<uintptr_t, apir_backend_instance*> backend_instances; + std::unordered_map<uint32_t, apir_backend_instance*> backend_instances;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp` around lines 157 - 181, The code casts a uintptr_t backend_id from create_backend_instance into a uint32_t out_backend_id, risking truncation/collision if next_backend_id grows beyond UINT32_MAX; update the implementation so types are consistent or guard the cast: either change next_backend_id and create_backend_instance return type to uint32_t (and update any map keys/usages) so backend_dispatch_initialize can assign directly, or add an explicit bounds check in backend_dispatch_initialize (check backend_id <= UINT32_MAX) and return APIR_BACKEND_INITIALIZE_BACKEND_INIT_FAILED on overflow before casting to uint32_t; reference backend_dispatch_initialize, create_backend_instance, next_backend_id, out_backend_id and out_handle when making the change.ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp (1)
71-85: Security: Guest-supplieddevice_handleis cast to a host pointer — currently safe but fragile.
device_handlearrives from the untrusted guest and is cast toggml_backend_dev_t. It is only used as a map key (no dereference), so it's safe today. However, any future code that dereferencesdevicebefore the map lookup would be an exploitable arbitrary-read/write. Consider validatingdevice_handleagainst a known-good set (e.g., compare it to the globaldevpointer) before proceeding.The same pattern appears in
backend_backend_cleanupat lines 152-154.Proposed validation
// Decode device handle first uintptr_t device_handle; apir_decode_uintptr_t(dec, &device_handle); - ggml_backend_dev_t device = (ggml_backend_dev_t)device_handle; + ggml_backend_dev_t device = (ggml_backend_dev_t)device_handle; + + // Validate that the device handle corresponds to a known device + if (get_device_extension(device) == nullptr) { + GGML_LOG_ERROR(GGML_VIRTGPU_BCK "%s: Unknown device handle %p\n", __func__, (void*)device_handle); + apir_decoder_set_fatal(dec); + return 1; + }This already happens implicitly later, but an explicit early check at the boundary documents the security intent and avoids accidentally using
devicebefore validation in future refactors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp` around lines 71 - 85, The guest-provided device_handle is being cast to ggml_backend_dev_t and used as a key—make this explicit and safe by validating the raw uintptr_t before casting: add an early check in the dispatch path (before casting to ggml_backend_dev_t and before calling get_backend_instance) that confirms device_handle matches a known-good value (e.g., compare against the global dev pointer cast to uintptr_t or call a new is_valid_device_handle(uintptr_t) helper), and return an error (apir_decoder_set_fatal and non-zero) if it fails; apply the same pattern in backend_backend_cleanup where device_handle is decoded and cast so future code cannot accidentally dereference an untrusted pointer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp`:
- Around line 55-57: The GGML_UNUSED(enc) is incorrect because enc is actually
used later in backend_backend_graph_compute when calling
apir_encode_ggml_status; remove the GGML_UNUSED(enc) statement and keep only
GGML_UNUSED(ctx) (or otherwise mark only unused symbols) so that enc is not
erroneously annotated as unused; update backend_backend_graph_compute to stop
casting away enc and ensure apir_encode_ggml_status calls continue to use enc
as-is.
- Around line 80-85: The code currently releases backends_mutex inside
get_backend_instance(), creating a race where cleanup_backend_instance() can
free apir_backend_instance->bck before graph_compute() uses it; to fix, either
hold backends_mutex for the entire duration of graph_compute()'s use of the
instance (acquire before get_backend_instance() and release after the
compute/dereference) or change apir_backend_instance to be reference-counted
(e.g., wrap in shared_ptr) and bump the refcount in get_backend_instance() and
drop it after graph_compute() finishes; update graph_compute(),
get_backend_instance(), and cleanup_backend_instance() accordingly and ensure
you validate instance->magic after acquiring the reference or lock before
dereferencing instance->bck.
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer-type.cpp`:
- Around line 86-93: apir_decode_ggml_tensor_inplace can return nullptr, so
guard the returned pointer `op` before using it: after calling
apir_decode_ggml_tensor_inplace(dec) check if op == nullptr and handle the
decoder error path (e.g., log/report the decode error and early-return an error
code or set value to 0) so you never call buft->iface.get_alloc_size(buft, op)
or ggml_nbytes(op) with a null pointer; update the block around the symbols
apir_decode_ggml_tensor_inplace, op, buft->iface.get_alloc_size and ggml_nbytes
accordingly.
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched.h`:
- Around line 28-31: The comment in struct apir_backend_instance incorrectly
documents the magic value as 0xAB1234CD; update the inline comment next to the
magic field in apir_backend_instance to match the actual constant
APIR_BACKEND_INSTANCE_MAGIC (0xCD4321BA) so the doc and code are consistent
(locate struct apir_backend_instance and replace the wrong hex in the comment
with APIR_BACKEND_INSTANCE_MAGIC / 0xCD4321BA).
In `@ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h`:
- Around line 108-118: The access to the global unordered_set backend_buffers in
the validation block is not thread-safe; modify the global to pair it with a
synchronization primitive (e.g., declare a std::shared_mutex
backend_buffers_mutex alongside backend_buffers at file scope in a shared
header), move the extern declarations out of the inline function into that
header, then update readers (this validation code in apir_cs_ggml.h) to take a
shared/read lock (std::shared_lock) before calling backend_buffers.find(...) and
update apir_track_backend_buffer() and apir_untrack_backend_buffer() to take an
exclusive/unique lock (std::unique_lock) when mutating the set so all accesses
are synchronized while preserving concurrent reads.
In `@ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp`:
- Around line 38-41: The function currently returns a pointer to a single static
buffer (prefixed_name) which is shared across GPUs and races; instead, build the
"[virtgpu] %s" string once when gpu->cached_buffer_type is initialized (device
init) and store it on the per-instance struct (either add a field on virtgpu or
on cached_buffer_type, e.g., cached_buffer_type.prefixed_name), then remove the
static prefixed_name from the accessor and make get_name simply return the
stored per-instance pointer (ensure allocation lifetime covers the device and
access is thread-safe).
In `@ggml/src/ggml-virtgpu/ggml-backend-device.cpp`:
- Around line 3-11: The function ggml_backend_remoting_device_get_name currently
uses a static local buffer prefixed_name which causes a race/dangling-pointer;
instead, populate each device's context name once at device registration (in the
code that sets ctx->name in ggml-backend-reg.cpp) to include the "[virtgpu] "
prefix plus the existing device-specific identifier (preserving names like
"ggml-virtgpu0"/"ggml-virtgpu1"), and then change
ggml_backend_remoting_device_get_name to simply return ctx->name.c_str() (remove
the static buffer usage and any snprintf there).
In `@ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp`:
- Line 176: The format specifier in the GGML_ABORT call using
GGML_VIRTGPU/virtgpu-forward-device.cpp is wrong for the variable size (a
size_t) — change the printf-style format from "%ld" to "%zu" (or alternatively
cast size to unsigned long and use "%lu") in the GGML_ABORT invocation that
includes __func__ and size so the output is defined and correct across
platforms.
In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h`:
- Around line 30-33: The abort message in the macro should include the
GGML_VIRTGPU prefix and the macro must not unconditionally abort on backend
errors if callers (e.g., apir_backend_initialize in virtgpu-forward-backend.cpp)
expect to handle non-zero return codes; update the macro that calls GGML_ABORT
so the string begins with "GGML_VIRTGPU: " (referring to
REMOTE_CALL_PREPARE_command_name) and provide a non-aborting variant (or a
flagged version) that returns the backend return code instead of calling
GGML_ABORT so callers like apir_backend_initialize can inspect ret_name and
handle APIR_BACKEND_INITIALIZE_BACKEND_INIT_FAILED rather than being forced into
abort; adjust usages or remove dead code in apir_backend_initialize accordingly.
---
Outside diff comments:
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp`:
- Around line 135-144: Decoded tensor pointers returned by
apir_decode_ggml_tensor are used without validation in the
set_tensor/get_tensor/cpy_tensor flow; add checks after each
apir_decode_ggml_tensor call (the ones producing src and dst in the shown
snippet and the analogous decodes in set_tensor and get_tensor) to ensure they
are non-null and not in a fatal/invalid state before calling
buffer->iface.cpy_tensor / buffer->iface.set_tensor / buffer->iface.get_tensor;
if validation fails, return an error (e.g., non-zero) and avoid calling the
buffer interface. Use the same validation pattern you already use for the buffer
handle (check the decoded pointer, log or handle the error, and early return)
and apply it to the functions referencing apir_decode_ggml_tensor, src, dst, and
any casts to (ggml_tensor *).
In `@ggml/src/ggml-virtgpu/backend/shared/api_remoting.h`:
- Around line 83-88: The apir_forward_error() function is missing a case for the
new enum value APIR_FORWARD_FAILED_TO_SYNC_STREAMS (value 3), so calls from
REMOTE_CALL (in virtgpu-forward-impl.h) can fall through to the generic "Unknown
APIR_COMMAND_TYPE_FORWARD error"; add an
APIR_FORWARD_ERROR(APIR_FORWARD_FAILED_TO_SYNC_STREAMS) entry in
apir_forward_error() alongside the existing
APIR_FORWARD_SUCCESS/NO_DISPATCH_FCT/TIMEOUT/BASE_INDEX entries so the enum maps
to its correct name. Ensure the symbol APIR_FORWARD_FAILED_TO_SYNC_STREAMS is
used exactly as defined in the enum.
In `@ggml/src/ggml-virtgpu/virtgpu.cpp`:
- Around line 533-542: The decoder instance response_dec used in
remote_call_prepare is declared static and causes data races; change it to
thread_local and ensure its state is reinitialized per call: set
response_dec.cur and response_dec.end as done already, and explicitly reset
response_dec.fatal (and any other relevant fields) before assigning *decoder;
mirror how encoder_buffer/enc are reinitialized in remote_call_prepare so
concurrent threads do not share or retain stale fatal flags (affects
response_dec, remote_call_prepare, and remote_call_finish).
- Around line 167-173: The format strings passed to GGML_ABORT (using
GGML_VIRTGPU and __func__) incorrectly include a closing parenthesis inside the
string literal ("%d | %s)" and "%d)"), causing an extra ')' to appear in logs;
update both calls (the GGML_ABORT invocation handling apir_ret and the branch
computing lib_ret) so the format strings end without the trailing ')' (e.g., "%d
| %s" and "%d") and place the final closing parenthesis as part of the
GGML_ABORT call itself, keeping references to __func__, apir_ret,
apir_load_library_error(apir_ret), and lib_ret unchanged.
- Around line 264-290: In virtgpu_open_device, avoid the fatal GGML_ABORT on
open() failure and instead log the error (GGML_LOG_ERROR) and return
APIR_ERROR_INITIALIZATION_FAILED so virtgpu_open can continue iterating devices;
additionally, before reading dev->nodes[DRM_NODE_RENDER] check that
dev->available_nodes has the DRM_NODE_RENDER bit set (e.g., if
(!(dev->available_nodes & (1 << DRM_NODE_RENDER))) log and return
APIR_ERROR_INITIALIZATION_FAILED) to prevent dereferencing a missing render
node; update both the initial node_path access and any subsequent uses of
dev->nodes[DRM_NODE_RENDER] accordingly, leaving virtgpu_open and
virtgpu_open_device behavior otherwise unchanged.
- Around line 126-146: The duplicated branch checks
APIR_LOAD_LIBRARY_ENV_VAR_MISSING twice, making the "symbols are missing" case
unreachable; update the third condition to check
APIR_LOAD_LIBRARY_SYMBOL_MISSING instead of APIR_LOAD_LIBRARY_ENV_VAR_MISSING so
that the branch invoking GGML_ABORT with the "some symbols are missing" message
is executed for symbol-missing errors (leave the other branches using
APIR_LOAD_LIBRARY_ENV_VAR_MISSING and APIR_LOAD_LIBRARY_CANNOT_OPEN, and keep
apir_load_library_error(ret) and GGML_ABORT usage unchanged).
---
Nitpick comments:
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp`:
- Around line 71-85: The guest-provided device_handle is being cast to
ggml_backend_dev_t and used as a key—make this explicit and safe by validating
the raw uintptr_t before casting: add an early check in the dispatch path
(before casting to ggml_backend_dev_t and before calling get_backend_instance)
that confirms device_handle matches a known-good value (e.g., compare against
the global dev pointer cast to uintptr_t or call a new
is_valid_device_handle(uintptr_t) helper), and return an error
(apir_decoder_set_fatal and non-zero) if it fails; apply the same pattern in
backend_backend_cleanup where device_handle is decoded and cast so future code
cannot accidentally dereference an untrusted pointer.
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp`:
- Around line 129-155: cleanup_device_extension currently calls
ggml_backend_free while holding device_extensions_mutex and ext->backends_mutex
which can deadlock if backend teardown re-enters device extension code; fix by
under device_extensions_mutex and ext->backends_mutex move pointers of
ext->backend_instances (instances) into a local vector, clear
ext->backend_instances and set ext->magic = 0 and remove ext from
device_extensions while still under the lock, then release both locks and
iterate the local vector to call ggml_backend_free, reset instance->magic and
delete each instance, and finally delete ext after locks are released — this
uses the symbols cleanup_device_extension, device_extensions_mutex,
backends_mutex, apir_device_extension::backend_instances, ggml_backend_free,
instance->magic and ext->magic to locate where to change.
- Around line 157-181: The code casts a uintptr_t backend_id from
create_backend_instance into a uint32_t out_backend_id, risking
truncation/collision if next_backend_id grows beyond UINT32_MAX; update the
implementation so types are consistent or guard the cast: either change
next_backend_id and create_backend_instance return type to uint32_t (and update
any map keys/usages) so backend_dispatch_initialize can assign directly, or add
an explicit bounds check in backend_dispatch_initialize (check backend_id <=
UINT32_MAX) and return APIR_BACKEND_INITIALIZE_BACKEND_INIT_FAILED on overflow
before casting to uint32_t; reference backend_dispatch_initialize,
create_backend_instance, next_backend_id, out_backend_id and out_handle when
making the change.
In `@ggml/src/ggml-virtgpu/backend/backend.cpp`:
- Around line 88-94: The check for `!library_reg` is dead because `library_reg`
is assigned from `virgl_library_reg` or the compile-time
`GGML_DEFAULT_BACKEND_REG` (non-NULL); remove the unreachable if-block that logs
and returns `APIR_LOAD_LIBRARY_ENV_VAR_MISSING`. Instead, if you intended to
detect registration failure, perform that check against the actual backend
registration/return value from the registration call (refer to `library_reg`,
`virgl_library_reg`, and `GGML_DEFAULT_BACKEND_REG`) rather than testing
`library_reg` for NULL.
In `@ggml/src/ggml-virtgpu/ggml-backend.cpp`:
- Around line 63-75: The failure path in ggml_backend_remoting_device_init may
leave ctx->device_handle and ctx->backend_id partially written by
apir_backend_initialize; update the failure branch (after init_result !=
APIR_BACKEND_INITIALIZE_SUCCESS) to defensively clear ctx->device_handle and
ctx->backend_id (e.g., set ctx->device_handle = nullptr and ctx->backend_id = 0
or use memset on those fields) before logging via GGML_LOG_ERROR and returning
nullptr so the context is left in a consistent, zeroed state for any future
retries.
In `@ggml/src/ggml-virtgpu/regenerate_remoting.py`:
- Around line 148-172: The new generator creates apir_dispatch_command_name that
returns names like "device_get_name" which duplicates semantics of
backend_dispatch_command_name (returns "backend_device_get_name"); decide and
implement one of two fixes: either make apir_dispatch_command_name reuse or call
through to backend_dispatch_command_name (or vice-versa) to avoid duplicate
mapping logic, or explicitly document/distinguish the two by adding a clear
comment and different return prefixes; update the generator that builds
apir_dispatch_command_name (the loop creating clean_name and case for
{func['enum_name']}) to either prepend the "backend_" prefix to match
backend_dispatch_command_name or add a generated comment above the function
explaining the intended difference so callers are not confused.
In `@ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp`:
- Around line 108-122: apir_backend_cleanup is ignoring the
ApirForwardReturnCode ret returned by REMOTE_CALL; update the function to check
ret after REMOTE_CALL and handle failures (at minimum log a warning/error and
avoid silently proceeding to remote_call_finish when cleanup failed) so
host-side backend leaks are visible; reference the local variables ret,
REMOTE_CALL invocation, and remote_call_finish within apir_backend_cleanup to
add a conditional that logs the failure (including ret) and performs appropriate
error-handling before returning.
In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h`:
- Around line 9-17: The macros REMOTE_CALL_PREPARE and REMOTE_CALL are coupled
because REMOTE_CALL_PREPARE intentionally declares
REMOTE_CALL_PREPARE_forward_flag and REMOTE_CALL_PREPARE_command_name outside
the do{...}while(0) block so REMOTE_CALL can reference
REMOTE_CALL_PREPARE_command_name; add a concise comment above the
REMOTE_CALL_PREPARE macro explaining this intentional scope leak, naming the
variables (REMOTE_CALL_PREPARE_forward_flag, REMOTE_CALL_PREPARE_command_name)
and warning that REMOTE_CALL must be used in the same scope after
REMOTE_CALL_PREPARE to avoid compile errors, to make the coupling explicit and
reduce fragility.
| // Get backend instance | ||
| apir_backend_instance* instance = get_backend_instance(device, backend_id); | ||
| if (instance == nullptr || instance->bck == nullptr) { | ||
| apir_decoder_set_fatal(dec); | ||
| return 1; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the file and examine its structure
find . -name "*backend-dispatched*" -type fRepository: crc-org/llama.cpp
Length of output: 479
🏁 Script executed:
# Check the full implementation of get_backend_instance, cleanup_backend_instance, and related code
cd ggml/src/ggml-virtgpu/backend 2>/dev/null || find . -path "*ggml-virtgpu/backend*" -name "*.cpp" | head -5Repository: crc-org/llama.cpp
Length of output: 43
🏁 Script executed:
# Search for the function implementations across the backend files
rg "get_backend_instance|cleanup_backend_instance|backend_backend_graph_compute|backend_backend_initialize|backend_backend_cleanup" --type=cpp -B2 -A10Repository: crc-org/llama.cpp
Length of output: 12055
🏁 Script executed:
wc -l ggml/src/ggml-virtgpu/backend/backend-dispatched.cppRepository: crc-org/llama.cpp
Length of output: 117
🏁 Script executed:
# Read the implementation of get_backend_instance in full
sed -n '84,102p' ggml/src/ggml-virtgpu/backend/backend-dispatched.cppRepository: crc-org/llama.cpp
Length of output: 650
🏁 Script executed:
# Read cleanup_backend_instance
sed -n '105,126p' ggml/src/ggml-virtgpu/backend/backend-dispatched.cppRepository: crc-org/llama.cpp
Length of output: 768
🏁 Script executed:
# Read backend_backend_graph_compute around lines 130-145
sed -n '70,160p' ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp | cat -nRepository: crc-org/llama.cpp
Length of output: 3625
🏁 Script executed:
# Check how dispatch commands are invoked and whether they're serialized per context/backend_id
rg "apir_backend_dispatch|virgl_apir_context|backend_dispatch_t" --type=cpp -B3 -A5 | head -100Repository: crc-org/llama.cpp
Length of output: 8124
🏁 Script executed:
# Check if there's any comment or documentation about threading model
rg -i "thread|serial|concurrent|mutex|sync|atomic" ggml/src/ggml-virtgpu/backend/backend-dispatched.cppRepository: crc-org/llama.cpp
Length of output: 665
🏁 Script executed:
# Check struct definitions for apir_device_extension and related synchronization
rg "struct apir_device_extension|struct.*context" --type=h -A15 | head -80Repository: crc-org/llama.cpp
Length of output: 4456
🏁 Script executed:
# Find the struct definition for apir_device_extension
rg "struct apir_device_extension" --type=h -A20Repository: crc-org/llama.cpp
Length of output: 1918
🏁 Script executed:
# Check how dispatch is called - is it single-threaded per context?
cat ggml/src/ggml-virtgpu/backend/backend.cpp | head -200Repository: crc-org/llama.cpp
Length of output: 5740
🏁 Script executed:
# Check if virgl_apir_context implies per-context serialization
rg "virgl_apir_context|ctx_id" ggml/src/ggml-virtgpu/backend/backend.cpp -B2 -A2Repository: crc-org/llama.cpp
Length of output: 1036
🏁 Script executed:
# Check if there's any async cleanup or background task that could race
rg -i "async|thread|queue|background" ggml/src/ggml-virtgpu/backend/*.cpp | head -30Repository: crc-org/llama.cpp
Length of output: 612
🏁 Script executed:
# Check the magic validation - is it checked before dereferencing instance->bck in graph_compute?
sed -n '70,90p' ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp | cat -nRepository: crc-org/llama.cpp
Length of output: 891
Hold the lock for the entire duration of backend instance use, or switch to reference counting (shared_ptr) to prevent deletion during compute.
get_backend_instance() releases backends_mutex immediately after lookup, creating a window where cleanup_backend_instance() can delete the instance. Between line 81 and lines 136–139, instance->bck may be freed. While get_backend_instance() validates instance->magic, this check is not repeated in graph_compute() before dereferencing. If the protocol relies on strict initialize→compute→cleanup ordering per backend_id with no concurrent access across contexts, document this invariant explicitly; otherwise, either extend the mutex hold to cover the compute operation or use a reference-counted wrapper to prevent premature deletion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp` around lines 80
- 85, The code currently releases backends_mutex inside get_backend_instance(),
creating a race where cleanup_backend_instance() can free
apir_backend_instance->bck before graph_compute() uses it; to fix, either hold
backends_mutex for the entire duration of graph_compute()'s use of the instance
(acquire before get_backend_instance() and release after the
compute/dereference) or change apir_backend_instance to be reference-counted
(e.g., wrap in shared_ptr) and bump the refcount in get_backend_instance() and
drop it after graph_compute() finishes; update graph_compute(),
get_backend_instance(), and cleanup_backend_instance() accordingly and ensure
you validate instance->magic after acquiring the reference or lock before
dereferencing instance->bck.
| struct apir_backend_instance { | ||
| ggml_backend_t bck; // The actual backend | ||
| uint32_t magic; // For validation: 0xAB1234CD | ||
| }; |
There was a problem hiding this comment.
Incorrect magic value in comment.
The comment on Line 30 says 0xAB1234CD but apir_backend_instance uses APIR_BACKEND_INSTANCE_MAGIC which is 0xCD4321BA (Line 45).
Proposed fix
struct apir_backend_instance {
ggml_backend_t bck; // The actual backend
- uint32_t magic; // For validation: 0xAB1234CD
+ uint32_t magic; // For validation: APIR_BACKEND_INSTANCE_MAGIC (0xCD4321BA)
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| struct apir_backend_instance { | |
| ggml_backend_t bck; // The actual backend | |
| uint32_t magic; // For validation: 0xAB1234CD | |
| }; | |
| struct apir_backend_instance { | |
| ggml_backend_t bck; // The actual backend | |
| uint32_t magic; // For validation: APIR_BACKEND_INSTANCE_MAGIC (0xCD4321BA) | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched.h` around lines 28 - 31, The
comment in struct apir_backend_instance incorrectly documents the magic value as
0xAB1234CD; update the inline comment next to the magic field in
apir_backend_instance to match the actual constant APIR_BACKEND_INSTANCE_MAGIC
(0xCD4321BA) so the doc and code are consistent (locate struct
apir_backend_instance and replace the wrong hex in the comment with
APIR_BACKEND_INSTANCE_MAGIC / 0xCD4321BA).
| // SECURITY: Validate buffer handle against tracked buffers to prevent | ||
| // guest VM from providing arbitrary host memory addresses | ||
| if (buffer) { | ||
| extern std::unordered_set<ggml_backend_buffer_t> backend_buffers; | ||
| if (backend_buffers.find(buffer) == backend_buffers.end()) { | ||
| GGML_LOG_WARN("ggml-virtgpu-backend: %s: Invalid buffer handle from guest: %p\n", __func__, (void*)buffer); | ||
| // Set fatal flag to prevent further processing with invalid handle | ||
| apir_decoder_set_fatal(dec); | ||
| return NULL; | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the definition and all accesses to backend_buffers
echo "=== All occurrences of backend_buffers ==="
rg -n 'backend_buffers' --type=cpp --type=h
echo ""
echo "=== Context around definition (likely std::unordered_set) ==="
rg -n 'std::unordered_set.*backend_buffers|backend_buffers.*=' --type=cpp --type=h -B2 -A2
echo ""
echo "=== Check for mutex/synchronization primitives ==="
rg -n 'mutex|lock|atomic' --type=cpp --type=h -C 1 | head -100
echo ""
echo "=== Check inline function definition ==="
rg -n 'inline.*apir_decode_ggml_buffer|apir_decode_ggml_buffer' --type=cpp --type=h -B3 -A15Repository: crc-org/llama.cpp
Length of output: 37214
Address thread-safety of backend_buffers — unprotected concurrent access.
The security validation logic is sound, but std::unordered_set<ggml_backend_buffer_t> backend_buffers is accessed without synchronization. The set is modified in apir_track_backend_buffer() and apir_untrack_backend_buffer() (in apir_cs_ggml-rpc-back.cpp), while simultaneously read in this validation check. If buffer allocation/deallocation occurs on a different thread than the decode path, there is a data race. Protect all accesses to backend_buffers with a mutex, or use a thread-safe container (e.g., std::shared_mutex with read-write locks, or a lock-free data structure).
Additionally, the extern declaration inside the inline function is valid but unconventional. Move it to file scope in a dedicated header for better discoverability and maintainability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h` around lines 108 - 118,
The access to the global unordered_set backend_buffers in the validation block
is not thread-safe; modify the global to pair it with a synchronization
primitive (e.g., declare a std::shared_mutex backend_buffers_mutex alongside
backend_buffers at file scope in a shared header), move the extern declarations
out of the inline function into that header, then update readers (this
validation code in apir_cs_ggml.h) to take a shared/read lock (std::shared_lock)
before calling backend_buffers.find(...) and update apir_track_backend_buffer()
and apir_untrack_backend_buffer() to take an exclusive/unique lock
(std::unique_lock) when mutating the set so all accesses are synchronized while
preserving concurrent reads.
|
🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 05 minutes 50 seconds. 🟢 • Link to the test results. • Link to the reports index. Test configuration: |
|
/test topsail remoting_mac |
|
🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 26 minutes 15 seconds. 🟢 • Link to the test results. • Link to the reports index. Test configuration: |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp (1)
10-58:⚠️ Potential issue | 🟠 MajorExtend the virgl API to provide shmem size on the backend side, or pass the actual shmem size through the RPC protocol.
The validation gap is real:
cgraph_sizecan be validated againstsizeof(ggml_cgraph)but not against the actual mapped shmem size. The backend'svirgl_apir_callbacksinterface only providesget_shmem_ptr(res_id)which returns a pointer with no size metadata. Without access to the shmem allocation size,apir_new_decoder()could attempt to read past the mapped region if an adversary or bug causes size mismatch.To fix this properly, either:
- Extend
virgl_apir_callbacksto include aget_shmem_size(virgl_ctx_id, res_id)callback, or- Include the actual shmem size in the RPC message alongside
cgraph_size, then validatecgraph_size <= shmem_sizebefore constructing the decoder.The current architecture assumes client-side correctness (shmem allocated to exact
cgraph_sizeon line 30 ofvirtgpu-forward-backend.cpp), but the backend cannot verify this assumption.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp` around lines 10 - 58, The backend currently trusts cgraph_size from the RPC and only checks it against sizeof(ggml_cgraph) in validate_graph_operation; to fix, ensure the backend verifies cgraph_size does not exceed the actual shared-memory mapping before creating the decoder: either add a get_shmem_size(virgl_ctx_id, res_id) callback to virgl_apir_callbacks and call ctx->iface->get_shmem_size(ctx->ctx_id, shmem_res_id) (use this size to check cgraph_size <= shmem_size and call apir_decoder_set_fatal(dec)/return 1 on failure), or modify the RPC to include the shmem size alongside cgraph_size (decoded by apir_decode_size_t) and validate cgraph_size <= received_shmem_size in backend_backend_graph_compute prior to calling apir_new_decoder / using get_shmem_ptr; keep validate_graph_operation as a secondary check for minimum sizes.ggml/src/ggml-virtgpu/backend/shared/api_remoting.h (1)
76-87:⚠️ Potential issue | 🟡 MinorMap the new forward error code to a string.
apir_forward_error()doesn’t includeAPIR_FORWARD_FAILED_TO_SYNC_STREAMS, so Line 83–86 will fall through to the generic "Unknown ..." message for that code. Add it to the mapping.💡 Suggested fix
APIR_FORWARD_ERROR(APIR_FORWARD_SUCCESS); APIR_FORWARD_ERROR(APIR_FORWARD_NO_DISPATCH_FCT); APIR_FORWARD_ERROR(APIR_FORWARD_TIMEOUT); + APIR_FORWARD_ERROR(APIR_FORWARD_FAILED_TO_SYNC_STREAMS); APIR_FORWARD_ERROR(APIR_FORWARD_BASE_INDEX);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/backend/shared/api_remoting.h` around lines 76 - 87, The apir_forward_error function is missing a mapping for the new error code APIR_FORWARD_FAILED_TO_SYNC_STREAMS; update the mapping in apir_forward_error (the block that calls APIR_FORWARD_ERROR(...)) to include APIR_FORWARD_FAILED_TO_SYNC_STREAMS so that apir_forward_error returns the proper string for that enum value instead of falling through to the generic "Unknown ..." message.
♻️ Duplicate comments (2)
ggml/src/ggml-virtgpu/virtgpu-forward-impl.h (2)
31-32:⚠️ Potential issue | 🟡 MinorAdd GGML_VIRTGPU prefix to abort message.
Line 31’s abort omits the prefix used elsewhere; align for consistent logging.
🛠️ Suggested fix
- GGML_ABORT("backend function '%s' failed (return code: %d)", + GGML_ABORT(GGML_VIRTGPU "backend function '%s' failed (return code: %d)",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h` around lines 31 - 32, The GGML_ABORT invocation inside the remote call path omits the "GGML_VIRTGPU" prefix used elsewhere; update the GGML_ABORT call for REMOTE_CALL_PREPARE to include a "GGML_VIRTGPU: " (or similar) prefix in its format string so the abort message is consistent with other virtgpu logs—locate the GGML_ABORT(...) that references REMOTE_CALL_PREPARE_command_name and ret_name and prepend the prefix to the message format.
19-33:⚠️ Potential issue | 🟠 MajorAvoid unconditional abort on non‑zero backend return codes.
Line 30–32 aborts on any backend error, which prevents callers from handling non‑zero return codes. If any call site expects to interpret these codes, this becomes a hard crash. Consider a non‑aborting variant or a flag that lets callers decide.
#!/bin/bash # Inspect REMOTE_CALL usage and how callers handle ret_name rg -nP --type=cpp --type=h -C3 '\bREMOTE_CALL\s*\('🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h` around lines 19 - 33, The REMOTE_CALL macro unconditionally aborts when the backend returns a non-zero ApirForwardReturnCode (checked with REMOTE_CALL_PREPARE_command_name), which prevents callers from handling errors; update the macro (or add a new variant like REMOTE_CALL_NOABORT) so that after computing ret_name = (ApirForwardReturnCode)(ret_name - APIR_FORWARD_BASE_INDEX) it does NOT call GGML_ABORT on non-zero return codes but instead leaves ret_name set for the caller to inspect, or accepts an extra flag/handler parameter that controls whether to abort or return the code; change the implementation of REMOTE_CALL (and update any call sites using REMOTE_CALL_PREPARE_command_name / ApirForwardReturnCode) to use the non‑aborting variant where callers need to interpret errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp`:
- Around line 10-58: The backend currently trusts cgraph_size from the RPC and
only checks it against sizeof(ggml_cgraph) in validate_graph_operation; to fix,
ensure the backend verifies cgraph_size does not exceed the actual shared-memory
mapping before creating the decoder: either add a get_shmem_size(virgl_ctx_id,
res_id) callback to virgl_apir_callbacks and call
ctx->iface->get_shmem_size(ctx->ctx_id, shmem_res_id) (use this size to check
cgraph_size <= shmem_size and call apir_decoder_set_fatal(dec)/return 1 on
failure), or modify the RPC to include the shmem size alongside cgraph_size
(decoded by apir_decode_size_t) and validate cgraph_size <= received_shmem_size
in backend_backend_graph_compute prior to calling apir_new_decoder / using
get_shmem_ptr; keep validate_graph_operation as a secondary check for minimum
sizes.
In `@ggml/src/ggml-virtgpu/backend/shared/api_remoting.h`:
- Around line 76-87: The apir_forward_error function is missing a mapping for
the new error code APIR_FORWARD_FAILED_TO_SYNC_STREAMS; update the mapping in
apir_forward_error (the block that calls APIR_FORWARD_ERROR(...)) to include
APIR_FORWARD_FAILED_TO_SYNC_STREAMS so that apir_forward_error returns the
proper string for that enum value instead of falling through to the generic
"Unknown ..." message.
---
Duplicate comments:
In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h`:
- Around line 31-32: The GGML_ABORT invocation inside the remote call path omits
the "GGML_VIRTGPU" prefix used elsewhere; update the GGML_ABORT call for
REMOTE_CALL_PREPARE to include a "GGML_VIRTGPU: " (or similar) prefix in its
format string so the abort message is consistent with other virtgpu logs—locate
the GGML_ABORT(...) that references REMOTE_CALL_PREPARE_command_name and
ret_name and prepend the prefix to the message format.
- Around line 19-33: The REMOTE_CALL macro unconditionally aborts when the
backend returns a non-zero ApirForwardReturnCode (checked with
REMOTE_CALL_PREPARE_command_name), which prevents callers from handling errors;
update the macro (or add a new variant like REMOTE_CALL_NOABORT) so that after
computing ret_name = (ApirForwardReturnCode)(ret_name - APIR_FORWARD_BASE_INDEX)
it does NOT call GGML_ABORT on non-zero return codes but instead leaves ret_name
set for the caller to inspect, or accepts an extra flag/handler parameter that
controls whether to abort or return the code; change the implementation of
REMOTE_CALL (and update any call sites using REMOTE_CALL_PREPARE_command_name /
ApirForwardReturnCode) to use the non‑aborting variant where callers need to
interpret errors.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cppggml/src/ggml-virtgpu/backend/backend-dispatched.cppggml/src/ggml-virtgpu/backend/shared/api_remoting.hggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.hggml/src/ggml-virtgpu/backend/shared/apir_backend.hggml/src/ggml-virtgpu/ggml-backend-buffer-type.cppggml/src/ggml-virtgpu/ggml-backend-device.cppggml/src/ggml-virtgpu/regenerate_remoting.pyggml/src/ggml-virtgpu/virtgpu-forward-device.cppggml/src/ggml-virtgpu/virtgpu-forward-impl.hggml/src/ggml-virtgpu/virtgpu-forward.gen.hggml/src/ggml-virtgpu/virtgpu.cpp
🚧 Files skipped from review as they are similar to previous changes (8)
- ggml/src/ggml-virtgpu/virtgpu.cpp
- ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp
- ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp
- ggml/src/ggml-virtgpu/backend/shared/apir_backend.h
- ggml/src/ggml-virtgpu/virtgpu-forward.gen.h
- ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h
- ggml/src/ggml-virtgpu/ggml-backend-device.cpp
- ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
ggml/src/ggml-virtgpu/virtgpu-forward-impl.h (1)
30-33: Unconditional abort on non-zero backend return code — verify no caller relies on error-code propagation.The
GGML_VIRTGPUprefix is now present (addressing the first part of the prior review). The unconditionalGGML_ABORTon any non-zeroret_namehowever is still in place. The prior review noted thatapir_backend_initialize()invirtgpu-forward-backend.cppcontained explicit conditional logic to handle both success andAPIR_BACKEND_INITIALIZE_BACKEND_INIT_FAILED, making that logic dead code once this macro is in effect. Please confirm whether that conditional handling has been removed or if there are any callers that still expect a non-abort error-return path.#!/bin/bash # Description: Check apir_backend_initialize for remaining dead error-handling code # and all REMOTE_CALL usage to confirm no caller inspects ret_name post-macro. echo "=== apir_backend_initialize implementation ===" rg -n 'apir_backend_initialize' -A 40 -B 2 --type=cpp --type=h echo "" echo "=== All REMOTE_CALL macro invocations with surrounding context ===" rg -n 'REMOTE_CALL\b' --type=cpp --type=h -A 6 -B 2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h` around lines 30 - 33, The current macro unconditionally calls GGML_ABORT when ret_name != 0 (using GGML_VIRTGPU, REMOTE_CALL_PREPARE_command_name and ret_name), which can deaden conditional logic in callers such as apir_backend_initialize in virtgpu-forward-backend.cpp; inspect all REMOTE_CALL invocations and apir_backend_initialize to determine whether any callers expect non-abort error propagation, and then either (a) change the macro to return/propagate the error code instead of calling GGML_ABORT, or (b) update the callers to handle only aborting failures and remove dead conditional branches—ensure REMOTE_CALL_PREPARE_command_name, ret_name, and GGML_ABORT usage are updated consistently to match the chosen semantics.
🧹 Nitpick comments (2)
ggml/src/ggml-virtgpu/backend/shared/api_remoting.h (1)
86-87:APIR_FORWARD_FAILED_TO_SYNC_STREAMS(value 3) listed afterAPIR_FORWARD_BASE_INDEX(value 4) — out of numeric order.The macro-based if-chain makes this functionally correct (each check is independent), but placing a lower-valued code after a higher sentinel is confusing. Move
APIR_FORWARD_FAILED_TO_SYNC_STREAMSbeforeAPIR_FORWARD_BASE_INDEXto maintain sequential ordering consistent with the enum definition above.♻️ Proposed fix
APIR_FORWARD_ERROR(APIR_FORWARD_SUCCESS); APIR_FORWARD_ERROR(APIR_FORWARD_NO_DISPATCH_FCT); APIR_FORWARD_ERROR(APIR_FORWARD_TIMEOUT); + APIR_FORWARD_ERROR(APIR_FORWARD_FAILED_TO_SYNC_STREAMS); APIR_FORWARD_ERROR(APIR_FORWARD_BASE_INDEX); - APIR_FORWARD_ERROR(APIR_FORWARD_FAILED_TO_SYNC_STREAMS);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/backend/shared/api_remoting.h` around lines 86 - 87, The two APIR_FORWARD_ERROR entries are out of numeric order: move APIR_FORWARD_FAILED_TO_SYNC_STREAMS so it appears before APIR_FORWARD_BASE_INDEX in the list to match the enum ordering; update the sequence where APIR_FORWARD_FAILED_TO_SYNC_STREAMS and APIR_FORWARD_BASE_INDEX are declared (look for the APIR_FORWARD_ERROR macro invocations in api_remoting.h) so the lower-valued APIR_FORWARD_FAILED_TO_SYNC_STREAMS (value 3) precedes APIR_FORWARD_BASE_INDEX (value 4).ggml/src/ggml-virtgpu/virtgpu-forward-impl.h (1)
9-11:REMOTE_CALL_PREPARE_forward_flagleaks into outer scope unnecessarily; both declarations create a hidden compile-time coupling.Two issues:
REMOTE_CALL_PREPARE_forward_flagis only consumed inside thedo-whileblock yet is declared at function scope. It can be moved inside the block to reduce noise.- Both
REMOTE_CALL_PREPARE_forward_flagandREMOTE_CALL_PREPARE_command_namelive at function scope, so callingREMOTE_CALL_PREPAREtwice in the same scope produces a redeclaration compile error. More importantly,REMOTE_CALLsilently depends onREMOTE_CALL_PREPARE_command_namebeing in scope — callingREMOTE_CALLwithout a precedingREMOTE_CALL_PREPAREin the same scope will fail to compile with a non-obvious error.Consider scoping
REMOTE_CALL_PREPARE_forward_flaginside thedo-while, and documenting (or enforcing via a wrapper) theREMOTE_CALL_PREPARE→REMOTE_CALLsequencing contract.♻️ Proposed fix (move forward_flag into the do-while)
`#define` REMOTE_CALL_PREPARE(gpu_dev_name, encoder_name, apir_command_type__) \ - int32_t REMOTE_CALL_PREPARE_forward_flag = (int32_t) apir_command_type__; \ const char* REMOTE_CALL_PREPARE_command_name = apir_dispatch_command_name(apir_command_type__); \ do { \ - encoder_name = remote_call_prepare(gpu_dev_name, APIR_COMMAND_TYPE_FORWARD, REMOTE_CALL_PREPARE_forward_flag); \ + encoder_name = remote_call_prepare(gpu_dev_name, APIR_COMMAND_TYPE_FORWARD, (int32_t)(apir_command_type__)); \ if (!encoder_name) { \ GGML_ABORT(GGML_VIRTGPU "%s: failed to prepare the remote call encoder", __func__); \ } \ } while (0)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h` around lines 9 - 11, Move the temporary variables out of function scope: modify the REMOTE_CALL_PREPARE macro so REMOTE_CALL_PREPARE_forward_flag is declared inside the do-while block (not at function scope) and likewise ensure REMOTE_CALL_PREPARE_command_name is scoped locally inside that block (e.g., declare const char* within the do-while). This eliminates redeclaration errors when REMOTE_CALL_PREPARE is invoked multiple times and avoids hidden coupling with REMOTE_CALL; also add a short comment in the macro noting the required REMOTE_CALL_PREPARE → REMOTE_CALL sequencing contract (or wrap both in a helper macro/function) so callers know REMOTE_CALL depends on the prepare macro having run.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/backend/VirtGPU.md`:
- Line 155: Typo in the documentation: change the phrase "Share-memory size" to
"Shared-memory size" in the VirtGPU.md content (look for the string
"Share-memory size" in docs/backend/VirtGPU.md) so the heading reads
"**Shared-memory size**: with the `libkrun` hypervisor, the RAM + VRAM".
In `@ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h`:
- Around line 88-93: The switch uses undefined case labels
APIR_COMMAND_TYPE_BACKEND_INITIALIZE and APIR_COMMAND_TYPE_BACKEND_CLEANUP
causing a compile error; add these two identifiers to the ApirBackendCommandType
enum (using explicit numeric values consistent with the backend block, e.g.,
place them before/after APIR_COMMAND_TYPE_BACKEND_GRAPH_COMPUTE or assign next
free values) and then update APIR_BACKEND_DISPATCH_TABLE_COUNT to reflect the
new total so the dispatch table size matches the enum.
---
Duplicate comments:
In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h`:
- Around line 30-33: The current macro unconditionally calls GGML_ABORT when
ret_name != 0 (using GGML_VIRTGPU, REMOTE_CALL_PREPARE_command_name and
ret_name), which can deaden conditional logic in callers such as
apir_backend_initialize in virtgpu-forward-backend.cpp; inspect all REMOTE_CALL
invocations and apir_backend_initialize to determine whether any callers expect
non-abort error propagation, and then either (a) change the macro to
return/propagate the error code instead of calling GGML_ABORT, or (b) update the
callers to handle only aborting failures and remove dead conditional
branches—ensure REMOTE_CALL_PREPARE_command_name, ret_name, and GGML_ABORT usage
are updated consistently to match the chosen semantics.
---
Nitpick comments:
In `@ggml/src/ggml-virtgpu/backend/shared/api_remoting.h`:
- Around line 86-87: The two APIR_FORWARD_ERROR entries are out of numeric
order: move APIR_FORWARD_FAILED_TO_SYNC_STREAMS so it appears before
APIR_FORWARD_BASE_INDEX in the list to match the enum ordering; update the
sequence where APIR_FORWARD_FAILED_TO_SYNC_STREAMS and APIR_FORWARD_BASE_INDEX
are declared (look for the APIR_FORWARD_ERROR macro invocations in
api_remoting.h) so the lower-valued APIR_FORWARD_FAILED_TO_SYNC_STREAMS (value
3) precedes APIR_FORWARD_BASE_INDEX (value 4).
In `@ggml/src/ggml-virtgpu/virtgpu-forward-impl.h`:
- Around line 9-11: Move the temporary variables out of function scope: modify
the REMOTE_CALL_PREPARE macro so REMOTE_CALL_PREPARE_forward_flag is declared
inside the do-while block (not at function scope) and likewise ensure
REMOTE_CALL_PREPARE_command_name is scoped locally inside that block (e.g.,
declare const char* within the do-while). This eliminates redeclaration errors
when REMOTE_CALL_PREPARE is invoked multiple times and avoids hidden coupling
with REMOTE_CALL; also add a short comment in the macro noting the required
REMOTE_CALL_PREPARE → REMOTE_CALL sequencing contract (or wrap both in a helper
macro/function) so callers know REMOTE_CALL depends on the prepare macro having
run.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/backend/VirtGPU.mdggml/src/ggml-virtgpu/backend/shared/api_remoting.hggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.hggml/src/ggml-virtgpu/regenerate_remoting.pyggml/src/ggml-virtgpu/virtgpu-forward-impl.h
🚧 Files skipped from review as they are similar to previous changes (1)
- ggml/src/ggml-virtgpu/regenerate_remoting.py
96021ce to
2240508
Compare
|
/test topsail remoting_mac |
|
🔴 Test of 'mac_ai test pre_cleanup_ci' failed after 00 hours 00 minutes 04 seconds. 🔴 • Link to the test results. • No reports index generated... Test configuration: Failure indicator: Empty. (See run.log) |
|
/test topsail remoting_mac |
1 similar comment
|
/test topsail remoting_mac |
|
🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 31 minutes 47 seconds. 🔴 • Link to the test results. • No reports index generated... Test configuration: Failure indicator: Empty. (See run.log) |
|
🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 26 minutes 22 seconds. 🟢 • Link to the test results. • Link to the reports index. Test configuration: |
|
/test topsail remoting_mac |
|
/test topsail remoting_mac |
|
🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 26 minutes 33 seconds. 🟢 • Link to the test results. • Link to the reports index. Test configuration: |
* model : Update label for LFM2-24B-A2B ``` ❯ build/bin/llama-bench -m /data/playground/checkpoints/LFM2-24B-A2B-Preview-Q4_0.gguf,/data/playground/checkpoints/LFM2-8B-A1B-Q4_0.gguf -p 1 -n 0 | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | lfm2moe 24B.A2B Q4_0 | 12.54 GiB | 23.84 B | CPU | 10 | pp1 | 30.35 ± 2.49 | | lfm2moe 8B.A1B Q4_0 | 4.41 GiB | 8.34 B | CPU | 10 | pp1 | 49.24 ± 1.93 | ``` * Remove extra line
* gguf : prevent integer overflow for ggml_context mem size * ggml : fix int overflows in ggml_new_object() * gguf : prevent string exhaustion * gguf : prevent array elements exhaustion * ggml : fix negative tensor type oob * py : assert that alignment is non-zero power of 2 * ggml : check int overflow in ggml_new_tensor_impl and ggml_new_object * gguf-py : error on duplicate keys when reading * py : restore tensor_fields * enforce proper alignment in add_custom_alignment * gguf : better name * gguf : fix ctx size for no_alloc == true * gguf : minor print fix * ggml : print values when overflow * ggml : remove deprecated ggml_type_sizef() * ggml : relax ggml_type asserts to debug-only * gguf : add mem_size overflow test * gguf : add file size check for arrays * ggml : relax asseerts for ggml_get_type_traits() * flake8 fix --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
…outer mode (#19854) * server: fix query params lost when proxying requests in multi-model router mode * server: re-encode query params using httplib::encode_query_component in proxy
This patch adds consistency checks in the ggml-virtgpu-backend (running on the host side) to ensure that the data received from the guest is consistent (valid pointers, valid sizes and offsets).
…thods ``` 1. bck->iface.synchronize(bck) 2. buft->iface.get_alloc_size(buft, op) 3. buft->iface.get_max_size(buft) ``` these three methods are optional in the GGML interface. `get_max_size` was already properly defaulted, but `backend sychronize` and `butf get_max_size` would have segfaulted the backend if not implemented.
…lrenderer APIR (still unmerged)
… in a host/guest shared file
…64 GB with libkrun
…iles Compilation breaks when ordered alphabetically.
44e01f0 to
0f065dc
Compare
|
/test topsail remoting_mac |
|
🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 26 minutes 26 seconds. 🟢 • Link to the test results. • Link to the reports index. Test configuration: |
Make sure to read the contributing guidelines before submitting a PR
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Documentation