Skip to content

Fix attention size computation error in OpenVINO backend for LLM#131

Merged
wine99 merged 1 commit intoravi9:dev_backend_openvinofrom
zhaixuejun1993:xuejun/fix-error-sttention-size
Apr 13, 2026
Merged

Fix attention size computation error in OpenVINO backend for LLM#131
wine99 merged 1 commit intoravi9:dev_backend_openvinofrom
zhaixuejun1993:xuejun/fix-error-sttention-size

Conversation

@zhaixuejun1993
Copy link
Copy Markdown
Collaborator

This pull request introduces several updates to the OpenVINO GGML decoder logic, primarily improving the detection and handling of attention-related operations and key-value cache identification. The changes enhance robustness when parsing computational graphs and ensure that certain preprocessing steps are only applied in appropriate contexts.

Improvements to attention and KV cache handling:

  • Improved the logic in compute_llm_params to check for deeper source validity when handling GGML_OP_SOFT_MAX, preventing potential null pointer dereferences.
  • Added new logic to extract attention_size from specific GGML_OP_MUL_MAT graph patterns involving permute and view operations, increasing accuracy in parameter computation for attention mechanisms.
  • Refactored the is_kvcache static method to prioritize buffer usage checks, making the order of conditions more robust and consistent.

Preprocessing logic improvements:

  • Updated the preprocess function to only add sliced masks when the decoder is stateful, preventing unnecessary operations for stateless models.## Overview

Additional information

Requirements

@wine99 wine99 merged commit 36daf24 into ravi9:dev_backend_openvino Apr 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants