Skip to content

[QAIRT] Implement QAIRT ORT->Genie workflow#2358

Open
qti-kromero wants to merge 15 commits intomicrosoft:mainfrom
CodeLinaro:dev/qti-kromero/ort-genie-workflow
Open

[QAIRT] Implement QAIRT ORT->Genie workflow#2358
qti-kromero wants to merge 15 commits intomicrosoft:mainfrom
CodeLinaro:dev/qti-kromero/ort-genie-workflow

Conversation

@qti-kromero
Copy link
Contributor

@qti-kromero qti-kromero commented Mar 14, 2026

Describe your changes

Implements a complete QAIRT workflow for converting ONNX Runtime models to Genie-compatible format through three new passes:

QairtPreparation: Executes external preparation scripts to quantize and prepare HuggingFace models for QAIRT, with configurable caching and script parameters.

QairtGenAIBuilder: Converts prepared models using QAIRT GenAIBuilder API with support for:

  • CPU and HTP backend targets
  • Device-specific optimizations (VTCM size, HVX threads, extended UDMA)
  • Model configurations (sequence lengths, multi-graph, model splits)

QairtEncapsulation: Wraps QAIRT DLC models in ONNX protobuf format with EPContext nodes, generating genai_config.json for onnxruntime-genai compatibility.

This enables end-to-end optimization of generative AI models for Qualcomm hardware accelerators.

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

Copy link
Contributor

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lintrunner found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@qti-kromero qti-kromero force-pushed the dev/qti-kromero/ort-genie-workflow branch from 5badb8e to fdaea52 Compare March 18, 2026 15:41
@qti-kromero qti-kromero marked this pull request as ready for review March 18, 2026 16:57
@qti-kromero
Copy link
Contributor Author

dev testing complete and unit testing added - pending reviewers

@qti-kromero
Copy link
Contributor Author

@jambayk @xiaoyu-work would it be possible to get a reviewer added to this

return {
"backend": PassConfigParam(
type_=str,
default_value="CPU",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR mostly looks good to me but could you use an enum with the options for backend and log_level like in

class RotateMode(StrEnumBase):

this gives it automatic validation of the allowed values. thanks!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new QAIRT pipeline to Olive to support an end-to-end ORT → QAIRT (Genie) workflow, including model preparation, QAIRT GenAIBuilder compilation, and ONNX encapsulation for onnxruntime-genai compatibility.

Changes:

  • Introduces three new QAIRT passes: preparation (external script runner), GenAIBuilder (CPU/HTP backends), and encapsulation (EPContext ONNX wrapper + genai_config.json).
  • Adds QAIRT model handlers and new Framework/ModelFileFormat enums for QAIRT artifacts.
  • Adds a new pytest suite for the QAIRT passes and updates Olive’s pass registry configuration (olive_config.json) with QAIRT entries and extra dependencies.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
olive/passes/qairt/preparation.py New pass to run external preparation scripts and emit QairtPreparedModelHandler.
olive/passes/qairt/gen_ai_builder.py New pass to compile/build QAIRT artifacts via QAIRT GenAIBuilder for CPU/HTP targets.
olive/passes/qairt/encapsulation.py New pass to export DLC + wrap it into an ONNX EPContext model and generate genai_config.json.
olive/passes/qairt/__init__.py Adds QAIRT passes package.
olive/model/handler/qairt.py Adds QAIRT model handler types (QairtPreparedModelHandler, QairtModelHandler).
olive/model/handler/__init__.py Exposes the new QAIRT model handlers from the handler package.
olive/constants.py Adds QAIRT framework and QAIRT model file formats.
olive/olive_config.json Registers new QAIRT passes and adds QAIRT extra dependency mapping.
test/passes/qairt/conftest.py Adds shared fixtures for mocking QAIRT modules and model handlers.
test/passes/qairt/test_preparation.py Unit tests for QairtPreparation behavior and subprocess streaming.
test/passes/qairt/test_gen_ai_builder.py Unit tests for GenAIBuilder CPU/HTP behavior and validation paths.
test/passes/qairt/test_encapsulation.py Unit tests for encapsulation, DLC discovery, ONNX generation, and genai_config creation.

Comment on lines +54 to +61
def run_session(
self,
session: Any = None,
inputs: Union[dict[str, Any], list[Any], tuple[Any, ...]] = None,
**kwargs: dict[str, Any],
) -> Any:
raise NotImplementedError("QairtPreparedModelHandler does not support prepare_session")

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_session raises NotImplementedError but the message says it doesn’t support "prepare_session". This is misleading when debugging (and appears to be copy/paste). Update the error message to refer to run_session.

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +85
@property
def size_on_disk(self) -> int:
"""Compute size of the model on disk."""
return 0
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_on_disk always returns 0 for QairtModelHandler as well, which will break SIZE_ON_DISK reporting for compiled QAIRT models. Consider implementing directory size calculation (or explicitly raising NotImplementedError) to avoid silently incorrect metrics.

Copilot uses AI. Check for mistakes.
# Can only set target and transformation configurations if the BE is HTP
if config.backend == qairt.BackendType.HTP.value:
# Device configs
gen_ai_builder.set_targets([config.soc_details])
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For HTP, set_targets([config.soc_details]) is called unconditionally, but soc_details defaults to None and validate_config doesn’t enforce it. Passing [None] into the QAIRT API is likely to fail at runtime. Add a validation/guard so HTP requires a non-empty soc_details (or skip set_targets when it’s unset and rely on QAIRT defaults).

Suggested change
gen_ai_builder.set_targets([config.soc_details])
if config.soc_details:
gen_ai_builder.set_targets([config.soc_details])

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +45
"log_level": PassConfigParam(
type_=str,
default_value=None,
description="Log level to be used within underlying QAIRT components."
"Valid values: DEBUG, INFO, WARN, ERROR.",
),
"run_checker": PassConfigParam(
type_=bool,
default_value=False,
description="Runs the onnx checker on the model before it is encapsulated.",
),
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log_level config option is defined but never used in this pass. Either wire it up (e.g., set the same QAIRT_LOG_LEVEL env var used in QairtGenAIBuilder) or remove it to avoid confusing users with a no-op parameter.

Copilot uses AI. Check for mistakes.
Comment on lines +257 to +260
genai_config["model"]["decoder"]["head_size"] = src_config.get("hidden_size", -1) // src_config.get(
"num_attention_heads", -1
)
genai_config["model"]["decoder"]["hidden_size"] = src_config.get("hidden_size", -1)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

head_size is computed as hidden_size // num_attention_heads using default values of -1 when keys are missing. This can silently produce incorrect values (e.g., -1 // -1 == 1) or raise if num_attention_heads is 0/non-int. Mirror the safer logic used in olive/passes/openvino/ov_utils.py:create_genai_config by validating both values are positive ints before dividing; otherwise set head_size to -1.

Suggested change
genai_config["model"]["decoder"]["head_size"] = src_config.get("hidden_size", -1) // src_config.get(
"num_attention_heads", -1
)
genai_config["model"]["decoder"]["hidden_size"] = src_config.get("hidden_size", -1)
hidden_size = src_config.get("hidden_size", -1)
num_attention_heads = src_config.get("num_attention_heads", -1)
head_size = -1
if isinstance(hidden_size, int) and isinstance(num_attention_heads, int) and hidden_size > 0 and num_attention_heads > 0:
head_size = hidden_size // num_attention_heads
genai_config["model"]["decoder"]["head_size"] = head_size
genai_config["model"]["decoder"]["hidden_size"] = hidden_size

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +40
@property
def size_on_disk(self) -> int:
"""Compute size of the model on disk."""
return 0
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_on_disk always returns 0, which will make the built-in SIZE_ON_DISK metric report incorrect results for QAIRT models. Implement actual size computation for local paths (sum file sizes under model_path, similar to Diffusers/HF handlers) or raise NotImplementedError if it’s intentionally unsupported.

Copilot uses AI. Check for mistakes.
Comment on lines +78 to +92
# Input/Ouptut metadata
container.inputs = [("input_ids", TensorProto.INT32, ["batch_size", "sequence_length"])]
container.outputs = [("logits", TensorProto.FLOAT, ["batch_size", 1, "vocab_size"])]

input_info = {inp[0]: (inp[1], inp[2]) for inp in container.inputs}

output_info = {out[0]: (out[1], out[2]) for out in container.outputs}

# Input/Output tensor helpers
inputs = []
for name, datatype, shape in container.inputs:
inputs.append(helper.make_tensor_value_info(name, datatype, shape))

outputs = []
for name, datatype, shape in container.outputs:
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pass overwrites container.inputs/container.outputs with hard-coded metadata (input_ids/logits, fixed dtypes/shapes). If the loaded QAIRT container already exposes the correct I/O (or if models include additional inputs like attention_mask / KV cache), this will generate an ONNX wrapper with the wrong interface. Prefer reading the inputs/outputs from the container (and only normalizing/adjusting when necessary), rather than replacing them unconditionally.

Suggested change
# Input/Ouptut metadata
container.inputs = [("input_ids", TensorProto.INT32, ["batch_size", "sequence_length"])]
container.outputs = [("logits", TensorProto.FLOAT, ["batch_size", 1, "vocab_size"])]
input_info = {inp[0]: (inp[1], inp[2]) for inp in container.inputs}
output_info = {out[0]: (out[1], out[2]) for out in container.outputs}
# Input/Output tensor helpers
inputs = []
for name, datatype, shape in container.inputs:
inputs.append(helper.make_tensor_value_info(name, datatype, shape))
outputs = []
for name, datatype, shape in container.outputs:
# Input/Output metadata
# Prefer using the inputs/outputs exposed by the container, and only fall back to
# default metadata when they are not available.
input_metas = getattr(container, "inputs", None) or [
("input_ids", TensorProto.INT32, ["batch_size", "sequence_length"]),
]
output_metas = getattr(container, "outputs", None) or [
("logits", TensorProto.FLOAT, ["batch_size", 1, "vocab_size"]),
]
input_info = {inp[0]: (inp[1], inp[2]) for inp in input_metas}
output_info = {out[0]: (out[1], out[2]) for out in output_metas}
# Input/Output tensor helpers
inputs = []
for name, datatype, shape in input_metas:
inputs.append(helper.make_tensor_value_info(name, datatype, shape))
outputs = []
for name, datatype, shape in output_metas:

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +101
# Prepare configuration for the script
cache_dir_path = Path(config.cache_dir).resolve()
script_config = {
"ADASCALE_DIR": str(cache_dir_path / "adascale"),
"CACHE_DIR": str(cache_dir_path),
"OUTPUT_DIR": str(output_model_path),
}

# Merge user-provided config
if config.script_config:
script_config.update(config.script_config)

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON passed to the preparation script doesn’t include the input model path. The docstring says the user config is merged with input/output paths, but only OUTPUT_DIR/CACHE_DIR/ADASCALE_DIR are set, so an external script has no reliable way to locate the HuggingFace model to prepare. Add an explicit key (e.g., INPUT_DIR or MODEL_DIR) pointing to model.model_path (and keep naming consistent with the preparation script contract).

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +123
with subprocess.Popen(
["python", str(script_path), "--config", config_file_path],
cwd=str(script_path.parent),
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subprocess is invoked via the literal executable name "python", which can pick up a different interpreter than the one running Olive (e.g., in venv/conda). Use sys.executable (or equivalent) so the preparation script runs under the same Python environment and has access to the same installed dependencies.

Copilot uses AI. Check for mistakes.
Comment on lines +280 to +329
def test_preparation_temp_config_cleanup(tmp_path, mock_hf_model):
"""Test that temporary config file is cleaned up."""
script_path = tmp_path / "prep_script.py"
script_path.write_text("# Mock script")
output_path = tmp_path / "output"

mock_process = MagicMock()
mock_process.poll.side_effect = [None, 0]
mock_process.wait.return_value = 0
# Ensure context manager returns the mock_process itself
mock_process.__enter__ = Mock(return_value=mock_process)
mock_process.__exit__ = Mock(return_value=False)
mock_process.stdout = MagicMock()

def stdout_generator():
yield "Done\n"
while True:
yield ""

mock_process.stdout.readline = Mock(side_effect=stdout_generator())
mock_process.stderr = MagicMock()

def stderr_generator():
while True:
yield ""

mock_process.stderr.readline = Mock(side_effect=stderr_generator())

with (
patch("subprocess.Popen", return_value=mock_process),
patch("tempfile.NamedTemporaryFile") as mock_temp,
):
temp_file_path = tmp_path / "olive_qairt_prep_test.json"

mock_file = MagicMock()
mock_file.name = str(temp_file_path)
mock_file.__enter__ = Mock(return_value=mock_file)
mock_file.__exit__ = Mock(return_value=False)
mock_temp.return_value = mock_file

prep_pass = create_pass_from_dict(
QairtPreparation,
{"script_path": str(script_path)},
disable_search=True,
)

prep_pass.run(mock_hf_model, str(output_path))

# Verify temp file would be cleaned up (unlink called)
# Note: In actual implementation, cleanup happens in finally block
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is incomplete: it patches NamedTemporaryFile but doesn’t assert any observable cleanup behavior (e.g., that Path(config_file_path).unlink() was invoked, or that the temp file no longer exists). As written it will pass even if the production code leaks temp config files. Add an assertion around the cleanup side effect to make the test meaningful.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,391 @@
# -------------------------------------------------------------------------

Check warning

Code scanning / lintrunner

RUFF/format Warning test

Run lintrunner -a to apply this patch.

def test_preparation_uses_sys_executable_and_env(tmp_path, mock_hf_model, mock_qairt_modules):
"""Test that subprocess uses sys.executable and passes environment."""
import os

Check warning

Code scanning / lintrunner

PYLINT/W0611 Warning test

Unused import os (unused-import)
See unused-import.

def test_preparation_uses_sys_executable_and_env(tmp_path, mock_hf_model, mock_qairt_modules):
"""Test that subprocess uses sys.executable and passes environment."""
import os

Check warning

Code scanning / lintrunner

RUFF/F401 Warning test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants