[QAIRT] Implement QAIRT ORT->Genie workflow by qti-kromero · Pull Request #2358 · microsoft/Olive

qti-kromero · 2026-03-14T05:30:34Z

Describe your changes

Implements a complete QAIRT workflow for converting ONNX Runtime models to Genie-compatible format through three new passes:

QairtPreparation: Executes external preparation scripts to quantize and prepare HuggingFace models for QAIRT, with configurable caching and script parameters.

QairtGenAIBuilder: Converts prepared models using QAIRT GenAIBuilder API with support for:

CPU and HTP backend targets
Device-specific optimizations (VTCM size, HVX threads, extended UDMA)
Model configurations (sequence lengths, multi-graph, model splits)

QairtEncapsulation: Wraps QAIRT DLC models in ONNX protobuf format with EPContext nodes, generating genai_config.json for onnxruntime-genai compatibility.

This enables end-to-end optimization of generative AI models for Qualcomm hardware accelerators.

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

olive/passes/qairt/encapsulation.py

olive/passes/qairt/gen_ai_builder.py

olive/passes/qairt/encapsulation.py

olive/passes/qairt/preparation.py

github-advanced-security

lintrunner found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

olive/passes/qairt/preparation.py

qti-kromero · 2026-03-20T22:49:29Z

dev testing complete and unit testing added - pending reviewers

olive/passes/qairt/gen_ai_builder.py

olive/passes/qairt/encapsulation.py

olive/passes/qairt/gen_ai_builder.py

olive/passes/qairt/encapsulation.py

qti-kromero · 2026-03-25T21:13:28Z

@jambayk @xiaoyu-work would it be possible to get a reviewer added to this

jambayk · 2026-03-26T00:08:20Z

olive/passes/qairt/encapsulation.py

+        return {
+            "backend": PassConfigParam(
+                type_=str,
+                default_value="CPU",


the PR mostly looks good to me but could you use an enum with the options for backend and log_level like in

Olive/olive/passes/pytorch/rotate.py

Line 40 in 85a754a

class RotateMode(StrEnumBase):

this gives it automatic validation of the allowed values. thanks!

Copilot

Pull request overview

Adds a new QAIRT pipeline to Olive to support an end-to-end ORT → QAIRT (Genie) workflow, including model preparation, QAIRT GenAIBuilder compilation, and ONNX encapsulation for onnxruntime-genai compatibility.

Changes:

Introduces three new QAIRT passes: preparation (external script runner), GenAIBuilder (CPU/HTP backends), and encapsulation (EPContext ONNX wrapper + genai_config.json).
Adds QAIRT model handlers and new Framework/ModelFileFormat enums for QAIRT artifacts.
Adds a new pytest suite for the QAIRT passes and updates Olive’s pass registry configuration (olive_config.json) with QAIRT entries and extra dependencies.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
`olive/passes/qairt/preparation.py`	New pass to run external preparation scripts and emit `QairtPreparedModelHandler`.
`olive/passes/qairt/gen_ai_builder.py`	New pass to compile/build QAIRT artifacts via QAIRT GenAIBuilder for CPU/HTP targets.
`olive/passes/qairt/encapsulation.py`	New pass to export DLC + wrap it into an ONNX EPContext model and generate `genai_config.json`.
`olive/passes/qairt/__init__.py`	Adds QAIRT passes package.
`olive/model/handler/qairt.py`	Adds QAIRT model handler types (`QairtPreparedModelHandler`, `QairtModelHandler`).
`olive/model/handler/__init__.py`	Exposes the new QAIRT model handlers from the handler package.
`olive/constants.py`	Adds QAIRT framework and QAIRT model file formats.
`olive/olive_config.json`	Registers new QAIRT passes and adds QAIRT extra dependency mapping.
`test/passes/qairt/conftest.py`	Adds shared fixtures for mocking QAIRT modules and model handlers.
`test/passes/qairt/test_preparation.py`	Unit tests for QairtPreparation behavior and subprocess streaming.
`test/passes/qairt/test_gen_ai_builder.py`	Unit tests for GenAIBuilder CPU/HTP behavior and validation paths.
`test/passes/qairt/test_encapsulation.py`	Unit tests for encapsulation, DLC discovery, ONNX generation, and genai_config creation.

Copilot · 2026-03-26T00:17:05Z

olive/model/handler/qairt.py

+    def run_session(
+        self,
+        session: Any = None,
+        inputs: Union[dict[str, Any], list[Any], tuple[Any, ...]] = None,
+        **kwargs: dict[str, Any],
+    ) -> Any:
+        raise NotImplementedError("QairtPreparedModelHandler does not support prepare_session")
+


run_session raises NotImplementedError but the message says it doesn’t support "prepare_session". This is misleading when debugging (and appears to be copy/paste). Update the error message to refer to run_session.

Copilot · 2026-03-26T00:17:06Z

olive/model/handler/qairt.py

+    @property
+    def size_on_disk(self) -> int:
+        """Compute size of the model on disk."""
+        return 0


size_on_disk always returns 0 for QairtModelHandler as well, which will break SIZE_ON_DISK reporting for compiled QAIRT models. Consider implementing directory size calculation (or explicitly raising NotImplementedError) to avoid silently incorrect metrics.

Copilot · 2026-03-26T00:17:06Z

olive/passes/qairt/gen_ai_builder.py

+        # Can only set target and transformation configurations if the BE is HTP
+        if config.backend == qairt.BackendType.HTP.value:
+            # Device configs
+            gen_ai_builder.set_targets([config.soc_details])


For HTP, set_targets([config.soc_details]) is called unconditionally, but soc_details defaults to None and validate_config doesn’t enforce it. Passing [None] into the QAIRT API is likely to fail at runtime. Add a validation/guard so HTP requires a non-empty soc_details (or skip set_targets when it’s unset and rely on QAIRT defaults).

Suggested change

gen_ai_builder.set_targets([config.soc_details])

if config.soc_details:

gen_ai_builder.set_targets([config.soc_details])

Copilot · 2026-03-26T00:17:06Z

olive/passes/qairt/encapsulation.py

+            "log_level": PassConfigParam(
+                type_=str,
+                default_value=None,
+                description="Log level to be used within underlying QAIRT components."
+                "Valid values: DEBUG, INFO, WARN, ERROR.",
+            ),
+            "run_checker": PassConfigParam(
+                type_=bool,
+                default_value=False,
+                description="Runs the onnx checker on the model before it is encapsulated.",
+            ),


The log_level config option is defined but never used in this pass. Either wire it up (e.g., set the same QAIRT_LOG_LEVEL env var used in QairtGenAIBuilder) or remove it to avoid confusing users with a no-op parameter.

Copilot · 2026-03-26T00:17:06Z

olive/passes/qairt/encapsulation.py

+    genai_config["model"]["decoder"]["head_size"] = src_config.get("hidden_size", -1) // src_config.get(
+        "num_attention_heads", -1
+    )
+    genai_config["model"]["decoder"]["hidden_size"] = src_config.get("hidden_size", -1)


head_size is computed as hidden_size // num_attention_heads using default values of -1 when keys are missing. This can silently produce incorrect values (e.g., -1 // -1 == 1) or raise if num_attention_heads is 0/non-int. Mirror the safer logic used in olive/passes/openvino/ov_utils.py:create_genai_config by validating both values are positive ints before dividing; otherwise set head_size to -1.

Suggested change

genai_config["model"]["decoder"]["head_size"] = src_config.get("hidden_size", -1) // src_config.get(

"num_attention_heads", -1

)

genai_config["model"]["decoder"]["hidden_size"] = src_config.get("hidden_size", -1)

hidden_size = src_config.get("hidden_size", -1)

num_attention_heads = src_config.get("num_attention_heads", -1)

head_size = -1

if isinstance(hidden_size, int) and isinstance(num_attention_heads, int) and hidden_size > 0 and num_attention_heads > 0:

head_size = hidden_size // num_attention_heads

genai_config["model"]["decoder"]["head_size"] = head_size

genai_config["model"]["decoder"]["hidden_size"] = hidden_size

Copilot · 2026-03-26T00:17:08Z

olive/model/handler/qairt.py

+    @property
+    def size_on_disk(self) -> int:
+        """Compute size of the model on disk."""
+        return 0


size_on_disk always returns 0, which will make the built-in SIZE_ON_DISK metric report incorrect results for QAIRT models. Implement actual size computation for local paths (sum file sizes under model_path, similar to Diffusers/HF handlers) or raise NotImplementedError if it’s intentionally unsupported.

Copilot · 2026-03-26T00:17:08Z

olive/passes/qairt/encapsulation.py

+        # Input/Ouptut metadata
+        container.inputs = [("input_ids", TensorProto.INT32, ["batch_size", "sequence_length"])]
+        container.outputs = [("logits", TensorProto.FLOAT, ["batch_size", 1, "vocab_size"])]
+
+        input_info = {inp[0]: (inp[1], inp[2]) for inp in container.inputs}
+
+        output_info = {out[0]: (out[1], out[2]) for out in container.outputs}
+
+        # Input/Output tensor helpers
+        inputs = []
+        for name, datatype, shape in container.inputs:
+            inputs.append(helper.make_tensor_value_info(name, datatype, shape))
+
+        outputs = []
+        for name, datatype, shape in container.outputs:


The pass overwrites container.inputs/container.outputs with hard-coded metadata (input_ids/logits, fixed dtypes/shapes). If the loaded QAIRT container already exposes the correct I/O (or if models include additional inputs like attention_mask / KV cache), this will generate an ONNX wrapper with the wrong interface. Prefer reading the inputs/outputs from the container (and only normalizing/adjusting when necessary), rather than replacing them unconditionally.

Suggested change

# Input/Ouptut metadata

container.inputs = [("input_ids", TensorProto.INT32, ["batch_size", "sequence_length"])]

container.outputs = [("logits", TensorProto.FLOAT, ["batch_size", 1, "vocab_size"])]

input_info = {inp[0]: (inp[1], inp[2]) for inp in container.inputs}

output_info = {out[0]: (out[1], out[2]) for out in container.outputs}

# Input/Output tensor helpers

inputs = []

for name, datatype, shape in container.inputs:

inputs.append(helper.make_tensor_value_info(name, datatype, shape))

outputs = []

for name, datatype, shape in container.outputs:

# Input/Output metadata

# Prefer using the inputs/outputs exposed by the container, and only fall back to

# default metadata when they are not available.

input_metas = getattr(container, "inputs", None) or [

("input_ids", TensorProto.INT32, ["batch_size", "sequence_length"]),

]

output_metas = getattr(container, "outputs", None) or [

("logits", TensorProto.FLOAT, ["batch_size", 1, "vocab_size"]),

]

input_info = {inp[0]: (inp[1], inp[2]) for inp in input_metas}

output_info = {out[0]: (out[1], out[2]) for out in output_metas}

# Input/Output tensor helpers

inputs = []

for name, datatype, shape in input_metas:

inputs.append(helper.make_tensor_value_info(name, datatype, shape))

outputs = []

for name, datatype, shape in output_metas:

Copilot · 2026-03-26T00:17:08Z

olive/passes/qairt/preparation.py

+        # Prepare configuration for the script
+        cache_dir_path = Path(config.cache_dir).resolve()
+        script_config = {
+            "ADASCALE_DIR": str(cache_dir_path / "adascale"),
+            "CACHE_DIR": str(cache_dir_path),
+            "OUTPUT_DIR": str(output_model_path),
+        }
+
+        # Merge user-provided config
+        if config.script_config:
+            script_config.update(config.script_config)
+


The JSON passed to the preparation script doesn’t include the input model path. The docstring says the user config is merged with input/output paths, but only OUTPUT_DIR/CACHE_DIR/ADASCALE_DIR are set, so an external script has no reliable way to locate the HuggingFace model to prepare. Add an explicit key (e.g., INPUT_DIR or MODEL_DIR) pointing to model.model_path (and keep naming consistent with the preparation script contract).

Copilot · 2026-03-26T00:17:08Z

olive/passes/qairt/preparation.py

+            with subprocess.Popen(
+                ["python", str(script_path), "--config", config_file_path],
+                cwd=str(script_path.parent),


The subprocess is invoked via the literal executable name "python", which can pick up a different interpreter than the one running Olive (e.g., in venv/conda). Use sys.executable (or equivalent) so the preparation script runs under the same Python environment and has access to the same installed dependencies.

Copilot · 2026-03-26T00:17:08Z

test/passes/qairt/test_preparation.py

+def test_preparation_temp_config_cleanup(tmp_path, mock_hf_model):
+    """Test that temporary config file is cleaned up."""
+    script_path = tmp_path / "prep_script.py"
+    script_path.write_text("# Mock script")
+    output_path = tmp_path / "output"
+
+    mock_process = MagicMock()
+    mock_process.poll.side_effect = [None, 0]
+    mock_process.wait.return_value = 0
+    # Ensure context manager returns the mock_process itself
+    mock_process.__enter__ = Mock(return_value=mock_process)
+    mock_process.__exit__ = Mock(return_value=False)
+    mock_process.stdout = MagicMock()
+
+    def stdout_generator():
+        yield "Done\n"
+        while True:
+            yield ""
+
+    mock_process.stdout.readline = Mock(side_effect=stdout_generator())
+    mock_process.stderr = MagicMock()
+
+    def stderr_generator():
+        while True:
+            yield ""
+
+    mock_process.stderr.readline = Mock(side_effect=stderr_generator())
+
+    with (
+        patch("subprocess.Popen", return_value=mock_process),
+        patch("tempfile.NamedTemporaryFile") as mock_temp,
+    ):
+        temp_file_path = tmp_path / "olive_qairt_prep_test.json"
+
+        mock_file = MagicMock()
+        mock_file.name = str(temp_file_path)
+        mock_file.__enter__ = Mock(return_value=mock_file)
+        mock_file.__exit__ = Mock(return_value=False)
+        mock_temp.return_value = mock_file
+
+        prep_pass = create_pass_from_dict(
+            QairtPreparation,
+            {"script_path": str(script_path)},
+            disable_search=True,
+        )
+
+        prep_pass.run(mock_hf_model, str(output_path))
+
+        # Verify temp file would be cleaned up (unlink called)
+        # Note: In actual implementation, cleanup happens in finally block


This test is incomplete: it patches NamedTemporaryFile but doesn’t assert any observable cleanup behavior (e.g., that Path(config_file_path).unlink() was invoked, or that the temp file no longer exists). As written it will pass even if the production code leaks temp config files. Add an assertion around the cleanup side effect to make the test meaningful.

test/passes/qairt/test_preparation.py

@@ -0,0 +1,391 @@
+# -------------------------------------------------------------------------


test/passes/qairt/test_preparation.py

+
+def test_preparation_uses_sys_executable_and_env(tmp_path, mock_hf_model, mock_qairt_modules):
+    """Test that subprocess uses sys.executable and passes environment."""
+    import os


test/passes/qairt/test_preparation.py

+
+def test_preparation_uses_sys_executable_and_env(tmp_path, mock_hf_model, mock_qairt_modules):
+    """Test that subprocess uses sys.executable and passes environment."""
+    import os


github-advanced-security bot found potential problems Mar 14, 2026

View reviewed changes

qti-kromero added 6 commits March 18, 2026 08:36

Initial GenAIBuilder pass

a77a8e1

Add QairtEncapsulation pass

27615af

Implement QairtEncapsulation pass

33a91a2

Adascale improvements

7f3f0c4

Add extended_udma support and misc fixes

d9181a5

add vtcm_mb and hvx_threads

fdaea52

qti-kromero force-pushed the dev/qti-kromero/ort-genie-workflow branch from 5badb8e to fdaea52 Compare March 18, 2026 15:41

Address linting issues and cleanup

746ea1f

github-advanced-security bot found potential problems Mar 18, 2026

View reviewed changes

olive/passes/qairt/preparation.py Fixed Show fixed Hide fixed

olive/passes/qairt/preparation.py Fixed Show fixed Hide fixed

Add comment for empty except

efc9397

qti-kromero marked this pull request as ready for review March 18, 2026 16:57

qti-kromero added 2 commits March 18, 2026 16:53

Merge branch 'main' into dev/qti-kromero/ort-genie-workflow

03cee7b

add unit tests

8b4f125

qti-kromero commented Mar 23, 2026

View reviewed changes

olive/passes/qairt/gen_ai_builder.py Show resolved Hide resolved

olive/passes/qairt/encapsulation.py Show resolved Hide resolved

olive/passes/qairt/gen_ai_builder.py Show resolved Hide resolved

qti-kromero commented Mar 24, 2026

View reviewed changes

olive/passes/qairt/encapsulation.py Outdated Show resolved Hide resolved

qti-kromero added 2 commits March 25, 2026 14:05

Address review comments

117dfef

Merge branch 'main' into dev/qti-kromero/ort-genie-workflow

a3dcdc5

jambayk reviewed Mar 26, 2026

View reviewed changes

jambayk requested a review from Copilot March 26, 2026 00:10

Copilot started reviewing on behalf of jambayk March 26, 2026 00:11 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

qti-kromero added 3 commits March 25, 2026 21:45

Fix qairt-dev[onnx] issue

6fe0d98

Fix qairt import issue

5ef1ba4

fix import issues for real

5da511a

github-advanced-security bot found potential problems Mar 26, 2026

View reviewed changes

	gen_ai_builder.set_targets([config.soc_details])
	if config.soc_details:
	gen_ai_builder.set_targets([config.soc_details])

-    genai_config["model"]["decoder"]["head_size"] = src_config.get("hidden_size", -1) // src_config.get(
-        "num_attention_heads", -1
-    )
-    genai_config["model"]["decoder"]["hidden_size"] = src_config.get("hidden_size", -1)
+    hidden_size = src_config.get("hidden_size", -1)
+    num_attention_heads = src_config.get("num_attention_heads", -1)
+    head_size = -1
+    if isinstance(hidden_size, int) and isinstance(num_attention_heads, int) and hidden_size > 0 and num_attention_heads > 0:
+        head_size = hidden_size // num_attention_heads
+    genai_config["model"]["decoder"]["head_size"] = head_size
+    genai_config["model"]["decoder"]["hidden_size"] = hidden_size

		@@ -0,0 +1,391 @@
		# -------------------------------------------------------------------------

Conversation

qti-kromero commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Checklist before requesting a review

(Optional) Issue link

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

qti-kromero commented Mar 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qti-kromero commented Mar 25, 2026

Uh oh!

jambayk Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Check warning

Check warning

Check warning

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qti-kromero commented Mar 14, 2026 •

edited

Loading