[QAIRT] Implement QAIRT ORT->Genie workflow#2358
[QAIRT] Implement QAIRT ORT->Genie workflow#2358qti-kromero wants to merge 15 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
lintrunner found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
5badb8e to
fdaea52
Compare
|
dev testing complete and unit testing added - pending reviewers |
|
@jambayk @xiaoyu-work would it be possible to get a reviewer added to this |
| return { | ||
| "backend": PassConfigParam( | ||
| type_=str, | ||
| default_value="CPU", |
There was a problem hiding this comment.
the PR mostly looks good to me but could you use an enum with the options for backend and log_level like in
Olive/olive/passes/pytorch/rotate.py
Line 40 in 85a754a
this gives it automatic validation of the allowed values. thanks!
There was a problem hiding this comment.
Pull request overview
Adds a new QAIRT pipeline to Olive to support an end-to-end ORT → QAIRT (Genie) workflow, including model preparation, QAIRT GenAIBuilder compilation, and ONNX encapsulation for onnxruntime-genai compatibility.
Changes:
- Introduces three new QAIRT passes: preparation (external script runner), GenAIBuilder (CPU/HTP backends), and encapsulation (EPContext ONNX wrapper + genai_config.json).
- Adds QAIRT model handlers and new Framework/ModelFileFormat enums for QAIRT artifacts.
- Adds a new pytest suite for the QAIRT passes and updates Olive’s pass registry configuration (
olive_config.json) with QAIRT entries and extra dependencies.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
olive/passes/qairt/preparation.py |
New pass to run external preparation scripts and emit QairtPreparedModelHandler. |
olive/passes/qairt/gen_ai_builder.py |
New pass to compile/build QAIRT artifacts via QAIRT GenAIBuilder for CPU/HTP targets. |
olive/passes/qairt/encapsulation.py |
New pass to export DLC + wrap it into an ONNX EPContext model and generate genai_config.json. |
olive/passes/qairt/__init__.py |
Adds QAIRT passes package. |
olive/model/handler/qairt.py |
Adds QAIRT model handler types (QairtPreparedModelHandler, QairtModelHandler). |
olive/model/handler/__init__.py |
Exposes the new QAIRT model handlers from the handler package. |
olive/constants.py |
Adds QAIRT framework and QAIRT model file formats. |
olive/olive_config.json |
Registers new QAIRT passes and adds QAIRT extra dependency mapping. |
test/passes/qairt/conftest.py |
Adds shared fixtures for mocking QAIRT modules and model handlers. |
test/passes/qairt/test_preparation.py |
Unit tests for QairtPreparation behavior and subprocess streaming. |
test/passes/qairt/test_gen_ai_builder.py |
Unit tests for GenAIBuilder CPU/HTP behavior and validation paths. |
test/passes/qairt/test_encapsulation.py |
Unit tests for encapsulation, DLC discovery, ONNX generation, and genai_config creation. |
| def run_session( | ||
| self, | ||
| session: Any = None, | ||
| inputs: Union[dict[str, Any], list[Any], tuple[Any, ...]] = None, | ||
| **kwargs: dict[str, Any], | ||
| ) -> Any: | ||
| raise NotImplementedError("QairtPreparedModelHandler does not support prepare_session") | ||
|
|
There was a problem hiding this comment.
run_session raises NotImplementedError but the message says it doesn’t support "prepare_session". This is misleading when debugging (and appears to be copy/paste). Update the error message to refer to run_session.
| @property | ||
| def size_on_disk(self) -> int: | ||
| """Compute size of the model on disk.""" | ||
| return 0 |
There was a problem hiding this comment.
size_on_disk always returns 0 for QairtModelHandler as well, which will break SIZE_ON_DISK reporting for compiled QAIRT models. Consider implementing directory size calculation (or explicitly raising NotImplementedError) to avoid silently incorrect metrics.
| # Can only set target and transformation configurations if the BE is HTP | ||
| if config.backend == qairt.BackendType.HTP.value: | ||
| # Device configs | ||
| gen_ai_builder.set_targets([config.soc_details]) |
There was a problem hiding this comment.
For HTP, set_targets([config.soc_details]) is called unconditionally, but soc_details defaults to None and validate_config doesn’t enforce it. Passing [None] into the QAIRT API is likely to fail at runtime. Add a validation/guard so HTP requires a non-empty soc_details (or skip set_targets when it’s unset and rely on QAIRT defaults).
| gen_ai_builder.set_targets([config.soc_details]) | |
| if config.soc_details: | |
| gen_ai_builder.set_targets([config.soc_details]) |
| "log_level": PassConfigParam( | ||
| type_=str, | ||
| default_value=None, | ||
| description="Log level to be used within underlying QAIRT components." | ||
| "Valid values: DEBUG, INFO, WARN, ERROR.", | ||
| ), | ||
| "run_checker": PassConfigParam( | ||
| type_=bool, | ||
| default_value=False, | ||
| description="Runs the onnx checker on the model before it is encapsulated.", | ||
| ), |
There was a problem hiding this comment.
The log_level config option is defined but never used in this pass. Either wire it up (e.g., set the same QAIRT_LOG_LEVEL env var used in QairtGenAIBuilder) or remove it to avoid confusing users with a no-op parameter.
| genai_config["model"]["decoder"]["head_size"] = src_config.get("hidden_size", -1) // src_config.get( | ||
| "num_attention_heads", -1 | ||
| ) | ||
| genai_config["model"]["decoder"]["hidden_size"] = src_config.get("hidden_size", -1) |
There was a problem hiding this comment.
head_size is computed as hidden_size // num_attention_heads using default values of -1 when keys are missing. This can silently produce incorrect values (e.g., -1 // -1 == 1) or raise if num_attention_heads is 0/non-int. Mirror the safer logic used in olive/passes/openvino/ov_utils.py:create_genai_config by validating both values are positive ints before dividing; otherwise set head_size to -1.
| genai_config["model"]["decoder"]["head_size"] = src_config.get("hidden_size", -1) // src_config.get( | |
| "num_attention_heads", -1 | |
| ) | |
| genai_config["model"]["decoder"]["hidden_size"] = src_config.get("hidden_size", -1) | |
| hidden_size = src_config.get("hidden_size", -1) | |
| num_attention_heads = src_config.get("num_attention_heads", -1) | |
| head_size = -1 | |
| if isinstance(hidden_size, int) and isinstance(num_attention_heads, int) and hidden_size > 0 and num_attention_heads > 0: | |
| head_size = hidden_size // num_attention_heads | |
| genai_config["model"]["decoder"]["head_size"] = head_size | |
| genai_config["model"]["decoder"]["hidden_size"] = hidden_size |
| @property | ||
| def size_on_disk(self) -> int: | ||
| """Compute size of the model on disk.""" | ||
| return 0 |
There was a problem hiding this comment.
size_on_disk always returns 0, which will make the built-in SIZE_ON_DISK metric report incorrect results for QAIRT models. Implement actual size computation for local paths (sum file sizes under model_path, similar to Diffusers/HF handlers) or raise NotImplementedError if it’s intentionally unsupported.
| # Input/Ouptut metadata | ||
| container.inputs = [("input_ids", TensorProto.INT32, ["batch_size", "sequence_length"])] | ||
| container.outputs = [("logits", TensorProto.FLOAT, ["batch_size", 1, "vocab_size"])] | ||
|
|
||
| input_info = {inp[0]: (inp[1], inp[2]) for inp in container.inputs} | ||
|
|
||
| output_info = {out[0]: (out[1], out[2]) for out in container.outputs} | ||
|
|
||
| # Input/Output tensor helpers | ||
| inputs = [] | ||
| for name, datatype, shape in container.inputs: | ||
| inputs.append(helper.make_tensor_value_info(name, datatype, shape)) | ||
|
|
||
| outputs = [] | ||
| for name, datatype, shape in container.outputs: |
There was a problem hiding this comment.
The pass overwrites container.inputs/container.outputs with hard-coded metadata (input_ids/logits, fixed dtypes/shapes). If the loaded QAIRT container already exposes the correct I/O (or if models include additional inputs like attention_mask / KV cache), this will generate an ONNX wrapper with the wrong interface. Prefer reading the inputs/outputs from the container (and only normalizing/adjusting when necessary), rather than replacing them unconditionally.
| # Input/Ouptut metadata | |
| container.inputs = [("input_ids", TensorProto.INT32, ["batch_size", "sequence_length"])] | |
| container.outputs = [("logits", TensorProto.FLOAT, ["batch_size", 1, "vocab_size"])] | |
| input_info = {inp[0]: (inp[1], inp[2]) for inp in container.inputs} | |
| output_info = {out[0]: (out[1], out[2]) for out in container.outputs} | |
| # Input/Output tensor helpers | |
| inputs = [] | |
| for name, datatype, shape in container.inputs: | |
| inputs.append(helper.make_tensor_value_info(name, datatype, shape)) | |
| outputs = [] | |
| for name, datatype, shape in container.outputs: | |
| # Input/Output metadata | |
| # Prefer using the inputs/outputs exposed by the container, and only fall back to | |
| # default metadata when they are not available. | |
| input_metas = getattr(container, "inputs", None) or [ | |
| ("input_ids", TensorProto.INT32, ["batch_size", "sequence_length"]), | |
| ] | |
| output_metas = getattr(container, "outputs", None) or [ | |
| ("logits", TensorProto.FLOAT, ["batch_size", 1, "vocab_size"]), | |
| ] | |
| input_info = {inp[0]: (inp[1], inp[2]) for inp in input_metas} | |
| output_info = {out[0]: (out[1], out[2]) for out in output_metas} | |
| # Input/Output tensor helpers | |
| inputs = [] | |
| for name, datatype, shape in input_metas: | |
| inputs.append(helper.make_tensor_value_info(name, datatype, shape)) | |
| outputs = [] | |
| for name, datatype, shape in output_metas: |
| # Prepare configuration for the script | ||
| cache_dir_path = Path(config.cache_dir).resolve() | ||
| script_config = { | ||
| "ADASCALE_DIR": str(cache_dir_path / "adascale"), | ||
| "CACHE_DIR": str(cache_dir_path), | ||
| "OUTPUT_DIR": str(output_model_path), | ||
| } | ||
|
|
||
| # Merge user-provided config | ||
| if config.script_config: | ||
| script_config.update(config.script_config) | ||
|
|
There was a problem hiding this comment.
The JSON passed to the preparation script doesn’t include the input model path. The docstring says the user config is merged with input/output paths, but only OUTPUT_DIR/CACHE_DIR/ADASCALE_DIR are set, so an external script has no reliable way to locate the HuggingFace model to prepare. Add an explicit key (e.g., INPUT_DIR or MODEL_DIR) pointing to model.model_path (and keep naming consistent with the preparation script contract).
| with subprocess.Popen( | ||
| ["python", str(script_path), "--config", config_file_path], | ||
| cwd=str(script_path.parent), |
There was a problem hiding this comment.
The subprocess is invoked via the literal executable name "python", which can pick up a different interpreter than the one running Olive (e.g., in venv/conda). Use sys.executable (or equivalent) so the preparation script runs under the same Python environment and has access to the same installed dependencies.
| def test_preparation_temp_config_cleanup(tmp_path, mock_hf_model): | ||
| """Test that temporary config file is cleaned up.""" | ||
| script_path = tmp_path / "prep_script.py" | ||
| script_path.write_text("# Mock script") | ||
| output_path = tmp_path / "output" | ||
|
|
||
| mock_process = MagicMock() | ||
| mock_process.poll.side_effect = [None, 0] | ||
| mock_process.wait.return_value = 0 | ||
| # Ensure context manager returns the mock_process itself | ||
| mock_process.__enter__ = Mock(return_value=mock_process) | ||
| mock_process.__exit__ = Mock(return_value=False) | ||
| mock_process.stdout = MagicMock() | ||
|
|
||
| def stdout_generator(): | ||
| yield "Done\n" | ||
| while True: | ||
| yield "" | ||
|
|
||
| mock_process.stdout.readline = Mock(side_effect=stdout_generator()) | ||
| mock_process.stderr = MagicMock() | ||
|
|
||
| def stderr_generator(): | ||
| while True: | ||
| yield "" | ||
|
|
||
| mock_process.stderr.readline = Mock(side_effect=stderr_generator()) | ||
|
|
||
| with ( | ||
| patch("subprocess.Popen", return_value=mock_process), | ||
| patch("tempfile.NamedTemporaryFile") as mock_temp, | ||
| ): | ||
| temp_file_path = tmp_path / "olive_qairt_prep_test.json" | ||
|
|
||
| mock_file = MagicMock() | ||
| mock_file.name = str(temp_file_path) | ||
| mock_file.__enter__ = Mock(return_value=mock_file) | ||
| mock_file.__exit__ = Mock(return_value=False) | ||
| mock_temp.return_value = mock_file | ||
|
|
||
| prep_pass = create_pass_from_dict( | ||
| QairtPreparation, | ||
| {"script_path": str(script_path)}, | ||
| disable_search=True, | ||
| ) | ||
|
|
||
| prep_pass.run(mock_hf_model, str(output_path)) | ||
|
|
||
| # Verify temp file would be cleaned up (unlink called) | ||
| # Note: In actual implementation, cleanup happens in finally block |
There was a problem hiding this comment.
This test is incomplete: it patches NamedTemporaryFile but doesn’t assert any observable cleanup behavior (e.g., that Path(config_file_path).unlink() was invoked, or that the temp file no longer exists). As written it will pass even if the production code leaks temp config files. Add an assertion around the cleanup side effect to make the test meaningful.
| @@ -0,0 +1,391 @@ | |||
| # ------------------------------------------------------------------------- | |||
Check warning
Code scanning / lintrunner
RUFF/format Warning test
|
|
||
| def test_preparation_uses_sys_executable_and_env(tmp_path, mock_hf_model, mock_qairt_modules): | ||
| """Test that subprocess uses sys.executable and passes environment.""" | ||
| import os |
Check warning
Code scanning / lintrunner
PYLINT/W0611 Warning test
|
|
||
| def test_preparation_uses_sys_executable_and_env(tmp_path, mock_hf_model, mock_qairt_modules): | ||
| """Test that subprocess uses sys.executable and passes environment.""" | ||
| import os |
Check warning
Code scanning / lintrunner
RUFF/F401 Warning test
Describe your changes
Implements a complete QAIRT workflow for converting ONNX Runtime models to Genie-compatible format through three new passes:
QairtPreparation: Executes external preparation scripts to quantize and prepare HuggingFace models for QAIRT, with configurable caching and script parameters.
QairtGenAIBuilder: Converts prepared models using QAIRT GenAIBuilder API with support for:
QairtEncapsulation: Wraps QAIRT DLC models in ONNX protobuf format with EPContext nodes, generating genai_config.json for onnxruntime-genai compatibility.
This enables end-to-end optimization of generative AI models for Qualcomm hardware accelerators.
Checklist before requesting a review
lintrunner -a(Optional) Issue link