Skip to content

fix(jobs): align executor kind/provider with AIRCORE-397 execution model#335

Open
matthewgrossman wants to merge 20 commits into
mainfrom
AIRCORE-513-align-executorkind-with-original-plan/mgrossman
Open

fix(jobs): align executor kind/provider with AIRCORE-397 execution model#335
matthewgrossman wants to merge 20 commits into
mainfrom
AIRCORE-513-align-executorkind-with-original-plan/mgrossman

Conversation

@matthewgrossman

@matthewgrossman matthewgrossman commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Separates two concerns that were conflated in the executor model: what kind of compute a job needs (provider) vs how the job payload is shaped (kind). Then threads the resolved kind through the compilation pipeline so plugin compilers can emit the right executor shape without hardcoding profile names.

Why this change

Before this PR, the executor used provider as a discriminator with values cpu, gpu, gpu_distributed, and subprocess. This conflated two ideas:

  • Compute intent — CPU vs GPU vs multi-GPU
  • Payload shape — container (needs image + command) vs subprocess (needs just command)

subprocess is a payload shape, not a compute type. Having provider="subprocess" made it impossible to express "run this as a subprocess on GPU hardware." It also meant that CPU container jobs submitted against a subprocess backend required a workaround in the Jobs API: translate_cpu_container_steps_to_subprocess would rewrite CPUExecutionProvider steps into SubprocessExecutionProvider steps at request time, coupling the API layer to backend selection. Plugin compilers had to provide a container image even for subprocess execution, and the image was silently discarded.

What we decided

Following the design in AIRCORE-397 and AIRCORE-513:

  1. kind is the executor payload shape discriminator: "container" or "subprocess"
  2. provider is purely compute intent: "cpu", "gpu", "gpu_distributed"
  3. profile maps (provider, profile) to a backend: ("cpu", "default") → Docker/K8s, ("cpu", "subprocess") → SubprocessJobBackend
  4. The Jobs API no longer rewrites executor specs — plugins emit the correct executor shape directly, based on the resolved kind
  5. The framework resolves kind from the profile and passes it to compilers, so no plugin ever needs to query execution profiles itself

How it works

User submits job with --profile subprocess
         │
         ▼
Plugin route receives profile="subprocess"
         │
         ▼
_compile_platform_spec resolves kind:
  GET /v2/execution-profiles → ("cpu", "subprocess") → kind="subprocess"
         │
         ▼
compiler(workspace, ..., sdk, kind="subprocess", profile="subprocess")
         │
         ▼
Plugin compile() branches on kind:
  if kind == "subprocess": emit SubprocessExecutionProviderSpec
  else: emit ContainerExecutionProviderSpec
         │
         ▼
stamp_profile() sets profile on each step
         │
         ▼
Jobs service dispatches via BackendRegistry.get_backend(provider="cpu", profile="subprocess")
         │
         ▼
SubprocessJobBackend.schedule()

Why this is incremental (not the final state)

This PR is a stepping stone toward the compile_default() design from AIRCORE-397. The end-state moves compilation to the Jobs service backend itself — the backend knows its own kind and constructs the executor server-side. Plugins would post a PluginJobSpec (just domain payload + metadata) and never construct executors at all.

What this PR does is the prerequisite plumbing:

  • Core model: kind as discriminator, provider as compute intent (required for any future work)
  • Execution profiles: expose kind so the platform can resolve it (required for compile_default)
  • Compile pipeline: thread kind to compilers so they can emit the right shape today (bridge until compile_default exists)

The resolve_profile_kind() helper and the kind parameter on compile() are explicitly marked as bridges — when compile_default() lands, plugins stop constructing executors and this resolution moves server-side.

What changed

Core model (providers.py)

  • Deleted CPUExecutionProvider, GPUExecutionProvider, DistributedGPUExecutionProvider
  • Created ContainerExecutionProvider with kind="container", provider as Literal["cpu", "gpu", "gpu_distributed"]
  • Updated SubprocessExecutionProvider: providerLiteral["cpu", "gpu"], added kind="subprocess"
  • Changed discriminator from provider to kind
  • Added ExecutorKind = Literal["container", "subprocess"]

Execution profiles

  • Added kind: ExecutorKind to BaseExecutionProfile (defaults to "container")
  • SubprocessJobExecutionProfile overrides with kind="subprocess"
  • GET /v2/execution-profiles returns kind for each profile

Backend registration

  • Subprocess: ("subprocess", "default")("cpu", "subprocess")
  • Added BackendKey("gpu", "subprocess") for GPU subprocess
  • Removed translate_cpu_container_steps_to_subprocess and its call in create_job()

Compilation pipeline

  • _compile_platform_spec resolves kind from profile via resolve_profile_kind() before calling the compiler
  • Compiler signature: (workspace, ..., sdk, kind, profile) — two new positional args
  • NemoJob.compile() receives kind and profile from the framework
  • stamp_profile() moved from _adapt_compile into _compile_platform_spec
  • BaseJobRequest: added profile field for submit-envelope metadata

Plugin compilers

  • Data designer: uses kind to emit container or subprocess executor
  • All other compilers: signature updated, behavior unchanged (they always emit one shape)

SDK + OpenAPI

  • OpenAPI spec regenerated with new schemas
  • SDK regenerated — Executor union is Union[ContainerExecutionProviderParam, SubprocessExecutionProviderParam]
  • All SDK types carry kind

Verified end-to-end

Mode Profile Executor Result
Local subprocess subprocess kind=subprocess, provider=cpu DD sampler job completed
K8s minikube default kind=container, provider=cpu Raw job completed

Design lineage

Test plan

  • All jobs service unit tests pass
  • All plugin unit tests pass
  • All wheel builds + tests pass (py3.11-3.14)
  • Lint + type check clean
  • OpenAPI + SDK regenerated
  • Local subprocess: DD sampler job completed
  • K8s minikube: container job completed

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added support for job execution profiles, allowing users to specify execution backends when submitting jobs.
    • Introduced executor kind selection (container vs. subprocess) for flexible job execution configuration.
  • Refactor

    • Restructured job executor provider schemas to use kind-based discrimination instead of provider-based selection, consolidating CPU/GPU/distributed configurations into a unified container executor model.
    • Updated job configuration and backend registry mappings to align with the new executor schema.
  • Tests

    • Updated end-to-end and unit tests to reflect new executor configuration schema.

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
@github-actions github-actions Bot added the fix label Jun 15, 2026
@matthewgrossman matthewgrossman changed the title fix(jobs): Separate compute and executor payload shapes fix(jobs): align executor kind/provider with AIRCORE-397 execution model Jun 15, 2026
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor
Suite Lines Covered Line Rate Branch Rate
Unit Tests 19479/25863 75.3% 60.9%
Integration Tests 11370/24608 46.2% 20.2%

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
…-plan/mgrossman

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
Comment on lines +101 to +102
profile: str | None = None
options: dict | None = None

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need options; how is that relevant to this PR?

# execution profiles.

_CACHE_TTL_SECONDS = 300 # 5 minutes
_cache: dict[str, Any] = {"profiles": None, "fetched_at": 0.0}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use a typeddict here?

Comment thread packages/nmp_common/tests/api_factory/test_api_factory.py
matthewgrossman and others added 13 commits June 15, 2026 15:26
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
- Fix local.yaml configs: provider=subprocess → provider=cpu, profile=subprocess
- Remove unused `options` field from BaseJobRequest
- Type the execution profiles cache properly (no more dict[str, Any])
- Rename stale FactoryCPUExecutionProviderSpec alias in conftest

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
…iles

- Fix import sorting (I001) in test files
- Remove unused SubprocessExecutionProvider import (F401) in test_jobs_api
- Re-export FilesetMetadata from types.files (moved to types.shared by SDK regen)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
SDK type regeneration changed type signatures, making some existing
`# type: ignore` comments unnecessary. Add the correct ty rule name
to the CI ignore list alongside the existing unused-ignore-comment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
…n agent tests

- Add kind="container" to all 15 executor dicts in e2e/test_jobs.py
- Add type: ignore[invalid-key] for subprocess command assertions in agent tests
  (ty can't narrow the Executor union from dict key access)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
…tegration/search tests

- e2e/test_jobs.py: convert all executor blocks from container to subprocess
  format (these run host commands, not containers)
- integration tests: add kind="container" to executor dicts
- test_job_search.py: add kind="container" to all 25+ executor dicts
- test_jobs_api.py: add kind="container" to secrets test executor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
…-plan/mgrossman

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
The framework now resolves the executor kind ("container" or "subprocess")
from the submitter's profile before calling compile(). Compilers receive
both `kind` and `profile` as parameters, so they can emit the correct
executor shape without querying execution profiles themselves.

- _compile_platform_spec resolves kind via resolve_profile_kind() once
- Compiler signature: added kind and profile as last two positional args
- NemoJob.compile() receives kind and profile from the framework
- stamp_profile() moved from _adapt_compile into _compile_platform_spec
- Data designer compiler uses kind to support both container/subprocess
- All plugin compilers and test compilers updated for new signature

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
Merge main's try/finally secret cleanup pattern with our subprocess
executor format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
kind should never be None — container is the natural default. Updated
all compile() signatures and compiler functions to use
kind: str = "container" instead of kind: str | None = None.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
@matthewgrossman matthewgrossman marked this pull request as ready for review June 17, 2026 02:16
@matthewgrossman matthewgrossman requested review from a team as code owners June 17, 2026 02:16
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Replaces the CPU/GPU/DistributedGPU execution provider schema trio with a single ContainerExecutionProvider (kind=container, compute intent via provider) and updates SubprocessExecutionProvider to use kind=subprocess/provider=cpu. Switches the PlatformJobStepSpec.executor discriminator from provider to kind, removes server-side CPU→subprocess translation, adds client-side profile kind resolution, and propagates kind/profile through all plugin compile paths, backends, tests, and e2e tests.

Changes

Executor kind discriminator refactor

Layer / File(s) Summary
Core provider models and OpenAPI contracts
services/core/jobs/src/nmp/core/jobs/app/providers.py, services/core/jobs/src/nmp/core/jobs/app/schemas.py, openapi/openapi.yaml, openapi/ga/openapi.yaml, openapi/ga/individual/platform.openapi.yaml
Removes CPUExecutionProvider, GPUExecutionProvider, DistributedGPUExecutionProvider and introduces ContainerExecutionProvider (kind=container, provider enum). Updates SubprocessExecutionProvider to kind=subprocess. Rekeys discriminated union on kind. Adds kind to BaseExecutionProfile and to Docker/E2E/Kubernetes/Volcano profile schemas.
Subprocess backend profile and registry wiring
services/core/jobs/src/nmp/core/jobs/controllers/backends/subprocess.py, .../registry.py, .../config.py, packages/nmp_testing/src/nmp/testing/jobs.py, packages/nmp_platform/config/local.yaml, packages/nmp_platform_runner/src/nmp/platform_runner/config/local.yaml
Changes SubprocessJobExecutionProfile.provider to "cpu", adds kind="subprocess". Fixes registry keys from (subprocess,subprocess) to (cpu,subprocess) and (gpu,subprocess). Updates default profile generation and test executor detection logic.
Docker, Kubernetes, and Volcano backend type updates
services/core/jobs/src/nmp/core/jobs/controllers/backends/docker.py, .../kubernetes/kubernetes_job.py, .../kubernetes/volcano_job.py, .../test.py
Rebinds ProviderT to ContainerExecutionProvider. Updates schedule/configure_container signatures on CPU/GPU Docker, Kubernetes, and Volcano backends. Updates E2E and mock backend specialization generics.
Plugin factory: profile kind resolution and compile pipeline
packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/profiles.py, .../api_factory.py, .../routes.py, .../job.py, .../profile.py
Adds profiles.py with 5-minute cached resolve_profile_kind(). Adds profile to BaseJobRequest. Extends compiler callable signatures with (kind, profile). Updates _compile_platform_spec to resolve kind before invoking compiler. Moves stamp_profile responsibility from adapter to factory. Adds kind to NemoJob.compile.
Remove server-side CPU→subprocess translation
services/core/jobs/src/nmp/core/jobs/api/v2/jobs/endpoints.py
Deletes translate_cpu_container_steps_to_subprocess and configured_subprocess_translation_profiles. create_job now passes platform_spec directly without rewriting executors.
Plugin job compiler updates
plugins/nemo-anonymizer/..., plugins/nemo-evaluator/..., plugins/nemo-data-designer/..., plugins/nemo-safe-synthesizer/..., plugins/nemo-agents/..., services/automodel/..., services/unsloth/..., services/hello-world/..., services/core/models/...
All plugin compile methods accept kind parameter. Container-based jobs use ContainerExecutionProviderSpec(kind="container"). Subprocess-based agent jobs use SubprocessExecutionProviderSpec(kind="subprocess", provider="cpu"). nemo-data-designer CreateJob.compile conditionally selects executor by kind. Adds profile field to plugin OpenAPI request schemas.
Unit, integration, and e2e test updates
services/core/jobs/tests/*, packages/nemo_platform_plugin/tests/*, packages/nmp_common/tests/*, plugins/*/tests/*, e2e/test_jobs.py, tools/lint/lint-python-types.sh
Updates all fixtures and stubs to ContainerExecutionProvider/SubprocessExecutionProvider with new field values. Replaces translated-executor assertions with persisted-executor assertions. Adds kind="container" to integration test payloads. Switches e2e tests to subprocess executor form. Expands compiler stubs with kind/profile parameters.

Possibly related PRs

  • NVIDIA-NeMo/nemo-platform#58: Touches the same CPU→subprocess translation helpers in services/core/jobs/src/nmp/core/jobs/api/v2/jobs/endpoints.py that this PR removes entirely.
  • NVIDIA-NeMo/nemo-platform#66: Introduced the nemo-agents subprocess executor fields (provider="subprocess") that this PR renames to kind="subprocess", provider="cpu".
  • NVIDIA-NeMo/nemo-platform#179: Introduced e2e/test_jobs.py with subprocess-backed job tests that this PR migrates to the new executor payload shape.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch AIRCORE-513-align-executorkind-with-original-plan/mgrossman

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
services/core/jobs/src/nmp/core/jobs/api/v2/jobs/endpoints.py (1)

73-94: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate executor kind against the selected execution profile.

Current validation only checks provider/profile. After this refactor, that allows shape/profile mismatches (e.g., subprocess payload validated against container profile), which can reach the wrong backend and fail at runtime.

Proposed fix
 def validate_job_spec(
     job_spec: PlatformJobSpec,
     execution_profiles: list[ExecutionProfileT],
 ) -> None:
@@
         if not execution_profile:
             raise HTTPException(
                 status_code=status.HTTP_422_UNPROCESSABLE_CONTENT,
                 detail=f"The execution profile '{step.executor.provider}/{profile_name}' specified in step '{step.name}' does not exist.",
             )
+        if step.executor.kind != execution_profile.kind:
+            raise HTTPException(
+                status_code=status.HTTP_422_UNPROCESSABLE_CONTENT,
+                detail=(
+                    f"Executor kind '{step.executor.kind}' in step '{step.name}' "
+                    f"does not match execution profile kind '{execution_profile.kind}' "
+                    f"for '{step.executor.provider}/{profile_name}'."
+                ),
+            )

Also applies to: 123-123

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/core/jobs/src/nmp/core/jobs/api/v2/jobs/endpoints.py` around lines
73 - 94, The validate_job_spec function currently only validates that the
execution profile exists by checking provider and profile name, but does not
validate that the executor kind matches the execution profile. Add a validation
check after confirming the execution_profile exists to ensure that
step.executor.kind is compatible with the selected execution_profile. This check
should prevent mismatches (such as a subprocess executor validated against a
container profile) from passing validation and causing runtime failures on the
wrong backend.
packages/nmp_platform/config/local.yaml (1)

51-63: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale jobs.executors comment block.

This comment still describes removed CPU→subprocess translation and profile: default. It now conflicts with current behavior (cpu/subprocess profile and no endpoint rewrite).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nmp_platform/config/local.yaml` around lines 51 - 63, The multi-line
comment block describing the subprocess executor registration is outdated and
references removed functionality. Update this comment to accurately reflect the
current behavior: remove references to the old "cpu/default" plugin steps
translation and the removed `translate_cpu_container_steps_to_subprocess`
endpoint, and instead describe the current "cpu/subprocess" profile usage and
how it works without the endpoint rewrite. Ensure the updated comment clearly
explains why this configuration exists and how deployers can extend it with
additional profiles for Docker or Kubernetes setups.
services/core/jobs/tests/conftest.py (1)

483-501: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stop ignoring kind and profile in hello_world_job_config.

The function accepts kind/profile but always emits kind="container" and profile="default", so non-default inputs are silently dropped and compile-path propagation is not actually exercised.

Suggested fix
 def hello_world_job_config(
@@
     sdk,
     kind: str = "container",
     profile: str | None = None,
 ) -> FactoryPlatformJobSpec:
+    if kind != "container":
+        raise ValueError(f"hello_world_job_config only supports kind='container', got '{kind}'")
+
     return FactoryPlatformJobSpec(
         steps=[
             FactoryPlatformJobStep(
                 name="hello-world-step-1",
                 executor=FactoryContainerExecutionProviderSpec(
-                    kind="container",
+                    kind=kind,
                     provider="cpu",
-                    profile="default",
+                    profile=profile or "default",
                     container=FactoryContainerSpec(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/core/jobs/tests/conftest.py` around lines 483 - 501, The
hello_world_job_config function accepts kind and profile parameters but ignores
them by using hardcoded values (kind="container" and profile="default") when
constructing the FactoryContainerExecutionProviderSpec. Replace the hardcoded
kind="container" with the kind parameter passed to the function, and replace the
hardcoded profile="default" with the profile parameter (using a sensible default
like "default" if profile is None). This ensures the function actually uses the
provided parameters instead of silently dropping them.
plugins/nemo-anonymizer/src/nemo_anonymizer_plugin/jobs/run.py (1)

102-112: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

RunJob.compile is missing the new kind kwarg and will crash.

The route adapter now calls NemoJob.compile(..., kind=..., profile=...). This method does not accept kind, so requests can fail with TypeError.

Suggested fix
     async def compile(
         cls,
         *,
         workspace: str,
         spec: BaseModel,  # AnonymizerStepConfig
         entity_client: object,
         job_name: str | None,
         async_sdk: AsyncNeMoPlatform,
+        kind: str = "container",
         profile: str | None = None,
         options: dict | None = None,
     ) -> PlatformJobSpec:
+        if kind != "container":
+            raise PlatformJobCompilationError(f"Unsupported executor kind for RunJob: {kind}")
         return PlatformJobSpec(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-anonymizer/src/nemo_anonymizer_plugin/jobs/run.py` around lines
102 - 112, The `RunJob.compile` classmethod is missing the `kind` parameter that
the route adapter is now passing to it. Add `kind` as a new kwarg parameter to
the `compile` method signature (you can place it alongside the existing
`profile` parameter in the method definition), and ensure it accepts a string
value. This will prevent the TypeError that occurs when the route adapter
attempts to pass the `kind` argument to this method.
plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py (1)

172-183: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

kind is accepted but ignored, so non-container profiles compile incorrectly.

This method always calls compile_evaluate_job(..., profile=...), which emits a container executor. If the resolved kind is subprocess, the generated spec is inconsistent with the selected execution profile.

Suggested fix
 from nemo_platform_plugin.jobs.api_factory import PlatformJobSpec
+from nemo_platform_plugin.jobs.exceptions import PlatformJobCompilationError
@@
     ) -> PlatformJobSpec:
         """Compile canonical spec to a plugin-native evaluator job."""
-        del workspace, entity_client, job_name, async_sdk, options
+        del workspace, entity_client, job_name, async_sdk, options
+        if kind != "container":
+            raise PlatformJobCompilationError(f"Unsupported executor kind for EvaluateJob: {kind}")
         from nemo_evaluator.jobs.compiler import compile_evaluate_job
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py` around lines 172
- 183, The `kind` parameter is accepted in the method signature but never used
when calling `compile_evaluate_job`, causing non-container execution profiles to
be compiled incorrectly with a container executor regardless of the specified
kind. Pass the `kind` parameter to the `compile_evaluate_job` function call
alongside the `profile` parameter so that the function can generate the
appropriate executor type (subprocess or container) based on the requested kind
value.
packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/api_factory.py (1)

696-707: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Forward the route’s compute provider into kind resolution.

create_job leaves default_provider as "cpu", so GPU jobs resolve request.profile against CPU profiles. Expose the provider on the factory and have the NemoJob route adapter pass job_cls.execution_provider.

Proposed fix
 def job_route_factory(
     service_name: str,
     job_type: str,
     job_input: JobSchemaLike,
     platform_job_config_compiler: PlatformJobSpecCompiler[JobInputT, JobOutputT]
@@
     job_output: JobSchemaLike | None = None,
     input_to_output: InputToOutputTransformer | InputToOutputTransformerAsync | None = None,
     generate_job_name: JobNameGenerator | None = None,
+    default_provider: str = "cpu",
 ) -> APIRouter:
@@
         generate_job_name: Called when user doesn't provide a job name. Returns the
             auto-generated name to use.
+        default_provider: Compute provider used to resolve request.profile to executor kind.
@@
                 service_name,
                 sdk,
                 profile=request.profile,
+                default_provider=default_provider,
             )

Also applies to: 841-851

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/api_factory.py`
around lines 696 - 707, The job_route_factory function needs to accept the
compute provider as a parameter so that GPU jobs can properly resolve profiles
against GPU profiles instead of CPU profiles. Add a new parameter to the
job_route_factory function signature (such as execution_provider or
compute_provider) to accept the provider information, then update all callers of
job_route_factory (particularly the NemoJob route adapter mentioned in the
comment range 841-851) to pass job_cls.execution_provider when invoking the
factory. Finally, use this provider parameter in the kind resolution logic
instead of leaving default_provider as "cpu".
plugins/nemo-safe-synthesizer/src/nemo_safe_synthesizer_plugin/api/v2/jobs/endpoints.py (1)

55-67: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

kind/profile are accepted but not applied to emitted executors.

Line 111-Line 112 adds compiler inputs, but Line 67/Line 92 still force config-driven profile and Line 185 never forwards kind/profile into step construction. That makes request-level executor selection non-functional.

Suggested patch
-def _create_job_step(job_config: SafeSynthesizerJobConfig, environment: list[EnvironmentVariable]) -> PlatformJobStep:
+def _create_job_step(
+    job_config: SafeSynthesizerJobConfig,
+    environment: list[EnvironmentVariable],
+    *,
+    kind: str,
+    profile: str | None,
+) -> PlatformJobStep:
+    expected_kind = "subprocess" if config.job_mode == "subprocess-local" else "container"
+    if kind != expected_kind:
+        raise PlatformJobCompilationError(
+            f"Incompatible executor kind {kind!r} for job_mode {config.job_mode!r}"
+        )
+    resolved_profile = profile or config.job_executor_profile
+
     if config.job_mode == "subprocess-local":
         ...
         return PlatformJobStep(
             name="safe-synthesizer",
             executor=SubprocessExecutionProviderSpec(
                 kind="subprocess",
                 provider="cpu",
-                profile=config.job_executor_profile,
+                profile=resolved_profile,
                 command=command,
             ),
             ...
         )
     ...
     return PlatformJobStep(
         name="safe-synthesizer",
         executor=ContainerExecutionProviderSpec(
             kind="container",
             provider="gpu",
-            profile=config.job_executor_profile,
+            profile=resolved_profile,
             container=ContainerSpec(
                 image=get_qualified_image(config.container_image),
                 entrypoint=config.entrypoint,
             ),
             resources=resources,
         ),
         ...
     )

-        steps.append(_create_job_step(job_config=transformed_spec, environment=environment))
+        steps.append(
+            _create_job_step(
+                job_config=transformed_spec,
+                environment=environment,
+                kind=kind,
+                profile=profile,
+            )
+        )

Also applies to: 89-93, 111-113, 185-185

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@plugins/nemo-safe-synthesizer/src/nemo_safe_synthesizer_plugin/api/v2/jobs/endpoints.py`
around lines 55 - 67, The _create_job_step function is not using the
request-level executor configuration parameters (kind and profile) when
constructing the SubprocessExecutionProviderSpec. Currently it hardcodes kind as
"subprocess" and derives profile from the config object instead of using values
provided in the request. You need to modify _create_job_step to accept kind and
profile as parameters (or extract them from the job_config), and then use these
request-level values to populate the kind and profile fields when instantiating
the SubprocessExecutionProviderSpec in the return statement, ensuring that
caller code also passes these parameters when invoking _create_job_step.
services/core/jobs/src/nmp/core/jobs/app/providers.py (1)

134-138: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use Self instead of string-based forward reference.

schemas.py already uses Self for the same pattern. Import it and replace the string annotation.

-from typing import Annotated, Literal, Union
+from typing import Annotated, Literal, Self, Union
     `@model_validator`(mode="after")
-    def validate_command(self) -> "SubprocessExecutionProvider":
+    def validate_command(self) -> Self:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/core/jobs/src/nmp/core/jobs/app/providers.py` around lines 134 -
138, The validate_command method in the SubprocessExecutionProvider class uses a
string-based forward reference ("SubprocessExecutionProvider") in its return
type annotation instead of the Self type that is already used elsewhere in
schemas.py. Import Self from typing_extensions or typing at the top of the file,
then replace the string annotation "SubprocessExecutionProvider" with Self in
the return type of the validate_command method to align with the codebase
pattern.

Source: Coding guidelines

🧹 Nitpick comments (3)
plugins/nemo-agents/tests/unit/test_improvement_jobs.py (1)

89-90: ⚡ Quick win

Also assert executor kind in these subprocess compile tests.

These cases now validate provider, but they should also validate the discriminator (kind="subprocess") to protect the new kind/provider split.

Proposed assertion additions
-    assert step["executor"]["provider"] == "cpu"
+    assert step["executor"]["kind"] == "subprocess"
+    assert step["executor"]["provider"] == "cpu"
     executor = step["executor"]
+    assert executor.get("kind") == "subprocess"
     assert executor.get("provider") == "cpu"

Also applies to: 202-203, 282-283, 331-332

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/tests/unit/test_improvement_jobs.py` around lines 89 -
90, The subprocess compile tests in test_improvement_jobs.py are missing
assertions to validate the executor kind field. Add assertions that verify
step["executor"]["kind"] == "subprocess" in all subprocess-related test cases
(at lines 89-90, 202-203, 282-283, and 331-332) to ensure the discriminator is
properly set after the kind/provider split, protecting against regressions in
the executor configuration structure.
packages/nemo_platform_plugin/tests/test_jobs_routes.py (1)

279-285: ⚡ Quick win

Assert adapter forwards kind and profile.

This test only checks return type, so a regression in _adapt_compile argument forwarding can pass.

Proposed test hardening
 `@pytest.mark.asyncio`
 async def test_compile_adapter_invokes_nemo_compile() -> None:
-    adapter = _adapt_compile(_WidgetJob, default_profile="research")
+    seen: dict[str, object] = {}
+
+    class CaptureJob(_WidgetJob):
+        `@classmethod`
+        async def compile(cls, **kwargs):
+            seen.update(kwargs)
+            return _FakePlatformSpec(steps=[_FakeStep(profile=None)])
+
+    adapter = _adapt_compile(CaptureJob, default_profile="research")
     spec = _WidgetSpec(name="w")
     # Adapter receives kind and profile from _compile_platform_spec.
     # Profile stamping is now done by _compile_platform_spec, not the adapter.
     platform_spec = await adapter("ws", spec, spec, "entity_client", "job-1", "sdk", "container", "research")
 
     assert isinstance(platform_spec, _FakePlatformSpec)
+    assert seen["kind"] == "container"
+    assert seen["profile"] == "research"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_plugin/tests/test_jobs_routes.py` around lines 279 -
285, The test_compile_adapter_invokes_nemo_compile test only verifies the return
type of the adapter but does not validate that the _adapt_compile adapter
correctly forwards the kind and profile arguments to the underlying compile
function. Add assertions that verify the adapter properly forwards both the kind
argument ("ws") and the profile argument ("research") to the compile function
being wrapped, either by mocking the underlying compile function and checking
the calls it receives, or by other verification means that ensure a regression
in argument forwarding would be caught by this test.
packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/routes.py (1)

50-75: ⚡ Quick win

Replace TYPE_CHECKING/string annotations with concrete runtime types.

This adapter path was touched and still uses deferred string hints plus TYPE_CHECKING imports for runtime-available types; also kind is typed as broad str instead of a concrete executor-kind type.

Suggested fix
-from typing import TYPE_CHECKING, Any
+from collections.abc import Callable
+from typing import Any
@@
-from nemo_platform_plugin.job import job_collection_path_for
+from nemo_platform_plugin.job import NemoJob, job_collection_path_for
@@
-if TYPE_CHECKING:
-    from collections.abc import Callable
-
-    from nemo_platform_plugin.job import NemoJob
@@
-def add_job_routes(
-    job_cls: type["NemoJob"],
+def add_job_routes(
+    job_cls: type[NemoJob],
@@
-    generate_job_name: "Callable[..., str] | None" = None,
+    generate_job_name: Callable[..., str] | None = None,
@@
-def _adapt_compile(
-    job_cls: type["NemoJob"],
+def _adapt_compile(
+    job_cls: type[NemoJob],
@@
-) -> "Callable[..., Any]":
+) -> Callable[..., Any]:
@@
-        kind: str = "container",
+        kind: ExecutorKind = "container",

As per coding guidelines, **/*.py requires concrete type hints (not string-based) and avoiding TYPE_CHECKING type imports when possible, and packages/nemo_platform_plugin/**/*.py repeats the same rule for runtime-available same-package types.

Also applies to: 253-257, 279-281

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/routes.py` around
lines 50 - 75, The code uses TYPE_CHECKING with string annotations for
runtime-available types, which violates the coding guidelines requiring concrete
type hints. Remove the TYPE_CHECKING import block entirely and import Callable
directly from collections.abc at the top of the file alongside other imports.
Import NemoJob directly from nemo_platform_plugin.job instead of using it as a
string annotation. Replace all string-quoted type hints in the add_job_routes
function signature (such as "NemoJob", "Callable[..., str] | None", and "str")
with concrete runtime type references by removing the quotes. Additionally,
replace any broad str type hints with a concrete executor-kind type instead.
Apply these same changes to the other affected locations mentioned at lines
253-257 and 279-281.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e/test_jobs.py`:
- Around line 3-4: The tests in test_jobs.py that use
SubprocessExecutionProviderSpec depend on the cpu/subprocess execution profile
which does not exist in the CI environment, causing create-job operations to
fail with a 422 error. Fix this by adding a preflight check before the
subprocess-dependent tests (including those at lines 57-60) that verifies the
cpu/subprocess profile is available in the execution environment, and skip those
tests with an appropriate skip marker if the profile is absent. Alternatively,
provision the cpu/subprocess profile in the e2e runtime configuration so that it
is available when tests execute.

In `@openapi/ga/individual/platform.openapi.yaml`:
- Around line 9557-9565: The kind enum field for container backend profiles
currently allows both container and subprocess options, but since Docker,
Kubernetes, and Volcano backends are container-only in this refactor, the enum
should only permit the container value. Remove subprocess from the enum list in
the kind field definition to prevent subprocess executors from being routed to
container backends. Apply this same change at all locations mentioned in the
comment (lines 12106-12114 and 18062-18070).
- Around line 16974-16989: The schema currently constrains the provider field to
only "cpu" using a const constraint, but the registry supports both cpu and gpu
providers with subprocess kind. Remove the const constraint on the provider
field and replace it with an enum that includes both "cpu" and "gpu" values to
allow the schema to properly represent all supported provider/subprocess
combinations. Additionally, consider whether the profile default value of
"default" is appropriate for all provider configurations or if it should be made
more flexible.

In `@openapi/ga/openapi.yaml`:
- Around line 9557-9565: The `kind` field in `DockerJobExecutionProfile`,
`KubernetesJobExecutionProfile`, and `VolcanoJobExecutionProfile` currently
allows both `container` and `subprocess` as enum options, but these backends
only support `ContainerExecutionProvider` which requires `container`. Replace
the `enum: [container, subprocess]` and `default: container` with `const:
container` in all three profile definitions (at lines 9557-9565, 12106-12114,
and 18062-18070) to constrain these backends to only accept the container
executor shape they can handle.
- Around line 16974-16978: The SubprocessExecutionProvider provider field
currently uses const: cpu which restricts the value to only CPU, but
SubprocessExecutionProvider should allow both cpu and gpu values to support GPU
subprocess execution profiles. Replace the const: cpu constraint in the
SubprocessExecutionProvider schema with an enum that includes both cpu and gpu
values to allow both provider types.

In `@openapi/openapi.yaml`:
- Around line 9557-9565: The kind field in DockerJobExecutionProfile,
KubernetesJobExecutionProfile, and VolcanoJobExecutionProfile currently allows
both "container" and "subprocess" options, but these container-based backends
should only support the "container" executor type. Remove "subprocess" from the
enum list in the kind field definition for all three profiles (found at the
locations specified in the comment), leaving only "container" as a valid option
while keeping it as the default value.
- Around line 16946-16952: The SubprocessJobExecutionProfile schema's provider
field enum is missing the gpu option, which means the /execution-profiles
endpoint cannot represent GPU subprocess routes even though
SubprocessExecutionProvider supports it. Add gpu to the enum list for the
provider field in the SubprocessJobExecutionProfile schema (around the provider
type definition that currently only lists cpu), and apply the same fix to the
other provider enum definition mentioned at lines 16974-16978 to ensure both
schemas are consistent and support gpu as a valid provider option.
- Around line 8689-8717: The `kind` field is used as a discriminator for the
executor variants but is not included in the required fields for the schema
definitions. This allows ambiguous payloads that could match multiple oneOf
branches without an explicit kind value. Add `kind` to the `required` array in
the ContainerExecutionProviderInput, ContainerExecutionProviderOutput, and
SubprocessExecutionProvider schema definitions in the OpenAPI specification to
ensure the discriminator field is always present and prevents ambiguous matches.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/api_factory.py`:
- Around line 655-662: The exception handling block that catches
PlatformJobCompilationError when resolve_profile_kind fails is silently falling
back to "container" mode, which causes incorrect payload shapes to be sent
downstream when a user explicitly selects a profile that cannot be resolved.
Instead of catching and swallowing this error, allow the
PlatformJobCompilationError to propagate through the exception handler so it
properly returns a 422 error to the client. Alternatively, restrict the fallback
to "container" to only apply when using legacy default-only behavior (when no
explicit profile was selected), and let errors from explicitly selected profiles
raise the compilation error.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/profiles.py`:
- Around line 66-69: The call to sdk.jobs.list_execution_profiles() can raise
exceptions that are not caught by the fallback error handler in
api_factory._compile_platform_spec, causing submission to fail instead of
falling back safely. Wrap the sdk.jobs.list_execution_profiles() call in a
try-except block and catch any exceptions that occur, then re-raise them as
PlatformJobCompilationError so they are properly handled by the fallback
mechanism in the caller.

In `@packages/nemo_platform_plugin/tests/test_jobs_filter.py`:
- Around line 45-46: The _fake_compiler function signature is too long and
violates code formatting rules. Reformat the function definition by breaking the
parameters across multiple lines, grouping related parameters together or
placing each parameter on its own line to comply with the style guidelines and
allow the lint CI to pass.

In `@packages/nmp_common/tests/api_factory/test_api_factory.py`:
- Line 1700: The function signatures for the compiler function and related stub
functions (at lines 1712, 1723, and 1738-1740) are exceeding the maximum line
length allowed by the style formatter, causing lint failures. Reformat these
function definitions by breaking the long parameter lists across multiple lines,
with each parameter or logical group of parameters on its own line, ensuring
proper indentation alignment. This applies to the compiler function definition
and all other similar stub function signatures mentioned in the comment.

In `@packages/nmp_testing/src/nmp/testing/jobs.py`:
- Around line 33-37: The predicate in the any() check for detecting an existing
subprocess profile is incomplete and may match GPU subprocess executors when
looking for CPU ones. Add an additional condition to verify that
getattr(executor, "provider", None) == "cpu" alongside the existing backend and
profile checks within the any() function. This ensures that
SubprocessJobExecutionProfile insertion is only skipped when a CPU subprocess
executor with the matching profile already exists, not when a GPU subprocess
executor exists.

In `@plugins/nemo-agents/src/nemo_agents_plugin/jobs/analyze_batch.py`:
- Around line 67-69: The compile path accepts kind and profile parameters at
lines 67-69 but does not use them when creating the executor at lines 110-114,
causing request-level profile selection to be lost and invalid kind values to be
silently ignored. Update the executor instantiation to pass the profile
parameter to maintain the request-level profile selection, and add validation
logic to reject invalid kind values before the executor is created. Ensure both
parameters are properly propagated from the function signature to the executor
creation code.

In `@plugins/nemo-agents/src/nemo_agents_plugin/jobs/evaluate_agent.py`:
- Around line 161-162: The function parameter kind is accepted at line 161 but
ignored when creating the executor in lines 185-188, where subprocess executor
is always used regardless of the kind value. Add validation early in the
function to check if the kind parameter matches what is actually supported
(subprocess), and raise an appropriate error if a different kind like
"container" is provided, preventing a mismatch between the requested execution
kind and the compiled output that could cause failures in backend routing and
validation.

In `@plugins/nemo-agents/src/nemo_agents_plugin/jobs/evaluate_suite.py`:
- Around line 140-141: The function accepts a kind parameter but always compiles
the step as subprocess regardless of the kind value provided, causing runtime
failures for invalid kind values. Add an explicit validation guard during the
compilation phase (around lines 205-207) that ensures the kind parameter is
either not provided or explicitly matches "subprocess". If an invalid kind value
is passed that doesn't align with the subprocess compilation behavior, raise an
error immediately during compilation rather than at execution time.

In `@plugins/nemo-agents/src/nemo_agents_plugin/jobs/optimize_agent.py`:
- Around line 141-142: The `kind` parameter in the function signature can be set
to "container" but the code at lines 168-170 always executes as subprocess,
creating a mismatch between the parameter value and actual behavior. Add a guard
clause at the beginning of the function that validates `kind == "subprocess"`,
raising an appropriate error or exception if this condition is not met. This
compile-time check prevents invalid executor and profile combinations where the
caller expects container execution but subprocess is always used instead.

In `@plugins/nemo-agents/src/nemo_agents_plugin/jobs/optimize_skills.py`:
- Around line 90-91: The function accepts a kind parameter with a default value
of "container" on line 90, but the implementation at lines 151-153 hardcodes
"subprocess" usage, creating a mismatch. Add early validation in the compile
function to check that kind equals "subprocess" and raise a compile error (4xx)
if it does not, ensuring profile configuration mistakes are caught immediately
rather than causing runtime issues downstream.

In `@services/hello-world/src/nmp/hello_world/api/v2/jobs/endpoints.py`:
- Around line 27-29: The function accepts kind and profile parameters but
ignores them, instead hardcoding kind="container" and profile="default" when
creating the PlatformJobSpec (around lines 46-50). Replace the hardcoded values
with the actual kind and profile parameters that were passed into the function
to ensure proper profile routing is respected.

---

Outside diff comments:
In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/api_factory.py`:
- Around line 696-707: The job_route_factory function needs to accept the
compute provider as a parameter so that GPU jobs can properly resolve profiles
against GPU profiles instead of CPU profiles. Add a new parameter to the
job_route_factory function signature (such as execution_provider or
compute_provider) to accept the provider information, then update all callers of
job_route_factory (particularly the NemoJob route adapter mentioned in the
comment range 841-851) to pass job_cls.execution_provider when invoking the
factory. Finally, use this provider parameter in the kind resolution logic
instead of leaving default_provider as "cpu".

In `@packages/nmp_platform/config/local.yaml`:
- Around line 51-63: The multi-line comment block describing the subprocess
executor registration is outdated and references removed functionality. Update
this comment to accurately reflect the current behavior: remove references to
the old "cpu/default" plugin steps translation and the removed
`translate_cpu_container_steps_to_subprocess` endpoint, and instead describe the
current "cpu/subprocess" profile usage and how it works without the endpoint
rewrite. Ensure the updated comment clearly explains why this configuration
exists and how deployers can extend it with additional profiles for Docker or
Kubernetes setups.

In `@plugins/nemo-anonymizer/src/nemo_anonymizer_plugin/jobs/run.py`:
- Around line 102-112: The `RunJob.compile` classmethod is missing the `kind`
parameter that the route adapter is now passing to it. Add `kind` as a new kwarg
parameter to the `compile` method signature (you can place it alongside the
existing `profile` parameter in the method definition), and ensure it accepts a
string value. This will prevent the TypeError that occurs when the route adapter
attempts to pass the `kind` argument to this method.

In `@plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py`:
- Around line 172-183: The `kind` parameter is accepted in the method signature
but never used when calling `compile_evaluate_job`, causing non-container
execution profiles to be compiled incorrectly with a container executor
regardless of the specified kind. Pass the `kind` parameter to the
`compile_evaluate_job` function call alongside the `profile` parameter so that
the function can generate the appropriate executor type (subprocess or
container) based on the requested kind value.

In
`@plugins/nemo-safe-synthesizer/src/nemo_safe_synthesizer_plugin/api/v2/jobs/endpoints.py`:
- Around line 55-67: The _create_job_step function is not using the
request-level executor configuration parameters (kind and profile) when
constructing the SubprocessExecutionProviderSpec. Currently it hardcodes kind as
"subprocess" and derives profile from the config object instead of using values
provided in the request. You need to modify _create_job_step to accept kind and
profile as parameters (or extract them from the job_config), and then use these
request-level values to populate the kind and profile fields when instantiating
the SubprocessExecutionProviderSpec in the return statement, ensuring that
caller code also passes these parameters when invoking _create_job_step.

In `@services/core/jobs/src/nmp/core/jobs/api/v2/jobs/endpoints.py`:
- Around line 73-94: The validate_job_spec function currently only validates
that the execution profile exists by checking provider and profile name, but
does not validate that the executor kind matches the execution profile. Add a
validation check after confirming the execution_profile exists to ensure that
step.executor.kind is compatible with the selected execution_profile. This check
should prevent mismatches (such as a subprocess executor validated against a
container profile) from passing validation and causing runtime failures on the
wrong backend.

In `@services/core/jobs/src/nmp/core/jobs/app/providers.py`:
- Around line 134-138: The validate_command method in the
SubprocessExecutionProvider class uses a string-based forward reference
("SubprocessExecutionProvider") in its return type annotation instead of the
Self type that is already used elsewhere in schemas.py. Import Self from
typing_extensions or typing at the top of the file, then replace the string
annotation "SubprocessExecutionProvider" with Self in the return type of the
validate_command method to align with the codebase pattern.

In `@services/core/jobs/tests/conftest.py`:
- Around line 483-501: The hello_world_job_config function accepts kind and
profile parameters but ignores them by using hardcoded values (kind="container"
and profile="default") when constructing the
FactoryContainerExecutionProviderSpec. Replace the hardcoded kind="container"
with the kind parameter passed to the function, and replace the hardcoded
profile="default" with the profile parameter (using a sensible default like
"default" if profile is None). This ensures the function actually uses the
provided parameters instead of silently dropping them.

---

Nitpick comments:
In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/routes.py`:
- Around line 50-75: The code uses TYPE_CHECKING with string annotations for
runtime-available types, which violates the coding guidelines requiring concrete
type hints. Remove the TYPE_CHECKING import block entirely and import Callable
directly from collections.abc at the top of the file alongside other imports.
Import NemoJob directly from nemo_platform_plugin.job instead of using it as a
string annotation. Replace all string-quoted type hints in the add_job_routes
function signature (such as "NemoJob", "Callable[..., str] | None", and "str")
with concrete runtime type references by removing the quotes. Additionally,
replace any broad str type hints with a concrete executor-kind type instead.
Apply these same changes to the other affected locations mentioned at lines
253-257 and 279-281.

In `@packages/nemo_platform_plugin/tests/test_jobs_routes.py`:
- Around line 279-285: The test_compile_adapter_invokes_nemo_compile test only
verifies the return type of the adapter but does not validate that the
_adapt_compile adapter correctly forwards the kind and profile arguments to the
underlying compile function. Add assertions that verify the adapter properly
forwards both the kind argument ("ws") and the profile argument ("research") to
the compile function being wrapped, either by mocking the underlying compile
function and checking the calls it receives, or by other verification means that
ensure a regression in argument forwarding would be caught by this test.

In `@plugins/nemo-agents/tests/unit/test_improvement_jobs.py`:
- Around line 89-90: The subprocess compile tests in test_improvement_jobs.py
are missing assertions to validate the executor kind field. Add assertions that
verify step["executor"]["kind"] == "subprocess" in all subprocess-related test
cases (at lines 89-90, 202-203, 282-283, and 331-332) to ensure the
discriminator is properly set after the kind/provider split, protecting against
regressions in the executor configuration structure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 151b02c2-83c7-4b5c-bf8f-c3e7989ea468

📥 Commits

Reviewing files that changed from the base of the PR and between c17fd7b and 85c29de.

⛔ Files ignored due to path filters (31)
  • sdk/python/nemo-platform/.nmpcontext/openapi.yaml is excluded by !sdk/**
  • sdk/python/nemo-platform/.nmpcontext/stainless.yaml is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/resources/files/api.md is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/resources/files/filesets.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/resources/jobs/api.md is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/files/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/files/fileset.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/files/fileset_create_params.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/files/fileset_metadata_param.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/container_execution_provider.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/container_execution_provider_param.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/cpu_execution_provider.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/cpu_execution_provider_param.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/distributed_gpu_execution_provider.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/docker_job_execution_profile.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/e2e_job_execution_profile.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/gpu_execution_provider_param.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/kubernetes_job_execution_profile.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/platform_job_step_spec.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/platform_job_step_spec_param.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/subprocess_execution_provider.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/subprocess_execution_provider_param.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/subprocess_job_execution_profile.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/jobs/volcano_job_execution_profile.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/shared/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/shared/fileset_metadata.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/shared_params/fileset_metadata_param.py is excluded by !sdk/**
  • sdk/python/nemo-platform/tests/api_resources/test_jobs.py is excluded by !sdk/**
  • sdk/stainless.yaml is excluded by !sdk/**
📒 Files selected for processing (63)
  • e2e/test_jobs.py
  • openapi/ga/individual/platform.openapi.yaml
  • openapi/ga/openapi.yaml
  • openapi/openapi.yaml
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/job.py
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/api_factory.py
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/profile.py
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/profiles.py
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/routes.py
  • packages/nemo_platform_plugin/tests/test_jobs_filter.py
  • packages/nemo_platform_plugin/tests/test_jobs_routes.py
  • packages/nmp_common/tests/api_factory/test_api_factory.py
  • packages/nmp_common/tests/jobs/test_docker.py
  • packages/nmp_platform/config/local.yaml
  • packages/nmp_platform_runner/src/nmp/platform_runner/config/local.yaml
  • packages/nmp_testing/src/nmp/testing/jobs.py
  • plugins/nemo-agents/openapi/openapi.yaml
  • plugins/nemo-agents/src/nemo_agents_plugin/jobs/analyze_batch.py
  • plugins/nemo-agents/src/nemo_agents_plugin/jobs/evaluate_agent.py
  • plugins/nemo-agents/src/nemo_agents_plugin/jobs/evaluate_suite.py
  • plugins/nemo-agents/src/nemo_agents_plugin/jobs/optimize_agent.py
  • plugins/nemo-agents/src/nemo_agents_plugin/jobs/optimize_skills.py
  • plugins/nemo-agents/tests/unit/test_evaluate_agent_job.py
  • plugins/nemo-agents/tests/unit/test_improvement_jobs.py
  • plugins/nemo-anonymizer/src/nemo_anonymizer_plugin/jobs/run.py
  • plugins/nemo-data-designer/openapi/openapi.yaml
  • plugins/nemo-data-designer/src/nemo_data_designer_plugin/jobs/create.py
  • plugins/nemo-evaluator/openapi/openapi.yaml
  • plugins/nemo-evaluator/src/nemo_evaluator/jobs/compiler.py
  • plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py
  • plugins/nemo-safe-synthesizer/openapi/openapi.yaml
  • plugins/nemo-safe-synthesizer/src/nemo_safe_synthesizer_plugin/api/v2/jobs/endpoints.py
  • plugins/nemo-safe-synthesizer/tests/unit/test_jobs.py
  • services/automodel/src/nmp/automodel/app/jobs/compiler.py
  • services/automodel/src/nmp/automodel/app/jobs/training/compiler.py
  • services/core/jobs/src/nmp/core/jobs/api/v2/jobs/endpoints.py
  • services/core/jobs/src/nmp/core/jobs/app/providers.py
  • services/core/jobs/src/nmp/core/jobs/app/schemas.py
  • services/core/jobs/src/nmp/core/jobs/app/test_helpers.py
  • services/core/jobs/src/nmp/core/jobs/controllers/backends/config.py
  • services/core/jobs/src/nmp/core/jobs/controllers/backends/docker.py
  • services/core/jobs/src/nmp/core/jobs/controllers/backends/kubernetes/kubernetes_job.py
  • services/core/jobs/src/nmp/core/jobs/controllers/backends/kubernetes/volcano_job.py
  • services/core/jobs/src/nmp/core/jobs/controllers/backends/registry.py
  • services/core/jobs/src/nmp/core/jobs/controllers/backends/subprocess.py
  • services/core/jobs/src/nmp/core/jobs/controllers/backends/test.py
  • services/core/jobs/tests/conftest.py
  • services/core/jobs/tests/controllers/test_base.py
  • services/core/jobs/tests/controllers/test_docker_backend.py
  • services/core/jobs/tests/controllers/test_kubernetes_backend.py
  • services/core/jobs/tests/controllers/test_subprocess_backend.py
  • services/core/jobs/tests/controllers/test_volcano_backend.py
  • services/core/jobs/tests/integration/test_jobs_auth_propagation.py
  • services/core/jobs/tests/integration/test_jobs_secrets_access.py
  • services/core/jobs/tests/test_config.py
  • services/core/jobs/tests/test_job_search.py
  • services/core/jobs/tests/test_jobs_api.py
  • services/core/jobs/tests/test_jobs_endpoint_translation.py
  • services/core/models/src/nmp/core/models/api/v2/models.py
  • services/hello-world/src/nmp/hello_world/api/v2/jobs/endpoints.py
  • services/unsloth/src/nmp/unsloth/app/jobs/compiler.py
  • services/unsloth/src/nmp/unsloth/app/jobs/training/compiler.py
  • tools/lint/lint-python-types.sh

Comment thread e2e/test_jobs.py
Comment on lines +3 to +4
These tests submit jobs with SubprocessExecutionProviderSpec (host command).
The e2e test environment runs against the subprocess backend.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Subprocess-profile precondition is unmet in CI, so these tests fail at create-job time.

This suite now hard-depends on cpu/subprocess, but CI currently returns 422 (The execution profile 'cpu/subprocess' ... does not exist). Provision that profile in the e2e runtime config, or preflight-skip subprocess-only tests when it is absent.

Also applies to: 57-60

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e/test_jobs.py` around lines 3 - 4, The tests in test_jobs.py that use
SubprocessExecutionProviderSpec depend on the cpu/subprocess execution profile
which does not exist in the CI environment, causing create-job operations to
fail with a 422 error. Fix this by adding a preflight check before the
subprocess-dependent tests (including those at lines 57-60) that verifies the
cpu/subprocess profile is available in the execution environment, and skip those
tests with an appropriate skip marker if the profile is absent. Alternatively,
provision the cpu/subprocess profile in the e2e runtime configuration so that it
is available when tests execute.

Source: Pipeline failures

Comment on lines +9557 to +9565
kind:
type: string
enum:
- container
- subprocess
title: Kind
description: 'The executor payload shape this profile expects: ''container''
or ''subprocess''.'
default: container

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constrain container backend profiles to kind: container.

Docker, Kubernetes, and Volcano backends are container-only in this refactor; allowing subprocess here can route a subprocess executor to a container backend.

Proposed OpenAPI shape
 kind:
   type: string
-  enum:
-  - container
-  - subprocess
+  const: container
   title: Kind
-  description: 'The executor payload shape this profile expects: ''container''
-    or ''subprocess''.'
+  description: The executor payload shape this profile expects.
   default: container

Also applies to: 12106-12114, 18062-18070

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openapi/ga/individual/platform.openapi.yaml` around lines 9557 - 9565, The
kind enum field for container backend profiles currently allows both container
and subprocess options, but since Docker, Kubernetes, and Volcano backends are
container-only in this refactor, the enum should only permit the container
value. Remove subprocess from the enum list in the kind field definition to
prevent subprocess executors from being routed to container backends. Apply this
same change at all locations mentioned in the comment (lines 12106-12114 and
18062-18070).

Comment on lines 16974 to +16989
provider:
type: string
const: subprocess
const: cpu
title: Provider
default: subprocess
default: cpu
profile:
type: string
title: Profile
description: The profile name for the executor, e.g., high_priority_a100,
low_priority, etc.
default: default
kind:
type: string
const: subprocess
title: Kind
default: subprocess

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep subprocess profile keys representable.

The registry supports (cpu, subprocess) and (gpu, subprocess), but this schema fixes provider to cpu and defaults profile to default.

Proposed fix
 provider:
   type: string
-  const: cpu
+  enum:
+  - cpu
+  - gpu
   title: Provider
   default: cpu
 profile:
   type: string
   title: Profile
   description: The profile name for the executor, e.g., high_priority_a100,
     low_priority, etc.
-  default: default
+  default: subprocess
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openapi/ga/individual/platform.openapi.yaml` around lines 16974 - 16989, The
schema currently constrains the provider field to only "cpu" using a const
constraint, but the registry supports both cpu and gpu providers with subprocess
kind. Remove the const constraint on the provider field and replace it with an
enum that includes both "cpu" and "gpu" values to allow the schema to properly
represent all supported provider/subprocess combinations. Additionally, consider
whether the profile default value of "default" is appropriate for all provider
configurations or if it should be made more flexible.

Comment thread openapi/ga/openapi.yaml
Comment on lines +9557 to +9565
kind:
type: string
enum:
- container
- subprocess
title: Kind
description: 'The executor payload shape this profile expects: ''container''
or ''subprocess''.'
default: container

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constrain container backend profiles to kind: container.

Docker, Kubernetes, and Volcano backends accept ContainerExecutionProvider; allowing subprocess here lets /execution-profiles drive compilers to emit an executor shape these backends cannot handle.

Proposed fix
         kind:
           type: string
-          enum:
-          - container
-          - subprocess
+          const: container
           title: Kind
           description: 'The executor payload shape this profile expects: ''container''
             or ''subprocess''.'
           default: container

Apply the same const: container change to DockerJobExecutionProfile, KubernetesJobExecutionProfile, and VolcanoJobExecutionProfile.

Also applies to: 12106-12114, 18062-18070

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openapi/ga/openapi.yaml` around lines 9557 - 9565, The `kind` field in
`DockerJobExecutionProfile`, `KubernetesJobExecutionProfile`, and
`VolcanoJobExecutionProfile` currently allows both `container` and `subprocess`
as enum options, but these backends only support `ContainerExecutionProvider`
which requires `container`. Replace the `enum: [container, subprocess]` and
`default: container` with `const: container` in all three profile definitions
(at lines 9557-9565, 12106-12114, and 18062-18070) to constrain these backends
to only accept the container executor shape they can handle.

Comment thread openapi/ga/openapi.yaml
Comment on lines 16974 to +16978
provider:
type: string
const: subprocess
const: cpu
title: Provider
default: subprocess
default: cpu

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Allow GPU subprocess execution profiles.

SubprocessExecutionProvider allows provider: gpu, and routing includes (gpu, subprocess); const: cpu makes GPU subprocess profiles invalid in the /execution-profiles response schema.

Proposed fix
         provider:
           type: string
-          const: cpu
+          enum:
+          - cpu
+          - gpu
           title: Provider
           default: cpu
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
provider:
type: string
const: subprocess
const: cpu
title: Provider
default: subprocess
default: cpu
provider:
type: string
enum:
- cpu
- gpu
title: Provider
default: cpu
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openapi/ga/openapi.yaml` around lines 16974 - 16978, The
SubprocessExecutionProvider provider field currently uses const: cpu which
restricts the value to only CPU, but SubprocessExecutionProvider should allow
both cpu and gpu values to support GPU subprocess execution profiles. Replace
the const: cpu constraint in the SubprocessExecutionProvider schema with an enum
that includes both cpu and gpu values to allow both provider types.

Comment on lines +161 to 162
kind: str = "container",
profile: str | None = None,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when the selected execution kind is not subprocess.

Line 161 accepts kind, but Lines 185-188 ignore it and always compile a subprocess executor. That can desync compile output from the resolved profile kind and fail later in backend routing/validation.

Proposed fix
@@
-    async def compile(  # type: ignore[override]
+    async def compile(  # type: ignore[override]
@@
-        from nemo_platform_plugin.jobs.api_factory import (
+        from nemo_platform_plugin.jobs.api_factory import (
             EnvironmentVariable,
             PlatformJobStep,
             SubprocessExecutionProviderSpec,
         )
+        from nemo_platform_plugin.jobs.exceptions import PlatformJobCompilationError
@@
+        if kind != "subprocess":
+            raise PlatformJobCompilationError(
+                f"agents.evaluate requires subprocess execution, got kind={kind!r}, profile={profile!r}"
+            )
+
         return PlatformJobSpec(

Also applies to: 185-188

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/src/nemo_agents_plugin/jobs/evaluate_agent.py` around
lines 161 - 162, The function parameter kind is accepted at line 161 but ignored
when creating the executor in lines 185-188, where subprocess executor is always
used regardless of the kind value. Add validation early in the function to check
if the kind parameter matches what is actually supported (subprocess), and raise
an appropriate error if a different kind like "container" is provided,
preventing a mismatch between the requested execution kind and the compiled
output that could cause failures in backend routing and validation.

Comment on lines +140 to 141
kind: str = "container",
profile: str | None = None,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject non-subprocess kind in this compile path.

Line 140 takes kind, but the step is always compiled as subprocess at Lines 205-207. Add an explicit guard so invalid profile/kind combinations fail during compilation, not at execution time.

Proposed fix
@@
     async def compile(  # type: ignore[override]
@@
     ) -> PlatformJobSpec:
@@
+        if kind != "subprocess":
+            raise PlatformJobCompilationError(
+                f"agents.evaluate-suite requires subprocess execution, got kind={kind!r}, profile={profile!r}"
+            )
+
         from nemo_platform_plugin.jobs.api_factory import (

Also applies to: 205-207

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/src/nemo_agents_plugin/jobs/evaluate_suite.py` around
lines 140 - 141, The function accepts a kind parameter but always compiles the
step as subprocess regardless of the kind value provided, causing runtime
failures for invalid kind values. Add an explicit validation guard during the
compilation phase (around lines 205-207) that ensures the kind parameter is
either not provided or explicitly matches "subprocess". If an invalid kind value
is passed that doesn't align with the subprocess compilation behavior, raise an
error immediately during compilation rather than at execution time.

Comment on lines +141 to 142
kind: str = "container",
profile: str | None = None,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard kind to prevent invalid executor/profile combinations.

Line 141 receives kind, but Lines 168-170 always emit subprocess. Add a compile-time check for kind == "subprocess" to avoid mismatched routing behavior.

Proposed fix
@@
-        from nemo_platform_plugin.jobs.api_factory import (
+        from nemo_platform_plugin.jobs.api_factory import (
             EnvironmentVariable,
             PlatformJobStep,
             SubprocessExecutionProviderSpec,
         )
+        from nemo_platform_plugin.jobs.exceptions import PlatformJobCompilationError
@@
+        if kind != "subprocess":
+            raise PlatformJobCompilationError(
+                f"agents.optimize requires subprocess execution, got kind={kind!r}, profile={profile!r}"
+            )
+
         _require_absolute(spec.optimize_config, "optimize_config")

Also applies to: 168-170

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/src/nemo_agents_plugin/jobs/optimize_agent.py` around
lines 141 - 142, The `kind` parameter in the function signature can be set to
"container" but the code at lines 168-170 always executes as subprocess,
creating a mismatch between the parameter value and actual behavior. Add a guard
clause at the beginning of the function that validates `kind == "subprocess"`,
raising an appropriate error or exception if this condition is not met. This
compile-time check prevents invalid executor and profile combinations where the
caller expects container execution but subprocess is always used instead.

Comment on lines +90 to 91
kind: str = "container",
profile: str | None = None,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce subprocess kind in compile.

Line 90 takes kind, but Lines 151-153 hardcode subprocess. Fail fast when kind != "subprocess" so profile mistakes are surfaced as 4xx compile errors.

Proposed fix
@@
     async def compile(  # type: ignore[override]
@@
     ) -> PlatformJobSpec:
@@
+        if kind != "subprocess":
+            raise PlatformJobCompilationError(
+                f"agents.optimize-skills requires subprocess execution, got kind={kind!r}, profile={profile!r}"
+            )
+
         from nemo_platform_plugin.jobs.api_factory import (

Also applies to: 151-153

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/src/nemo_agents_plugin/jobs/optimize_skills.py` around
lines 90 - 91, The function accepts a kind parameter with a default value of
"container" on line 90, but the implementation at lines 151-153 hardcodes
"subprocess" usage, creating a mismatch. Add early validation in the compile
function to check that kind equals "subprocess" and raise a compile error (4xx)
if it does not, ensuring profile configuration mistakes are caught immediately
rather than causing runtime issues downstream.

Comment on lines +27 to 29
kind: str = "container",
profile: str | None = None,
) -> PlatformJobSpec:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop hardcoding executor profile; use compile inputs.

Lines 27-28 accept kind/profile, but Lines 46-50 ignore them and force kind="container", profile="default". That can break non-default profile routing.

Proposed fix
@@
 from nemo_platform_plugin.jobs.api_factory import (
@@
 )
+from nemo_platform_plugin.jobs.exceptions import PlatformJobCompilationError
@@
 def compile_hello_world_job(
@@
 ) -> PlatformJobSpec:
@@
+    if kind != "container":
+        raise PlatformJobCompilationError(
+            f"hello-world requires container execution, got kind={kind!r}, profile={profile!r}"
+        )
+
     return PlatformJobSpec(
@@
                 executor=ContainerExecutionProviderSpec(
                     kind="container",
                     provider="cpu",
-                    profile="default",
+                    profile=profile or "default",
                     container=ContainerSpec(

Also applies to: 46-50

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/hello-world/src/nmp/hello_world/api/v2/jobs/endpoints.py` around
lines 27 - 29, The function accepts kind and profile parameters but ignores
them, instead hardcoding kind="container" and profile="default" when creating
the PlatformJobSpec (around lines 46-50). Replace the hardcoded values with the
actual kind and profile parameters that were passed into the function to ensure
proper profile routing is respected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant