Skip to content

ci: add docker build in the pipeline to use with a basic k8s install test#345

Merged
crookedstorm merged 13 commits into
mainfrom
check-containers-k8s/crookedstorm
Jun 16, 2026
Merged

ci: add docker build in the pipeline to use with a basic k8s install test#345
crookedstorm merged 13 commits into
mainfrom
check-containers-k8s/crookedstorm

Conversation

@crookedstorm

@crookedstorm crookedstorm commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

This is a work in progress, but in order to set up k8s testing in pipeline that should prevent breakage we also need to be able to do a basic docker build of smaller containers in the pipeline.

This should combine both into a tidy little smoke test.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Kubernetes smoke-test change detection (k8s-smoke) and used it to gate CPU smoke CI.
    • Introduced manual CI controls for CPU smoke image tag selection and optional publishing, plus new CPU smoke image build and kind-based E2E smoke jobs.
  • Bug Fixes

    • Improved robustness of Kubernetes E2E secret cleanup with guaranteed best-effort deletion and post-delete verification even when assertions fail.
  • Refactor

    • Hardened and centralized the Helm-based E2E installation flow with stronger validation and improved diagnostics on failure.

@crookedstorm crookedstorm requested a review from a team as a code owner June 15, 2026 23:43
@github-actions github-actions Bot added the ci label Jun 15, 2026
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7b91be4f-128a-4998-bd2c-5174d100d344

📥 Commits

Reviewing files that changed from the base of the PR and between d0cca11 and aca0a98.

📒 Files selected for processing (2)
  • .github/actions/changes/action.yaml
  • .github/workflows/ci.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/actions/changes/action.yaml
  • .github/workflows/ci.yaml

📝 Walkthrough

Walkthrough

Adds CPU smoke image build and kind-based Kubernetes smoke test CI jobs gated by k8s-smoke path detection. Consolidates Helm install scripts into refactored install_helm_e2e.sh with structured validation, diagnostics, and helper functions, removing two standalone scripts. E2E test hardens secret cleanup via try/finally.

Changes

CPU Smoke Test CI Pipeline

Layer / File(s) Summary
Path-change detection and workflow triggers
.github/actions/changes/action.yaml, .github/workflows/ci.yaml
Adds k8s-smoke output to changes action covering e2e/k8s scripts, values, test_jobs.py, and free-disk-space action. CI workflow gains workflow_dispatch inputs for image tag and publish flag, wiring k8s-smoke into downstream job conditionals.
build-cpu-smoke-images and kind-cpu-smoke jobs
.github/workflows/ci.yaml
build-cpu-smoke-images conditionally runs on path changes or manual trigger, building CPU images via Buildx with dynamic registry/tag/publish logic. kind-cpu-smoke depends on image publishing, provisions kind cluster, pre-pulls GHCR images, installs NeMo Platform via Helm, runs E2E smoke pytest, uploads logs/artifacts. Both added to ci-status required checks.
install_helm_e2e.sh refactoring: config, validation, and helpers
e2e/k8s/scripts/install_helm_e2e.sh
Strict error handling and centralized env-var defaults for namespace, release, chart/values paths, registry/tag toggles, RustFS settings. Adds require_non_empty, validate_file_inputs, validate_image_inputs helpers. Updates install_rustfs to install via Helm with pod readiness and S3 bucket creation. Adds add_nvidia_helm_repo, collect_install_diagnostics, maybe_export_minikube_cluster_url helpers.
install_helm_e2e.sh main flow and script consolidation
e2e/k8s/scripts/install_helm_e2e.sh, e2e/k8s/scripts/install_nmp_auth_e2e.sh, e2e/k8s/scripts/install_nmp_e2e.sh, e2e/k8s/scripts/install_rustfs.sh
Reworked main flow with structured validation, conditional RustFS, Helm dependency build, and readiness delegation. Removes install_nmp_e2e.sh and install_rustfs.sh entirely. Updates install_nmp_auth_e2e.sh to set REQUIRE_NMP_E2E_IMAGES=true and delegate to install_helm_e2e.sh.

E2E Secret Cleanup Robustness

Layer / File(s) Summary
test_job_using_secret_environment_variable: try/finally cleanup
e2e/test_jobs.py
Adds NotFoundError import. Wraps job creation and assertions in try block with secret_deleted flag; deletes secret and verifies removal via sdk.secrets.retrieve raising NotFoundError; finally block performs best-effort cleanup.

Suggested Reviewers

  • svvarom
  • mckornfield
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title describes adding Docker build capability to the CI pipeline for Kubernetes testing, which aligns with the core changes: a new smoke test job and Docker build infrastructure added to ci.yaml.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch check-containers-k8s/crookedstorm

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/docker-cpu-smoketest.yaml:
- Around line 59-60: The actions/checkout action in the workflow is missing the
persist-credentials configuration, which causes git credentials to remain in the
runner environment unnecessarily. Add the input parameter persist-credentials:
false to the checkout action to disable credential persistence since no
subsequent git operations require these credentials.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cb4b005e-4867-4565-9147-9bd912e15022

📥 Commits

Reviewing files that changed from the base of the PR and between dbff951 and aec4185.

📒 Files selected for processing (1)
  • .github/workflows/docker-cpu-smoketest.yaml

Comment thread .github/workflows/docker-cpu-smoketest.yaml Outdated
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor
Suite Lines Covered Line Rate Branch Rate
Unit Tests 19462/25829 75.3% 60.9%
Integration Tests 11371/24601 46.2% 20.2%

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/docker-cpu-smoketest.yaml (1)

49-52: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope token permissions per job (least privilege).

At Line 49-52, packages: write is granted workflow-wide, so kind-smoke inherits write access it does not need. Keep packages: write only on build-cpu-images; use read-only (or no packages permission) for kind-smoke.

Suggested patch
 permissions:
   contents: read
-  packages: write
+  packages: read

 jobs:
   build-cpu-images:
+    permissions:
+      contents: read
+      packages: write
@@
   kind-smoke:
+    permissions:
+      contents: read
+      packages: read
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/docker-cpu-smoketest.yaml around lines 49 - 52, The
workflow currently grants `packages: write` permission at the workflow level,
which means all jobs including `kind-smoke` inherit unnecessary write access to
packages. Move the `permissions` block from the workflow root level to be
job-specific: add `permissions` with `contents: read` and `packages: write` only
to the `build-cpu-images` job, and add a separate `permissions` block to the
`kind-smoke` job with only `contents: read` (removing or not including
`packages` permission). This follows the principle of least privilege by
ensuring each job only has the permissions it actually needs.
♻️ Duplicate comments (1)
.github/workflows/docker-cpu-smoketest.yaml (1)

67-68: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Disable credential persistence in both checkout steps.

actions/checkout should set persist-credentials: false at Line 67 and Line 172 to avoid retaining repo credentials in runner git config.

Suggested patch
       - name: Checkout code
         uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
+        with:
+          persist-credentials: false
@@
       - name: Checkout code
         uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
+        with:
+          persist-credentials: false

Also applies to: 172-173

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/docker-cpu-smoketest.yaml around lines 67 - 68, The
actions/checkout steps are not disabling credential persistence, which can leave
repository credentials in the runner's git config. Add persist-credentials:
false to both the checkout step at lines 67-68 and the checkout step at lines
172-173 to ensure credentials are not retained after the workflow completes.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.github/workflows/docker-cpu-smoketest.yaml:
- Around line 49-52: The workflow currently grants `packages: write` permission
at the workflow level, which means all jobs including `kind-smoke` inherit
unnecessary write access to packages. Move the `permissions` block from the
workflow root level to be job-specific: add `permissions` with `contents: read`
and `packages: write` only to the `build-cpu-images` job, and add a separate
`permissions` block to the `kind-smoke` job with only `contents: read` (removing
or not including `packages` permission). This follows the principle of least
privilege by ensuring each job only has the permissions it actually needs.

---

Duplicate comments:
In @.github/workflows/docker-cpu-smoketest.yaml:
- Around line 67-68: The actions/checkout steps are not disabling credential
persistence, which can leave repository credentials in the runner's git config.
Add persist-credentials: false to both the checkout step at lines 67-68 and the
checkout step at lines 172-173 to ensure credentials are not retained after the
workflow completes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a80def08-e567-415c-9f98-2317406b171d

📥 Commits

Reviewing files that changed from the base of the PR and between aec4185 and 343522b.

📒 Files selected for processing (1)
  • .github/workflows/docker-cpu-smoketest.yaml

@crookedstorm crookedstorm requested review from a team as code owners June 16, 2026 01:42
…test

Signed-off-by: Brooke Storm <brookes@nvidia.com>
Signed-off-by: Brooke Storm <brookes@nvidia.com>
Signed-off-by: Brooke Storm <brookes@nvidia.com>
Signed-off-by: Brooke Storm <brookes@nvidia.com>
Signed-off-by: Brooke Storm <brookes@nvidia.com>
Signed-off-by: Brooke Storm <brookes@nvidia.com>
Signed-off-by: Brooke Storm <brookes@nvidia.com>
Signed-off-by: Brooke Storm <brookes@nvidia.com>
Signed-off-by: Brooke Storm <brookes@nvidia.com>
Signed-off-by: Brooke Storm <brookes@nvidia.com>
@crookedstorm crookedstorm force-pushed the check-containers-k8s/crookedstorm branch from 7ba1266 to db71c2a Compare June 16, 2026 15:51
@svvarom svvarom self-requested a review June 16, 2026 16:17
@crookedstorm

Copy link
Copy Markdown
Contributor Author

@coderabbitai resume

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
✅ Action performed

Reviews resumed.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ci.yaml:
- Around line 23-27: Remove the `packages: read` line from the top-level
permissions block in the workflow file unless another job that does not override
permissions requires GHCR access. Since the image jobs already declare their own
`packages: write/read` permissions at the job level, they do not need the
top-level grant, and keeping it violates the principle of least privilege by
granting package read access to all jobs unnecessarily.
- Around line 119-120: The checkout actions in the GitHub Actions workflow are
not disabling persisted credentials, which is a security risk. Add
`persist-credentials: false` to the `actions/checkout` step at lines 119-120 in
.github/workflows/ci.yaml. Also add the same `persist-credentials: false`
parameter to the checkout step at lines 235-236 in .github/workflows/ci.yaml.
This prevents the job token from being stored in .git/config and accessible to
subsequent build/test steps.

In `@e2e/test_jobs.py`:
- Around line 258-259: The deletion verification in the test_jobs.py file (lines
258-259) only checks if the secret is absent from the first page of list
results, but the list endpoint has a default page_size of 10, so the secret
could still exist on other pages. Replace this pagination-fragile check with a
retrieve-after-delete verification instead. After deleting the secret, attempt
to retrieve it directly using sdk.secrets.retrieve() and verify that it raises
an appropriate error or exception (expecting failure), which definitively
confirms the secret no longer exists rather than relying on page-based list
absence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: aa9ce9ca-a11c-4d02-a0b7-4f4f6088e5d1

📥 Commits

Reviewing files that changed from the base of the PR and between 202511f and db71c2a.

📒 Files selected for processing (7)
  • .github/actions/changes/action.yaml
  • .github/workflows/ci.yaml
  • e2e/k8s/scripts/install_helm_e2e.sh
  • e2e/k8s/scripts/install_nmp_auth_e2e.sh
  • e2e/k8s/scripts/install_nmp_e2e.sh
  • e2e/k8s/scripts/install_rustfs.sh
  • e2e/test_jobs.py
💤 Files with no reviewable changes (2)
  • e2e/k8s/scripts/install_rustfs.sh
  • e2e/k8s/scripts/install_nmp_e2e.sh

Comment thread .github/workflows/ci.yaml
Comment thread .github/workflows/ci.yaml
Comment thread e2e/test_jobs.py Outdated
Signed-off-by: Brooke Storm <brookes@nvidia.com>
Signed-off-by: Brooke Storm <brookes@nvidia.com>
Comment thread .github/workflows/ci.yaml Outdated
Comment thread .github/workflows/ci.yaml

@svvarom svvarom left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Brooke Storm <brookes@nvidia.com>
@crookedstorm crookedstorm enabled auto-merge June 16, 2026 17:30
@crookedstorm crookedstorm added this pull request to the merge queue Jun 16, 2026
Merged via the queue into main with commit bb6403e Jun 16, 2026
53 checks passed
@crookedstorm crookedstorm deleted the check-containers-k8s/crookedstorm branch June 16, 2026 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants