Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions helm-prereqs/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -630,12 +630,12 @@ _TEMPORAL_TLS="--tls-cert-path /var/secrets/temporal/certs/server-interservice/t
--tls-ca-path /var/secrets/temporal/certs/server-interservice/ca.crt \
--tls-server-name interservice.server.temporal.local"
kubectl exec -n temporal deploy/temporal-admintools -- \
sh -c "temporal operator namespace create -n cloud --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
sh -c "temporal operator namespace create -n cloud --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
kubectl exec -n temporal deploy/temporal-admintools -- \
sh -c "temporal operator namespace create -n site --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
sh -c "temporal operator namespace create -n site --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
# flow Temporal namespace — required by NICo Flow workers; pod panics on startup if absent.
kubectl exec -n temporal deploy/temporal-admintools -- \
sh -c "temporal operator namespace create -n flow --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
sh -c "temporal operator namespace create -n flow --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
Comment on lines 632 to +638

@coderabbitai coderabbitai Bot Jun 17, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not suppress all namespace-creation failures.

--retention 72h is a good fix, but Line 633, Line 635, Line 638, and Line 766 still unconditionally hide failures (2>/dev/null || true). This can mask real setup faults (TLS/auth/connectivity/CLI errors) and breaks fail-fast behavior.

Suggested hardening (idempotent + fail-fast)
+create_temporal_namespace() {
+    local ns="$1"
+    if kubectl exec -n temporal deploy/temporal-admintools -- \
+        sh -c "temporal operator namespace describe -n '${ns}' --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" \
+        >/dev/null 2>&1; then
+        echo "Temporal namespace '${ns}' already exists"
+        return 0
+    fi
+
+    kubectl exec -n temporal deploy/temporal-admintools -- \
+        sh -c "temporal operator namespace create -n '${ns}' --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}"
+}
+
-kubectl exec -n temporal deploy/temporal-admintools -- \
-    sh -c "temporal operator namespace create -n cloud --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
-kubectl exec -n temporal deploy/temporal-admintools -- \
-    sh -c "temporal operator namespace create -n site --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
-kubectl exec -n temporal deploy/temporal-admintools -- \
-    sh -c "temporal operator namespace create -n flow --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
+create_temporal_namespace "cloud"
+create_temporal_namespace "site"
+create_temporal_namespace "flow"
...
-kubectl exec -n temporal deploy/temporal-admintools -- \
-    sh -c "temporal operator namespace create -n '${NICO_SITE_UUID}' --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
+create_temporal_namespace "${NICO_SITE_UUID}"

As per coding guidelines, "**/*.sh: Review shell scripts for ... error propagation ..." and "helm-prereqs/**: ... idempotency, and clear failure messages."

Also applies to: 765-766

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@helm-prereqs/setup.sh` around lines 632 - 638, The kubectl exec commands for
temporal namespace creation (lines 633, 635, 638, and line 766) are
unconditionally suppressing all stderr output and errors with `2>/dev/null ||
true`, which masks real setup failures like TLS, authentication, or connectivity
issues. Instead of blanket error suppression, implement idempotent namespace
creation by first checking if each namespace already exists before attempting to
create it, or by using a conditional approach that only ignores "already exists"
errors while allowing actual failures to be logged and propagated. This
maintains idempotency while preserving fail-fast behavior and enabling clear
error diagnostics as required by the helm-prereqs guidelines.

Source: Coding guidelines

@ajf ajf Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai can you create issues for those things, thanks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

gh issue create \
  --repo "NVIDIA/infra-controller" \
  --title "helm-prereqs/setup.sh: Temporal namespace creation silently suppresses all errors" \
  --assignee "ajf" \
  --body "## Summary

In \`helm-prereqs/setup.sh\`, the four \`temporal operator namespace create\` invocations for the \`cloud\`, \`site\`, \`flow\`, and per-site UUID namespaces unconditionally suppress all stderr output and exit codes via \`2>/dev/null || true\`:

\`\`\`sh
kubectl exec -n temporal deploy/temporal-admintools -- \\
    sh -c \"temporal operator namespace create -n cloud --retention 72h ...\" 2>/dev/null || true
\`\`\`

This pattern masks real setup faults such as TLS misconfiguration, authentication failures, and connectivity issues, breaking fail-fast behaviour and producing no actionable diagnostics. The script may proceed as if namespaces were created when they were not, causing downstream worker pods to enter a CrashLoop with \`Namespace '<ns>' is not found\`.

## Affected lines

- \`helm-prereqs/setup.sh\` lines 633, 635, 638 (cloud / site / flow namespaces)
- \`helm-prereqs/setup.sh\` line 766 (per-site \`NICO_SITE_UUID\` namespace)

## Expected behaviour

Namespace creation commands should:
1. Be idempotent — skip creation if the namespace already exists (via \`temporal operator namespace describe\`).
2. Propagate genuine errors (TLS, auth, connectivity) rather than suppressing them.
3. Emit a clear diagnostic message on failure, consistent with the \`helm-prereqs/**\` coding guidelines requiring *idempotency* and *clear failure messages*.

## References

- PR: https://github.com/NVIDIA/infra-controller/pull/2675
- Review comment: https://github.com/NVIDIA/infra-controller/pull/2675#discussion_r3428992931

Raised by `@ajf`."

Length of output: 1876



@ajf, the issue has been opened and assigned to you:

helm-prereqs/setup.sh: Temporal namespace creation silently suppresses all errors#2768

It covers all four affected lines (the cloud, site, flow, and per-site UUID namespace creation commands) and references this PR and review comment for full traceability.

echo "Temporal namespaces ready"

_SETUP_PHASE="[7g/7] NICo REST helm chart"
Expand Down Expand Up @@ -763,7 +763,7 @@ _TEMPORAL_TLS="--tls-cert-path /var/secrets/temporal/certs/server-interservice/t
--tls-ca-path /var/secrets/temporal/certs/server-interservice/ca.crt \
--tls-server-name interservice.server.temporal.local"
kubectl exec -n temporal deploy/temporal-admintools -- \
sh -c "temporal operator namespace create -n '${NICO_SITE_UUID}' --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
sh -c "temporal operator namespace create -n '${NICO_SITE_UUID}' --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
echo "Temporal namespace ready"

# FLOW_GRPC_ENABLED toggles the site-agent's Flow gRPC client (see
Expand Down
Loading