Skip to content

[Bug] Plan executor produces delete+update on same Deployment UID, then retries stale plan indefinitely #353

@tristal

Description

@tristal

Summary

The plan executor in execplan.go can produce a reconciliation plan that contains both a "delete Deployment" and "update Deployment" step for the same k8s Deployment resource. The delete succeeds, then the update fails on a stale UID precondition. The controller retries the same plan indefinitely without recomputing from current
state.

Environment

  • Controller image: v1.5.x (exact TBD)
  • Kubernetes: EKS 1.30
  • Temporal: self-hosted (Cassandra-backed)

What happened

A helm release changed the desired image tag while an existing version was being sunset. The controller's reconcile plan included:

  1. deleting deployment for account-temporal-default-worker-17-235-1-9755
  2. deleted worker resource on version sunset (HPA) ✅
  3. updating deployment for account-temporal-default-worker-17-232-4-559b
  4. updating deployment for account-temporal-default-worker-17-235-1-9755

Step 4 failed:

StorageError: invalid object, Code: 4, Key: /registry/deployments/.../account-temporal-default-worker-17-235-1-9755,
ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: be213a24-..., UID in object meta: ""

The controller retried this exact plan for ~3 hours. Restarting the controller produced zero reconcile activity for this TWD — it didn't re-enqueue or recompute.

Impact

The Worker Deployment's current build (17.235.1-9755) had its k8s Deployment deleted but was never recreated. Zero pods existed for the current version. All new workflows routed to it were stranded.

Proposed fix

  1. Plan generator should not emit both delete and update for the same Deployment UID in a single plan.
  2. On plan execution failure, recompute the plan from current state rather than retrying the stale plan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions