Skip to content

Conversation

@xiaoweim
Copy link

@xiaoweim xiaoweim commented Jan 6, 2026

This PR introduces a script using Kops to deploy CCM, serve as a replacement for kube-up within the "Cloud Provider Last Known Good Testing" framework.

The "Cloud Provider LKG Testing" design addresses compatibility challenges by running continuous background tests to identify "Last Known Good" (LKG) pairs of Kubernetes and Cloud Provider code. This script uses kops to reliably deploy clusters for these verification tests, replacing the legacy and error-prone kube-up workflow.

  • Multiple Modes:
    • lkg-k8s-local-gcp: Deploys LKG/Stable K8s with a locally built/Latest CCM (primary dev workflow).
    • latest-k8s-lkg-gcp: Tests latest K8s with the Latest Known Good version of CCM.
    • stock: Standard kops behavior.
  • Automated Validation: Includes kops validate cluster with a 15-minute wait to ensure the cluster is fully healthy before passing control.
  • Smart Lifecycle Management:
    • Delete on Failure (Default): Automatically cleans up resources if deployment fails (via trap), ensuring CI hygiene.
    • Keep on Success: Preserves the cluster if deployment succeeds, allowing for subsequent E2E tests or local debugging.
    • Can be overridden with DELETE_CLUSTER=false.

Running the script locally: https://gist.github.com/xiaoweim/f1f436e90111a25f99851fa8c809e436

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 6, 2026
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If the repository mantainers determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

Welcome @xiaoweim!

It looks like this is your first PR to kubernetes/cloud-provider-gcp 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-gcp has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 6, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @xiaoweim. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xiaoweim
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 6, 2026
Copy link

@zhang-xuebin zhang-xuebin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for doing this, appreciated!

echo "Environment variables:"
echo " GCP_PROJECT (Required) GCP Project ID"
echo " CLUSTER_NAME (Required) Cluster name (e.g. my-cluster.k8s.local)"
echo " DELETE_CLUSTER (Optional) Set to 'false' to keep the cluster running (default: true)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this script intended for local testing? If so, is it better to set DELETE_CLUSTER default as false?

oh actually I saw you have some discussion around line 212. Probably add a usage comment section at the top and document the expected behaviour?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is primarily for the "Cloud Provider Last Known Good Testing" framework to verify compatibility (replacing the legacy kube-up workflow). It can also be used for local testing to help reproduce these scenarios.

@aojea
Copy link
Member

aojea commented Jan 6, 2026

what LKG is?

/assign @justinsb

@elmiko
Copy link
Contributor

elmiko commented Jan 6, 2026

very cool update, thank you!

@xiaoweim
Copy link
Author

xiaoweim commented Jan 6, 2026

@aojea
Copy link
Member

aojea commented Jan 7, 2026

That doc is obsolete, I've already implemented the skew testing automation, there is some internal doc and recording, but feel free to ping me I can explain it again.... In this case, we should also add a small doc explaining these things so the repo is self contained

@xiaoweim xiaoweim marked this pull request as ready for review January 8, 2026 02:23
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 8, 2026
@cheftako
Copy link
Member

cheftako commented Jan 9, 2026

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 9, 2026
@cheftako
Copy link
Member

cheftako commented Jan 9, 2026

That doc is obsolete, I've already implemented the skew testing automation, there is some internal doc and recording, but feel free to ping me I can explain it again.... In this case, we should also add a small doc explaining these things so the repo is self contained

Antonio can we have chat/meeting to go over what you have?

# Ensure bucket exists
if ! gsutil ls -p "${GCP_PROJECT}" "${KOPS_STATE_STORE}" >/dev/null 2>&1; then
gsutil mb -p "${GCP_PROJECT}" -l "${GCP_LOCATION}" "${KOPS_STATE_STORE}"
gsutil ubla set off "${KOPS_STATE_STORE}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? I don't see it documented in kops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see it explained here. On that note, I see a lot of shared logic between this script and ./e2e/scenarios/kops-simplethat can be refactored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants