Skip to content

fix: respect user-provided container capabilities under PodSecurity restricted#176

Open
1fanwang wants to merge 2 commits intoClickHouse:mainfrom
1fanwang:fix/security-context-capabilities-merge
Open

fix: respect user-provided container capabilities under PodSecurity restricted#176
1fanwang wants to merge 2 commits intoClickHouse:mainfrom
1fanwang:fix/security-context-capabilities-merge

Conversation

@1fanwang
Copy link
Copy Markdown

Why

#174 reports that ClickHouseCluster / KeeperCluster pods fail to start in namespaces that enforce the restricted PodSecurity Standard. Setting containerTemplate.securityContext.capabilities.drop: [ALL] is expected to leave the merged container with no add capabilities, but the operator-defaulted add: [IPC_LOCK, PERFMON, SYS_PTRACE] survives the merge — and restricted forbids any add outside NET_BIND_SERVICE.

What

ApplyContainerTemplateOverrides in internal/controller/overrides.go merges the user-supplied ContainerTemplateSpec onto the operator-built container via Strategic Merge Patch (SMP) with corev1.Container semantics. SMP deep-merges nested structs field-by-field unless the schema carries a patchStrategy=replace directive — and corev1.Capabilities carries none. So a patch of {Capabilities: {Drop: [ALL]}} over a base of {Capabilities: {Add: [IPC_LOCK, PERFMON, SYS_PTRACE]}} yields a merged {Capabilities: {Add: [IPC_LOCK, PERFMON, SYS_PTRACE], Drop: [ALL]}}. PodSecurity then rejects the pod.

(The issue points at ApplyDefault, but defaultSpec in keepercluster_types.go and clickhousecluster_types.go carries no SecurityContext — the operator-defaulted Add is set in internal/controller/{clickhouse,keeper}/templates.go and survives via SMP in overrides.go. Same observable symptom; the fix lives at the merge boundary.)

This change pre-clears the base container's Capabilities when the spec author supplies their own, making user-provided Capabilities authoritative for the whole sub-struct. Other SecurityContext fields (runAsUser, runAsNonRoot, …) still deep-merge as before.

Tests

Added Describe("Capabilities") in internal/controller/overrides_test.go:

  1. Bug repro — user Drop:[ALL] against operator-defaulted Add:[IPC_LOCK,…] → merged has empty Add and Drop:[ALL]. Fails on main.
  2. Invariant — user Add:[NET_BIND_SERVICE] replaces operator Add:[IPC_LOCK,…] rather than merging.
  3. Invariant — user SecurityContext without Capabilities preserves operator defaults.
  4. Invariant — caller's *Container and *SecurityContext pointers are not mutated.

Local checks: make lint-fix (0 issues), go test ./internal/controller/ -count=1 (all pass), and go test ./internal/controller/clickhouse/ -count=1 --ginkgo.label-filter='!integration' (all pass). Skipped [integration]-labelled tests because they require a local Docker daemon; nothing in this change touches that path.

Happy to iterate on the choice of merge-boundary fix vs. an alternative like adding a patchStrategy=replace directive — went with the boundary fix because it's contained to the operator and doesn't depend on upstream schema changes.

Related Issues

Fixes #174

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 27, 2026

CLA assistant check
All committers have signed the CLA.

…estricted

Strategic merge patch on corev1.Capabilities deep-merges Add and Drop independently,
so a user setting only `drop: [ALL]` leaves the operator-defaulted
`add: [IPC_LOCK, PERFMON, SYS_PTRACE]` in place — which the `restricted` PodSecurity
profile rejects. Treat user-provided Capabilities as authoritative for the whole
sub-struct by clearing it on the base before the merge, while keeping deep-merge
semantics for the rest of SecurityContext.

Fixes ClickHouse#174
@1fanwang 1fanwang force-pushed the fix/security-context-capabilities-merge branch from 6043e38 to fab9810 Compare April 27, 2026 09:01
@GrigoryPervakov GrigoryPervakov requested a review from Copilot April 28, 2026 08:53
@GrigoryPervakov GrigoryPervakov self-assigned this Apr 28, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes how container Linux capabilities are merged when applying containerTemplate overrides, ensuring user-provided securityContext.capabilities fully overrides operator-defaulted capabilities so pods can start under the Kubernetes PodSecurity restricted standard.

Changes:

  • Update ApplyContainerTemplateOverrides to clear the base container’s SecurityContext.Capabilities whenever the user provides their own Capabilities, preventing Strategic Merge Patch from deep-merging Add lists.
  • Add a focused test suite covering the original bug repro, replacement semantics, preservation semantics, and non-mutation of caller-owned pointers.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
internal/controller/overrides.go Makes user-provided Capabilities authoritative by pre-clearing base capabilities before SMP merge.
internal/controller/overrides_test.go Adds tests validating capabilities override behavior and ensuring inputs aren’t mutated.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security Context Capabilities Merge Bug - Cannot Override Default Capabilities

4 participants