fix: data races, resource leaks, and validation gaps#125
Merged
Conversation
Concurrency: - guard TaskNode Phase/Message/Metrics/cancel with an RWMutex and route all access through accessors; add locking MarshalJSON (fixes -race failures) - lock logWriter.buffer (shared stdout/stderr writer) - guard k8s.pods with a mutex Leaks: - close probe HTTP response body - return SSE/log handlers on client disconnect; non-blocking broadcast - close docker client on ctx-done, not when Run returns (was use-after-close) - kill process if Getpgid fails after start - close envfiles per-iteration Correctness: - env precedence: spec env now overrides inherited environment - EnvVar.Unstring uses SplitN so values may contain '=' - check ReadBuildInfo ok bool (no nil panic on -v) - validate dependencies exist + detect dependency cycles at load time - guard nil TCPSocket/HTTPGet in Probe.URL - drain event channels during shutdown to avoid deadlock Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the task runner by addressing concurrency races, resource leaks, and validation/correctness gaps that could cause hangs, panics, or inconsistent UI/task state reporting.
Changes:
- Adds synchronization around mutable task status/logging and improves shutdown behavior to avoid deadlocks.
- Fixes multiple resource-lifecycle issues (HTTP bodies, SSE/log streaming disconnects, docker client lifetime, envfile handling).
- Adds workflow dependency validation (unknown deps + cycle detection) and tightens parsing/edge-case handling.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| main.go | Avoids nil panic when build info is unavailable (kit -v prints unknown). |
| internal/types/probe.go | Makes Probe.URL() safe when no action is configured. |
| internal/types/envfile.go | Prevents deferred-close accumulation by reading envfiles per-iteration via helper. |
| internal/types/env_var.go | Allows EnvVar values containing = by using SplitN. |
| internal/terminal.go | Routes task phase reads through synchronized accessor. |
| internal/task_node.go | Introduces status locking + accessors and a locking MarshalJSON to eliminate task status races. |
| internal/server.go | Makes SSE broadcast non-blocking and exits /events + /logs on client disconnect. |
| internal/run.go | Validates unknown deps + cycles; routes task status/cancel/metrics through accessors; drains event channels during shutdown wait. |
| internal/proc/kubernetes.go | Guards concurrent access to the pod list and snapshots for metrics. |
| internal/proc/host.go | Fixes env precedence and kills started process when pgid capture fails. |
| internal/proc/container.go | Defers docker client close until ctx cancellation to avoid use-after-close by metrics goroutine. |
| internal/probe.go | Closes probe HTTP response bodies on the success path. |
| internal/log_writer.go | Adds a mutex to avoid data races when the same writer is used for stdout+stderr. |
| internal/dag.go | Adds DFS-based cycle detection helper. |
| internal/dag_test.go | Adds a test covering cycle detection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Resolved main.go conflict by taking the upstream rewrite (#124), which already includes the debug.ReadBuildInfo ok-check fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a batch of correctness bugs found in a review of the task runner, grouped into three areas. All findings verified; tests pass with
go test -race ./....Concurrency (data races)
TaskNodePhase/Message/Metrics/cancelwere written by task/timer/metrics goroutines while read by the main loop and the HTTP server. Added astatus sync.RWMutexwith accessors and a lockingMarshalJSON; routed every access through them. (-racewas failing before this.)logWriter.buffermutated without a lock — the same writer is passed as both stdout and stderr, whichos/execcopies on separate goroutines. Added a mutex.k8s.podsappended in the informer goroutine while ranged in the metrics goroutine. Guarded with a mutex;GetMetricssnapshots under lock.Resource leaks
/eventsand/logshandlers never returned on client disconnect (goroutine + memory leak); now select onr.Context().Done(). Broadcast made non-blocking so one slow client can't stall others.deferwhenRunreturned, but the metrics goroutine kept using it → use-after-close. Now closed on ctx-done.Getpgidfailed afterStart; now killed.Correctness
EnvVar.UnstringusedSplit(s,"="), rejecting values containing=(URLs, base64). NowSplitN.kit -vcould nil-panic when build info is absent; check theokbool.Probe.URL()nil-derefed when neitherTCPSocketnorHTTPGetwas set.Test plan
go build ./...go vet ./...go test -race ./...(passes; addedTestDAG_findCycle)🤖 Generated with Claude Code