Skip to content

nightshift: idea-generator — 15 improvement ideas for tailstick #36

@nightshift-micr

Description

@nightshift-micr

nightshift: idea-generator — tailstick improvement ideas

Repo: Microck/tailstick
Task: idea-generator
Category: options
Generated: 2026-04-21

Analysis of tailstick (Go, ~3156 LOC, 8 internal packages) — a USB-delivered Tailscale enrollment tool for Windows and Linux with lease lifecycle management.


🟢 High Value / Low Effort

1. Add tailstick status command

File: internal/app/cli.go, internal/app/workflow.go
Why: Users have no way to view current lease state from the CLI. They must read /var/lib/tailstick/state.json manually.
Approach: Add a status command that loads state, filters active leases, and prints a table (lease ID, mode, device name, status, expiry). Reuse state.Load().

2. Structured JSON logging option

File: internal/logging/logger.go
Why: The current logger writes RFC3339 [LEVEL] message format. The audit trail uses NDJSON (AppendAudit) but the main log doesn't support structured output. This makes log aggregation harder.
Approach: Add a --log-format=json flag. When enabled, emit {"ts":"...","level":"INFO","msg":"..."} lines using json.Encoder. The current text format remains default.

3. Config schema validation with JSON Schema

File: internal/config/config.go
Why: Validate() does ad-hoc checks. A published JSON Schema would let IDEs validate configs before runtime and improve the preset maker web tool.
Approach: Add a schema/tailstick.config.schema.json file generated from the Go types. Reference it from the README config section.

4. Lease history and tailstick list command

File: internal/app/cli.go, internal/app/workflow.go
Why: State file accumulates records but there's no CLI way to review past leases (cleaned, failed).
Approach: Add tailstick list [--all] showing lease history. Default shows active only, --all includes cleaned/failed records.


🟡 Medium Value / Medium Effort

5. Configurable agent check interval per preset

File: internal/model/types.go, internal/app/workflow.go
Why: The agent timer is hardcoded to 60s (OnUnitActiveSec=60s). Some use cases (short-lived session leases) may benefit from faster cleanup detection (e.g., 10s), while others prefer less frequent checks.
Approach: Add agentIntervalSeconds to Preset model. Pass to installLinuxAgent()/installWindowsAgent() for systemd timer / scheduled task frequency.

6. Retry with exponential backoff for cleanup failures

File: internal/app/workflow.go (cleanupRecord)
Why: Currently, cleanup failures retry every agent tick (60s) indefinitely. A single transient DNS issue could cause hundreds of retries.
Approach: Add CleanupAttempts int and LastCleanupAttempt *time.Time to LeaseRecord. After N failures, start backing off (60s → 120s → 300s → 900s). After 24h, emit a final audit event and mark as cleanup_abandoned.

7. Shutdown graceful drain for GUI server

File: internal/gui/server.go
Why: httpSrv.Shutdown() has no timeout. If the browser has an open connection, the server hangs on exit.
Approach: Use Shutdown(ctx) with a 5-second context timeout. Also add IdleTimeout: 30 * time.Second to the http.Server config.

8. Prometheus-compatible metrics endpoint

File: New internal/metrics/ package
Why: Fleet deployments need observability — how many leases active, cleanup success rate, enrollment duration.
Approach: Optional --metrics-addr :9090 flag. Expose /metrics with: tailstick_leases_active (gauge by mode), tailstick_enrollments_total (counter), tailstick_cleanup_duration_seconds (histogram). Use the stdlib expvar pattern or a lightweight Prometheus client.

9. Dry-run diff output for config changes

File: internal/config/config.go
Why: --dry-run exists for enrollment but not for config validation. Operators changing presets would benefit from seeing what would change before running.
Approach: Add tailstick config validate [--config FILE] subcommand that loads, validates, and prints a summary of presets, their auth methods, and exit node configs without enrolling.

10. macOS support

File: internal/platform/platform.go, internal/tailscale/client.go
Why: macOS has Tailscale support and is common for ops laptops. Currently only Linux (Debian/Ubuntu) and Windows are supported.
Approach: Add darwin case in Detect(), installCommand(), uninstallCommand(), StatePath(), LogPath(). Use brew install tailscale or the macOS install script. Use launchd plist instead of systemd/schtasks for agent scheduling.


🔵 Lower Priority / Larger Scope

11. Webhook notifications on lease lifecycle events

File: internal/app/workflow.go
Why: Fleet operators may want Slack/email alerts when leases are created or cleaned up, or when cleanup fails repeatedly.
Approach: Add optional webhooks array to config: [{url, events: ["enrolled","cleanup_failed","cleaned"], secret}]. POST NDJSON audit entries to the URL.

12. Multi-tenant config support

File: internal/config/config.go
Why: Currently one config file per binary invocation. USB sticks for different tailnets need separate config files. A single config could support multiple tailnets via preset-level tailnet isolation.
Approach: Allow presets to specify their own tailnet field for API operations. The cleanup API call already takes an API key per preset — extend to support distinct tailnet hostnames.

13. Encrypted state file at rest

File: internal/state/store.go
Why: state.json contains device IDs, hostnames, and encrypted secrets (base64 blobs). While secrets are encrypted, metadata is in plaintext. On multi-user machines, this leaks device enrollment info.
Approach: Use the same crypto.Encrypt() mechanism to encrypt the entire state file at rest. Decrypt on load using machine context.

14. Plugin system for custom install/uninstall commands

File: internal/tailscale/client.go
Why: Custom OS images or air-gapped environments may need proprietary install methods (e.g., pulling from an internal artifact registry). Currently, install commands are hardcoded per platform.
Approach: Support install.linuxCustom: ["my-installer", "--target", "tailscale"] in preset config, allowing arbitrary command arrays for non-standard environments.

15. Audit log retention and rotation

File: internal/state/store.go
Why: AppendAudit() appends to a single NDJSON file indefinitely. Over months of operation, this file grows without bound.
Approach: Rotate audit files when they exceed a configurable size (default 10MB). Keep N rotated files. Add tailstick audit show [--last N] command for review.


Generated by nightshift — autonomous code quality bot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions