Skip to content

feat: background workers = non-HTTP workers with shared state#2287

Open
nicolas-grekas wants to merge 1 commit intophp:mainfrom
nicolas-grekas:sidekicks
Open

feat: background workers = non-HTTP workers with shared state#2287
nicolas-grekas wants to merge 1 commit intophp:mainfrom
nicolas-grekas:sidekicks

Conversation

@nicolas-grekas
Copy link
Contributor

@nicolas-grekas nicolas-grekas commented Mar 16, 2026

Summary

Background workers are long-running PHP workers that run outside the HTTP cycle. They observe their environment (Redis, DB, filesystem, etc.) and publish configuration that HTTP workers read per-request - enabling real-time reconfiguration without restarts or polling.

PHP API

  • frankenphp_worker_set_vars(array $vars) - publishes config from a background worker (persistent memory, cross-thread)
  • frankenphp_worker_get_vars(string|array $name, float $timeout = 30.0) - reads config from HTTP workers (blocks until first publish, generational cache)
  • frankenphp_worker_get_signaling_stream() - returns a pipe-based stream for stream_select() integration (cooperative shutdown)

Caddyfile configuration

php_server {
    # HTTP worker (unchanged)
    worker public/index.php { num 4 }

    # Named background worker
    worker bin/worker.php { background; name config-watcher }

    # Catch-all for lazy-started names
    worker bin/worker.php { background }
}
  • background marks a worker as non-HTTP
  • name specifies an exact worker name; workers without name are catch-all for lazy-started names
  • Not declaring a catch-all forbids lazy-started ones
  • max_threads on catch-all sets a safety cap for lazy-started instances (defaults to 16)
  • num and max_threads capped at 1 (pooling is a future feature)
  • max_consecutive_failures defaults to 6 (same as HTTP workers)
  • max_execution_time automatically disabled for background workers

Shutdown

Background workers are stopped cooperatively via the signaling stream: FrankenPHP writes "stop\n" which is picked up by stream_select(). Workers have a 5-second grace period.

After the grace period, a best-effort force-kill is attempted:

  • Linux ZTS: arms PHP's own max_execution_time timer cross-thread via timer_settime(EG(max_execution_timer_timer))
  • Windows: CancelSynchronousIo + QueueUserAPC interrupts blocking I/O and alertable waits
  • macOS: no per-thread mechanism available; stuck threads are abandoned

Architecture

  • BackgroundWorkerRegistry per php_server for isolation and at-most-once semantics
  • Persistent memory (pemalloc) with RWMutex for safe cross-thread sharing
  • Generational cache: per-thread version check skips lock + copy when data hasn't changed; repeated get_vars calls return the same array instance (=== is O(1))
  • Opcache immutable array zero-copy fast path (IS_ARRAY_IMMUTABLE)
  • Interned string optimizations (ZSTR_IS_INTERNED) - skip copy/free for shared memory strings
  • Rich type support: null, scalars, arrays (nested), enums
  • Signaling stream: pipe-based fd for stream_select() - compatible with amphp/ReactPHP event loops
  • Crash recovery with exponential backoff and automatic restart
  • Thread reservation: background workers get dedicated threads outside the HTTP scaling budget
  • $_SERVER['FRANKENPHP_WORKER_NAME'] set for all workers (HTTP and background)
  • $_SERVER['FRANKENPHP_WORKER_BACKGROUND'] set for all workers (true/false)

Example

// bin/config-watcher.php
$redis = new Redis();
$redis->connect('127.0.0.1');

do {
    frankenphp_worker_set_vars([
        'maintenance' => (bool) $redis->get('maintenance_mode'),
        'feature_flags' => json_decode($redis->get('features'), true),
    ]);
} while (!should_stop(5.0)); // check every 5s via signaling stream
// HTTP worker
$config = frankenphp_worker_get_vars('config-watcher');
if ($config['maintenance']) {
    return new Response('Down for maintenance', 503);
}

Test coverage

16 tests covering: basic vars, at-most-once start, validation, type support (enums, binary-safe strings), multiple background workers, multiple entrypoints, crash restart, signaling stream, worker restart lifecycle, non-background-worker error handling, identity detection, generational cache.

All tests pass on PHP 8.2, 8.3, 8.4, and 8.5 with -race.

Documentation

Full docs at docs/background-workers.md.

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 4 times, most recently from e1655ab to 867e9b3 Compare March 16, 2026 20:26
@AlliBalliBaba
Copy link
Contributor

AlliBalliBaba commented Mar 16, 2026

Interesting approach to parallelism, what would be a concrete use case for only letting information flow one way from the sidekick to the http workers?

Usually the flow would be inverted, where a http worker offloads work to a pool of 'sidekick' workers and can optionally wait for a task to complete.

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 2 times, most recently from da54ab8 to a06ba36 Compare March 16, 2026 21:45
@henderkes
Copy link
Contributor

Thank you for the contribution. Interesting idea, but I'm thinking we should merge the approach with #1883. The kind of worker is the same, how they are started is but a detail.

@nicolas-grekas the Caddyfile setting should likely be per php_server, not a global setting.

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 7 times, most recently from ad71bfe to 05e9702 Compare March 17, 2026 08:03
@nicolas-grekas
Copy link
Contributor Author

nicolas-grekas commented Mar 17, 2026

@AlliBalliBaba The use case isn't task offloading (HTTP->worker), but out-of-band reconfigurability (environment->worker->HTTP). Sidekicks observe external systems (Redis Sentinel failover, secret rotation, feature flag changes, etc.) and publish updated configuration that HTTP workers pick up on their next request; with per-request consistency guaranteed via $_SERVER injection. No polling, no TTLs, no redeployment.

Task offloading (what you describe) is a valid and complementary pattern, but it solves a different problem. The non-HTTP worker foundation here could support both.

@henderkes Agreed that the underlying non-HTTP worker type overlaps with #1883. The foundation (skip HTTP startup/shutdown, immediate readiness, cooperative shutdown) is the same. The difference is the API layer and the DX goals:

  • Minimal FrankenPHP config: a single sidekick_entrypoint in php_server(thanks for the idea). No need to declare individual workers in the Caddyfile. The PHP app controls which sidekicks to start via frankenphp_sidekick_start(), keeping the infrastructure config simple.

  • Graceful degradability: apps should work correctly with or without FrankenPHP. The same codebase should work on FrankenPHP (with real-time reconfiguration) and on traditional setups (with static or always refreshed config).

  • Nice framework integration: the sidekick_entrypoint pointing to e.g. bin/console means sidekicks are regular framework commands, making them easy to develop.

Happy to follow up with your proposals now that this is hopefully clarified.
I'm going to continue on my own a bit also :)

@dunglas
Copy link
Member

dunglas commented Mar 17, 2026

Great PR!

Couldn't we create a single API that covers both use case?

We try to keep the number of public symbols and config option as small as possible!

@henderkes
Copy link
Contributor

@henderkes Agreed that the underlying non-HTTP worker type overlaps with #1883. The foundation (skip HTTP startup/shutdown, immediate readiness, cooperative shutdown) is the same. The difference is the API layer and the DX goals:

Yes, that's why I'd like to unify the two API's and background implementations into one. Unfortunately the first task worker attempt didn't make it into main, but perhaps @AlliBalliBaba can use his experience with the previous PR to influence this one. I'd be more in favour of a general API, than a specific sidecar one.

@nicolas-grekas
Copy link
Contributor Author

The PHP-side API has been significantly reworked since the initial iteration: I replaced $_SERVER injection with explicit get_vars/set_vars protocol.

The old design used frankenphp_set_server_var() to inject values into $_SERVER implicitly. The new design uses an explicit request/response model:

  • frankenphp_sidekick_set_vars(array $vars): called from the sidekick to publish a complete snapshot atomically
  • frankenphp_sidekick_get_vars(string|array $name, float $timeout = 30.0): array: called from HTTP workers to read the latest vars

Key improvements:

  • No race condition on startup: get_vars blocks until the sidekick has called set_vars. The old design had a race where HTTP requests could arrive before the sidekick had published its values.
  • Strict context enforcement: set_vars and should_stop throw RuntimeException if called from a non-sidekick context.
  • Atomic snapshots: set_vars replaces all vars at once. No partial state possible
  • Parallel start: get_vars(['redis-watcher', 'feature-flags']) starts all sidekicks concurrently, waits for all, returns vars keyed by name.
  • Works in both worker and non-worker mode: get_vars works from any PHP script served by php_server, not just from frankenphp_handle_request() workers.

Other changes:

  • sidekick_entrypoint moved from global frankenphp block to per-php_server (as @henderkes suggested)
  • Removed the $argv parameter: the sidekick name is the command, passed as $_SERVER['argv'][1]
  • set_vars is restricted to sidekick context only (throws if called from HTTP workers)
  • get_vars accepts string|array: when given an array, all sidekicks start in parallel
  • Atomic snapshots: set_vars replaces all vars at once, no partial state
  • Binary-safe values (null bytes, UTF-8)

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 3 times, most recently from cb65f46 to 4dda455 Compare March 17, 2026 10:46
@nicolas-grekas
Copy link
Contributor Author

Thanks @dunglas and @henderkes for the feedback. I share the goal of keeping the API surface minimal.

Thinking about it more, the current API is actually quite small and already general:

  • 1 Caddyfile setting: sidekick_entrypoint (per php_server)
  • 3 PHP functions: get_vars, set_vars, should_stop

The name "sidekick" works as a generic concept: a helper running alongside. The current set_vars/get_vars protocol covers the config-publishing use case. For task offloading (HTTP->worker) later, the same sidekick infrastructure could support:

  • frankenphp_sidekick_send_task(string $name, mixed $payload): mixed
  • frankenphp_sidekick_receive_task(): mixed

Same worker type, same sidekick_entrypoint, same should_stop(). Just a different communication pattern added on top. No new config, no new worker type.

So the path would be:

  1. This PR: sidekicks with set_vars/get_vars (config publishing)
  2. Future PR: add send_task/receive_task (task offloading), reusing the same non-HTTP worker foundation

The foundation (non-HTTP threads, cooperative shutdown, crash recovery, per-php_server scoping) is shared. Only the communication primitives differ.

WDYT?

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 4 times, most recently from b3734f5 to ed79f46 Compare March 17, 2026 11:48
@nicolas-grekas
Copy link
Contributor Author

nicolas-grekas commented Mar 17, 2026

I think the failures are unrelated - a cache reset would be needed. Any help on this topic?

@alexandre-daubois
Copy link
Member

alexandre-daubois commented Mar 17, 2026

Hmm, it seems they are on some versions, for example here: https://github.com/php/frankenphp/actions/runs/23192689128/job/67392820942?pr=2287#step:10:3614

For the cache, I'm not aware of a Github feature that allow to clear everything unfortunately 🙁

@AlliBalliBaba
Copy link
Contributor

Didn't get through everything, might continue at a later point.

@nicolas-grekas
Copy link
Contributor Author

Thanks for the review @AlliBalliBaba. Here is what I did, see second commit:

  • name instead of match for background workers

No special name validation - same rules as HTTP workers.

php_server {
    worker public/index.php { num 4 }
    worker bin/worker.php { background; name config-watcher }
    worker bin/worker.php { background }  # catch-all for dynamic names
}
  • Combined start + wait + lock + copy + unlock into a single Go call

go_frankenphp_start_background_worker, go_frankenphp_worker_wait_and_get, and go_frankenphp_worker_release_vars are replaced by a single go_frankenphp_worker_get_vars that does everything: starts the worker if needed, waits for ready, takes RLock, calls back into C to copy vars, releases RLock, returns. One CGo crossing from C's perspective.

This eliminates lockedVarsStacks, InitLockedVarsStacks, the two-phase lock/release protocol, and the varsVersion cache (dead code after the consolidation - can be re-added later if needed). No more risk of lock leak across the CGo boundary.

  • FRANKENPHP_WORKER_NAME set for all workers

Both HTTP and background workers now get $_SERVER['FRANKENPHP_WORKER_NAME']. The C-side worker_bg_name was split into worker_name (set for all) and is_background_worker (flag for background-specific behavior). Renamed httpEnabled to isBackgroundWorker on the Go side for clarity.

@nicolas-grekas
Copy link
Contributor Author

nicolas-grekas commented Mar 21, 2026

Oh, and num = 0 now works to lazy-start a bg worker (1 is still the default default to 0).

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 5 times, most recently from 4bb625c to f779e05 Compare March 21, 2026 17:13
@henderkes
Copy link
Contributor

Okay, conceptually I think we're getting somewhere. I'm not yet 100% sure about the background directive, but I cannot think of a better way either.

I do have another idea though how we could possibly unify the background workers with task workers a bit more: use the same frankenphp_handle_task() function to mark either kind of worker as ready, but for background workers the handler would be a function that runs until a stop signal of some kind is received. I'm not sure how this would pan out on the code side and whether it would simplify things, so please push back here if you think it's a bad idea.

For the code, review incoming...

@nicolas-grekas
Copy link
Contributor Author

Thanks for the review @henderkes, all addressed (match-related stuff were missing cleanups from previous iterations).

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 6 times, most recently from 813a1ea to 04ef4fe Compare March 23, 2026 08:18
@nicolas-grekas
Copy link
Contributor Author

nicolas-grekas commented Mar 23, 2026

Here is a new iteration with a new API based on signaling threads + an exception (split in last commit):

  • ShutdownException replaces the signaling stream see below

  • max_consecutive_failures aligned to HTTP workers

Now defaults to 6 (same as HTTP workers) instead of -1 (never panic). Users can still set -1 explicitly if they want infinite retries.

  • $_SERVER['FRANKENPHP_WORKER_BACKGROUND']

Set for all workers: true for background workers, false for HTTP workers. Allows PHP code to detect the worker type at runtime.

  • Unified restart path for all workers

Background workers now use the same drainWorkerThreads restart mechanism as HTTP workers instead of shutdown() + lazy re-creation. ShutdownException unblocks them so they reach Yielding state naturally. This removes the separate background worker cleanup block (stoppedBackgroundWorkers, registry removal, workers slice filtering) and the lazy restart from get_vars. Simpler code, same behavior.

  • Fixed get_vars cache leak for non-worker scripts

The per-thread vars cache wasn't reset for regular (non-worker) PHP scripts served by php_server. After the request ended, the cache held dangling pointers to freed request memory. Fixed by calling bg_worker_vars_cache_reset() in frankenphp_execute_script before php_request_shutdown.

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 6 times, most recently from ef92ba2 to bdbaf1c Compare March 23, 2026 12:53
@nicolas-grekas
Copy link
Contributor Author

Following up on the ShutdownException exploration:

I explored replacing the signaling stream with a FrankenPHP\ShutdownException that would interrupt any blocking PHP call via OS signals. After thorough investigation, I found this approach is not viable:

  • Go's signal trampoline installs handlers for all signals with SA_RESTART. pthread_kill(SIGUSR1) is swallowed by Go's runtime: sigfwdgo refuses to forward user-originated signals (SI_USER/SI_TKILL), and SA_RESTART causes the kernel to restart the interrupted syscall silently. Our C handler never runs.
  • timer_create + SIGEV_THREAD_ID (Linux ZTS) bypasses Go's filter (SI_TIMER), but crashes Go's scheduler when the race detector is enabled. The signal delivery corrupts Go's internal state on locked threads.
  • macOS has no per-thread timer API (timer_create/SIGEV_THREAD_ID don't exist). setitimer is per-process and can't target a specific thread.

The conclusion: you can't reliably deliver signals to interrupt C-level blocking calls in a Go+CGo process, cross-platform.

So I kept the signaling stream as the primary graceful shutdown mechanism, and added a grace period with best-effort force-kill:

  • Signaling stream remains the API for cooperative shutdown. Background workers check it via stream_select() and stop promptly.

  • Grace period: on restart/shutdown, all background workers receive "stop\n" on the signaling stream and have 5 seconds to exit. This uses a unified drainWorkerThreads path (same as HTTP workers).

  • Best-effort force-kill after grace period:

    • Linux ZTS: arms PHP's own max_execution_time timer cross-thread via timer_settime(EG(max_execution_timer_timer)). This is safe because after 5 seconds, stuck threads are guaranteed to be in C code (not Go). Triggers "Maximum execution time exceeded" fatal, worker restarts.
    • Windows: CancelSynchronousIo + QueueUserAPC interrupts blocking I/O and alertable waits (SleepEx).
    • macOS: no-op. Stuck threads are abandoned and exit when the blocking call returns.
  • max_execution_time disabled for background workers via zend_set_timeout(0, 0) during script init.

Other changes from previous update still in place: max_consecutive_failures defaults to 6, FRANKENPHP_WORKER_BACKGROUND server var, generational cache, cache leak fix.

Docs updated. All 16 tests pass on PHP 8.5 with -race.

Note: the force-kill mechanism could also benefit HTTP workers stuck in blocking calls during shutdown. Keeping it scoped to background workers for now; let me know.

refactor: address review feedback on background workers

- Use `name` instead of `match` for background worker identification
- Combine start + wait + lock + copy + unlock into single CGo call
  (go_frankenphp_worker_get_vars replaces three separate exports)
- Remove lockedVarsStacks, InitLockedVarsStacks, and varsVersion
- Set FRANKENPHP_WORKER_NAME for all workers (HTTP and background)
- Split worker_bg_name into worker_name + is_background_worker flag
- Rename httpEnabled to isBackgroundWorker on Go side
- Remove name validation regex (same rules as HTTP workers)
- Keep $_SERVER['argv'] for background workers (bin/console compat)

Add generational cache back

Review by henderkes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants