Skip to content

Feature [client]: add supervised connection pool with legacy mode opt…#523

Open
VladoPlavsic wants to merge 3 commits intoelixir-grpc:masterfrom
VladoPlavsic:master
Open

Feature [client]: add supervised connection pool with legacy mode opt…#523
VladoPlavsic wants to merge 3 commits intoelixir-grpc:masterfrom
VladoPlavsic:master

Conversation

@VladoPlavsic
Copy link
Copy Markdown

@VladoPlavsic VladoPlavsic commented Apr 17, 2026

Introduces a built-in HTTP/2 connection pool for every GRPC.Stub.connect/2
call. Connections are checked out before each RPC and returned afterwards,
with stream counts tracked per connection. Supports configurable pool size,
overflow capacity, per-connection stream limits, and optional health-check
pings — all via a :pool option on connect/2.

Adds legacy mode (config :grpc, pool_enabled: false) to restore the
pre-pool single-connection behaviour for gradual migration. Adds nil
max_overflow support for unbounded overflow connections. Includes full
unit and integration test coverage for pool, overflow, and legacy paths.

Closes: #522

Full patch notes

TL;DR - For legacy mode see Legacy Mode section at the bottom

Connection Pool — Patch Notes

Summary

This PR introduces a built-in connection pool for the Elixir gRPC client. Every call to
GRPC.Stub.connect/2 (or GRPC.Client.Connection.connect/2) now starts a supervised pool of
long-lived HTTP/2 connections instead of opening a single raw connection. Callers share those
connections transparently: the client checks out a connection before each RPC and returns it
afterwards, keeping stream counts accurate. Pool size, overflow capacity, and per-connection
stream limits are all configurable via a single :pool option.


What Changed

New modules

Module Role
GRPC.Client.Pool Public API: start_for_address/4, stop_for_address/1, checkout/1, checkin/2
GRPC.Client.Pool.Config Struct that carries pool configuration (size, overflow, max streams)
GRPC.Client.Pool.Supervisor OTP supervisor that owns one pool; restarted transiently
GRPC.Client.Pool.Server GenServer that tracks channels, open-stream counts, and leases
GRPC.Client.Pool.Implementation Pure functional core — all state transitions, no side effects
GRPC.Client.Pool.HealthCheck.DynamicSupervisor Manages health-check workers for pool connections
GRPC.Client.Pool.HealthCheck.Server Sends periodic gRPC health-check pings; exits cleanly when its connection drops

Modified modules

GRPC.Channel — added pool: reference() | nil field. The field is nil on raw channels and
holds a pool reference on virtual channels returned by connect/2.

GRPC.Client.Application — registers GRPC.Client.Pool.Registry at startup so pool
supervisors and servers can be found by pool_ref without additional setup.

GRPC.Client.Connection — replaced direct adapter.connect calls with
GRPC.Client.Pool.start_for_address/4. The do_disconnect private function now calls
GRPC.Client.Pool.stop_for_address/1 instead of adapter.disconnect/1. Added :pool to the
Keyword.validate! options list with sensible defaults.

GRPC.Stub — the call/5 implementation now goes through acquire_channel/2 (pool
checkout) and release_channel/2 (pool checkin) bracketing every RPC, replacing the previous
ad-hoc pick_channel + liveness check.


Benefits

  • Reduced connection overhead — HTTP/2 connections are long-lived and multiplexed; no
    per-call handshake cost.
  • Controlled concurrencymax_streams caps how many concurrent requests share a single
    connection. When a connection is saturated, the pool opens an overflow connection instead of
    stacking unlimited streams.
  • Automatic recovery — when a connection process exits unexpectedly, the pool removes it,
    clears its leases, and lets Gun re-establish the session in the background. Callers do not need
    to handle reconnection logic.
  • Back-pressure — when the pool (including overflow) is fully exhausted,
    GRPC.Status.resource_exhausted() is returned immediately rather than stacking requests
    indefinitely.
  • Observability — lease tracking (streams per channel, leases by PID and monitor ref) makes
    pool state inspectable at any time via :sys.get_state/1.

Breaking Changes

1. GRPC.Stub.connect/2 returns a virtual channel, not a raw connection

Previously the returned %GRPC.Channel{} had adapter_payload populated with the underlying
connection PID (e.g. %{conn_pid: pid}). That field is now nil on the virtual channel —
internal connection details belong to the pool, not the caller.

Before:

{:ok, %GRPC.Channel{adapter_payload: %{conn_pid: pid}}} = GRPC.Stub.connect("localhost:50051")

After:

{:ok, %GRPC.Channel{pool: pool_ref}} = GRPC.Stub.connect("localhost:50051")
# pool_ref is a reference(); adapter_payload is nil on the virtual channel

2. GRPC.Stub.disconnect/2 return value changed

Previously the disconnected channel carried adapter_payload: %{conn_pid: nil}. Now it carries
pool: nil.

Before:

{:ok, %GRPC.Channel{adapter_payload: %{conn_pid: nil}}} = GRPC.Stub.disconnect(channel)

After:

{:ok, %GRPC.Channel{pool: nil}} = GRPC.Stub.disconnect(channel)

3. Connection-process crash messages are no longer forwarded to the caller

Previously, if the underlying Gun connection process crashed, an {:EXIT, pid, reason} message
could propagate to the process that opened the connection (when it was trap-exiting). The pool
now owns all connections and handles those exits internally. Callers will no longer receive
connection-crash messages.


New API: :pool option

GRPC.Stub.connect/2 and GRPC.Client.Connection.connect/2 accept a new :pool keyword
option:

GRPC.Stub.connect("localhost:50051",
  pool: %{
    size: 2,           # number of persistent connections (default: 1)
    max_overflow: 5,   # max extra connections when pool is saturated (default: 0)
    max_streams: 100   # max concurrent streams per connection (default: nil = unlimited)
  }
)

All three keys are optional; omitting :pool entirely uses %{size: 1, max_overflow: 0, max_streams: nil}.

Setting max_overflow: nil removes the overflow cap entirely — the pool opens new connections on
demand whenever all existing connections are saturated, with no upper bound beyond what the server
and OS allow.


Migration Guide

  1. Remove adapter_payload pattern matches on connect results. If your code inspects
    channel.adapter_payload.conn_pid after connecting, remove that assertion. The pool now
    manages connection PIDs internally.

  2. Update disconnect pattern matches. Replace checks for adapter_payload: %{conn_pid: nil}
    with pool: nil.

  3. Remove manual connection-crash handling. If your process was trapping exits and handling
    {:EXIT, conn_pid, _} to detect drops, that logic is no longer needed. The pool handles
    reconnection automatically.

  4. Tune pool size for your workload (optional). The default of one connection is conservative.
    For services with moderate to high RPC concurrency, consider increasing :size or setting
    :max_overflow to absorb traffic bursts without opening unbounded connections.

Legacy Mode (opt-out)

If you need to temporarily disable the pool and restore the pre-pool behaviour, set the following
in your config:

config :grpc, pool_enabled: false

With the pool disabled:

  • GRPC.Stub.connect/2 calls adapter.connect directly and returns a channel with
    adapter_payload: %{conn_pid: pid} and pool: nil — exactly as before.
  • RPC calls use the old pick_channel + Process.alive? liveness check instead of pool
    checkout/checkin.
  • GRPC.Stub.disconnect/1 calls adapter.disconnect directly and returns a channel with
    adapter_payload: %{conn_pid: nil}.
  • No pool supervisor, server, or registry entries are created.

The default is pool_enabled: true. This option is intended as a temporary escape hatch while
migrating — not as a permanent configuration.


…-out

Introduces a built-in HTTP/2 connection pool for every GRPC.Stub.connect/2
call. Connections are checked out before each RPC and returned afterwards,
with stream counts tracked per connection. Supports configurable pool size,
overflow capacity, per-connection stream limits, and optional health-check
pings — all via a :pool option on connect/2.

Adds legacy mode (config :grpc, pool_enabled: false) to restore the
pre-pool single-connection behaviour for gradual migration. Adds nil
max_overflow support for unbounded overflow connections. Includes full
unit and integration test coverage for pool, overflow, and legacy paths.
@sleipnir
Copy link
Copy Markdown
Collaborator

Hello @VladoPlavsic, thank you for the PR.

We have some open PRs for the gRPC client and we need to review them before taking a look here.

Comment thread grpc/lib/grpc/client/connection.ex Outdated

@spec start_for_address(Channel.t(), term(), non_neg_integer(), keyword()) ::
{:ok, Channel.t()} | {:error, any()}
def start_for_address(%Channel{} = vc, host, port, norm_opts) do
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously RPC.Client.Connection was doing this:

defp connect_real_channel(%Channel{scheme: "unix"} = vc, path, port, opts, adapter) do
  %Channel{vc | host: path, port: port}
  |> adapter.connect(opts[:adapter_opts])
end

defp connect_real_channel(%Channel{} = vc, host, port, opts, adapter) do
  %Channel{vc | host: host, port: port}
  |> adapter.connect(opts[:adapter_opts])
end

Should GRPC.Client.Pool.start_for_address/4 also take care of the path vs host thing?

I worry doing this would break for anyone who tries to do something like:

GRPC.Stub.connect("unix:///tmp/grpc.sock")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defp connect_real_channel(%Channel{scheme: "unix"} = vc, path, port, opts, adapter) do
  %Channel{vc | host: path, port: port}
  |> adapter.connect(opts[:adapter_opts])
end

defp connect_real_channel(%Channel{} = vc, host, port, opts, adapter) do
  %Channel{vc | host: host, port: port}
  |> adapter.connect(opts[:adapter_opts])
end

as far as I can tell these two functions are identical - except for second parameter name. That's why I decided to remove it (the resolution happens before this function call)

Comment on lines +163 to +167
old_leases = Map.get(leases, channel.id, [])
old_lease = Enum.find(old_leases, fn %State.Lease{caller_pid: pid} -> pid == caller_pid end)

new_leases =
Enum.reject(old_leases, fn %State.Lease{caller_pid: pid} -> pid == caller_pid end)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be more than one matching "old" lease? If so, it returns just one and then removes all of them, instead of just the one it finds. Would that be an issue? 🤔

Suggested change
old_leases = Map.get(leases, channel.id, [])
old_lease = Enum.find(old_leases, fn %State.Lease{caller_pid: pid} -> pid == caller_pid end)
new_leases =
Enum.reject(old_leases, fn %State.Lease{caller_pid: pid} -> pid == caller_pid end)
old_leases = Map.get(leases, channel.id, [])
old_lease = Enum.find(old_leases, fn %State.Lease{caller_pid: pid} -> pid == caller_pid end)
new_leases = List.delete(old_leases, old_lease)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gRPC call is blocking, and we lease to a PID that requires a channel, so even if you were to do from a single process something like

Task.async(fn -> execute_your_grpc end)

we would lease a channel to the task process, so, to answer your question, no, there shouldn't (can't?) be a case where we lease a channel to same PID twice.

Co-authored-by: Noah Betzen <noah@nezteb.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Continuation of #345] Supervised gRPC Connection Pool — actual implementation

3 participants