Skip to content

Performance redesign: type-stable, zero-allocation SLH core #3

Description

@oameye

We need to redesign the internals of QuantumInputOutput.jl. The current implementation works but leaves a lot of performance on the table due to type instabilities, unnecessary allocations, and duplicated code paths. Since systems are small (1-4 ports) and the ODE inner loop evaluates our closures thousands of times, every allocation and type-instability compounds.

Problem 1: Two parallel SLH types with duplicated logic

We currently have SLH (symbolic) and SLHqo (numeric/time-dependent). Both represent the same mathematical object — an (S, L, H) triple — and the composition rules (cascade, concatenate, feedback) are mathematically identical. Yet we maintain two complete implementations of , , and feedback, each ~100 lines, because the symbolic path needs SQA._adjoint and simplify while the numeric path needs adjoint and handles Function types.

This duplication is a maintenance burden and a source of subtle divergence bugs.

Solution: A single parametric SLH{N, ST, LT, HT} type. The symbolic-vs-numeric difference reduces to four lines of dispatch:

_adj(x::SQA.QNumber) = SQA._adjoint(x)
_adj(x) = adjoint(x)

_post(x::BasicSymbolic) = simplify(x)
_post(x) = x

One implementation of cascade, concatenate, and feedback uses _adj and _post instead of hardcoding the symbolic or numeric variant. Julia's dispatch handles the rest.

Problem 2: Heap-allocated scattering matrices

The scattering matrix is currently a Matrix (heap-allocated). In concatenation it's even Matrix{Any}:

S_t = Matrix{Any}(undef, lS1+lS2, lS1+lS2)  # SLH.jl line 262

Matrix{Any} means every element access boxes/unboxes, killing type inference for everything downstream. Even the non-Any paths use heap-allocated Matrix and Vector for objects that are 1x1 to 4x4 in practice.

Why this matters: Scattering matrices are accessed during cascade and feedback — operations that happen at model construction time. More importantly, the Lindblad vector is also a plain Vector, and its elements get composed into closures that run at every ODE timestep. Heap allocation for a 2-element vector is pure overhead.

Solution: Use StaticArrays. The port count N becomes a type parameter:

struct SLH{N, ST, LT, HT}
    scattering::SMatrix{N, N, ST}
    lindblad::SVector{N, LT}
    hamiltonian::HT
end

All scattering and Lindblad storage lives on the stack. N is 1-4 in all real use cases (confirmed by scanning every test and example) — well within StaticArrays' sweet spot. Concatenation produces SLH{N1+N2} at compile time. Cascade enforces matching N at compile time (no more runtime @assert).

Convenience constructors preserve the current API:

SLH(1, γ * a, H)                           # scalar → SLH{1}
SLH([-r η; η r], [0, 0], 0)                 # matrix/vector → SLH{2}

Problem 3: Time-dependent closures are type-unstable

The current SLHqo cascade creates closures like:

return t -> begin
    L1_vec = [f(t) for f in L1f]   # allocates a new Vector every call
    L2_vec = [f(t) for f in L2f]   # allocates another Vector every call
    (L2_vec + S2 * L1_vec)[i]      # allocates S2*L1_vec, then sum, then indexes
end

These closures run at every ODE timestep. Each call allocates 3+ vectors that immediately become garbage. Worse, because each closure is a unique anonymous type, storing multiple of them in a Vector{Function} is type-erased — the compiler can't specialize on the element type.

Solution: Two changes:

  1. Composition uses _add/_mul helpers that build composed closures without intermediate vector allocation:
_add(f::Function, g::Function) = t -> f(t) + g(t)
_add(f::Function, x) = t -> f(t) + x
_add(x, f::Function) = t -> x + f(t)
_add(x, y) = x + y

_mul(s, f::Function) = t -> s * f(t)
_mul(f::Function, g::Function) = t -> f(t) * g(t)
_mul(x, y) = x * y
  1. FunctionWrappers.jl gives all time-dependent elements a single concrete type:
const TimeDep{T} = FunctionWrapper{T, Tuple{Float64}}

When any Lindblad element is a Function, all elements are wrapped uniformly as TimeDep{Operator}. This makes SVector{N, TimeDep{Operator}} fully concrete — the compiler sees one type, not N different anonymous closure types. For fully static systems (symbolic or numeric without time-dependence), no wrapping happens.

Problem 4: Pulse functions return opaque closures

Every coupling_input, coupling_output, etc. (currently u_to_gu, v_to_gv, ...) computes a LinearInterpolation then wraps it:

gu_int = LinearInterpolation(gu_t, T; extrapolation = _extrapolate)
return t -> gu_int(t)  # why?

LinearInterpolation is already callable. The wrapper creates an opaque closure type that prevents the compiler from specializing downstream code (e.g., coupling_matrix receiving a concrete LinearInterpolation vs. a generic Function).

Solution: Return the LinearInterpolation directly. All downstream code calls g(t) — works unchanged, but now Julia knows the concrete type.

Problem 5: coupling_matrix (currently interaction_picture_A) allocates every ODE step

function A(t)
    g = [gf(t) for gf in gfs]           # allocates
    M = zeros(ComplexF64, n, n)          # allocates
    # ... fill M ...
    return 0.5 * M                       # allocates again
end

This function is the RHS of the mode evolution ODE. It's called at every timestep. Three allocations per call, thousands of calls.

Solution: Pre-allocate buffers in the closure:

function coupling_matrix(gs)
    gfs = _as_time_function.(gs)
    n = length(gfs)
    M = zeros(ComplexF64, n, n)
    g = Vector{ComplexF64}(undef, n)
    function A(t)
        for k in 1:n; g[k] = gfs[k](t); end
        fill!(M, zero(ComplexF64))
        for i in 1:n, j in (i+1):n
            val = 0.5 * g[i] * conj(g[j])
            M[i, j] = val
            M[j, i] = -conj(val)
        end
        return M
    end
    return A
end

Zero allocations per call.

Problem 6: solve_mode_evolution (currently interaction_picture_M) uses out-of-place ODE

f_M(u, p, t) = A(t) * u   # allocates a new matrix every step

Solution: In-place form, and return the ODE solution directly (it's already callable as sol(t)):

function solve_mode_evolution(A::Function, T; alg = Tsit5(), kwargs...)
    T0, Tend = T[1], T[end]
    n = size(A(T0), 1)
    M0 = Matrix{ComplexF64}(I, n, n)
    function f_M!(du, u, p, t)
        mul!(du, A(t), u)
    end
    prob = ODEProblem(f_M!, M0, (T0, Tend))
    sol = solve(prob, alg; saveat = T, kwargs...)
    return sol
end

Problem 7: Non-const module globals

_tol_div = 1e-10
_extrapolate = ExtrapolationType.Extension
_ϵu = 1e-10
_ϵv = 1e-10

Non-const globals in Julia are type-unstable. The compiler inserts type checks at every access because the value could change type at any time.

Solution: Add const.

Problem 8: translate (currently translate_qo) redundantly normalizes time parameters

_normalize_time_parameter(time_parameter) is called in every translate method, including recursive calls from QAdd → its arguments. For an expression with 20 terms, it runs 20 times doing the same work.

Solution: Normalize once at the public entry point, then pass the result through internal _translate methods:

function translate(op, b::Basis; parameter=Dict(), time_parameter=Dict(), kwargs...)
    tp = _normalize_time_parameter(time_parameter)
    return _translate(op, b, parameter, tp, kwargs...)
end

Also type the internal dict (Dict{KT, Function} instead of Dict{Any, Any}).

Problem 9: Correlation matrix allocates intermediate vectors

g1 = [expect(Ls_ls_dag[it+i-1], ρ_bar_τ[i]) for i = 1:length(τ_)]  # new vector per outer iteration
g1_m[it, it:end] = g1

Solution: Write directly into the output matrix:

for i in eachindex(ρ_bar)
    val = expect(Ls_dag, ρ_bar[i])
    g1_m[it, it+i-1] = val
    g1_m[it+i-1, it] = conj(val)
end

Problem 10: Cryptic API names

The current function names (u_to_gu, v_to_gv, u_eff, interaction_picture_A, etc.) are shorthand from specific papers. They mean nothing without reading Kiilerich & Mølmer. A package API should be self-documenting.

Solution: Rename everything to describe what it computes, not what variable letter it uses.

Full rename table

Current New Why
Types
SLH SLH Well-established in quantum optics
SLHqo removed Unified into SLH
(new) Gaussian Pulse shape descriptor — enables dispatch instead of _Gauss suffix functions
Accessors
get_scattering(G) scattering(G) Julia convention: no get_ prefix (length, size, basis, not get_length)
get_lindblad(G) lindblad(G) same
get_hamiltonian(G) hamiltonian(G) same
Composition
/ cascade / series "cascade" is jargon; series is universally understood
/ concatenate / parallel "concatenate" suggests appending strings; parallel matches the physics
feedback feedback already clear
Pulse coupling
u_to_gu(u, T) coupling_input(mode, T) describes what you compute, not variable letters
v_to_gv(v, T) coupling_output(mode, T) same
u_to_gu_Gauss(τ, σ; δ) coupling_input(g::Gaussian) dispatch on pulse type instead of name suffix
v_to_gv_Gauss(τ, σ; δ) coupling_output(g::Gaussian) same
uv_to_gin(u, v, T) coupling_delay_in(u, v, T) explicit about delay cavity context
uv_to_gout(u, v, T) coupling_delay_out(u, v, T) same
Effective modes
u_eff(modes, T, i) effective_input_mode(modes, T, i) says what it returns
v_eff(modes, T, i) effective_output_mode(modes, T, i) same
Interaction picture
interaction_picture_A(gs) coupling_matrix(gs) describes what it builds; the interaction picture context is implicit
interaction_picture_M(A, T) solve_mode_evolution(A, T) describes what it does, not the textbook symbol
interaction_picture_M_2modes_equal(u, T) solve_mode_evolution_symmetric(u, T) explicit about the u=v symmetry assumption
Translation
translate_qo(op, b) translate(op, b) shorter; the _qo suffix is redundant since the basis argument already implies QuantumOptics
substitute_operators(op, dict) substitute_operators(op, dict) already clear
Correlations
two_time_corr_matrix(T, ρt, ...) correlation_matrix(T, ρt, ...) shorter; "two-time" is implied by returning a matrix over a time grid

Gaussian pulse type

Instead of separate _Gauss functions, introduce a lightweight pulse descriptor:

struct Gaussian{T}
    τ::T    # center time
    σ::T    # width
    δ::T    # detuning
end
Gaussian(τ, σ; δ=0) = Gaussian(τ, σ, δ)

coupling_input(g::Gaussian)   # analytical formula
coupling_output(g::Gaussian)  # analytical formula
coupling_input(mode, T)       # numerical from samples/function
coupling_output(mode, T)      # numerical from samples/function

This is extensible — adding Exponential or SechSquared pulse shapes later just means adding new types and methods, not new function names.

File organization

Split the current files by concern:

src/
├── QuantumInputOutput.jl    # module, exports, imports
├── types.jl                 # SLH, Gaussian, constructors, _adj, _post, _add, _mul
├── compose.jl               # ▷, ⊞, feedback (one implementation each)
├── translate.jl             # translate, _translate internal dispatch
├── pulses.jl                # coupling_input, coupling_output, coupling_delay_in, coupling_delay_out
├── modes.jl                 # effective_input_mode, effective_output_mode
├── interaction_picture.jl   # coupling_matrix, solve_mode_evolution, solve_mode_evolution_symmetric
├── correlations.jl          # correlation_matrix
└── utils.jl                 # substitute_operators, numeric_average, expect, constants

The current pulses.jl mixes coupling computation with ODE-based effective modes — these are separate concerns. The current utils.jl mixes virtual cavity code with operator substitution.

New dependencies

  • StaticArraysSMatrix/SVector for scattering and Lindblad storage
  • FunctionWrappersFunctionWrapper for concrete time-dependent element types

Both lightweight, widely used, zero transitive dependencies.

Breaking changes

What changes Why Migration
SLHqo removed Unified into SLH SLHqo(...)SLH(...), same arguments
SLH.scattering is SMatrix Stack allocation, compile-time dimension safety Matrix(G.scattering) if plain matrix needed
SLH.lindblad is SVector Stack allocation, concrete element types Indexing/iteration unchanged
All function renames Self-documenting API See rename table above
coupling_input etc. return LinearInterpolation Removes opaque closure wrapper Already callable as f(t), transparent
solve_mode_evolution returns ODE solution Removes redundant wrapper Already callable as sol(t), transparent
cascadeseries Clearer terminology Find-and-replace
concatenateparallel Clearer terminology Find-and-replace
get_* → bare name Julia convention Drop get_ prefix

Exports

export SLH, Gaussian,
    scattering, lindblad, hamiltonian,
    , series, , parallel, feedback,
    translate,
    coupling_input, coupling_output,
    coupling_delay_in, coupling_delay_out,
    effective_input_mode, effective_output_mode,
    coupling_matrix, solve_mode_evolution, solve_mode_evolution_symmetric,
    correlation_matrix,
    substitute_operators

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions