Performance redesign: type-stable, zero-allocation SLH core

We need to redesign the internals of QuantumInputOutput.jl. The current implementation works but leaves a lot of performance on the table due to type instabilities, unnecessary allocations, and duplicated code paths. Since systems are small (1-4 ports) and the ODE inner loop evaluates our closures thousands of times, every allocation and type-instability compounds.

## Problem 1: Two parallel SLH types with duplicated logic

We currently have `SLH` (symbolic) and `SLHqo` (numeric/time-dependent). Both represent the same mathematical object — an (S, L, H) triple — and the composition rules (cascade, concatenate, feedback) are mathematically identical. Yet we maintain two complete implementations of `▷`, `⊞`, and `feedback`, each ~100 lines, because the symbolic path needs `SQA._adjoint` and `simplify` while the numeric path needs `adjoint` and handles `Function` types.

This duplication is a maintenance burden and a source of subtle divergence bugs.

**Solution:** A single parametric `SLH{N, ST, LT, HT}` type. The symbolic-vs-numeric difference reduces to four lines of dispatch:

```julia
_adj(x::SQA.QNumber) = SQA._adjoint(x)
_adj(x) = adjoint(x)

_post(x::BasicSymbolic) = simplify(x)
_post(x) = x
```

One implementation of cascade, concatenate, and feedback uses `_adj` and `_post` instead of hardcoding the symbolic or numeric variant. Julia's dispatch handles the rest.

## Problem 2: Heap-allocated scattering matrices

The scattering matrix is currently a `Matrix` (heap-allocated). In concatenation it's even `Matrix{Any}`:

```julia
S_t = Matrix{Any}(undef, lS1+lS2, lS1+lS2)  # SLH.jl line 262
```

`Matrix{Any}` means every element access boxes/unboxes, killing type inference for everything downstream. Even the non-`Any` paths use heap-allocated `Matrix` and `Vector` for objects that are 1x1 to 4x4 in practice.

**Why this matters:** Scattering matrices are accessed during cascade and feedback — operations that happen at model construction time. More importantly, the Lindblad vector is also a plain `Vector`, and its elements get composed into closures that run at every ODE timestep. Heap allocation for a 2-element vector is pure overhead.

**Solution:** Use `StaticArrays`. The port count `N` becomes a type parameter:

```julia
struct SLH{N, ST, LT, HT}
    scattering::SMatrix{N, N, ST}
    lindblad::SVector{N, LT}
    hamiltonian::HT
end
```

All scattering and Lindblad storage lives on the stack. `N` is 1-4 in all real use cases (confirmed by scanning every test and example) — well within StaticArrays' sweet spot. Concatenation produces `SLH{N1+N2}` at compile time. Cascade enforces matching `N` at compile time (no more runtime `@assert`).

Convenience constructors preserve the current API:

```julia
SLH(1, √γ * a, H)                           # scalar → SLH{1}
SLH([-r η; η r], [0, 0], 0)                 # matrix/vector → SLH{2}
```

## Problem 3: Time-dependent closures are type-unstable

The current `SLHqo` cascade creates closures like:

```julia
return t -> begin
    L1_vec = [f(t) for f in L1f]   # allocates a new Vector every call
    L2_vec = [f(t) for f in L2f]   # allocates another Vector every call
    (L2_vec + S2 * L1_vec)[i]      # allocates S2*L1_vec, then sum, then indexes
end
```

These closures run at every ODE timestep. Each call allocates 3+ vectors that immediately become garbage. Worse, because each closure is a unique anonymous type, storing multiple of them in a `Vector{Function}` is type-erased — the compiler can't specialize on the element type.

**Solution:** Two changes:

1. Composition uses `_add`/`_mul` helpers that build composed closures without intermediate vector allocation:

```julia
_add(f::Function, g::Function) = t -> f(t) + g(t)
_add(f::Function, x) = t -> f(t) + x
_add(x, f::Function) = t -> x + f(t)
_add(x, y) = x + y

_mul(s, f::Function) = t -> s * f(t)
_mul(f::Function, g::Function) = t -> f(t) * g(t)
_mul(x, y) = x * y
```

2. `FunctionWrappers.jl` gives all time-dependent elements a single concrete type:

```julia
const TimeDep{T} = FunctionWrapper{T, Tuple{Float64}}
```

When any Lindblad element is a `Function`, all elements are wrapped uniformly as `TimeDep{Operator}`. This makes `SVector{N, TimeDep{Operator}}` fully concrete — the compiler sees one type, not N different anonymous closure types. For fully static systems (symbolic or numeric without time-dependence), no wrapping happens.

## Problem 4: Pulse functions return opaque closures

Every `coupling_input`, `coupling_output`, etc. (currently `u_to_gu`, `v_to_gv`, ...) computes a `LinearInterpolation` then wraps it:

```julia
gu_int = LinearInterpolation(gu_t, T; extrapolation = _extrapolate)
return t -> gu_int(t)  # why?
```

`LinearInterpolation` is already callable. The wrapper creates an opaque closure type that prevents the compiler from specializing downstream code (e.g., `coupling_matrix` receiving a concrete `LinearInterpolation` vs. a generic `Function`).

**Solution:** Return the `LinearInterpolation` directly. All downstream code calls `g(t)` — works unchanged, but now Julia knows the concrete type.

## Problem 5: `coupling_matrix` (currently `interaction_picture_A`) allocates every ODE step

```julia
function A(t)
    g = [gf(t) for gf in gfs]           # allocates
    M = zeros(ComplexF64, n, n)          # allocates
    # ... fill M ...
    return 0.5 * M                       # allocates again
end
```

This function is the RHS of the mode evolution ODE. It's called at every timestep. Three allocations per call, thousands of calls.

**Solution:** Pre-allocate buffers in the closure:

```julia
function coupling_matrix(gs)
    gfs = _as_time_function.(gs)
    n = length(gfs)
    M = zeros(ComplexF64, n, n)
    g = Vector{ComplexF64}(undef, n)
    function A(t)
        for k in 1:n; g[k] = gfs[k](t); end
        fill!(M, zero(ComplexF64))
        for i in 1:n, j in (i+1):n
            val = 0.5 * g[i] * conj(g[j])
            M[i, j] = val
            M[j, i] = -conj(val)
        end
        return M
    end
    return A
end
```

Zero allocations per call.

## Problem 6: `solve_mode_evolution` (currently `interaction_picture_M`) uses out-of-place ODE

```julia
f_M(u, p, t) = A(t) * u   # allocates a new matrix every step
```

**Solution:** In-place form, and return the ODE solution directly (it's already callable as `sol(t)`):

```julia
function solve_mode_evolution(A::Function, T; alg = Tsit5(), kwargs...)
    T0, Tend = T[1], T[end]
    n = size(A(T0), 1)
    M0 = Matrix{ComplexF64}(I, n, n)
    function f_M!(du, u, p, t)
        mul!(du, A(t), u)
    end
    prob = ODEProblem(f_M!, M0, (T0, Tend))
    sol = solve(prob, alg; saveat = T, kwargs...)
    return sol
end
```

## Problem 7: Non-const module globals

```julia
_tol_div = 1e-10
_extrapolate = ExtrapolationType.Extension
_ϵu = 1e-10
_ϵv = 1e-10
```

Non-const globals in Julia are type-unstable. The compiler inserts type checks at every access because the value could change type at any time.

**Solution:** Add `const`.

## Problem 8: `translate` (currently `translate_qo`) redundantly normalizes time parameters

`_normalize_time_parameter(time_parameter)` is called in every `translate` method, including recursive calls from `QAdd` → its arguments. For an expression with 20 terms, it runs 20 times doing the same work.

**Solution:** Normalize once at the public entry point, then pass the result through internal `_translate` methods:

```julia
function translate(op, b::Basis; parameter=Dict(), time_parameter=Dict(), kwargs...)
    tp = _normalize_time_parameter(time_parameter)
    return _translate(op, b, parameter, tp, kwargs...)
end
```

Also type the internal dict (`Dict{KT, Function}` instead of `Dict{Any, Any}`).

## Problem 9: Correlation matrix allocates intermediate vectors

```julia
g1 = [expect(Ls_ls_dag[it+i-1], ρ_bar_τ[i]) for i = 1:length(τ_)]  # new vector per outer iteration
g1_m[it, it:end] = g1
```

**Solution:** Write directly into the output matrix:

```julia
for i in eachindex(ρ_bar)
    val = expect(Ls_dag, ρ_bar[i])
    g1_m[it, it+i-1] = val
    g1_m[it+i-1, it] = conj(val)
end
```

## Problem 10: Cryptic API names

The current function names (`u_to_gu`, `v_to_gv`, `u_eff`, `interaction_picture_A`, etc.) are shorthand from specific papers. They mean nothing without reading Kiilerich & Mølmer. A package API should be self-documenting.

**Solution:** Rename everything to describe what it computes, not what variable letter it uses.

### Full rename table

| Current | New | Why |
|---|---|---|
| **Types** |
| `SLH` | `SLH` | Well-established in quantum optics |
| `SLHqo` | removed | Unified into `SLH` |
| (new) | `Gaussian` | Pulse shape descriptor — enables dispatch instead of `_Gauss` suffix functions |
| **Accessors** |
| `get_scattering(G)` | `scattering(G)` | Julia convention: no `get_` prefix (`length`, `size`, `basis`, not `get_length`) |
| `get_lindblad(G)` | `lindblad(G)` | same |
| `get_hamiltonian(G)` | `hamiltonian(G)` | same |
| **Composition** |
| `▷` / `cascade` | `▷` / `series` | "cascade" is jargon; `series` is universally understood |
| `⊞` / `concatenate` | `⊞` / `parallel` | "concatenate" suggests appending strings; `parallel` matches the physics |
| `feedback` | `feedback` | already clear |
| **Pulse coupling** |
| `u_to_gu(u, T)` | `coupling_input(mode, T)` | describes what you compute, not variable letters |
| `v_to_gv(v, T)` | `coupling_output(mode, T)` | same |
| `u_to_gu_Gauss(τ, σ; δ)` | `coupling_input(g::Gaussian)` | dispatch on pulse type instead of name suffix |
| `v_to_gv_Gauss(τ, σ; δ)` | `coupling_output(g::Gaussian)` | same |
| `uv_to_gin(u, v, T)` | `coupling_delay_in(u, v, T)` | explicit about delay cavity context |
| `uv_to_gout(u, v, T)` | `coupling_delay_out(u, v, T)` | same |
| **Effective modes** |
| `u_eff(modes, T, i)` | `effective_input_mode(modes, T, i)` | says what it returns |
| `v_eff(modes, T, i)` | `effective_output_mode(modes, T, i)` | same |
| **Interaction picture** |
| `interaction_picture_A(gs)` | `coupling_matrix(gs)` | describes what it builds; the interaction picture context is implicit |
| `interaction_picture_M(A, T)` | `solve_mode_evolution(A, T)` | describes what it does, not the textbook symbol |
| `interaction_picture_M_2modes_equal(u, T)` | `solve_mode_evolution_symmetric(u, T)` | explicit about the u=v symmetry assumption |
| **Translation** |
| `translate_qo(op, b)` | `translate(op, b)` | shorter; the `_qo` suffix is redundant since the basis argument already implies QuantumOptics |
| `substitute_operators(op, dict)` | `substitute_operators(op, dict)` | already clear |
| **Correlations** |
| `two_time_corr_matrix(T, ρt, ...)` | `correlation_matrix(T, ρt, ...)` | shorter; "two-time" is implied by returning a matrix over a time grid |

### `Gaussian` pulse type

Instead of separate `_Gauss` functions, introduce a lightweight pulse descriptor:

```julia
struct Gaussian{T}
    τ::T    # center time
    σ::T    # width
    δ::T    # detuning
end
Gaussian(τ, σ; δ=0) = Gaussian(τ, σ, δ)

coupling_input(g::Gaussian)   # analytical formula
coupling_output(g::Gaussian)  # analytical formula
coupling_input(mode, T)       # numerical from samples/function
coupling_output(mode, T)      # numerical from samples/function
```

This is extensible — adding `Exponential` or `SechSquared` pulse shapes later just means adding new types and methods, not new function names.

## File organization

Split the current files by concern:

```
src/
├── QuantumInputOutput.jl    # module, exports, imports
├── types.jl                 # SLH, Gaussian, constructors, _adj, _post, _add, _mul
├── compose.jl               # ▷, ⊞, feedback (one implementation each)
├── translate.jl             # translate, _translate internal dispatch
├── pulses.jl                # coupling_input, coupling_output, coupling_delay_in, coupling_delay_out
├── modes.jl                 # effective_input_mode, effective_output_mode
├── interaction_picture.jl   # coupling_matrix, solve_mode_evolution, solve_mode_evolution_symmetric
├── correlations.jl          # correlation_matrix
└── utils.jl                 # substitute_operators, numeric_average, expect, constants
```

The current `pulses.jl` mixes coupling computation with ODE-based effective modes — these are separate concerns. The current `utils.jl` mixes virtual cavity code with operator substitution.

## New dependencies

- **StaticArrays** — `SMatrix`/`SVector` for scattering and Lindblad storage
- **FunctionWrappers** — `FunctionWrapper` for concrete time-dependent element types

Both lightweight, widely used, zero transitive dependencies.

## Breaking changes

| What changes | Why | Migration |
|---|---|---|
| `SLHqo` removed | Unified into `SLH` | `SLHqo(...)` → `SLH(...)`, same arguments |
| `SLH.scattering` is `SMatrix` | Stack allocation, compile-time dimension safety | `Matrix(G.scattering)` if plain matrix needed |
| `SLH.lindblad` is `SVector` | Stack allocation, concrete element types | Indexing/iteration unchanged |
| All function renames | Self-documenting API | See rename table above |
| `coupling_input` etc. return `LinearInterpolation` | Removes opaque closure wrapper | Already callable as `f(t)`, transparent |
| `solve_mode_evolution` returns ODE solution | Removes redundant wrapper | Already callable as `sol(t)`, transparent |
| `cascade` → `series` | Clearer terminology | Find-and-replace |
| `concatenate` → `parallel` | Clearer terminology | Find-and-replace |
| `get_*` → bare name | Julia convention | Drop `get_` prefix |

## Exports

```julia
export SLH, Gaussian,
    scattering, lindblad, hamiltonian,
    ▷, series, ⊞, parallel, feedback,
    translate,
    coupling_input, coupling_output,
    coupling_delay_in, coupling_delay_out,
    effective_input_mode, effective_output_mode,
    coupling_matrix, solve_mode_evolution, solve_mode_evolution_symmetric,
    correlation_matrix,
    substitute_operators
```


Current	New	Why
Types
`SLH`	`SLH`	Well-established in quantum optics
`SLHqo`	removed	Unified into `SLH`
(new)	`Gaussian`	Pulse shape descriptor — enables dispatch instead of `_Gauss` suffix functions
Accessors
`get_scattering(G)`	`scattering(G)`	Julia convention: no `get_` prefix (`length`, `size`, `basis`, not `get_length`)
`get_lindblad(G)`	`lindblad(G)`	same
`get_hamiltonian(G)`	`hamiltonian(G)`	same
Composition
`▷` / `cascade`	`▷` / `series`	"cascade" is jargon; `series` is universally understood
`⊞` / `concatenate`	`⊞` / `parallel`	"concatenate" suggests appending strings; `parallel` matches the physics
`feedback`	`feedback`	already clear
Pulse coupling
`u_to_gu(u, T)`	`coupling_input(mode, T)`	describes what you compute, not variable letters
`v_to_gv(v, T)`	`coupling_output(mode, T)`	same
`u_to_gu_Gauss(τ, σ; δ)`	`coupling_input(g::Gaussian)`	dispatch on pulse type instead of name suffix
`v_to_gv_Gauss(τ, σ; δ)`	`coupling_output(g::Gaussian)`	same
`uv_to_gin(u, v, T)`	`coupling_delay_in(u, v, T)`	explicit about delay cavity context
`uv_to_gout(u, v, T)`	`coupling_delay_out(u, v, T)`	same
Effective modes
`u_eff(modes, T, i)`	`effective_input_mode(modes, T, i)`	says what it returns
`v_eff(modes, T, i)`	`effective_output_mode(modes, T, i)`	same
Interaction picture
`interaction_picture_A(gs)`	`coupling_matrix(gs)`	describes what it builds; the interaction picture context is implicit
`interaction_picture_M(A, T)`	`solve_mode_evolution(A, T)`	describes what it does, not the textbook symbol
`interaction_picture_M_2modes_equal(u, T)`	`solve_mode_evolution_symmetric(u, T)`	explicit about the u=v symmetry assumption
Translation
`translate_qo(op, b)`	`translate(op, b)`	shorter; the `_qo` suffix is redundant since the basis argument already implies QuantumOptics
`substitute_operators(op, dict)`	`substitute_operators(op, dict)`	already clear
Correlations
`two_time_corr_matrix(T, ρt, ...)`	`correlation_matrix(T, ρt, ...)`	shorter; "two-time" is implied by returning a matrix over a time grid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance redesign: type-stable, zero-allocation SLH core #3

Problem 1: Two parallel SLH types with duplicated logic

Problem 2: Heap-allocated scattering matrices

Problem 3: Time-dependent closures are type-unstable

Problem 4: Pulse functions return opaque closures

Problem 5: `coupling_matrix` (currently `interaction_picture_A`) allocates every ODE step

Problem 6: `solve_mode_evolution` (currently `interaction_picture_M`) uses out-of-place ODE

Problem 7: Non-const module globals

Problem 8: `translate` (currently `translate_qo`) redundantly normalizes time parameters

Problem 9: Correlation matrix allocates intermediate vectors

Problem 10: Cryptic API names

Full rename table

`Gaussian` pulse type

File organization

New dependencies

Breaking changes

Exports

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

What changes	Why	Migration
`SLHqo` removed	Unified into `SLH`	`SLHqo(...)` → `SLH(...)`, same arguments
`SLH.scattering` is `SMatrix`	Stack allocation, compile-time dimension safety	`Matrix(G.scattering)` if plain matrix needed
`SLH.lindblad` is `SVector`	Stack allocation, concrete element types	Indexing/iteration unchanged
All function renames	Self-documenting API	See rename table above
`coupling_input` etc. return `LinearInterpolation`	Removes opaque closure wrapper	Already callable as `f(t)`, transparent
`solve_mode_evolution` returns ODE solution	Removes redundant wrapper	Already callable as `sol(t)`, transparent
`cascade` → `series`	Clearer terminology	Find-and-replace
`concatenate` → `parallel`	Clearer terminology	Find-and-replace
`get_*` → bare name	Julia convention	Drop `get_` prefix

Uh oh!

Performance redesign: type-stable, zero-allocation SLH core #3

Description

Problem 1: Two parallel SLH types with duplicated logic

Problem 2: Heap-allocated scattering matrices

Problem 3: Time-dependent closures are type-unstable

Problem 4: Pulse functions return opaque closures

Problem 5: coupling_matrix (currently interaction_picture_A) allocates every ODE step

Problem 6: solve_mode_evolution (currently interaction_picture_M) uses out-of-place ODE

Problem 7: Non-const module globals

Problem 8: translate (currently translate_qo) redundantly normalizes time parameters

Problem 9: Correlation matrix allocates intermediate vectors

Problem 10: Cryptic API names

Full rename table

Gaussian pulse type

File organization

New dependencies

Breaking changes

Exports

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Problem 5: `coupling_matrix` (currently `interaction_picture_A`) allocates every ODE step

Problem 6: `solve_mode_evolution` (currently `interaction_picture_M`) uses out-of-place ODE

Problem 8: `translate` (currently `translate_qo`) redundantly normalizes time parameters

`Gaussian` pulse type