CP/CPP full orbit: source field from the existing libneo neo_spline_field (splined A, B=curl A, direct read, GPU-ready)

# CP/CPP full orbit: optional direct real-space field spline (no per-step inversion, GPU-ready)

A concrete realization of the #422 lever 3 ("precompute a direct Cartesian field").
Add an ALTERNATIVE field representation for the Cartesian full-orbit pusher
(`orbit_cpp_boris`) that reads B from a **real-space (R, phi, z) spline** instead of
inverting Cartesian->logical every step. Keep the chartmap as the exact build-time
source. The current `B = curl A`-from-chartmap path stays the default and the accuracy
reference; this is opt-in and benchmarked against it.

## Why

- **Speed (#422):** the full orbit is ~5x slower than ASCOT5 almost entirely because
  every field eval inverts Cartesian->logical (a damped Newton, several spline evals),
  plus the GC reconstruction does 2 more inversions/step. ASCOT reads three direct
  tricubic splines on an (R,phi,z) grid -- no inversion.
- **GPU:** confirmed asymmetry in the code. libneo's batch-spline EVALUATE is already
  GPU-callable (`!$acc routine seq` at ~11 sites in `batch_interpolate_3d.f90`, a
  resident device kernel, managed-memory bug fixed). The chartmap inversion is
  GPU-HOSTILE: data-dependent Newton trip counts + line search (`orbit_cpp_boris.f90`
  newton_from `maxit=30`), `select type` on the hot path, procedure pointers and
  class-dispatched metric/christoffel -- `orbit_cpp_chartmap_metric.f90:16` says
  outright "NOT GPU-portable". So a direct spline read is the thing that makes a GPU
  full-orbit push possible at all.

## What to build (reuses most of the substrate)

1. **Setup-time sampler.** Loop a regular (R, phi, z) grid over one field period; at each
   node invert once to logical and call the SAME `field_at_logical` the hot loop uses
   today, storing the Cartesian field. The inversion is paid once per node at setup,
   never per step -- that is the whole win. The sampler is essentially today's
   `cart_field` run over a grid.
2. **Real-space field type.** Clone `src/field/field_splined.f90` (which today splines in
   FLUX coords, so it still needs the inversion) onto an (R, phi, z) grid, built with
   `construct_batch_splines_3d` (libneo), `periodic = [.false., .true., .false.]` (phi
   periodic over 2pi/nfp, R/z natural -- inherits the seam-clean periodic-phi support
   directly; use cylindrical, NOT Cartesian, so phi is an axis).
3. **Pusher branch.** `cart_field` and the loss test in `orbit_cpp_boris` get a fast path
   that reads the real-space spline instead of `to_wedge -> invert_cart_warm ->
   field_at_logical`. Gate as a new orbit_model or a field-representation flag; the
   chartmap stays the source.

What to store, two options (benchmark both):
- **(A) spline B_cart directly** (ASCOT-style): batch of (B_x, B_y, B_z, s) and read
  grad|B| via `evaluate_batch_splines_3d_der`. Simplest; but an independently splined
  B is NOT divergence-free (ASCOT lives with this), an energy-drift risk over long traces.
- **(B) spline the vector potential A_cart, take B = curl of the spline** (via the
  analytic spline derivatives `_der`). This keeps **div B = 0 exactly** -- better than
  ASCOT -- while staying a direct, inversion-free, GPU-friendly read. grad|B| then needs
  A's 2nd derivatives (`_der2`, available). Recommended primary; A is smooth so it
  splines cleanly.

Spline the radial label **s(x)** too, so the loss test (`u_gc>=1` today) becomes a spline
read (GPU-friendly), exactly as ASCOT splines psi.

## Orders and the LCFS kink

libneo batch splines already support **cubic and quintic** (`order` in [3,5],
`batch_interpolate_3d.f90:96`); a quintic real-space field is a one-line change. The open
problem is the **separatrix kink**: a rectangular (R,phi,z) box has corners with rho>1
where the chartmap is undefined and `field_at_logical` returns the edge-clamped field, so
a grid line crossing the LCFS sees a C1 kink. libneo has ONLY natural/periodic BCs --
**no monotone/Steffen/limited/one-sided variant** -- and a quintic rings worse at the kink
than a cubic. Mitigations, by effort:
- gate on the splined s(x) so no used cell touches rho>1 (ASCOT approach, cleanest);
- keep the box just inside the LCFS;
- a **kink-robust spline variant** in libneo (one-sided/limited near the boundary) -- a
  separate libneo task (see the linked libneo issue). The user is fine allowing
  cubic + quintic now and adding a kink-robust variant later.

## Risks (the depth)

- **Memory is the first constraint.** libneo's batch layout stores a dense (order+1)^3
  coefficient block per cell: a 4-quantity QUINTIC at 200x64x200 is ~190 GB
  (prohibitive); cubic ~36 GB; ASCOT's compact form (one set per node, reconstruct on
  the fly) is far lighter. So: cubic + a modest grid + store only (B_x,B_y,B_z,s) (grad
  via _der) + per-field-period storage; and/or add a compact-storage option to libneo
  (linked issue). Size the grid in the design before coding.
- **div B != 0** for option (A); option (B) (curl of splined A) avoids it -- prefer (B).
- **LCFS-edge accuracy** (the kink) lands exactly where losses are decided; benchmark it.
- **phi-periodicity:** use (R,phi,z), not Cartesian, to keep periodic-phi.

## Benchmark plan ("how bad does it get")

On the reactor W7-X high-mirror case (benchmark-orbit-proxima#2), compare the real-space
spline CP against the exact chartmap-inversion CP and ASCOT5, at 1 ms / 100 ms:
- speed (target: within ~1.5-2x of ASCOT, vs ~5x today);
- confined fraction / loss times (target: within statistics of the chartmap-inversion CP);
- energy/mu conservation over the trace (the div-B / kink test);
- cubic vs quintic, option (A) vs (B), grid resolution sweep -> the accuracy-vs-cost curve.

## Reuse vs new

Reuse: the whole batch-spline engine (cubic+quintic, periodic-phi, value/der/der2, GPU
`acc routine seq` + resident kernel, managed-memory fix); `field_splined.f90` as the
template; `field_at_logical` as the per-node sampler; the loss-as-level-set pattern.
New: the setup sampler, the real-space field type, the pusher branch, and (optional,
libneo) a kink-robust BC + compact storage.

Effort: medium. No physics change -- the exact curl-A field is sampled. Related:
itpplasma/SIMPLE#422 (perf umbrella), #419/#420/#421; libneo spline-infra issue (linked);
benchmark itpplasma/benchmark-orbit-proxima#2.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CP/CPP full orbit: source field from the existing libneo neo_spline_field (splined A, B=curl A, direct read, GPU-ready) #423

CP/CPP full orbit: optional direct real-space field spline (no per-step inversion, GPU-ready)

Why

What to build (reuses most of the substrate)

Orders and the LCFS kink

Risks (the depth)

Benchmark plan ("how bad does it get")

Reuse vs new

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

CP/CPP full orbit: source field from the existing libneo neo_spline_field (splined A, B=curl A, direct read, GPU-ready) #423

Description

CP/CPP full orbit: optional direct real-space field spline (no per-step inversion, GPU-ready)

Why

What to build (reuses most of the substrate)

Orders and the LCFS kink

Risks (the depth)

Benchmark plan ("how bad does it get")

Reuse vs new

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions