Skip to content

CP/CPP full orbit: source field from the existing libneo neo_spline_field (splined A, B=curl A, direct read, GPU-ready) #423

Description

@krystophny

CP/CPP full orbit: optional direct real-space field spline (no per-step inversion, GPU-ready)

A concrete realization of the #422 lever 3 ("precompute a direct Cartesian field").
Add an ALTERNATIVE field representation for the Cartesian full-orbit pusher
(orbit_cpp_boris) that reads B from a real-space (R, phi, z) spline instead of
inverting Cartesian->logical every step. Keep the chartmap as the exact build-time
source. The current B = curl A-from-chartmap path stays the default and the accuracy
reference; this is opt-in and benchmarked against it.

Why

  • Speed (CP full orbit performance: path to ASCOT5 speed (phase 'make it fast') #422): the full orbit is ~5x slower than ASCOT5 almost entirely because
    every field eval inverts Cartesian->logical (a damped Newton, several spline evals),
    plus the GC reconstruction does 2 more inversions/step. ASCOT reads three direct
    tricubic splines on an (R,phi,z) grid -- no inversion.
  • GPU: confirmed asymmetry in the code. libneo's batch-spline EVALUATE is already
    GPU-callable (!$acc routine seq at ~11 sites in batch_interpolate_3d.f90, a
    resident device kernel, managed-memory bug fixed). The chartmap inversion is
    GPU-HOSTILE: data-dependent Newton trip counts + line search (orbit_cpp_boris.f90
    newton_from maxit=30), select type on the hot path, procedure pointers and
    class-dispatched metric/christoffel -- orbit_cpp_chartmap_metric.f90:16 says
    outright "NOT GPU-portable". So a direct spline read is the thing that makes a GPU
    full-orbit push possible at all.

What to build (reuses most of the substrate)

  1. Setup-time sampler. Loop a regular (R, phi, z) grid over one field period; at each
    node invert once to logical and call the SAME field_at_logical the hot loop uses
    today, storing the Cartesian field. The inversion is paid once per node at setup,
    never per step -- that is the whole win. The sampler is essentially today's
    cart_field run over a grid.
  2. Real-space field type. Clone src/field/field_splined.f90 (which today splines in
    FLUX coords, so it still needs the inversion) onto an (R, phi, z) grid, built with
    construct_batch_splines_3d (libneo), periodic = [.false., .true., .false.] (phi
    periodic over 2pi/nfp, R/z natural -- inherits the seam-clean periodic-phi support
    directly; use cylindrical, NOT Cartesian, so phi is an axis).
  3. Pusher branch. cart_field and the loss test in orbit_cpp_boris get a fast path
    that reads the real-space spline instead of to_wedge -> invert_cart_warm -> field_at_logical. Gate as a new orbit_model or a field-representation flag; the
    chartmap stays the source.

What to store, two options (benchmark both):

  • (A) spline B_cart directly (ASCOT-style): batch of (B_x, B_y, B_z, s) and read
    grad|B| via evaluate_batch_splines_3d_der. Simplest; but an independently splined
    B is NOT divergence-free (ASCOT lives with this), an energy-drift risk over long traces.
  • (B) spline the vector potential A_cart, take B = curl of the spline (via the
    analytic spline derivatives _der). This keeps div B = 0 exactly -- better than
    ASCOT -- while staying a direct, inversion-free, GPU-friendly read. grad|B| then needs
    A's 2nd derivatives (_der2, available). Recommended primary; A is smooth so it
    splines cleanly.

Spline the radial label s(x) too, so the loss test (u_gc>=1 today) becomes a spline
read (GPU-friendly), exactly as ASCOT splines psi.

Orders and the LCFS kink

libneo batch splines already support cubic and quintic (order in [3,5],
batch_interpolate_3d.f90:96); a quintic real-space field is a one-line change. The open
problem is the separatrix kink: a rectangular (R,phi,z) box has corners with rho>1
where the chartmap is undefined and field_at_logical returns the edge-clamped field, so
a grid line crossing the LCFS sees a C1 kink. libneo has ONLY natural/periodic BCs --
no monotone/Steffen/limited/one-sided variant -- and a quintic rings worse at the kink
than a cubic. Mitigations, by effort:

  • gate on the splined s(x) so no used cell touches rho>1 (ASCOT approach, cleanest);
  • keep the box just inside the LCFS;
  • a kink-robust spline variant in libneo (one-sided/limited near the boundary) -- a
    separate libneo task (see the linked libneo issue). The user is fine allowing
    cubic + quintic now and adding a kink-robust variant later.

Risks (the depth)

  • Memory is the first constraint. libneo's batch layout stores a dense (order+1)^3
    coefficient block per cell: a 4-quantity QUINTIC at 200x64x200 is ~190 GB
    (prohibitive); cubic ~36 GB; ASCOT's compact form (one set per node, reconstruct on
    the fly) is far lighter. So: cubic + a modest grid + store only (B_x,B_y,B_z,s) (grad
    via _der) + per-field-period storage; and/or add a compact-storage option to libneo
    (linked issue). Size the grid in the design before coding.
  • div B != 0 for option (A); option (B) (curl of splined A) avoids it -- prefer (B).
  • LCFS-edge accuracy (the kink) lands exactly where losses are decided; benchmark it.
  • phi-periodicity: use (R,phi,z), not Cartesian, to keep periodic-phi.

Benchmark plan ("how bad does it get")

On the reactor W7-X high-mirror case (benchmark-orbit-proxima#2), compare the real-space
spline CP against the exact chartmap-inversion CP and ASCOT5, at 1 ms / 100 ms:

  • speed (target: within ~1.5-2x of ASCOT, vs ~5x today);
  • confined fraction / loss times (target: within statistics of the chartmap-inversion CP);
  • energy/mu conservation over the trace (the div-B / kink test);
  • cubic vs quintic, option (A) vs (B), grid resolution sweep -> the accuracy-vs-cost curve.

Reuse vs new

Reuse: the whole batch-spline engine (cubic+quintic, periodic-phi, value/der/der2, GPU
acc routine seq + resident kernel, managed-memory fix); field_splined.f90 as the
template; field_at_logical as the per-node sampler; the loss-as-level-set pattern.
New: the setup sampler, the real-space field type, the pusher branch, and (optional,
libneo) a kink-robust BC + compact storage.

Effort: medium. No physics change -- the exact curl-A field is sampled. Related:
#422 (perf umbrella), #419/#420/#421; libneo spline-infra issue (linked);
benchmark itpplasma/benchmark-orbit-proxima#2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions