You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CP/CPP full orbit: optional direct real-space field spline (no per-step inversion, GPU-ready)
A concrete realization of the #422 lever 3 ("precompute a direct Cartesian field").
Add an ALTERNATIVE field representation for the Cartesian full-orbit pusher
(orbit_cpp_boris) that reads B from a real-space (R, phi, z) spline instead of
inverting Cartesian->logical every step. Keep the chartmap as the exact build-time
source. The current B = curl A-from-chartmap path stays the default and the accuracy
reference; this is opt-in and benchmarked against it.
Why
Speed (CP full orbit performance: path to ASCOT5 speed (phase 'make it fast') #422): the full orbit is ~5x slower than ASCOT5 almost entirely because
every field eval inverts Cartesian->logical (a damped Newton, several spline evals),
plus the GC reconstruction does 2 more inversions/step. ASCOT reads three direct
tricubic splines on an (R,phi,z) grid -- no inversion.
GPU: confirmed asymmetry in the code. libneo's batch-spline EVALUATE is already
GPU-callable (!$acc routine seq at ~11 sites in batch_interpolate_3d.f90, a
resident device kernel, managed-memory bug fixed). The chartmap inversion is
GPU-HOSTILE: data-dependent Newton trip counts + line search (orbit_cpp_boris.f90
newton_from maxit=30), select type on the hot path, procedure pointers and
class-dispatched metric/christoffel -- orbit_cpp_chartmap_metric.f90:16 says
outright "NOT GPU-portable". So a direct spline read is the thing that makes a GPU
full-orbit push possible at all.
What to build (reuses most of the substrate)
Setup-time sampler. Loop a regular (R, phi, z) grid over one field period; at each
node invert once to logical and call the SAME field_at_logical the hot loop uses
today, storing the Cartesian field. The inversion is paid once per node at setup,
never per step -- that is the whole win. The sampler is essentially today's cart_field run over a grid.
Real-space field type. Clone src/field/field_splined.f90 (which today splines in
FLUX coords, so it still needs the inversion) onto an (R, phi, z) grid, built with construct_batch_splines_3d (libneo), periodic = [.false., .true., .false.] (phi
periodic over 2pi/nfp, R/z natural -- inherits the seam-clean periodic-phi support
directly; use cylindrical, NOT Cartesian, so phi is an axis).
Pusher branch.cart_field and the loss test in orbit_cpp_boris get a fast path
that reads the real-space spline instead of to_wedge -> invert_cart_warm -> field_at_logical. Gate as a new orbit_model or a field-representation flag; the
chartmap stays the source.
What to store, two options (benchmark both):
(A) spline B_cart directly (ASCOT-style): batch of (B_x, B_y, B_z, s) and read
grad|B| via evaluate_batch_splines_3d_der. Simplest; but an independently splined
B is NOT divergence-free (ASCOT lives with this), an energy-drift risk over long traces.
(B) spline the vector potential A_cart, take B = curl of the spline (via the
analytic spline derivatives _der). This keeps div B = 0 exactly -- better than
ASCOT -- while staying a direct, inversion-free, GPU-friendly read. grad|B| then needs
A's 2nd derivatives (_der2, available). Recommended primary; A is smooth so it
splines cleanly.
Spline the radial label s(x) too, so the loss test (u_gc>=1 today) becomes a spline
read (GPU-friendly), exactly as ASCOT splines psi.
Orders and the LCFS kink
libneo batch splines already support cubic and quintic (order in [3,5], batch_interpolate_3d.f90:96); a quintic real-space field is a one-line change. The open
problem is the separatrix kink: a rectangular (R,phi,z) box has corners with rho>1
where the chartmap is undefined and field_at_logical returns the edge-clamped field, so
a grid line crossing the LCFS sees a C1 kink. libneo has ONLY natural/periodic BCs -- no monotone/Steffen/limited/one-sided variant -- and a quintic rings worse at the kink
than a cubic. Mitigations, by effort:
gate on the splined s(x) so no used cell touches rho>1 (ASCOT approach, cleanest);
keep the box just inside the LCFS;
a kink-robust spline variant in libneo (one-sided/limited near the boundary) -- a
separate libneo task (see the linked libneo issue). The user is fine allowing
cubic + quintic now and adding a kink-robust variant later.
Risks (the depth)
Memory is the first constraint. libneo's batch layout stores a dense (order+1)^3
coefficient block per cell: a 4-quantity QUINTIC at 200x64x200 is ~190 GB
(prohibitive); cubic ~36 GB; ASCOT's compact form (one set per node, reconstruct on
the fly) is far lighter. So: cubic + a modest grid + store only (B_x,B_y,B_z,s) (grad
via _der) + per-field-period storage; and/or add a compact-storage option to libneo
(linked issue). Size the grid in the design before coding.
div B != 0 for option (A); option (B) (curl of splined A) avoids it -- prefer (B).
LCFS-edge accuracy (the kink) lands exactly where losses are decided; benchmark it.
phi-periodicity: use (R,phi,z), not Cartesian, to keep periodic-phi.
Benchmark plan ("how bad does it get")
On the reactor W7-X high-mirror case (benchmark-orbit-proxima#2), compare the real-space
spline CP against the exact chartmap-inversion CP and ASCOT5, at 1 ms / 100 ms:
speed (target: within ~1.5-2x of ASCOT, vs ~5x today);
confined fraction / loss times (target: within statistics of the chartmap-inversion CP);
energy/mu conservation over the trace (the div-B / kink test);
cubic vs quintic, option (A) vs (B), grid resolution sweep -> the accuracy-vs-cost curve.
Reuse vs new
Reuse: the whole batch-spline engine (cubic+quintic, periodic-phi, value/der/der2, GPU acc routine seq + resident kernel, managed-memory fix); field_splined.f90 as the
template; field_at_logical as the per-node sampler; the loss-as-level-set pattern.
New: the setup sampler, the real-space field type, the pusher branch, and (optional,
libneo) a kink-robust BC + compact storage.
Effort: medium. No physics change -- the exact curl-A field is sampled. Related: #422 (perf umbrella), #419/#420/#421; libneo spline-infra issue (linked);
benchmark itpplasma/benchmark-orbit-proxima#2.
CP/CPP full orbit: optional direct real-space field spline (no per-step inversion, GPU-ready)
A concrete realization of the #422 lever 3 ("precompute a direct Cartesian field").
Add an ALTERNATIVE field representation for the Cartesian full-orbit pusher
(
orbit_cpp_boris) that reads B from a real-space (R, phi, z) spline instead ofinverting Cartesian->logical every step. Keep the chartmap as the exact build-time
source. The current
B = curl A-from-chartmap path stays the default and the accuracyreference; this is opt-in and benchmarked against it.
Why
every field eval inverts Cartesian->logical (a damped Newton, several spline evals),
plus the GC reconstruction does 2 more inversions/step. ASCOT reads three direct
tricubic splines on an (R,phi,z) grid -- no inversion.
GPU-callable (
!$acc routine seqat ~11 sites inbatch_interpolate_3d.f90, aresident device kernel, managed-memory bug fixed). The chartmap inversion is
GPU-HOSTILE: data-dependent Newton trip counts + line search (
orbit_cpp_boris.f90newton_from
maxit=30),select typeon the hot path, procedure pointers andclass-dispatched metric/christoffel --
orbit_cpp_chartmap_metric.f90:16saysoutright "NOT GPU-portable". So a direct spline read is the thing that makes a GPU
full-orbit push possible at all.
What to build (reuses most of the substrate)
node invert once to logical and call the SAME
field_at_logicalthe hot loop usestoday, storing the Cartesian field. The inversion is paid once per node at setup,
never per step -- that is the whole win. The sampler is essentially today's
cart_fieldrun over a grid.src/field/field_splined.f90(which today splines inFLUX coords, so it still needs the inversion) onto an (R, phi, z) grid, built with
construct_batch_splines_3d(libneo),periodic = [.false., .true., .false.](phiperiodic over 2pi/nfp, R/z natural -- inherits the seam-clean periodic-phi support
directly; use cylindrical, NOT Cartesian, so phi is an axis).
cart_fieldand the loss test inorbit_cpp_borisget a fast paththat reads the real-space spline instead of
to_wedge -> invert_cart_warm -> field_at_logical. Gate as a new orbit_model or a field-representation flag; thechartmap stays the source.
What to store, two options (benchmark both):
grad|B| via
evaluate_batch_splines_3d_der. Simplest; but an independently splinedB is NOT divergence-free (ASCOT lives with this), an energy-drift risk over long traces.
analytic spline derivatives
_der). This keeps div B = 0 exactly -- better thanASCOT -- while staying a direct, inversion-free, GPU-friendly read. grad|B| then needs
A's 2nd derivatives (
_der2, available). Recommended primary; A is smooth so itsplines cleanly.
Spline the radial label s(x) too, so the loss test (
u_gc>=1today) becomes a splineread (GPU-friendly), exactly as ASCOT splines psi.
Orders and the LCFS kink
libneo batch splines already support cubic and quintic (
orderin [3,5],batch_interpolate_3d.f90:96); a quintic real-space field is a one-line change. The openproblem is the separatrix kink: a rectangular (R,phi,z) box has corners with rho>1
where the chartmap is undefined and
field_at_logicalreturns the edge-clamped field, soa grid line crossing the LCFS sees a C1 kink. libneo has ONLY natural/periodic BCs --
no monotone/Steffen/limited/one-sided variant -- and a quintic rings worse at the kink
than a cubic. Mitigations, by effort:
separate libneo task (see the linked libneo issue). The user is fine allowing
cubic + quintic now and adding a kink-robust variant later.
Risks (the depth)
coefficient block per cell: a 4-quantity QUINTIC at 200x64x200 is ~190 GB
(prohibitive); cubic ~36 GB; ASCOT's compact form (one set per node, reconstruct on
the fly) is far lighter. So: cubic + a modest grid + store only (B_x,B_y,B_z,s) (grad
via _der) + per-field-period storage; and/or add a compact-storage option to libneo
(linked issue). Size the grid in the design before coding.
Benchmark plan ("how bad does it get")
On the reactor W7-X high-mirror case (benchmark-orbit-proxima#2), compare the real-space
spline CP against the exact chartmap-inversion CP and ASCOT5, at 1 ms / 100 ms:
Reuse vs new
Reuse: the whole batch-spline engine (cubic+quintic, periodic-phi, value/der/der2, GPU
acc routine seq+ resident kernel, managed-memory fix);field_splined.f90as thetemplate;
field_at_logicalas the per-node sampler; the loss-as-level-set pattern.New: the setup sampler, the real-space field type, the pusher branch, and (optional,
libneo) a kink-robust BC + compact storage.
Effort: medium. No physics change -- the exact curl-A field is sampled. Related:
#422 (perf umbrella), #419/#420/#421; libneo spline-infra issue (linked);
benchmark itpplasma/benchmark-orbit-proxima#2.