Skip to content

P3 quadpack: gsl_integration_routines_mod -> fortnum, drop unused gsl_roots#97

Open
krystophny wants to merge 15 commits into
mainfrom
reorder/p3-quadpack
Open

P3 quadpack: gsl_integration_routines_mod -> fortnum, drop unused gsl_roots#97
krystophny wants to merge 15 commits into
mainfrom
reorder/p3-quadpack

Conversation

@krystophny

@krystophny krystophny commented Jun 14, 2026

Copy link
Copy Markdown
Member

Merge order

#95 -> #96 -> #97 -> #98 -> #99

Each PR now targets main on its own, so its diff is cumulative (p1 + .. + pN) against main and the PRs share code. Merge them in the sequence above. Once a predecessor merges into main, the next PR's diff shrinks to its own increment.

Third PR of the reordered fortnum migration stack. Base: P2 (#96).

Re-expresses the QUADPACK-pattern 1D integrators on fortnum: fint1d_qag/qags/qagp/qagiu map to integrate_qag/qags/qagp/qagiu, and fint1d_cquad runs on the adaptive QAGS path. The user integrand reaches fortnum through a host-associated panel(x, ctx) wrapper, and fortnum_status_t feeds the existing disp_gsl_integration_error accumulator. The public interface (module, generic names, result(2) value/abserr) is preserved so collop_compute.f90 and both collop_definitions.f90 compile unchanged; the internal kind alias is renamed dp -> wp to avoid clashing with the callers' nrtype dp. gsl_roots_routines_mod has no call sites and is removed with its build-list entry.

CI status: expected backend-swap red, resolves on merge

The run-golden-record check is red, and that is the expected backend-swap signal.

The collision-operator integrals now run on fortnum instead of GSL/FGSL. The lorentz golden record compares this build against a reference NEO-2 still linked to GSL: the latest stable release (rough check) and a fresh main build (exact check). Both references differ from the fortnum build at the numerical-library level:

Difference in ailmm: 8.889334739576369e-08
Difference in denmm: 4.860273234422973e-10

These exceed the lorentz tolerance (TOL=1e-14), so compare_h5.py reports the two as not the same and the job fails. The difference is the fortnum-vs-GSL result of the same integrals, not a regression. The build itself is clean: fortnum is fetched and linked, both neo_2_par.x and neo_2_ql.x compile, and the unit tests pass.

The golden record regenerates from main once the migration lands. Merge in the stated order; when this PR's predecessor reaches main, the exact-check reference becomes the fortnum build and the comparison closes. Do not loosen the tolerance.

Note: fortnum pin updated to current main (92de6e9) after a fortnum history rewrite; old shas no longer resolve.

Declare fortnum once (guarded by NOT TARGET fortnum): prefer a local
$CODE/fortnum checkout, otherwise FetchContent the pinned revision
4c560b0 over https so CI can clone the public repo. Link the common
library against fortnum alongside the existing GSL/FGSL backends.

This adds the new dependency before any call site migrates to it and
removes nothing yet, so the build and physics are unchanged: GSL still
backs gsl_specialfunctions_mod, gsl_integration_routines_mod, and
gsl_bspline_routines_mod.
Replace the FGSL backend of besselk(n,z) with fortnum bessel_kn. The
module and public function name stay identical, so collop_compute.f90
(orders 0, 1, 2) compiles unchanged. The kind alias moves from
fgsl_double to iso_fortran_env real64 (wp).
…sl_roots

Re-express the QUADPACK-pattern 1D integrators on fortnum:
fint1d_qag/qags/qagp/qagiu map to integrate_qag/qags/qagp/qagiu, and
fint1d_cquad runs on the adaptive QAGS path (Wynn-epsilon extrapolation
covers the endpoint singularities CQUAD targets). The user integrand
reaches fortnum through a host-associated panel(x, ctx) wrapper, and
fortnum_status_t feeds the existing disp_gsl_integration_error
accumulator. The public interface (module, generic names, result(2)
value/abserr) is preserved so collop_compute.f90 and both
collop_definitions.f90 compile unchanged; the internal kind alias is
renamed dp -> wp to avoid clashing with the callers' nrtype dp on
use-association. gsl_roots_routines_mod has no call sites in the tree
and is removed together with its build-list entry.
Re-pin the fortnum FetchContent GIT_TAG to lazy-fortran/fortnum main commit
33bd676, which makes the QAGS/QAGP/QAGIU adaptive driver reproduce QUADPACK
dqagse/dqagpe extrapolation. The CQUAD/QAGS/QAGIU collision-operator
integrals (ailmm/denmm, the Chandrasekhar Maxwellian moments) routed onto
fortnum_integrate now match GSL/QUADPACK accuracy instead of drifting at
~1e-8 with a reported ~1e-13 error.
The fint1d_cquad wrappers were re-expressed on the QAGS path, a different
algorithm from GSL's CQUAD that stopped at a slightly different value and
drifted ~1e-8 in the collision integrals (ailmm) against the CQUAD-generated
golden. fortnum now ships a genuine doubly-adaptive Clenshaw-Curtis
integrate_cquad; call it directly and re-pin fortnum to the revision with it.
libneo also fetches fortnum and FetchContent's first declaration wins,
so with libneo declared first this repo's pin was dead: CI built
whatever revision libneo pinned, compiled under libneo's directory
scope with -O3 -ffast-math. Finite-math-only removed fortnum's
non-finite-sample handling, so collision-operator integrands that hit
0*inf at isolated nodes filled anumm/denmm with NaN, zhseqr failed on
every Arnoldi iteration, and the QL performance golden job spun until
the 6 h CI limit.

Declare fortnum first so the pin and its IEEE-flag defense apply, and
pin lazy-fortran/fortnum#57: coefficient-difference cquad error
estimate, zeroed non-finite samples, -fno-fast-math on the target.
@krystophny

Copy link
Copy Markdown
Member Author

Merged main; declared fortnum before libneo so this repo's pin wins, and pinned lazy-fortran/fortnum#57. Root cause of the 6 h golden timeout: fortnum was fetched at libneo's pin and compiled under libneo's -ffast-math scope, which removed its NaN handling; collision-operator integrands hitting 0*inf at isolated nodes filled the matrices with NaN and zhseqr failed on every Arnoldi iteration. The pinned fortnum adds a coefficient-difference cquad error estimate, zeroes non-finite samples, and forces -fno-fast-math on its target; the QL performance case now completes locally with zero NaN integrals and healthy convergence (4 bad modes, 17 stabilized iterations). Unit tests pass locally (5/5). The lorentz exact-vs-main compare still reports the expected backend-swap residual (denmm 4.9e-10, ailmm 2.5e-10), which is the GSL reference's own stopping error at epsabs=1e-11.

@krystophny

Copy link
Copy Markdown
Member Author

Status of the golden-record investigation:

No tolerance changes; the golden record stays as is.

@krystophny

Copy link
Copy Markdown
Member Author

Root cause identified, per-call and per-evaluation tracing complete.

Instrumented the fortnum-backed shim to run every live integrator call through both backends (fortnum and FGSL/GSL as oracle) in the same process during the ql golden-record case: 3.07M calls traced.

  • fint1d_qagiu and the 2.76M inner-kernel fint1d_cquad calls agree to <= 1.2e-15 relative. fortnum accuracy is not the problem.
  • All divergent calls (1756 of 3.07M) sit on collision-operator segments whose endpoints are B-spline knots. With collop_bspline_order=2 the matrix-element integrands jump exactly at the knots (piecewise-constant basis derivatives), which is why the integrate() wrapper splits at t_vec in the first place.
  • Per-evaluation dumps show the actual mechanism: the Clenshaw-Curtis node arithmetic (mid + half*cos) of BOTH backends can land one ulp outside the requested segment, e.g. x = 2.66666666666666607 for a segment starting at a = 8/3 = 2.66666666666666652. That sample sits on the far side of the knot where the integrand is 5.4e5 instead of 0. GSL's CQUAD then subdivides against the phantom kink and keeps a spurious ~2e-10 contribution (its own abserr ~7e-9 admits it); fortnum's subdivision happens to end on nodes that round back onto the knot and returns the exact 0. Every ailmm/denmm difference in the golden record traces back to these one-ulp boundary slivers, amplified downstream into the 1e-6-level D-coefficient diffs.

So the diff is a boundary-sampling artifact of adaptive quadrature on knot-discontinuous integrands, not a backend accuracy regression, and it cannot be closed by making fortnum more accurate (fortnum already returns the exact integral on the offending segments).

Fix under verification: clamp the integrand abscissa to the open segment (one ulp inside each end) in the fint1d_cquad wrappers, so every segment integrand is continuous on its closed interval and both backends resolve it to machine precision. This must land on main as well (the exact check builds its reference from main), which also removes the sliver artifact from the GSL-based results. Local two-sided ql run with the clamp on both builds is in progress; numbers follow.

Same fix as on main (#117), applied to the fortnum-backed wrappers: the
Clenshaw-Curtis node arithmetic can land one ulp outside the segment,
where the collision-operator integrands jump at the bounding B-spline
knot. Clamping one ulp inside keeps the integrand continuous on the
segment; the cross-backend ql golden diff drops from ailmm_aa 3.6e-8 to
7.8e-11.
@krystophny

Copy link
Copy Markdown
Member Author

Clamp fix verified two-sided and pushed; here are the numbers.

The abscissa clamp (evaluate one ulp inside each segment) is now in three places: on main as PR #117 (GSL/FGSL wrappers), and on this stack as 097ee5e (fortnum shim, merged through to the P4 and P5 branches). fortnum pin stays at f849f59 - the tracing shows fortnum needs no accuracy fix: after the clamp, all 3.07M traced integrator calls in the ql case agree with GSL to a few ulp, and fortnum was the side returning the exact value on the previously divergent segments.

Local two-sided golden runs, current (this stack, fortnum) vs reference (clamped main, GSL):

lorentz (compare at 1e-14 / magnetics 1e-12):

  • before clamp: ailmm 2.46e-10, denmm 4.86e-10 (and _a/_aa variants)
  • after clamp: denmm clean below the gate; only ailmm/ailmm_aa remain, at 8.53e-12

ql (compare at 1e-15):

  • before clamp: ailmm_aa 3.57e-08, gamma_out 1.44e-07, D12_AX 2.77e-06
  • after clamp: ailmm_aa 7.76e-11, gamma_out 1.47e-09, D12_AX 1.53e-09

What remains is not a quadrature defect on either side: the ailmm assembly sums segment integrals of magnitude ~4e5 that cancel down to elements a few thousand times smaller. Each segment integral agrees across backends to 1-3 ulp (2-7e-16 relative; the irreducible difference between two correct implementations with different node sets), and the cancellation amplifies that to the observed 1e-11-level residual. Closing that would require bit-identical arithmetic with GSL, i.e. porting GSL's CQUAD decision-for-decision, which fortnum's clean-room policy rules out.

Practical consequence: the exact check can only go green for this stack once #117 is on main (so the freshly built reference is also clamped), and even then lorentz/ql will show the 1e-11-class ailmm residual against the 1e-14/1e-15 gates. That residual is the true, fully characterized backend-swap delta - three to four orders smaller than what these PRs showed before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant