Skip to content

Stabilize synthetic stats fixtures used by training tests#1232

Merged
mcgibbon merged 3 commits into
mainfrom
fix/coupled-train-test-stats-stability
Jun 8, 2026
Merged

Stabilize synthetic stats fixtures used by training tests#1232
mcgibbon merged 3 commits into
mainfrom
fix/coupled-train-test-stats-stability

Conversation

@mcgibbon

@mcgibbon mcgibbon commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Fixes flakiness in `fme/coupled/test_train.py::test_train_and_inference[3-True]` (and the GPU-skipped `test_train_and_inference_with_derived_forcings`) by removing a hidden source of pathological inputs in the synthetic stats fixtures.

The integration tests built per-variable normalization stats by drawing `np.random.randn()` for both means and stds. Stds occasionally landed near zero or negative, which made normalized inputs blow up. The CRPS + `NoiseConditionedSFNO` training case amplified that through multi-step rollouts and produced inference outputs in the `1e+3` to `1e+17` range and occasional NaN values. Variation between runs came from `list(set(...))` of variable names depending on `PYTHONHASHSEED`, so the same test would pull a pathological std assignment on some CI invocations but not others. Verified locally on CPU: five runs of the CRPS case with the old behavior produced `PRESsfc` magnitudes ranging from `1e+0` to `3e+3` (one run with std drawn at `-1.2`); after the fix all five runs stayed at `~1e+0` with no NaN/Inf.

Changes:

  • `fme.ace.testing.fv3gfs_data.get_scalar_dataset` / `save_scalar_netcdf`: replace the `randn()` default with `center + uniform(-0.05, 0.05)` so per-variable noise stays bounded.

  • `fme.ace.testing.save_stats_netcdfs`: new helper that writes a paired mean/std file with the std centered at 1.0, so callers cannot forget to keep stds bounded away from zero.

  • Route `StatsData`, `fme.coupled.data_loading.test_data_loader.create_coupled_data_on_disk`, and `fme.ace.test_train._setup` through the new helper. Drop the now-redundant `stats_std_fill_value` parameter from `_setup`.

  • Tests added (existing integration tests continue to pass; the fix is on the test fixture itself)

  • If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated

Resolves #1143

mcgibbon added 2 commits June 5, 2026 22:48
The synthetic stats files used by the coupled/ace training integration
tests were populated with per-variable `np.random.randn()` draws for both
means and stds. Stds drawn near zero (or negative) caused normalized
inputs to blow up, which the CRPS + NoiseConditionedSFNO training case
amplified through multi-step rollouts to produce outputs in the
1e+3 to 1e+17 range and occasional NaN values during inference.

Iteration over `set(...)` of variable names made the per-variable
std assignment depend on `PYTHONHASHSEED`, so the same test would land
on a pathological std on some CI invocations but not others.

Changes:
- `fme.ace.testing.fv3gfs_data.get_scalar_dataset` /
  `save_scalar_netcdf`: replace `randn()` default with
  `center + uniform(-0.05, 0.05)` so per-variable noise stays bounded.
- `fme.ace.testing.save_stats_netcdfs`: new helper that writes a paired
  mean/std file with the std centered at 1.0, so callers cannot forget
  to keep stds bounded away from zero.
- Route `StatsData`, `fme.coupled.data_loading.test_data_loader.
  create_coupled_data_on_disk`, and `fme.ace.test_train._setup` through
  the new helper. Drop the now-redundant `stats_std_fill_value`
  parameter from `_setup`.

Resolves #1143

@AnnaKwa AnnaKwa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tracking this down!

@mcgibbon mcgibbon enabled auto-merge (squash) June 8, 2026 16:51
@mcgibbon mcgibbon merged commit 6ef9d77 into main Jun 8, 2026
7 checks passed
@mcgibbon mcgibbon deleted the fix/coupled-train-test-stats-stability branch June 8, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fme/coupled/test_train.py::test_train_and_inference[3-True] flakiness

2 participants