Skip to content

[WIP] Feature/var masking train stepper#1196

Open
yyexela wants to merge 1 commit into
mainfrom
feature/var_masking_train_stepper
Open

[WIP] Feature/var masking train stepper#1196
yyexela wants to merge 1 commit into
mainfrom
feature/var_masking_train_stepper

Conversation

@yyexela

@yyexela yyexela commented May 26, 2026

Copy link
Copy Markdown
Contributor

Working on masking to train stepper


Short description of why the PR is needed and how it satisfies those requirements, in sentence form.

Changes:

  • symbol (e.g. fme.core.my_function) or script and concise description of changes or added feature

  • Can group multiple related symbols on a single bullet

  • Tests added

  • If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated

Resolves # (delete if none)

@yyexela yyexela force-pushed the feature/var_masking_train_stepper branch from 5f40449 to 0524dc2 Compare May 26, 2026 22:00
Comment on lines +525 to +530
# Build a NaN-free view of input for the corrector and ocean model.
# When variable masking augmentation fills IC channels with NaN, those NaNs
# survive normalization's fill (which only applies to input_norm) and would
# propagate through area-weighted means in physics corrections, producing NaN
# outputs and zero gradients. Replacing with the normalizer's denormalized
# estimate (climatological mean for masked variables) keeps corrections valid.

@mcgibbon mcgibbon May 27, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this take - the corrector is not valid if we're giving it climatological means instead of actual values. Like, we can't apply a corrector if the values it needs to do the correction are missing. This avoids a NaN or crash but I think a crash is the right call in this scenario.

If input data is missing for the correction, the way the code is right now I think it should raise an exception and halt. Perhaps we could add configuration to only apply certain corrections if values are available, and make the corrector mask-aware.

However for just masking inputs randomly for the network, we could consider making use of the input timestep data despite that input being masked from the network itself. This could work in a dropout-style approach for masking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants