[DRAFT] DO NOT MERGE Esm2 minifold by jomitchellnv · Pull Request #1541 · NVIDIA/bionemo-framework

jomitchellnv · 2026-04-01T00:56:32Z

Description

Usage

TODO: Add code snippet

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run.

ciflow:skip - Skip all CI tests for this PR
ciflow:notebooks - Run Jupyter notebooks execution tests
ciflow:slow - Run slow single GPU integration tests marked as @pytest.mark.slow
ciflow:all - Run all tests (unit tests, slow tests, and notebooks). This label can be used to enforce running all framework tests.
ciflow:all-recipes - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes.

Unit tests marked as @pytest.mark.multi_gpu or @pytest.mark.distributed are not run in the PR pipeline.

For more details, see CONTRIBUTING

Note

By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
/ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Triggering Code Rabbit AI Review

To trigger a code review from code rabbit, comment on a pull request with one of these commands:

@coderabbitai review - Triggers a standard review
@coderabbitai full review - Triggers a comprehensive review

See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.

Pre-submit Checklist

I have tested these changes locally
I have updated the documentation accordingly
I have added/updated tests as needed
All existing tests pass successfully

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

coderabbitai · 2026-04-01T00:59:24Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: faf92919-1b0c-407d-a8f4-2afc40fae5ed

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch esm2-minifold

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

Adds unpadded_tps to wandb charts

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

2. Implement FP8/MXFP8/NVFP4 via _scaled_mm → 7x slower (no batched FP8 GEMM in PyTorch) 3. Try CUDA graphs → still slow (512 kernels vs 1) 4. Try quantize-dequantize → works but adds overhead for no real benefit 5. Realize BF16 bmm is 0.03ms and nothing beats it 6. Delete the .float() upcast Signed-off-by: Jonathan Mitchell <jomitchell@s1019-0204.ipp1a1.colossus.nvidia.com>

copy-pr-bot · 2026-04-08T19:27:35Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

Signed-off-by: Jonathan Mitchell <jomitchell@ipp1-1334.ipp1a1.colossus.nvidia.com>

Signed-off-by: Jonathan Mitchell <jomitchell@ipp1-1429.ipp1a1.colossus.nvidia.com>

Signed-off-by: Jonathan Mitchell <jomitchell@ipp1-1334.ipp1a1.colossus.nvidia.com>

Signed-off-by: Jonathan Mitchell <jomitchell@r6515-0097.ipp1a1.colossus.nvidia.com>

jomitchellnv added 5 commits March 31, 2026 15:02

adds minifold model

ceb27af

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

adds equivalence tests

f5967e6

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

adds dataset support for minifold

44252c4

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

adds dataset stuff

c4e88bc

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

adds wandb

a0e3f55

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

jomitchellnv requested review from cspades, dorotat-nv, jstjohn, jwilber, pstjohn, savitha-eng and trvachov as code owners April 1, 2026 00:56

jomitchellnv and others added 16 commits March 31, 2026 18:12

adds eval script

6d47468

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

adds data

bba6ad1

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

dockerignore

199c342

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

X

cc8cfb4

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

fp32 master weights

7c6b43d

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

fixed devcontainer reqs

ec76ecc

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

adds fp32 params dtype to miniformer

d596ab8

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

no weight decay

f6e077f

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

x

e29dc1a

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

enables layer-wise and component-wise low precision targeting

0500772

sets layer components low precision to false except ffn

50bc59f

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

removes torch.autocast and uses fsdp2 for autocasting to bf16

213ef6e

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

Adds quant stats logging support

57a50b3

Adds unpadded_tps to wandb charts

enables quant stats logging grad underflow heatmaps into wandb

6ef9dcd

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

adds NVFP4 support

cc5424d

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

jomitchellnv and others added 6 commits April 9, 2026 14:24

better wandb quant stats logs and heatmaps

9a615a2

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

tri-mul cutlass kernel 43% faster

095f154

Signed-off-by: Jonathan Mitchell <jomitchell@ipp1-1334.ipp1a1.colossus.nvidia.com>

adds FP8 Tri Mul attempts

fb7f647

Signed-off-by: Jonathan Mitchell <jomitchell@ipp1-1429.ipp1a1.colossus.nvidia.com>

variant B Code

5d9997c

Signed-off-by: Jonathan Mitchell <jomitchell@ipp1-1334.ipp1a1.colossus.nvidia.com>

status as of april 15 2026

78f5940

Signed-off-by: Jonathan Mitchell <jomitchell@r6515-0097.ipp1a1.colossus.nvidia.com>

cleanup April 15 2026

c086c2f

Signed-off-by: Jonathan Mitchell <jomitchell@r6515-0097.ipp1a1.colossus.nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] DO NOT MERGE Esm2 minifold#1541

[DRAFT] DO NOT MERGE Esm2 minifold#1541
jomitchellnv wants to merge 27 commits intomainfrom
esm2-minifold

jomitchellnv commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

Review skipped

Uh oh!

copy-pr-bot bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jomitchellnv commented Apr 1, 2026

Description

Usage

Type of changes

CI Pipeline Configuration

Authorizing CI Runs

Triggering Code Rabbit AI Review

Pre-submit Checklist

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

copy-pr-bot bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Apr 1, 2026 •

edited

Loading