Split vacancies benchmark by ThomasWarford · Pull Request #437 · ddmms/ml-peg

ThomasWarford · 2026-03-23T22:25:23Z

Pre-review checklist for PR author

PR author must check the checkboxes below when creating the PR.

I've confirmed the contribution guidelines.

Summary

Based on this paper: Identifying split vacancy defects with machine-learned foundation models and electrostatics

The metrics:

Formation energy of split-vacancy defects from fully ionised defects
Spearman's coefficient for ranking the energies of initial defect structures.
RMSD of MLIP relaxed structures vs DFT relaxed structures. (Note the initial structure for the MLIP relaxation is currently the DFT relaxed structure.)

Linked issue

Closes #335

Progress

Calculations
Analysis
Application
Documentation

Testing

New decorators/callbacks

ThomasWarford · 2026-03-23T22:36:14Z

@joehart2001 @ElliottKasoar do you think this belongs in a "defects" section rather than bulk crystals?

ThomasWarford · 2026-03-23T22:41:37Z

Here's how the table looks at the moment

joehart2001 · 2026-03-24T16:08:02Z

@joehart2001 @ElliottKasoar do you think this belongs in a "defects" section rather than bulk crystals?

Yes i totally agree. We have some other PRs that could maybe fit in a defects category, like #337 creates an interstitial category, so we could merge you two. but its something we can move around easily anyway

joehart2001 · 2026-03-24T16:09:32Z

Here's how the table looks at the moment

Looking good, potentially don't need rmsd as well as MAE? hopefully we'll have an option soon to switch between different types of errors automatically.

ThomasWarford · 2026-03-24T17:59:09Z

The RMSD metric is still subject to change, since it is sensitive to the accuracy of the bulk structure as well as the defect structure (which is what we are interested in).

kavanase · 2026-04-06T17:14:26Z

This looks very nice @ThomasWarford!

…chmark

kavanase · 2026-04-10T17:10:35Z

I had a look through this code @ThomasWarford. It's very nice!

One comment:
For the max_dist description:

Maximum atomic displacement between the MLIP-relaxed and DFT-relaxed matched
structures, normalised by :math:(N/V)^{1/3} (where :math:N is the number of
atoms and :math:V the cell volume) to give a unitless quantity comparable across
different supercell sizes. Only computed for structure pairs that pass the
StructureMatcher test. The match criterion itself is a normalised max dist below 0.3.

I would suggest changing "across different supercell sizes" to "across different crystal structures". 'Different supercell sizes' sounds like this number will differ for a defect in e.g. a 2x2x2 MgO supercell and a 4x4x4 supercell, which is not the case. Also just to check, is it not (V/N)^{1/3} ? i.e. average free length per atom (https://github.com/materialsproject/pymatgen/blob/v2025.6.14/src/pymatgen/analysis/structure_matcher.py#L534)

Fyi, if the StructureMatcher scans are very slow to run, the StructureMatcher_scan_stol function in doped gives the same functionalities and orders of magnitude faster.

…chmark

ThomasWarford · 2026-04-12T10:20:27Z

@kavanase thanks for taking a look, I've made the changes you suggested to the docs.

I'll profile this later and see how long the structure matching is taking (I suspect it's quite a small amount of time compared to the relaxations, but maybe not on GPU!). @joehart2001 @ElliottKasoar how would you suggest adding StructureMatcher_scan_stol from doped, and what's your attitude to dependencies?

It looks like doped.utils.efficiency doesn't depend much on the rest of doped so the code could be copied with proper attribution (MIT license)

kavanase · 2026-04-12T15:19:14Z

Ok cool! The structure matching stuff was just if it's annoyingly slow

ThomasWarford · 2026-04-12T15:41:22Z

It seems about ~1/4 of the time is spent structure matching, when running on GPU

joehart2001 · 2026-04-12T17:08:53Z

@kavanase thanks for taking a look, I've made the changes you suggested to the docs.

I'll profile this later and see how long the structure matching is taking (I suspect it's quite a small amount of time compared to the relaxations, but maybe not on GPU!). @joehart2001 @ElliottKasoar how would you suggest adding StructureMatcher_scan_stol from doped, and what's your attitude to dependencies?

It looks like doped.utils.efficiency doesn't depend much on the rest of doped so the code could be copied with proper attribution (MIT license)

In general, the fewer dependencies the better, but if this is more than a small helper and the rest of doped is likely to be useful elsewhere, adding it could be justified. @ElliottKasoar thoughts?

ThomasWarford · 2026-04-13T08:43:15Z

Here's the app:

Clicking on a violin plot point shows the MLIP relaxed and DFT relaxed structures.

ThomasWarford · 2026-04-13T08:46:17Z

@joehart2001 perhaps it's not worth it for a 25% speedup. If a future PR uses StructureMatcher heavily it might be worth looking at again.

kavanase · 2026-04-13T13:57:06Z

Looks very nice! One thing that was initially surprising to me looking at that table (before remembering the origins) -- the errors are larger for PBE than PBEsol, despite the models mostly all being PBE models (outsite the R2SCAN one). But this is more due to their compositions (PBE -> nitrides, PBEsol -> oxides here) right?

I see this is shown in the lower violin plot ("Oxides, PBEsol"), but I think it would be best if that could be included in the table headers, just to minimise confusion about this. I know this is more text to add so could make it messy, but if needed I think using "Oxides/Nitrides" in the headers would be better than "PBEsol/PBE" as it's the former that is the bigger difference here, whereas the slightly different functional choices are not so big factors?

ElliottKasoar · 2026-04-14T16:11:58Z

@kavanase thanks for taking a look, I've made the changes you suggested to the docs.
I'll profile this later and see how long the structure matching is taking (I suspect it's quite a small amount of time compared to the relaxations, but maybe not on GPU!). @joehart2001 @ElliottKasoar how would you suggest adding StructureMatcher_scan_stol from doped, and what's your attitude to dependencies?
It looks like doped.utils.efficiency doesn't depend much on the rest of doped so the code could be copied with proper attribution (MIT license)

In general, the fewer dependencies the better, but if this is more than a small helper and the rest of doped is likely to be useful elsewhere, adding it could be justified. @ElliottKasoar thoughts?

Generally I prefer adding a dependency over copying code, otherwise we miss out on any bug fixes, performance improvements, etc., but if it were say a single function we needed and licensing was fine, copying things over could be considered (I'd still reference it, even if it were not required).

Adding dependencies of course comes with potential conflicts etc. to worry about, but we can also make things optional and declare conflicting dependencies if we need to as well, so I also wouldn't overthink it.

(More generally, we need to formalise and potentially revisit some of our decisions with respect to test-specific dependencies, but that's not a job for this!)

ThomasWarford · 2026-04-14T16:19:20Z

@kavanase thanks for pointing this out, I changed it in the latest commit. Materials project is pretty oxide-heavy, which probably explains it.

ThomasWarford added 21 commits February 3, 2026 13:43

WIP calculations

3b0f613

bugfixes

3c890aa

float32 to float64

4e3d27c

very WIP

ac29957

passed, still many TODOs

358b32b

rename

819918b

remove sv_from_nv_xyz_path for simplicity

b1f9098

Working app

ef2ddbc

update metric names

c1684d2

copy reference structures

8b0f0f4

add rmsd

8dd25b8

tidy

a784828

split by functional

f489af9

slow

fb72af9

calculate rmsd (expensive) in calculate stage

172c814

fix some bugs

2a84003

clean

0eb4a5e

structure matching

32d2575

update description

1d0ddd9

remove structure match metric (always 1)

76c55d4

merge models.yml

a65e1e8

ThomasWarford force-pushed the split_vacancies_benchmark branch from 5a053c7 to a65e1e8 Compare March 23, 2026 22:34

ElliottKasoar added the new benchmark Proposals and suggestions for new benchmarks label Apr 2, 2026

Merge remote-tracking branch 'upstream/main' into split_vacancies_ben…

1281163

…chmark

ThomasWarford added 4 commits April 8, 2026 21:57

move code to split vacancy app

07420da

polish + docs

d2ac308

remove todos

15ef457

doc clarifications

919f477

ThomasWarford force-pushed the split_vacancies_benchmark branch from 9c5df6e to 919f477 Compare April 8, 2026 21:49

ThomasWarford added 2 commits April 9, 2026 11:39

Create defects category

afb4d1c

Create defects category

ad52779

ThomasWarford force-pushed the split_vacancies_benchmark branch from afb4d1c to ad52779 Compare April 9, 2026 10:44

ThomasWarford added 3 commits April 9, 2026 12:12

Move matching criteria (max_dist<STOL) to analysis

ff3dc29

improve split vacancy description

a405f5f

not sure why fast forward didnt work

29696f7

ThomasWarford added 5 commits April 10, 2026 19:44

NaN handling

ef03c40

more robust structure matching

0b7f942

matching description

7795251

show info for selected structure

900005d

Merge remote-tracking branch 'upstream/main' into split_vacancies_ben…

231a16f

…chmark

warning for unmatched structures

32ac38b

track convergence of relaxations

980a01c

ThomasWarford marked this pull request as ready for review April 13, 2026 08:43

pbesol/pbe -> oxide/nitride

9aff18e

Conversation

ThomasWarford commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-review checklist for PR author

Summary

Linked issue

Progress

Testing

New decorators/callbacks

Uh oh!

ThomasWarford commented Mar 23, 2026

Uh oh!

ThomasWarford commented Mar 23, 2026

Uh oh!

joehart2001 commented Mar 24, 2026

Uh oh!

joehart2001 commented Mar 24, 2026

Uh oh!

ThomasWarford commented Mar 24, 2026

Uh oh!

kavanase commented Apr 6, 2026

Uh oh!

kavanase commented Apr 10, 2026

Uh oh!

ThomasWarford commented Apr 12, 2026

Uh oh!

kavanase commented Apr 12, 2026

Uh oh!

ThomasWarford commented Apr 12, 2026

Uh oh!

joehart2001 commented Apr 12, 2026

Uh oh!

ThomasWarford commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ThomasWarford commented Apr 13, 2026

Uh oh!

kavanase commented Apr 13, 2026

Uh oh!

ElliottKasoar commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ThomasWarford commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ThomasWarford commented Mar 23, 2026 •

edited

Loading

ThomasWarford commented Apr 13, 2026 •

edited

Loading

ElliottKasoar commented Apr 14, 2026 •

edited

Loading