Skip to content

Split vacancies benchmark#437

Open
ThomasWarford wants to merge 45 commits intoddmms:mainfrom
ThomasWarford:split_vacancies_benchmark
Open

Split vacancies benchmark#437
ThomasWarford wants to merge 45 commits intoddmms:mainfrom
ThomasWarford:split_vacancies_benchmark

Conversation

@ThomasWarford
Copy link
Copy Markdown
Contributor

@ThomasWarford ThomasWarford commented Mar 23, 2026

Pre-review checklist for PR author

PR author must check the checkboxes below when creating the PR.

Summary

Based on this paper: Identifying split vacancy defects with machine-learned foundation models and electrostatics

The metrics:

  • Formation energy of split-vacancy defects from fully ionised defects
  • Spearman's coefficient for ranking the energies of initial defect structures.
  • RMSD of MLIP relaxed structures vs DFT relaxed structures. (Note the initial structure for the MLIP relaxation is currently the DFT relaxed structure.)

Linked issue

Closes #335

Progress

  • Calculations
  • Analysis
  • Application
  • Documentation

Testing

New decorators/callbacks

@ThomasWarford ThomasWarford force-pushed the split_vacancies_benchmark branch from 5a053c7 to a65e1e8 Compare March 23, 2026 22:34
@ThomasWarford
Copy link
Copy Markdown
Contributor Author

@joehart2001 @ElliottKasoar do you think this belongs in a "defects" section rather than bulk crystals?

@ThomasWarford
Copy link
Copy Markdown
Contributor Author

Here's how the table looks at the moment

image

@joehart2001
Copy link
Copy Markdown
Collaborator

@joehart2001 @ElliottKasoar do you think this belongs in a "defects" section rather than bulk crystals?

Yes i totally agree. We have some other PRs that could maybe fit in a defects category, like #337 creates an interstitial category, so we could merge you two. but its something we can move around easily anyway

@joehart2001
Copy link
Copy Markdown
Collaborator

Here's how the table looks at the moment

image

Looking good, potentially don't need rmsd as well as MAE? hopefully we'll have an option soon to switch between different types of errors automatically.

@ThomasWarford
Copy link
Copy Markdown
Contributor Author

The RMSD metric is still subject to change, since it is sensitive to the accuracy of the bulk structure as well as the defect structure (which is what we are interested in).

@ElliottKasoar ElliottKasoar added the new benchmark Proposals and suggestions for new benchmarks label Apr 2, 2026
@kavanase
Copy link
Copy Markdown

kavanase commented Apr 6, 2026

This looks very nice @ThomasWarford!

@ThomasWarford ThomasWarford force-pushed the split_vacancies_benchmark branch from 9c5df6e to 919f477 Compare April 8, 2026 21:49
@ThomasWarford ThomasWarford force-pushed the split_vacancies_benchmark branch from afb4d1c to ad52779 Compare April 9, 2026 10:44
@kavanase
Copy link
Copy Markdown

I had a look through this code @ThomasWarford. It's very nice!

One comment:
For the max_dist description:

Maximum atomic displacement between the MLIP-relaxed and DFT-relaxed matched
structures, normalised by :math:(N/V)^{1/3} (where :math:N is the number of
atoms and :math:V the cell volume) to give a unitless quantity comparable across
different supercell sizes. Only computed for structure pairs that pass the
StructureMatcher test. The match criterion itself is a normalised max dist below 0.3.

I would suggest changing "across different supercell sizes" to "across different crystal structures". 'Different supercell sizes' sounds like this number will differ for a defect in e.g. a 2x2x2 MgO supercell and a 4x4x4 supercell, which is not the case. Also just to check, is it not (V/N)^{1/3} ? i.e. average free length per atom (https://github.com/materialsproject/pymatgen/blob/v2025.6.14/src/pymatgen/analysis/structure_matcher.py#L534)

Fyi, if the StructureMatcher scans are very slow to run, the StructureMatcher_scan_stol function in doped gives the same functionalities and orders of magnitude faster.

@ThomasWarford
Copy link
Copy Markdown
Contributor Author

@kavanase thanks for taking a look, I've made the changes you suggested to the docs.

I'll profile this later and see how long the structure matching is taking (I suspect it's quite a small amount of time compared to the relaxations, but maybe not on GPU!). @joehart2001 @ElliottKasoar how would you suggest adding StructureMatcher_scan_stol from doped, and what's your attitude to dependencies?

It looks like doped.utils.efficiency doesn't depend much on the rest of doped so the code could be copied with proper attribution (MIT license)

@kavanase
Copy link
Copy Markdown

Ok cool! The structure matching stuff was just if it's annoyingly slow

@ThomasWarford
Copy link
Copy Markdown
Contributor Author

It seems about ~1/4 of the time is spent structure matching, when running on GPU

@joehart2001
Copy link
Copy Markdown
Collaborator

@kavanase thanks for taking a look, I've made the changes you suggested to the docs.

I'll profile this later and see how long the structure matching is taking (I suspect it's quite a small amount of time compared to the relaxations, but maybe not on GPU!). @joehart2001 @ElliottKasoar how would you suggest adding StructureMatcher_scan_stol from doped, and what's your attitude to dependencies?

It looks like doped.utils.efficiency doesn't depend much on the rest of doped so the code could be copied with proper attribution (MIT license)

In general, the fewer dependencies the better, but if this is more than a small helper and the rest of doped is likely to be useful elsewhere, adding it could be justified. @ElliottKasoar thoughts?

@ThomasWarford
Copy link
Copy Markdown
Contributor Author

ThomasWarford commented Apr 13, 2026

Here's the app:
image

Clicking on a violin plot point shows the MLIP relaxed and DFT relaxed structures.

@ThomasWarford ThomasWarford marked this pull request as ready for review April 13, 2026 08:43
@ThomasWarford
Copy link
Copy Markdown
Contributor Author

@joehart2001 perhaps it's not worth it for a 25% speedup. If a future PR uses StructureMatcher heavily it might be worth looking at again.

@kavanase
Copy link
Copy Markdown

Looks very nice! One thing that was initially surprising to me looking at that table (before remembering the origins) -- the errors are larger for PBE than PBEsol, despite the models mostly all being PBE models (outsite the R2SCAN one). But this is more due to their compositions (PBE -> nitrides, PBEsol -> oxides here) right?

I see this is shown in the lower violin plot ("Oxides, PBEsol"), but I think it would be best if that could be included in the table headers, just to minimise confusion about this. I know this is more text to add so could make it messy, but if needed I think using "Oxides/Nitrides" in the headers would be better than "PBEsol/PBE" as it's the former that is the bigger difference here, whereas the slightly different functional choices are not so big factors?

@ElliottKasoar
Copy link
Copy Markdown
Collaborator

ElliottKasoar commented Apr 14, 2026

@kavanase thanks for taking a look, I've made the changes you suggested to the docs.
I'll profile this later and see how long the structure matching is taking (I suspect it's quite a small amount of time compared to the relaxations, but maybe not on GPU!). @joehart2001 @ElliottKasoar how would you suggest adding StructureMatcher_scan_stol from doped, and what's your attitude to dependencies?
It looks like doped.utils.efficiency doesn't depend much on the rest of doped so the code could be copied with proper attribution (MIT license)

In general, the fewer dependencies the better, but if this is more than a small helper and the rest of doped is likely to be useful elsewhere, adding it could be justified. @ElliottKasoar thoughts?

Generally I prefer adding a dependency over copying code, otherwise we miss out on any bug fixes, performance improvements, etc., but if it were say a single function we needed and licensing was fine, copying things over could be considered (I'd still reference it, even if it were not required).

Adding dependencies of course comes with potential conflicts etc. to worry about, but we can also make things optional and declare conflicting dependencies if we need to as well, so I also wouldn't overthink it.

(More generally, we need to formalise and potentially revisit some of our decisions with respect to test-specific dependencies, but that's not a job for this!)

@ThomasWarford
Copy link
Copy Markdown
Contributor Author

@kavanase thanks for pointing this out, I changed it in the latest commit. Materials project is pretty oxide-heavy, which probably explains it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new benchmark Proposals and suggestions for new benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Split Vacancy Defect Tests

4 participants