Methodology

Reference data

The DFT ground truth is the ASSYST campaign: ~7M structures stored in the dft_surrogate_mlip_assyst Postgres database. Convention details are captured in [our wiki](https://… link TBD).

Per-element offset fit

Each potential's raw MLIP energy can be off the DFT scale by an element- dependent constant. We fit per-element shifts Δμ_i to minimise (E_MLIP − E_DFT) − Σᵢ Nᵢ · Δμᵢ across all structures, via the normal equations (AᵀA)μ = Aᵀb. Two flavours are computed:

The site's primary view uses the global correction.

Outlier filter

Before fitting and before reporting per-element RMSE/MAE, structures with |(E_MLIP − E_DFT) / n_atoms| ≥ 50 eV/atom are dropped. These are MLIP numerical failures, not real reference offsets — without the filter a single runaway structure can inflate a per-element RMSE by 10–100×.

Metric definitions

Metric Definition
E_rmse_corrected √mean((dE/atom − offset)²) over structures grouped by structure.element
E_mae_corrected mean(|dE/atom − offset|) — same grouping
F_rmse_comp √mean(|F_MLIP − F_DFT|²) over atoms, grouped by atomic species
F_mae_comp mean(|F_MLIP − F_DFT|) — same
F_radial_rmse √mean((|F_MLIP| − |F_DFT|)²) over atoms — magnitude-only error
F_radial_mae mean(||F_MLIP| − |F_DFT||) — same
F_ang_med_deg median angle(F_MLIP, F_DFT) per atomic species
S_rmse_voigt √mean(δs²) over 6 Voigt components of stress, grouped by structure.element

Element grouping caveat

The DB column structure.element is the marker assigned at ingest. It is not consistent across binary datasets — Binary_Fe_Mo rows are labelled Fe, while Binary_Fe_B rows are labelled B. We follow this convention to match the upstream offset-fit CSVs. Force metrics use per-atom species (atomic-number-derived) grouping, which is consistent.