The widespread adoption of machine-learned interatomic potentials (MLIPs) for high-throughput materials screening is currently hampered by a critical lack of formal reliability guarantees. This oversight leads to significant inefficiencies, as demonstrated by the finding that a single MLIP can miss a staggering 93% of density functional theory (DFT)-stable materials, achieving a recall of only 0.07 on a large benchmark.
Bridging the Reliability Gap with Proof-Carrying Materials
The researchers introduce Proof-Carrying Materials (PCM) as a novel framework to address this fundamental limitation. PCM operates in three stages: adversarial falsification across compositional space to probe MLIP weaknesses, bootstrap envelope refinement utilizing 95% confidence intervals to quantify uncertainty, and Lean 4 formal certification to provide rigorous assurance. This systematic approach directly tackles the inherent unreliability of current MLIP deployments.
Uncovering Architecture-Specific Blind Spots
Auditing prominent MLIP models like CHGNet, TensorNet, and MACE revealed significant architecture-specific blind spots. These models exhibit near-zero pairwise error correlations (r <= 0.13), a finding further validated by independent Quantum ESPRESSO simulations. The median DFT/CHGNet force ratio was a stark 12x, underscoring the critical need for auditing before deployment in sensitive materials discovery pipelines.