When machine learning shows its reasoning, electrocatalyst discovery accelerates


Feb 13, 2026

Machine learning algorithms that output human-readable equations and design rules are transforming how electrocatalysts for clean-energy reactions are screened, identified, and validated across millions of candidates.

(Nanowerk Spotlight) The electrochemical reactions that drive clean-energy technology, from water splitting and fuel-cell oxygen reduction to carbon dioxide conversion and nitrogen fixation, all require catalysts to proceed at useful rates. The best-performing materials still rely on scarce, expensive metals like platinum and iridium, and replacing them demands searching a candidate space that spans dozens of elements, varied crystal structures, multiple dopant atoms, and a growing library of support materials. The combinations number in the millions. No experimental program can cover that territory systematically. Computational quantum chemistry narrowed the odds. Density functional theory, a method for calculating the electronic structure of atoms and molecules from first principles, lets researchers estimate how strongly a surface grips reaction intermediates, a useful stand-in for catalytic performance, without synthesizing anything. Combined with the computational hydrogen electrode model, these calculations enabled volcano plots: graphs that rank catalysts by a single binding-energy parameter, with the best performers near the peak. Volcano plots became a standard tool, but they lean on simplifying assumptions. They compress a multi-variable problem into one or two dimensions and treat the binding energies of different intermediates as though they scale linearly with one another. Many real catalysts violate those assumptions. Single isolated metal atoms on two-dimensional supports, for instance, do not behave like extended metal surfaces. The d-band center, the field’s most widely used electronic descriptor measuring the average energy of a metal’s d-electron states, can even mispredict trends for such systems because it averages away the discrete, directional orbital interactions that actually govern how molecules bind to isolated atomic sites. A new article in Accounts of Materials Research (“Data-Driven Electrocatalyst Discovery: Recent Trends in Machine Learning Approaches and Descriptor-Based Design Principles”), from researchers at the University of Puerto Rico and the Technical University of Munich, surveys how interpretable machine learning and carefully engineered physical descriptors now push past these limitations. Data-driven electrocatalyst discovery integrates machine learning, data mining, experience-driven descriptors such as the Sabatier volcano, and feature correlation analysis to screen candidates across two-dimensional and three-dimensional material platforms Data-driven electrocatalyst discovery integrates machine learning, data mining, experience-driven descriptors such as the Sabatier volcano, and feature correlation analysis to screen candidates across two-dimensional and three-dimensional material platforms. (Image courtesy of the authors) (click on image to enlarge) The review makes a deliberate case for transparency over raw predictive power. While black-box deep-learning models can achieve strong accuracy on large datasets, their opacity hinders physical insight and slows the translation of predictions into experimentally controllable design levers. Interpretable models, by contrast, reveal which descriptors matter, how they connect to reaction mechanisms, and where conventional global trends fail to capture local exceptions. “Catalyst discovery needs more than faster prediction,” Prof. Zhongfang Chen, corresponding author of the study, tells Nanowerk. “It needs models that translate data into understanding. By grounding machine learning in physically meaningful descriptors and interpretable rules, we can build workflows that are scalable, mechanistically transparent, and more reliable in guiding experiments.” The paper builds on the idea that global activity trends and local pockets of exceptional performance demand different analytical tools, and that the greatest gains come from pairing them. For mapping global trends, the review highlights two methods. LASSO (least absolute shrinkage and selection operator) is a regression technique that penalizes complexity, automatically zeroing out unimportant variables to yield a sparse, readable model. Applied to the nitrogen reduction reaction on single metal atoms at boron vacancies in two-dimensional metal diborides, LASSO compressed geometric and electronic information into one formula combining bond length, bond angle, and atomic numbers of the metal and substrate. That single number correlated linearly with the free-energy barrier of the rate-limiting step, the slowest stage that caps overall speed, and flagged titanium on vanadium diboride as a top candidate. SISSO (sure independence screening and sparsifying operator) goes further by constructing entirely new descriptors through algebraic combination of elementary physical quantities, capturing nonlinear relationships that LASSO cannot. For the oxygen reduction reaction on graphene-supported single-atom catalysts, SISSO distilled 29 candidate features into a two-variable global descriptor built from the metal’s d-band center and the formation energy of the nonmetal support. For uncovering local exceptions, the review turns to subgroup discovery (SGD). Rather than fitting one curve to all data, SGD searches for small subsets with unusually high performance and returns human-readable threshold rules that define them. In nickel-based metal-organic frameworks, porous crystalline materials built from metal nodes and organic linkers, SGD identified four conditions tied to orbital occupancy and ionization energy that reliably flagged low-overpotential oxygen evolution catalysts. Overpotential here means the extra voltage wasted beyond the thermodynamic minimum needed to drive the reaction. A separate descriptor prominent in the same study was the occupancy of eₘ orbitals at the active metal site, which directly controls how strongly intermediates bind. Too few eₘ electrons overbind; too many underbind. The paper showed a clear volcano-shaped relationship between eₘ filling and overpotential, capturing electronic detail the d-band center alone misses. These methods prove most effective in sequence. In a study of more than 10 000 graphene-based single-atom catalysts for oxygen reduction, SISSO first mapped the broad activity landscape. SGD then applied threshold rules to isolate a high-performance subgroup. The target single-atom catalyst, a cobalt center coordinated by two sulfur and two nitrogen atoms on a graphene support, was synthesized and delivered strong performance with long-term stability, illustrating how abstract shortlists translate into real, testable materials. Database-scale screening amplifies these methods. More than 6 300 two-dimensional materials from the 2DMatPedia repository were filtered by synthesizability, conductivity, and the oxygen adsorption free energy. That single thermodynamic descriptor sorted candidates by reaction pathway and shortlisted 24 oxygen reduction and two oxygen evolution catalysts with predicted activities rivaling platinum and iridium oxide benchmarks. A “progressive learning” framework tackled a different challenge: finding rare, high-performance catalysts in sparse data. A first-stage model predicted adsorption energies of three key intermediates; those predictions then fed into a second-stage model for overpotential. Compared with conventional single-pass approaches, this two-step architecture improved sensitivity to rare top performers that standard workflows overlook. The top candidate identified, manganese on ruthenium dioxide, was synthesized and confirmed to deliver high current density with low overpotential in acidic conditions. For carbon dioxide reduction, the team explored vertical metal dimers on defective graphene in an “inverse sandwich” arrangement. A gradient boosting model identified the difference in first ionization energies between the two metals as the dominant performance driver, a proxy for the dimer’s electron-donation capacity. Screening 784 such dimers flagged 154 stable, highly active candidates. The review also addresses the gap between computational prediction and structural confirmation. In a hydrogen evolution study, a supervised-learning model trained on simulated X-ray absorption spectra was applied to synchrotron data from real cobalt catalysts. It confirmed that the most active samples contained a large fraction of edge-site cobalt atoms coordinated by two nitrogen atoms, directly linking atomic structure to measured performance. Faithful modeling of the full electrochemical interface, including solvent, pH, electric double-layer effects, and applied potential, stands as the next major challenge. The authors advocate a tiered strategy: apply high-fidelity, constant-potential calculations to a small, carefully selected subset of promising candidates, then use those calibrated baselines for broader surveys. Multi-fidelity data fusion, combining different levels of computational accuracy with targeted experimental measurements, will be essential for producing descriptors that remain consistent from atomistic active sites to realistic operating conditions. Dr. Liangliang Xu, first author of the Account, sees artificial intelligence evolving from a screening calculator into an active research partner. “Data-driven methods are changing the research paradigm in catalysis and materials science,” he says. “The next stage is deeper integration, where generative AI and large language models can act as literature-aware agents to help extract synthesis protocols, standardize parameters, and generate executable experimental checklists. Coupled with active learning and automated laboratories, these systems can propose the most informative experiments, execute them with full provenance, and update models in real time under uncertainty, shortening the cycle from ideas to validated materials.” The case studies in this review already demonstrate pieces of that vision in action: algorithms that generate testable hypotheses, database screens that produce synthesis-ready shortlists, and machine learning models that interpret experimental spectra to confirm predicted structures. What ties them together is a shift in how the field treats artificial intelligence, multiscale modeling, and closed-loop experimental feedback, not as separate tools but as a unified discovery engine. By grounding that engine in algorithms whose reasoning scientists can read, challenge, and build on, the field is compressing the distance between atomic-level understanding and working catalysts for sustainable energy.


Michael Berger
By
– Michael is author of four books by the Royal Society of Chemistry:
Nano-Society: Pushing the Boundaries of Technology (2009),
Nanotechnology: The Future is Tiny (2016),
Nanoengineering: The Skills and Tools Making Technology Invisible (2019), and
Waste not! How Nanotechnologies Can Increase Efficiencies Throughout Society (2025)
Copyright ©




Nanowerk LLC

For authors and communications departmentsclick to open

Lay summary


Prefilled posts