Atomic representation of molecular electronic structure

In quantum chemistry, the wavefunction associated with the electronic Hamiltonian \(\hat H\) is typically expressed by anti-symmetrised products of single-electron functions or molecular orbitals. These are represented in a local atomic orbital basis of spherical atomic functions \(\left| {\psi _m} \right\rangle = \mathop {\sum}

olimits_i {c_m^i\left| {\phi _i} \right\rangle }\) with varying angular momentum. As a consequence, one can write the electronic Schrödinger equation in matrix form

$${\mathbf{Hc}}_m = \epsilon _m{\mathbf{Sc}}_m,$$ (1)

where the Hamiltonian matrix H may correspond to the Fock or Kohn–Sham matrix, depending on the chosen level of theory30. In both cases, the Hamiltonian and overlap matrices are defined as:

$$H_{ij} = \left\langle {\phi _i} \right|\hat H\left| {\phi _j} \right\rangle$$ (2)

and

$$S_{ij} = \left\langle {\phi _i|\phi _j} \right\rangle .$$ (3)

The eigenvalues \(\epsilon _m\) and electronic wavefunction coefficients \(c_m^i\) contain the same information as H and S where the electronic eigenvalues are naturally invariant to rigid molecular rotations, translations or permutation of equivalent atoms. Unfortunately, as a function of atomic coordinates and changing molecular configurations, eigenvalues and wavefunction coefficients are not well-behaved or smooth. State degeneracies and electronic level crossings provide a challenge to the direct prediction of eigenvalues and wavefunctions with ML techniques. We address this problem with a deep learning architecture that directly describes the Hamiltonian matrix in local atomic orbital representation.

SchNOrb deep learning framework

SchNOrb (SchNet for Orbitals) presents a framework that captures the electronic structure in a local representation of atomic orbitals that is common in quantum chemistry. Figure 2a gives an overview of the proposed architecture. SchNOrb extends the deep tensor neural network SchNet31 to represent electronic wavefunctions. The core idea is to construct symmetry-adapted pairwise features \({\mathbf{\Omega }}_{ij}^l\) to represent the block of the Hamiltonian matrix corresponding to atoms i, j. They are written as a product of rotationally invariant (λ = 0) and covariant (λ > 0) components \({\boldsymbol{\omega }}_{ij}^\lambda\) which ensures that – given a sufficiently large feature space – all rotational symmetries up to angular momentum l can be represented:

$${\mathbf{\Omega }}_{ij}^l = \mathop {\prod}\limits_{\lambda = 0}^l {{\boldsymbol{\omega }}_{ij}^\lambda } \quad {\mathrm{with}}\,0 \le l \le 2L$$ (4)

$${\boldsymbol{\omega }}_{ij}^\lambda = \left\{ {\begin{array}{*{20}{l}} {{\kern 1pt} {\kern 1pt} {\mathbf{p}}_{ij}^\lambda \otimes {\mathbb{1}}_D} \hfill & {{\mathrm{for}}\,\lambda = 0} \hfill \\ {\left[ {{\mathbf{p}}_{ij}^\lambda \otimes \frac{{{\mathbf{r}}_{ij}}}{{\left\| {{\mathbf{r}}_{ij}} \right\|}}} \right]{\mathbf{W}}^\lambda } \hfill & {{\mathrm{for}}\,\lambda \, > \, 0} \hfill \end{array}} \right.,$$ (5)

here, r ij is the vector pointing from atom i to atom j, \({\mathbf{p}}_{ij}^\lambda \in {\Bbb R}^B\) are rotationally invariant coefficients and Wλ ∈ R3×D are learnable parameters projecting the features along D randomly chosen directions. This allows to rotate the different factors of \({\mathbf{\Omega }}_{ij}^l \in {\Bbb R}^{B \cdot D}\) relative to each other and further increases the flexibility of the model for D > 3. In case of λ = 0, the coefficients are independent of the directions due to rotational invariance.

Fig. 2 Prediction of electronic properties with SchNOrb. a Illustration of the network architecture. The neural network architecture consists of three steps (grey boxes) starting from initial representations of atom types and positions (top), continuing with the construction of representations of chemical environments of atoms and atom pairs (middle) before using these to predict energy and Hamiltonian matrix respectively (bottom). The left path through the network to the energy prediction E is rotationally invariant by design, while the right pass to the Hamiltonian matrix H allows for a maximum angular momentum L of predicted orbitals by employing a multiplicative construction of the basis ω ij using sequential interaction passes l = 0…2L. The onsite and offsite blocks of the Hamiltonian matrix are treated separately. The prediction of overlap matrix S is performed analogously. b Illustration of the SchNet interaction block32. c Illustration of SchNorb interaction block. The pairwise representation \({\mathbf{h}}_{ij}^l\) of atoms i, j is constructed by a factorised tensor layer f tensor from atomic representations as well as the interatomic distance. Using this, rotationally invariant interaction refinements \({\mathbf{v}}_i^m\) and basis coefficients \({\mathbf{p}}_{ij}^l\) are computed. d Loewdin population analysis for uracil based on the density matrix calculated from the predicted Hamiltonian and overlap matrices. e Mean abs. errors of lowest 20 orbitals (13 occupied + 7 virtual) of ethanol for Hartree–Fock and DFT@PBE. f The predicted (solid black) and reference (dashed grey) orbital energies of an ethanol molecule for DFT. Shown are the last four occupied and first four unoccupied orbitals, including HOMO and LUMO. The associated predicted and reference molecular orbitals are compared for four selected energy levels Full size image

We obtain the coefficients \({\mathbf{p}}_{ij}^\lambda\) from an atomistic neural network, as shown in Fig. 2a. Starting from atom type embeddings \({\mathbf{x}}_i^0\), rotationally invariant representations of atomistic environments \({\mathbf{x}}_i^T\) are computed by applying T consecutive interaction refinements. These are by construction invariant with respect to rotation, translation and permutations of atoms. This part of the architecture is equivalent to the SchNet model for atomistic predictions (see refs. 32,33). In addition, we construct representations of atom pairs i,j that will enable the prediction of the coefficients \({\mathbf{p}}_{ij}^\lambda\). This is achieved by 2L + 1 SchNOrb interaction blocks, which compute the coefficients \({\mathbf{p}}_{ij}^\lambda\) with a given angular momentum λ with respect to the atomic environment of the respective atom pair ij. This corresponds to adapting the atomic orbital interaction based on the presence and position of atomic orbitals in the vicinity of the atom pair. As shown in Fig. 2b, the coefficient matrix depends on pair interactions mlp pair of atoms i, j as well as environment interactions mlp env of atom pairs (i, m) and (n, j) for neighbouring atoms m, n. These are crucial to enable the model to capture the orientation of the atom pair within the molecule for which pair-wise interactions of atomic environments are not sufficient.

The Hamiltonian matrix is obtained by treating on-site and off-site blocks separately. Given a basis of atomic orbitals up to angular momentum L, we require pair-wise environments with angular momenta up to 2L to describe all Hamiltonian blocks

$${\tilde{\mathbf{H}}}_{ij} = \left\{\begin{array}{*{20}{l}} {\mathbf{H}}_{{\mathrm{off}}}\left( \left[ {{\mathbf{\Omega}}_{ij}^l} \right]_{0 \le l \le 2L + 1} \right) \hfill & {{\mathrm{for}}\,i \,

e\, j} \hfill \\ {\mathbf{H}}_{{\mathrm{on}}}\left( \left[ {{\mathbf{\Omega }}_{im}^l} \right]_{\mathop{m

e i}\limits_{0 \le l \le 2L + 1}} \right) \hfill & {{\mathrm{for}}\,i = j }\hfill\end{array}\right.,$$ (6)

The predicted Hamiltonian is obtained through symmetrisation \({\mathbf{H}} = \frac{1}{2}({\tilde{\mathbf{H}}} + {\tilde{\mathbf{H}}}^{{\intercal}} )\). H off and H on are modelled by neural networks that are described in detail in the methods section. The overlap matrix S can be obtained in the same manner. Based on this, the orbital energies and coefficients can be calculated according to Eq. (1). The computational cost of the diagonalisation is negligible for the molecules and basis sets we study here (<1 ms). For large basis sets and molecules, when the diagonalisation starts to dominate the computational cost, our method requires only a single diagonalization instead of one per SCF step. In addition to the Hamiltonian and overlap matrices, we predict the total energy separately as a sum over atom-wise energy contributions, in analogy with the conventional SchNet treatment32 to drive the molecular dynamics simulations.

Learning electronic structure and derived properties

The proposed SchNOrb architecture allows us to perform predictions of total energies, Hamiltonian and overlap matrices in end-to-end fashion using a combined regression loss. We train separate neural networks for several data sets of water as well as ethanol, malondialdehyde, and uracil from the MD17 dataset7. The reference calculations were performed with Hartree-Fock (HF) and density functional theory (DFT) with the PBE exchange correlation functional34. The employed Gaussian atomic orbital bases include angular momenta up to l = 2 (d-orbitals). We augment the training data by adding rotated geometries and correspondingly rotated Hamiltonian and overlap matrices to learn the correct rotational symmetries (see Methods section). Detailed model and training settings for each data set are listed in Supplementary Table 1.

As Supplementary Table 2 shows, the total energies could be predicted up to a mean absolute error below 2 meV for the molecules. The predictions show mean absolute errors below 8 meV for the Hamiltonian and below 1 × 10−4 for the overlap matrices. We examine how these errors propagate to orbital energy and coefficients. Figure 2e shows mean absolute errors for energies of the lowest 20 molecular orbitals for ethanol reference calculations using DFT as well as HF. The errors for the DFT reference data are consistently lower. Beyond that, the occupied orbitals (1–13) are predicted with higher accuracy (<20 meV) than the virtual orbitals (~100 meV). We conjecture that the larger error for virtual orbitals arises from the fact that these are not strictly defined by the underlying data from the HF and Kohn–Sham DFT calculations. Virtual orbitals are only defined up to an arbitrary unitary transformation. Their physical interpretation is limited and, in HF and DFT theory, they do not enter in the description of ground-state properties. For the remaining data sets, the average errors of the occupied orbitals are <10 meV for water and malondialdehyde, as well as 48 meV for uracil. This is shown in detail in Supplementary Fig. 1. The orbital coefficients are predicted with cosine similarities ≥90% (see Supplementary Fig. 2). Figure 2f depicts the predicted and reference orbital energies for the frontier MOs of ethanol (solid and dotted lines, respectively), as well as the orbital shapes derived from the coefficients. Both occupied and unoccupied energy levels are reproduced with high accuracy, including the highest occupied (HOMO) and lowest unoccupied orbitals (LUMO). This trend is also reflected in the overall shape of the orbitals. Even the slightly higher deviations in the orbital energies observed for the third and fourth unoccupied orbital only result in minor deformations. The learned covariance of molecular orbitals for rotations of a water molecule is shown in Supplementary Fig. 3.

The ML model uses about 93 million parameters to predict a large Hamiltonian matrix with >100 atomic orbitals. This size is comparable to state-of-the-art neural networks for the generation of similarly sized images35. Supplementary Table 6 shows the computational costs of calculating the reference data, training the network and predicting Hamiltonians. While training of SchNOrb took about 80 h, performing the required DFT reference calculations remains the bottleneck for obtaining a trained network, in particular for larger molecules. Our approach to predicting Hamiltonian matrices leads to accelerations of 2–3 orders of magnitude.

As SchNOrb learns the electronic structure of molecular systems, all chemical properties that are defined as quantum mechanical operators on the wavefunctions can be computed from the ML prediction without the need to train a separate model. We investigate this feature by directly calculating electronic dipole and quadrupole moments from the orbital coefficients predicted by SchNOrb, as well as the HF total energies for the ethanol molecule. The corresponding mean absolute errors are reported in Supplementary Tables 4 and 5. The calculation of energies and forces from the coefficients requires the evaluation of the core Hamiltonian (HF) or exchange correlation terms (DFT), for which we currently resort to the ORCA code. To avoid this computational overhead and obtain highly accurate predictions for molecular dynamics simulations with mean absolute error below 1 meV, we predict energies and forces directly as a sum of atomic contributions31. Regarding the electrostatic moments, excellent agreement with the electronic structure reference is observed for the majority of molecules (<0.054 D for dipoles and <0.058 D Å for quadrupoles). The only deviation from this trend is observed for uracil, where a loss function minimising only the errors of Hamiltonian and overlap matrices is too limited. The dipole moment depends strongly on the molecular electron density derived from the orbital coefficients, which are never learned directly.

Beyond that, we have studied the prediction accuracy for ethanol when using the larger def2-tzvp basis set which includes f-orbitals (l = 3). While the predictions of the Hamiltonian and overlap matrices remain remarkably accurate with 8.3 meV and 10−6, respectively, the derived properties exhibit large errors, e.g., an MAE of 0.4775 eV for the orbital energies. For large numbers of orbitals, errors in the Hamiltonian can accumulate due to the diagonalisation. This problem could be solved by improving the neural network architecture to further reduce the prediction error or introducing a density dependent term into the loss function, which will be explored in future investigations.

In this case, a similar accuracy as the other methods could in principle be reached upon the addition of more reference data points. The above results demonstrate the utility of combining a learned Hamiltonian with quantum operators. This makes it possible to access a wide range of chemical properties without the need for explicitly developing specialised neural network architectures.

Chemical insights from electronic deep learning

Recently, a lot of research has focused on explaining predictions of ML models36,37,38 aiming both at the validation of the model39,40 as well as the extraction of scientific insight17,31,41. However, these methods explain ML predictions either in terms of the input space, atom types and positions in this case, or latent features such as local chemical potentials31,42. In quantum chemistry however, it is more common to analyse electronic properties in terms of the MOs and properties derived from the electronic wavefunction, which are direct output quantities of the SchNOrb architecture.

Molecular orbitals encode the distribution of electrons in a molecule, thus offering direct insights into its underlying electronic structure. They form the basis for a wealth of chemical bonding analysis schemes, bridging the gap between quantum mechanics and abstract chemical concepts, such as bond orders and atomic partial charges30. These quantities are invaluable tools in understanding and interpreting chemical processes based on molecular reactivity and chemical bonding strength. As SchNOrb yields the MOs, we are able to apply population analysis to our ML predictions. Figure 2d shows Loewdin partial atomic charges and bond orders for the uracil molecule. Loewdin charges provide a chemically intuitive measure for the electron distribution and can e.g., aid in identifying potential nucleophilic or electrophilic reaction sites in a molecule. The negatively charged carbonyl oxygens in uracil, for example, are involved in forming RNA base pairs. The corresponding bond orders provide information on the connectivity and types of bonds between atoms. In the case of uracil, the two double bonds of the carbonyl groups are easily recognisable (bond order 2.12 and 2.14, respectively). However, it is also possible to identify electron delocalisation effects in the pyrimidine ring, where the carbon double bond donates electron density to its neighbours. A population analysis for malondialdehyde, as well as population prediction errors for all molecules can be found in Supplementary Fig. 4 and Supplementary Table 3.

The SchNOrb architecture enables an accurate prediction of the electronic structure across molecular configuration space, which provides for rich chemical interpretation during molecular reaction dynamics. Figure 3a shows an excerpt of a molecular dynamics simulation of malondialdehyde that was driven by atomic forces predicted using SchNOrb. It depicts the proton transfer together with the relevant MOs and the electronic density. Supplementary Video 1 shows a side-by-side comparison between the predicted and reference HOMO-2 orbital during this excerpt of the trajectory. The density paints an intuitive picture of the reaction as it migrates along with the hydrogen. This exchange of electron density during proton transfer is also reflected in the orbitals. Their dynamical rearrangement indicates an alternation between single and double bonds. The latter effect is hard to recognise based on the density alone and demonstrates the wealth of information encoded in the molecular wavefunctions.

Fig. 3 Proton transfer in malondialdehyde. a Excerpt of the MD trajectory showing the proton transfer, the electron density as well as the relevant MOs HOMO-2 and HOMO-3 for three configurations (I, II, III). b Forces exerted by the MOs on the transferred proton for configurations I and II. c Density of states broadened across the proton transfer trajectory. MO energies of the equilibrium structure are indicated by grey dashed lines. The inset shows a zoom of HOMO-2 and HOMO-3 Full size image

Figure 3b depicts the forces the different MOs exert onto the hydrogen atom exchanged during the proton transfer. All forces are projected onto the reaction coordinate, where positive values correspond to a force driving the proton towards the product state. In the initial configuration I, most forces lead to attraction of the hydrogen atom to the right oxygen. In the intermediate configuration II, orbital rearrangement results in a situation where the majority of orbital force contributions on the hydrogen atom become minimal, representing mostly non-bonding character between oxygens and hydrogen. One exception is MO 13, depicted in the inset of Fig. 3b. Due to a minor deviation from a symmetric O–H–O arrangement, the orbital represents a one-sided O–H bond, exerting forces that promote the reaction. The intrinsic fluctuations during the proton transfer molecular dynamics are captured by the MOs as can be seen in Fig. 3c. This shows the distribution of orbital energies encountered during the reaction. As would be expected, both HOMO-2 and HOMO-3 (inset, orange and blue respectively), which strongly participate in the proton transfer, show significantly broadened peaks due to strong energy variations in the dynamics. This example nicely shows the chemically intuitive interpretation that can be obtained by the electronic structure prediction of SchNOrb.

Deep learning-enhanced quantum chemistry

An essential paradigm of chemistry is that the molecular structure defines chemical properties. Inverse chemical design turns this paradigm on its head by enabling property-driven chemical structure exploration. The SchNOrb framework constitutes a suitable tool to enable inverse chemical design due to its analytic representation of electronic structure in terms of the atomic positions. We can therefore obtain analytic derivatives with respect to the atomic positions, which provide the ability to optimise electronic properties. Figure 4a shows the minimisation and maximisation of the HOMO-LUMO gap \(\epsilon _{{\mathrm{gap}}}\) of malondialdehyde as an example. We perform gradient descent and ascent from a randomly selected configuration r ref until convergence at r min and r max , respectively. We are able to identify structures which minimise and maximise the gap from its initial 3.15 to 2.68 eV at r min and 3.59 eV at r max . While in this proof of concept these changes were predominantly caused by local deformations in the carbon–carbon bonds indicated in Fig. 4a, they present an encouraging prospect how electronic surrogate models such as SchNOrb can contribute to computational chemical design using more sophisticated optimisation methods, such as alchemical derivatives43 or reinforcement learning44.

Fig. 4 Applications of SchNOrb. a Optimisation of the HOMO-LUMO gap. HOMO and LUMO with energy levels are shown for a randomly drawn configuration of the malonaldehyde dataset (centre) as well as for configurations that were obtained from minimising or maximising the HOMO-LUMO gap prediction using SchNOrb (left and right, respectively). For the optimised configurations, the difference of the orbitals are shown in green (increase) and violet (decrease). The dominant geometrical change is indicated by the black arrows. b The predicted MO coefficients for the uracil configurations from the test set are used as a wavefunction guess to obtain accurate solutions from DFT at a reduced number of self-consistent-field (SCF) iterations. This reduces the required SCF iterations by an average of 77% using a Newton solver. In terms of runtime, it is more efficient to use SOSCF, even though this saves only 15% of iterations for uracil Full size image

ML applications for electronic structure methods have usually been one-directional, i.e., ML models are trained to predict the outputs of calculations. On the other hand, models in the spirit of Fig. 1b, such as SchNOrb, offer the prospect of providing a deeper integration with quantum chemistry methods by substituting parts of the electronic structure calculation. SchNOrb directly predicts wavefunctions based on quantum chemistry data, which in turn, can serve as input for further quantum chemical calculations. For example, in the context of HF or DFT calculations, the relevant equations are solved via a self-consistent field approach (SCF) that determines a set of MOs. The convergence with respect to SCF iteration steps largely determines the computational speed of an electronic structure calculation and strongly depends on the quality of the initialisation for the wavefunction. The coefficients predicted by SchNOrb can serve as such an initialisation of SCF calculations. To this end, we generated wavefunction files for the ORCA quantum chemistry package45 from the predicted SchNOrb coefficients, which were then used to initialise SCF calculations. Figure 4b depicts the SCF convergence for three sets of computations on the uracil molecule: using the standard initialisation techniques of quantum chemistry codes, and the SchNorb coefficients with or without a second order solver. Nominally, only small improvements are observed using SchNorb coefficients in combination with a conventional SCF solver. This is due to the various strategies employed in electronic structure codes in order to provide a numerically robust SCF procedure. By performing SCF calculations with a second order solver, which would not converge using a less accurate starting point than our SchNorb MO coefficients, the efficiency of our combined ML and second order SCF approach becomes apparent. Convergence is obtained in only a fraction of the original iterations, reducing the number of cycles by ~77%. Similarly, Supplementary Fig. 5 shows the reduction of SCF iteration by ~73% for malondialdehyde. However, since second-order optimisation steps are more costly, it is more time-efficient to perform conventional SOSCF which reduces the convergence time by 13% and 16% for uracil and malondialdehyde, respectively.

It should be noted, that this combined approach does not introduce any approximations into the electronic structure method itself and yields exactly the same results as the full computation. Another example of integration of the SchNOrb deep learning framework with quantum chemistry, as shown in Fig. 1b, is the use of predicted wavefunctions and MO energies based on Hartree–Fock as starting point for post-Hartree–Fock correlation methods such as Møller–Plesset perturbation theory (MP2). Supplementary Table 5 presents the mean absolute error of an MP2 calculation for ethanol based on wavefunctions predicted from SchNOrb. The associated prediction error for the test set is 83 meV. Compared to the overall HF and MP2 energy, the relative error of SchNOrb amounts to 0.01% and 0.06%, respectively. For the MP2 correlation energy, we observe a deviation of 17%, the reason of which is inclusion of virtual orbitals in the calculation of the MP2 integrals. However, even in this case, the total error only amounts to a deviation of 93 meV.