Variational imaginary time evolution

We focus on many-body systems that are described by Hamiltonians \(H = \mathop {\sum}

olimits_i {\lambda _i} h_i\), with real coefficients, λ i , and observables, h i , that are tensor products of Pauli matrices. We assume that the number of terms in this Hamiltonian scales polynomially with the system size, which is true for many physical systems, such as molecules or the Fermi-Hubbard model. Given an initial state |ψ〉, the normalised imaginary time evolution is defined by

$$\left| {\psi (\tau )} \right\rangle = A(\tau )e^{ - H\tau }\left| {\psi (0)} \right\rangle ,$$ (1)

where \(A(\tau ) = 1/\sqrt {\left\langle {\psi (0)} \right|e^{ - 2H\tau }\left| {\psi (0)} \right\rangle }\) is a normalisation factor. In the instance that the initial state is a maximally mixed state, the state at time τ is a thermal or Gibbs state ρ T=1/τ = e−Hτ/Tr[e−Hτ], with temperature T = 1/τ. When the initial state has a non-zero overlap with the ground state, the state at τ → ∞ is the ground state of H. Equivalently, the Wick rotated Schrödinger equation is,

$$\frac{{\partial \left| {\psi (\tau )} \right\rangle }}{{\partial \tau }} = - (H - E_\tau )\left| {\psi (\tau )} \right\rangle ,$$ (2)

where the term E τ = 〈ψ(τ)|H|ψ(τ)〉 results from enforcing normalisation. Even if |ψ(τ)〉 can be represented by a quantum computer, the non-unitary imaginary time evolution cannot be naively mapped to a quantum circuit.

In our variational method, instead of directly encoding the quantum state |ψ(τ)〉 at time τ, we approximate it using a parametrised trial state \(| {\phi (\vec \theta (\tau ))} \rangle\), with \(\vec \theta (\tau ) = (\theta _1(\tau ),\theta _2(\tau ), \ldots ,\theta _N(\tau ))\). This stems from the intuition that the physically relevant states are contained in a small subspace of the full Hilbert space.38 The trial state is referred to as the ansatz. In condensed matter physics and computational chemistry, a wide variety of ansätze have been proposed for both classical and quantum variational methods.11,20,39,40

Using a quantum circuit, we prepare the trial state, \(| {\phi (\vec \theta )} \rangle\), by applying a sequence of parametrised unitary gates, \(V(\vec \theta ) = U_N(\theta _N) \ldots U_k(\theta _k) \ldots U_1(\theta _1)\) to our initial state, \(\left| {\bar 0} \right\rangle\). We express this as \(| {\phi (\vec \theta )} \rangle = V(\vec \theta )| {\bar 0} \rangle\) and remark that \(V(\vec \theta )\) is also referred to as the ansatz. We refer to all possible states that could be created by the circuit V as the ‘ansatz space’. Here, U k (θ k ) is the kth unitary gate, controlled by parameter θ k , and the gate can be regarded as a single- or two-qubit gate.

To simulate the imaginary time evolution of the trial state, we use McLachlan’s variational principle,41,42

$$\delta \left\| {(\partial /\partial \tau + H - E_\tau )\left| {\psi (\tau )} \right\rangle } \right\| = 0,$$ (3)

where \(\parallel \rho \parallel = {\mathrm{Tr}}[\sqrt {\rho \rho ^\dagger } ]\) denotes the trace norm of a state. By replacing |ψ(τ)〉 with \(| {\phi (\tau )} \rangle = | {\phi (\vec \theta (\tau ))} \rangle\), we effectively project the desired imaginary time evolution onto the manifold of the ansatz space. The evolution of the parameters is obtained from the resulting differential equation

$$\mathop {\sum}\limits_j {A_{ij}} \dot \theta _j = C_i,$$ (4)

where

$$\begin{array}{l}A_{ij} = \Re \left( {\frac{{\partial \left\langle {\phi (\tau )} \right|}}{{\partial \theta _i}}\frac{{\partial \left| {\phi (\tau )} \right\rangle }}{{\partial \theta _j}}} \right)\\ C_i = \Re \left( { - \mathop {\sum}\limits_\alpha {\lambda _\alpha } \frac{{\partial \left\langle {\phi (\tau )} \right|}}{{\partial \theta _i}}h_\alpha \left| {\phi (\tau )} \right\rangle } \right),\end{array}$$ (5)

and h α and λ α are the Pauli terms and coefficients of the Hamiltonian, as described above. The derivation of Eq. (4) can be found in the Supplementary Materials. As both A ij and C i are real, the derivative \(\dot \theta _j\) is also real, as required for parametrising a quantum circuit. Interestingly, although the average energy term E τ appears in Eq. (2), it does not appear in Eq. (4). This is because the ansatz applied maintains normalisation, as it is composed of unitary operators.

Imaginary time evolution with quantum circuits

By following a similar method to that introduced in ref., 28 we can efficiently measure A ij and C i using a quantum computer. We assume that the derivative of a unitary gate U i (θ i ) can be expressed as \(\partial U_i(\theta _i)/\partial \theta _i = \mathop {\sum}{_k {f_{k,i}}} U_i(\theta _i)\sigma _{k,i}\), with unitary operator σ k,i . The derivative of the trial state is given by \(\partial \left| {\phi (\tau )} \right\rangle /\partial \theta _i = \mathop {\sum} {_k {f_{k,i}}} \tilde V_{k,i}\left| {\bar 0} \right\rangle\), with \(\tilde V_{k,i} = U_N(\theta _N) \ldots U_{i + 1}(\theta _{i + 1})U_i(\theta _i)\sigma _{k,i} \ldots U_1(\theta _1)\). There are typically only one or two terms resulting from each derivative. As an example, when U i (θ i ) is a single-qubit rotation \(R_z(\theta _i) = e^{ - i\theta _i\sigma _z/2}\), the derivative \(\partial U_i(\theta _i)/\partial \theta _i = - i/2 \times \sigma _ze^{ - i\theta _i\sigma _z/2}\). The coefficients A ij and C i are given by

$$\begin{array}{l}A_{ij} = \Re \left( {\mathop {\sum}\limits_{k,l} {f_{k,i}^ \ast } f_{l,j}\left\langle {\bar 0} \right|\tilde V_{k,i}^\dagger \tilde V_{l,j}\left| {\bar 0} \right\rangle } \right),\\ C_i = \Re \left( {\mathop {\sum}\limits_{k,\alpha } {f_{k,i}^ \ast } \lambda _\alpha \left\langle {\bar 0} \right|\tilde V_{k,i}^\dagger h_\alpha V\left| {\bar 0} \right\rangle } \right).\end{array}$$ (6)

All of these terms are of the form \(a\Re \left( {e^{i\theta }\left\langle {\bar 0} \right|U\left| {\bar 0} \right\rangle } \right)\) and can be evaluated using the circuits shown in the Supplementary Materials.

With A(τ) and \(\vec C(\tau )\) at time τ, the imaginary time evolution over a small interval δτ can be simulated by evaluating \(\vec{\dot{\theta}}(\tau ) = A^{ - 1}(\tau )\cdot \vec C(\tau )\), and using a suitable update rule, such as the Euler method,

$$\vec \theta (\tau + \delta \tau ) \simeq \vec \theta (\tau ) + \dot{\vec\theta}(\tau )\delta \tau = \vec \theta (\tau ) + A^{ - 1}(\tau )\cdot \vec C(\tau )\delta \tau .$$ (7)

By repeating this process N T = τ total /δτ times, we can simulate imaginary time evolution over a duration τ total . Often, the satisfying parameter evolution is not unique and Eq. (4) is underdetermined. In that case, we can employ truncated singular value decomposition to approximately invert A, or Tikhonov regularisation to additionally constrain the parameters to vary smoothly. We elaborate upon these strategies in the Supplementary Materials.

A limitation of our variational method is that the ansatz may not be able to faithfully describe all states on the desired trajectory, much like its real-time counterpart.28 Even though such states lie in a small subspace of the full Hilbert space,38 it is difficult to prove that they can be generated by a given ansatz, despite promising numerical results.28 However, our numerical results are similarly promising for imaginary time, and demonstrate it to be a robust routine for energy minimisation. Moreover, we believe that when tasked with finding the ground state using imaginary time evolution, a small deviation from the true evolution is less problematic than when trying to simulate real-time evolution. This is because imaginary time evolution always drives a state towards the ground state (or one of the lowest eigenstates), whereas the real-time evolution of two closely separated states may be very different. Consequently, as long as errors due to an imperfect ansatz do not cause the simulation to become trapped in local minima, we do not mind if the evolution deviates from the path of true imaginary time evolution, as ultimately, it will still be driven towards the ground state. Nevertheless, designing ansätze that are well suited to imaginary time evolution is an interesting open problem.

Ground-state energy via imaginary time evolution

We apply our method to the problem of finding the ground-state energy of a many-body Hamiltonian, H. As with the VQE, our goal is to find the values of the parameters, \(\vec \theta\), which minimise the expectation value of the Hamiltonian

$$E_{\min } = \min _{\vec \theta }{\mathrm{ }}\langle {\phi (\vec \theta )} |H| {\phi (\vec \theta )} \rangle ,$$ (8)

where \(| {\phi (\vec \theta )} \rangle = V(\vec \theta )| {\bar 0} \rangle\) is our variational trial state. The VQE solves this problem by using a quantum computer to construct a good ansatz and measure the expectation value of the Hamiltonian, and a classical optimisation routine to obtain new values of the parameters. In order to preserve the exponential speedup of the VQE over classical methods, the trial state is constructed using a number of parameters that scales polynomially with the system size. However, because we may need to consider many possible values for each parameter, the total size of the parameter space still scales exponentially with the system size. Moreover, many optimisation algorithms, such as gradient descent, are liable to becoming trapped in local minima. This combination can make the classical optimisation step of the VQE very difficult.43

As described above, if the initial state has a non-zero overlap with the ground state, true propagation in imaginary time will evolve the system into the ground state, in the limit that τ → ∞. Classically, this has been leveraged as a powerful tool to find the ground-state energy of quantum systems.8,9,11 Using our method, we can efficiently simulate ansatz-based imaginary time evolution to find the ground state, using a quantum computer. In the numerical simulations described below, we use the Euler method to solve differential equations, which corresponds to the update rule for the parameters shown in Eq. (7). We prove in the Supplementary Materials that when δτ is sufficiently small, the average energy of the trial state, \(E(\tau)=\langle\phi(\tau)\vert {H}\vert {\phi}(\tau)\rangle\), always decreases when following the Euler update rule: E(τ + δτ) ≤ E(τ).

In this work, we consider gradient descent, a canonical classical optimisation method

$$\vec \theta (\tau + \delta \tau ) = \vec \theta (\tau ) + \vec G(\tau )\delta \tau = \vec \theta (\tau ) + \vec C(\tau )\delta \tau ,$$ (9)

where \(\vec G(\tau ) = -

abla E(\tau )\) is the gradient of E(τ) and \(\vec C(\tau ) \equiv -

abla E(\tau )\) is the same vector in Eq. (4). Classical optimisation methods only consider information about the average energy, and not about the ansatz itself, which is encoded in the matrix A, used only in variational imaginary time evolution.

Toy example

Here, we present two simple toy examples which highlight the difference between variational imaginary time evolution and gradient descent for finding the ground-state energy of Hamiltonians. Consider the following Hamiltonians

$$H_A = \left( {\begin{array}{*{20}{c}} 1 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 \end{array}} \right),\;\;H_B = \left( {\begin{array}{*{20}{c}} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 \end{array}} \right)$$ (10)

with ansätze

$$\left| {\psi _A(\theta _1,\theta _2,\theta _3)} \right\rangle = e^{i\theta _3}CR_Y^{0,1}(\theta _2)R_X^0(\theta _1)\left| {00} \right\rangle ,$$ (11)

$$\left| {\psi _B(\theta _1,\theta _2,\theta _3)} \right\rangle = e^{i\theta _3}CR_Y^{0,1}(\theta _2)R_X^0(\theta _1)R_X^1(\theta _1)\left| {01} \right\rangle ,$$ (12)

prepared by circuits ((A) and (B), respectively). Here, \(CR_Y^{0,1}\) is a controlled Y rotation with control qubit 0 and target qubit 1, \(R_X^q(\theta _1)\) is a rotation of qubit q around the x-axis, and the rotation about the j-axis is \(R_{\sigma _j}(\theta ) = e^{ - i\theta \sigma _j/2}\) with Pauli matrices σ j . Note that θ 3 is a fictitious parameter which corresponds to the global phase. This is present only so that the evolution of the other parameters are not constrained to produce an oscillating global phase in time, as recently studied in ref. 44.

We study the ability of variational imaginary time and gradient descent to navigate the energy landscapes of the toy systems A and B, and present the results in Fig. 1. Figure 1a shows imaginary time robustly discovering the global minima of the energy landscape of system A, while gradient descent becomes trapped in local minima. Figure 1b shows imaginary time performing comparably with gradient descent for system B, despite it being only slightly more complicated than system A. This shows that it is still possible for imaginary time evolution to become trapped in local minima, when the ansatz is not sufficiently powerful for certain Hamiltonians.

Fig. 1 Comparison of variational imaginary time (top plot in each panel) and gradient descent (bottom plot in each panel) discovering the ground state in toy systems A (top panel) and B (bottom panel). The background colour indicates the energy \(\langle\psi(\theta _1,\;\theta _2,\;\theta _3)\vert {H}\vert\psi(\theta _1,\;\theta _2,\;\theta _3)\rangle\) with red and blue corresponding to the global maximum and ground-state energies, respectively. The arrows indicate the trajectories of the methods, and are coloured green if they converge to the true ground state, and red otherwise Full size image

Simulation of H 2 and LiH

We use our method to find the ground-state energy of the H 2 and LiH molecules in their minimal spin–orbital basis sets. We map the molecular fermionic Hamiltonians to qubit Hamiltonians using the procedure described in the Supplementary Materials. The H 2 Hamiltonian acts on two qubits, and considers the space of two electrons in four spin–orbitals. The LiH Hamiltonian acts on eight qubits, and considers an active space of two electrons in eight spin–orbitals. There are numerous possible choices for the ansatz circuit; we use a universal ansatz for H 2 29 and an ansatz inspired by the low-depth circuit ansatz45 for LiH, as shown in the Supplementary Materials. The simulation results for H 2 are shown in Fig. 2. We have used a universal ansatz, which is capable of representing all states along the imaginary time trajectory to confirm that our method can recover true imaginary time evolution, when the ansatz is sufficiently powerful. We attribute deviation from the true evolution to the use of an Euler update rule, and finite step size. Our simulations were able to converge to the ground state in all trials.

Fig. 2 Simulations of H 2 with random initial parameters and timestep δτ = 0.01. The red line is the exact ground-state energy. The dashed black line is the exact imaginary time evolution. The blue line is the variational imaginary time evolution. The inset plot shows the fidelity of variational imaginary time to true imaginary time evolution. Here we consider an internuclear distance of R = 0.75 Å. The inset plot and main plot share the same x-axis label Full size image

We compare the LiH results to those obtained using the VQE, with gradient descent as the classical optimisation routine. We use the low-depth circuit ansatz shown in the Supplementary Materials for our simulation, with 137 parameters. This is approximately a quarter of the number needed in a universal ansatz. We consider starting from a good initial state (the Hartree–Fock state for LiH), and also random initial states. We believe that the latter simulations provide a more thorough test of both methods.

We use the maximum stable step size δτ for each method such that energy monotonically decreases in the first 200 iterations. The stable timestep for imaginary time was 0.225, and for gradient descent it was 0.886. Figure 3 shows the imaginary time method outperforming gradient descent. It is able to locate the ground state more quickly, and accurately. This advantage is most noticeable for the case of random start states, where the obtained convergence rate is significantly higher than gradient descent. This may be most relevant for solving optimisation problems using the QAOA algorithm, where it is often harder to motivate a good initial starting state.

Fig. 3 Noise-free simulations of LiH at an internuclear distance of R = 1.45 Å. Simulations in the top plot begin in a small random perturbation (of at most, Δθ j = π/50) from the Hartree–Fock state. Simulations in the bottom plot begin with uniformly random parameters. The solid lines (against the left axis) indicate the fraction of 1280 simulations which, by the given iteration, have converged to within 1 mHartree of the true ground state. The dashed lines (against the right axis) indicate the average proximity to the true ground state of only the so-far converged simulations. Imaginary time and gradient descent use their maximum stable timesteps Full size image

It is natural to question whether the resource requirements of variational imaginary time evolution are comparable with those of gradient descent. We assess this by performing a simple resource estimation, and by examining its sensitivity to shot noise and to gate errors within the quantum computer. At each iteration, populating the gradient vector requires \({\cal{O}}(N_CN_HN_p)\) measurements, where N C is the number of measurements required to ascertain a Hamiltonian term to the required precision, N H is the number of terms in the Hamiltonian, and N p is the number of parameters used in the ansatz. For imaginary time, the total cost is \({\cal{O}}(N_CN_HN_p + N_p^2N_A)\), where N A is the number of measurements required to ascertain an element of the A matrix to the required precision. Sensible ansätze typically have fewer parameters N p than there are Hamiltonian terms N H (in our LiH simulations, N p = 137 and N H = 181). If the number of Hamiltonian terms is considerably larger than the number of parameters used, then the additional cost \({\cal{O}}(N_p^2N_A)\) of imaginary time can be dominated by the cost of calculating the gradient vector. While this is not true for our LiH simulations, we find that it is possible to further reduce the cost of imaginary time by using fewer measurements for each term in the A matrix than for each element of the gradient vector (N A ≪ N C ), while still maintaining imaginary time’s superior performance. We demonstrate this in Fig. 4, where we vary the number of measurements used to populate A, and simulate the methods under the effect of decoherence. The results show that imaginary time can perform significantly better than gradient descent under the presence of noise, even when significantly fewer measurements are made. However, if the gradient is not known to sufficient accuracy (N C < 2 × 104), the reliability of imaginary time evolution cannot be improved by increasing N A , and can even perform less effectively than gradient descent. Combined with imaginary time’s faster convergence and the tendency of gradient-based methods to become trapped in local minima, we expect finding the ground state to require substantially fewer measurements using imaginary time than gradient descent.