J. Phys. Soc. Jpn. 90, 032001 (2021) [33 Pages]

Hybrid Quantum-Classical Algorithms and Quantum Error Mitigation

+ Affiliations
1NTT Secure Platform Laboratories, NTT Corporation, Musashino, Tokyo 180-8585, Japan2Department of Materials, University of Oxford, Parks Road, Oxford OX1 3PH, United Kingdom3Center on Frontiers of Computing Studies, Department of Computer Science, Peking University, Beijing 100871, China4Stanford Institute for Theoretical Physics, Stanford University, Stanford, CA 94305, U.S.A.

Quantum computers can exploit a Hilbert space whose dimension increases exponentially with the number of qubits. In experiment, quantum supremacy has recently been achieved by the Google team by using a noisy intermediate-scale quantum (NISQ) device with over 50 qubits. However, the question of what can be implemented on NISQ devices is still not fully explored, and discovering useful tasks for such devices is a topic of considerable interest. Hybrid quantum-classical algorithms are regarded as well-suited for execution on NISQ devices by combining quantum computers with classical computers, and are expected to be the first useful applications for quantum computing. Meanwhile, mitigation of errors on quantum processors is also crucial to obtain reliable results. In this article, we review the basic results for hybrid quantum-classical algorithms and quantum error mitigation techniques. Since quantum computing with NISQ devices is an actively developing field, we expect this review to be a useful basis for future studies.

©2021 The Physical Society of Japan


1. Introduction   2

2. Basic Variational Quantum Algorithms   2

 2.1 Variational quantum eigensolver   3

 2.2 Real and imaginary time evolution quantum simulator   4

3. Variational Quantum Optimisation   5

 3.1 Quantum approximate optimisation algorithms   6

 3.2 Variational algorithms for machine learning   6

  3.2.1 Quantum circuit learning   7

  3.2.2 Data-driven quantum circuit learning   7

  3.2.3 Quantum generative adversarial networks   8

  3.2.4 Quantum autoencoder for quantum data compression   8

  3.2.5 Variational quantum state eigensolver   9

 3.3 Variational algorithms for linear algebra   9

 3.4 Excited state-search variational algorithms   10

  3.4.1 Overlap-based method   10

  3.4.2 Quantum subspace expansion   11

  3.4.3 Contraction VQE methods   11

  3.4.4 Calculation of the Green's function   11

 3.5 Variational circuit recompilation   12

 3.6 Variational-state quantum metrology   13

 3.7 Variational quantum algorithms for quantum error correction   13

  3.7.1 Variational circuit compiler for quantum error correction   14

  3.7.2 Variational quantum error corrector (QVECTOR)   14

 3.8 Dissipative-system variational quantum eigensolver   14

 3.9 Other applications   15

4. Variational Quantum Simulation   15

 4.1 Variational quantum simulation algorithms for density matrix   15

  4.1.1 Variational real time simulation for open quantum system dynamics   15

  4.1.2 Variational imaginary time simulation for a density matrix   15

 4.2 Variational quantum simulation algorithms for general processes   16

  4.2.1 Generalised time evolution   16

  4.2.2 Matrix multiplication and linear equations   16

  4.2.3 Open system dynamics   16

 4.3 Gibbs state preparation   17

 4.4 Variational quantum simulation algorithms for estimating the Green's function   17

 4.5 Other applications   17

5. Quantum Error Mitigation   17

 5.1 Extrapolation   18

  5.1.1 Richardson extrapolation   18

  5.1.2 Exponential extrapolation   19

  5.1.3 Methods to boost physical errors   19

  5.1.4 Mitigation of algorithmic errors   20

 5.2 Least square fitting for several noise parameters   20

 5.3 Quasi-probability method   21

 5.4 Quantum subspace expansion   22

 5.5 Symmetry verification   22

 5.6 Individual error reduction   23

 5.7 Measurement error mitigation   23

 5.8 Learning-based quantum error mitigation   24

  5.8.1 Quantum error mitigation via Clifford data regression   24

  5.8.2 Learning-based quasi-probability method   24

 5.9 Stochastic error mitigation   25

 5.10 Combination of error mitigation techniques   25

  5.10.1 Symmetry verification with error extrapolation   25

  5.10.2 Quasi-probability method with error extrapolation   26

  5.10.3 Symmetry verification with quasi-probability method   26

  5.10.4 Combining quasi-probability, symmetry verification and error extrapolation   27

6. Conclusion   27

 Acknowledgments   27

A. Derivation of Eq. (11) for Variational Quantum Simulation   27

B. Hadamard Test and Quantum Circuits for Variational Quantum Simulation   28

C. SWAP Test and Destructive SWAP Test   28

D. Methodologies for Optimisation   29

 D·1 Local cost function   29

 D·2 Hamiltonian morphing optimisation   30

E. Subspace Expansion   30

 References   30

1. Introduction

As the dimension of the Hilbert space of a quantum system increases exponentially with respect to the system size, general quantum systems are, in principle, hard to simulate on a classical computer. For example, systems manipulating tens to hundreds of qubits have been believed to be classically intractable, and they have been proposed for demonstrating quantum advantages over classical supercomputers in the so-called task of “quantum supremacy”.1) In October 2019, Google announced that they had successfully demonstrated quantum supremacy with a 53 qubit device, named Sycamore.2) The dimension of the computational state-space is as large as \(2^{53}\approx 9.0\times 10^{15}\). To sample one output of a quantum circuit on the 53 qubits, it is estimated that a classical supercomputer would need 10,000 years, whilst Sycamore only took 200 s. Although recent efforts have significantly reduced the classical simulation cost,3) classical simulation of a general quantum circuit will certainly become an intractable task as we increase the gate fidelity, the gate depth, or the number of qubits.

While the tasks considered in quantum supremacy are generally mathematically abstract problems, ultimately the field must progress to demonstrate true quantum advantage, i.e., to solve a problem of practical value with superior efficiency using a quantum device. Current quantum hardware only incorporates a small number (tens) of qubits with a non-negligible gate error rate, making it insufficient for implementing conventional quantum algorithms such as Shor's factoring algorithm,4) the phase estimation algorithm,5) and Hamiltonian simulation algorithms.6) These generally require one to accurately control millions of qubits when taking account of fault-tolerance.7)

Before realising a universal fault-tolerant quantum computer, a more feasible scenario for current and near-term quantum computing is the so-called noisy intermediate-scale quantum (NISQ) regime,8) where we control tens to thousands of noisy qubits with gate errors that may be on the order of \(10^{-3}\) or lower. Although NISQ computers are not universal, we may exploit them to solve certain computational tasks, such as chemistry simulation, significantly faster than classical computers, via a combination of quantum and classical computers.915) Intuitively, because a large portion of the computational burden is processed on the classical computer, fully coherent deep quantum circuits may not be required. As both quantum and classical computers are used, such simulation methods are called hybrid quantum-classical algorithms. In addition, to compensate computation errors, quantum error mitigation techniques can be used by a post-processing of the experiment data.14,16,17) Since quantum error mitigation does not necessitate encoding of qubits as full error correction does, it thus contributes to a huge saving of qubits, which is vital for NISQ simulation.

In this review paper, we aim to summarise the most basic ideas of hybrid quantum-classical algorithms and quantum error mitigation techniques. In Sect. 2, we introduce the basic algorithms — the variational quantum eigensolver and variational quantum simulation — for finding a ground state or simulating dynamical evolution of a many-body Hamiltonian.9,14,18,19) In Sect. 3, we show how the variational quantum eigensolver algorithm can be extended for general optimisation problems including machine learning problems, linear algebra problems, excited energy spectra, etc.2033) Meanwhile, we show in Sect. 4 that the variational quantum simulation algorithm may be extended as well for open systems,19,34) general processes,34) thermal states,19) and calculating the Green's function.35) Finally, in Sect. 5, we show several error mitigation methods for suppressing errors in NISQ computing.14,16,17,3647) This review does not cover the application of NISQ computers in solving specific physics problems and we refer to McArdle et al.48) and Cao et al.49) for reviews for its application in quantum computational chemistry, to Bauer et al.50) in quantum materials, etc.

2. Basic Variational Quantum Algorithms

Since NISQ devices can only apply a relatively shallow circuit on a limited number of qubits, conventional quantum algorithms may not be implemented on NISQ devices. Here we consider hybrid quantum-classical algorithms tailored to NISQ computing. Because the algorithms generally use parametrised quantum circuits and variationally update the parameters, they are also called variational quantum algorithms (VQAs).

For implementing VQAs,9,11,12) we first consider the parametrised trial wave function as \begin{equation} |\varphi(\vec{\theta})\rangle = U(\vec{\theta})|\varphi_{\textit{ref}}\rangle, \end{equation} (1) where \(U(\vec{\theta}) = U_{N}(\theta_{N})\cdots U_{k}(\theta_{k})\cdots U_{1}(\theta_{1})\) generally consists of single or two qubit gates, and \(\vec{\theta} = (\theta_{1},\theta_{2},\ldots,\theta_{N})^{T}\) is a vector of real parameters, and \(|\varphi_{\textit{ref}}\rangle\) is the initial state. Typically, when we have \(N_{q}\)-qubit quantum processors, we can choose \(|\varphi_{\textit{ref}}\rangle = |0\rangle^{\otimes N_{q}}\) or any initial state from a classical computation. Here, we refer to \(|\varphi(\vec{\theta})\rangle\) as the ansatz state, and \(U(\vec{\theta})\) as the ansatz circuit. The routine of a variational quantum algorithm typically works by preparing the trial state, measuring the state, and updating the parameters according to a classical algorithm on the measurement results. To circumvent the accumulation of physical errors, we generally assume that the ansatz \(U(\vec{\theta})\) is implemented with a shallow quantum circuit. The schematic figure of VQAs is shown in Fig. 1.

Figure 1. Schematic of variational quantum algorithms. The ansatz state \(|\psi(\vec{\theta})\rangle\) is generated via a short-depth parametrised quantum circuit and measured to extract classical data. The measurement result is fed to a classical computer to update the parameters.

Although there exist a large number of VQAs, they can be generally classified into two categories: variational quantum optimisation (VQO) and variational quantum simulation (VQS). VQO involves optimising parameters under a cost function. For example, when we minimise the energy of the state, i.e., the expectation value of the given Hamiltonian as a cost function, the cost function after optimisation approximates the ground state energy. The corresponding state also approximates the ground state. This is the so-called variational quantum eigensolver9,12) and other VQO algorithms can be similarly designed by properly changing the cost function to other metrics. While variational quantum optimisation aims to optimise a static target cost function, VQS aims to simulate a dynamical process, such as the Schrödinger time evolution of a quantum state.14,19) VQS algorithms can also be applied for optimising a static cost function, such as variational imaginary time simulation, or studying general many-body physics problems. The distinction between variational quantum optimisation and variational quantum simulation is not absolute, and algorithms for problems in one category may be adapted for those in the other category. Before showing how specific VQO or VQS algorithms work for specific tasks, in this section, we first illustrate the most basic VQO algorithm, a variational quantum eigensolver for finding ground state energy, and the most basic VQS algorithms for simulating real and imaginary time simulation.

Variational quantum eigensolver

The variational quantum eigensolver (VQE) is a hybrid quantum-classical algorithm for computing the ground state and the ground state energy of a Hamiltonian H of interest. In the seminal work by Peruzzo et al.,9) the VQE algorithm was theoretically proposed and experimentally demonstrated for finding the ground state energy of the HeH+ molecule using a two-qubit photonic quantum processor. We note that the conventional approach for finding eigenstates of a Hamiltonian with a universal quantum computer is by adiabatic state preparation and quantum phase estimation (QPE).51) We refer to McArdle et al.48) and Cao et al.49) for the review.

The VQE algorithm relies on the Rayleigh–Ritz variational principle, i.e., for any parameterised quantum state \(|\varphi(\vec{\theta})\rangle\), we have \begin{equation} \min_{\vec{\theta}}\langle\varphi(\vec{\theta})|H |\varphi(\vec{\theta})\rangle \geq E_{G}, \end{equation} (2) where \(E_{G}\) is the ground state energy of the Hamiltonian H and the minimisation is over all parameters \(\vec{\theta}\).12) As we will explain shortly, we can efficiently calculate \(\langle\varphi(\vec{\theta})|H |\varphi(\vec{\theta})\rangle\) with quantum processors. Therefore, by optimising parameters \(\vec{\theta}\) via a classical computer, considering \(E(\vec{\theta}) =\langle\varphi(\vec{\theta})|H |\varphi(\vec{\theta})\rangle\) as a to-be-minimised cost function, we can approximate the ground state energy and the ground state.

Now, we explain how to measure \(E(\vec{\theta}) =\langle\varphi(\vec{\theta})|H |\varphi(\vec{\theta})\rangle\) for a Hamiltonian H. As Pauli operators and products of them \(\{I, X, Y, Z \}^{\otimes N_{q}}\) form the complete basis for operators, any Hamiltonian can be expanded as \begin{equation} \begin{split} H &= \sum_{\alpha} f_{\alpha} P_{\alpha},\\ P_{\alpha} &\in \{I, X, Y, Z \}^{\otimes N_{q}}, \end{split} \end{equation} (3) where \(f_{\alpha}\) are real coefficients and \(\{X, Y, Z \}\) are Pauli operators. While an arbitrary \(N_{q}\)-qubit operator may have exponential terms in the expansion, Hamiltonians in reality are generally sparse so that the expansion only has the number of terms polynomial to \(N_{q}\). For example, the Hamiltonian of the 1D Ising model is \begin{equation} H = h\sum_{i} Z_{i} Z_{i+1} + \lambda \sum_{i} X_{i}, \end{equation} (4) where the number of terms is linear to the number of qubits. This also holds true for other systems. For example, to simulate the electronic structure of molecules, we consider the fermionic Hamiltonian \begin{equation} H_{f} = \sum_{ij} t_{ij} a_{i}^{\dagger} a_{j} +\sum_{ijkl} u_{ijkl} a^{\dagger}_{i} a_{k}^{\dagger} a_{l} a_{j}, \end{equation} (5) where \(a_{j}^{\dagger}\) (\(a_{j}\)) denotes the creation (annihilation) operator for a fermion in the j-th orbital, and \(t_{ij}\) and \(u_{ijkl}\) are the one and two electron interactions, which can be efficiently calculated by integrating the basis set wave functions.52,53) The Hamiltonian consists of a polynomial number of terms consisting of products of fermionic operators. They can be further mapped to qubit operators via encoding methods such as Jordan–Wigner, parity, and Bravi–Kitaev encodings.51,54,55) For example, the Jordan–Wigner transformation is defined as \begin{equation} \begin{split} a_{i}^{\dagger} &\rightarrow I^{\otimes i-1} \otimes \sigma^{-} \otimes Z^{\otimes N-i},\\ a_{i} &\rightarrow I^{\otimes i-1} \otimes \sigma^{+} \otimes Z^{\otimes N-i}, \end{split} \end{equation} (6) where \(\sigma^{\pm} = (X\mp i Y)/2\). We can therefore map the fermion Hamiltonian to the form of Eq. (3) with a polynomial number of Pauli operators.

By assuming a Pauli decomposition of the Hamiltonian \(H =\sum_{\alpha} f_{\alpha} P_{\alpha}\), the cost function \(E(\vec{\theta})\) becomes \begin{equation} E(\vec{\theta}) = \sum_{\alpha} f_{\alpha} \langle\varphi(\vec{\theta})|P_{\alpha}|\varphi(\vec{\theta})\rangle, \end{equation} (7) where each term \(\langle\varphi(\vec{\theta})|P_{\alpha}|\varphi(\vec{\theta})\rangle\) is evaluated by calculating the expectation value of each Pauli operator \(P_{\alpha}\). Note that the measurement of \(P_{\alpha}\) can be fully parallelised by using many quantum processors.

After we have obtained \(E(\vec{\theta})\), we update the parameters by using a classical computer to minimise the cost function. As an example, the gradient descent method updates the parameter settings as \begin{equation} \vec{\theta}^{(n+1)} = \vec{\theta}^{(n)}-a \nabla E(\vec{\theta}^{(n)}), \end{equation} (8) where \(\vec{\theta}^{(n)}\) and \(\vec{\theta}^{(n+1)}\) denote the parameters at the n-step and the \(n+1\)-step, respectively, \(a>0\) is a parameter determining the step size, and \(\nabla E(\vec{\theta}^{(n)})\) is the gradient of the cost function at \(\vec{\theta}^{(n)}\). The gradient descent method deterministically decreases the cost function to a local minimum. The concept of the VQE is illustrated in Fig. 2. In practice, it is important to opt for a fast and accurate optimisation method properly to reach the global minimum or feasible solution in a reasonable time. Also, the optimisation method should be robust to physical noise and shot noise in the quantum hardware. In addition to gradient based methods, another type of optimisation method is via direct search of the cost function. While the gradient may be more sensitive to physical noise — it typically vanishes exponentially in the number of qubits,56) direct search is believed to be more robust to physical noise, which may necessitate less repetitions.57) We refer to McArdle et al.48) for the review of classical optimisation algorithms.

Figure 2. Schematic of the variational quantum eigensolver. The expectation values of the Pauli operators \(P_{\alpha}\) are measured for the ansatz state, and the expectation value of the Hamiltonian is measured as a cost function on the classical computer. The cost function is sent to a classical optimiser to update the parameters.

Whether the VQE algorithm works also depends on the choice of the ansatz. To have an efficient quantum simulation algorithm, we need to use a suitable ansatz for the problem. If the ansatz state cannot express the solution, e.g., when the solution is a highly entangled state but the ansatz can only generate low entangled states, it cannot find the correct solution. In literature, several different types of ansätze are proposed for different purposes. For example, the unitary coupled cluster ansatz is known as a suitable physically inspired ansatz for electronic structure problems in chemistry.9,58,59) However, the unitary coupled cluster ansatz generally necessitates a complicated form of quantum circuit with gates applied on multiple number of qubits, where each multi-qubit gate could be decomposed as a sequence of two-qubit gates, and the number of multi-qubit gates is quadratic to the number of qubits and the number of electrons. Since the unitary coupled cluster ansatz involves many general two-qubit gates, it may be hard to implement on noisy quantum devices with short coherence time and limited connectivity. This problem might be circumvented by leveraging so-called “hardware efficient ansätze”. This family of ansätze might be more experimentally feasible,9,10,13) because they are constructed based on realisable demands on connectivity and gate operations that correspond to real quantum devices. However, a hardware efficient ansatz does not reflect the details of the simulated quantum system, and it has been shown that exponentially vanishing gradients (so-called barren plateaus) are liable to occur for randomly initialised parameters.60) There are several other methods proposed to circumvent the vanishing gradient problem.6164) We refer to McArdle et al.48) for a more detailed discussion for ansatz construction.

The disadvantage of the VQE method is that the correctness of the solution relies on the heuristic choice of the ansatz and the optimisation may be caught by a local instead of global minimum. Furthermore, the total number of measurements is \(O(\epsilon^{-2})\) for reaching a precision ϵ due to shot noise,9,12) which is quadratically worse than the conventional QPE algorithm that uses universal quantum computers.

The VQE algorithm has been experimentally demonstrated by several groups.9,10,6570) To date, the hydrogen chain was simulated with a 12-qubit superconducting system.71)

Real and imaginary time evolution quantum simulator

Now we introduce the basic algorithms for variational quantum simulation (AQS), in particular, for simulating real14) and imaginary18) time evolution. The real time evolution of a quantum system can be described via the Schrödinger equation as \begin{equation} \frac{d |\psi(t)\rangle}{dt} = - i H |\psi(t)\rangle, \end{equation} (9) where H is the Hamiltonian and \(|\psi(t)\rangle\) is the time-dependent state. The conventional approach for simulating the evolution is to realize the time evolution \(e^{-iH t}\) as a unitary circuit and the state at time t is obtained by applying the unitary to the state.6,7274) The circuit depth generally increases polynomially with respect to the evolution time t. Instead, variational quantum simulation algorithms assume that the quantum state \(|\psi(t)\rangle\) is represented by an ansatz quantum circuit, \(|\varphi (\vec{\theta}(t))\rangle = U(\vec{\theta}(t))|\varphi_{\textit{ref}}\rangle\), and the time evolution of the Schrödinger equation of the state \(|\psi(t)\rangle\) is mapped to the evolution of the parameters \(\vec{\theta}(t)\).

Different variational principles can be used to have different evolution equations of the parameters. The three most conventional variational principles are — The Dirac and Frenkel variational principle,75,76) McLachlan's variational principle,77) and the time-dependent variational principle.78,79) The Dirac and Frenkel variational principle is not suitable for variational quantum simulation because the equation of the parameters may involve complex solutions, which contradicts with the requirement that parameters are real. Although the other two variational principles both give real solutions, it is shown that the time-dependent variational principle could be more unstable and it cannot be applied for evolution of density matrices and imaginary time evolution. On the contrary, McLachlan's principle generally produces stable solutions and it is also applicable to all the other scenarios beyond real time simulation. We refer a detailed study of the three variational principles to Yuan et al.19)

Now, we focus on McLachlan's variational principle77) and show how to derive the evolution of the parameters that effectively simulates the time evolution. McLachlan's principle requires one to minimise the distance between the ideal evolution and the evolution of the parametrised trial state as \begin{equation} \delta \|(\partial/\partial t + iH)|\varphi (\vec{\theta}(t))\rangle \| = 0, \end{equation} (10) where \(\|{|}\varphi\rangle \| =\langle\varphi|\varphi\rangle\) and δ is the variation over the derivative of the parameters \(\dot{\theta}_{j}\). With real parameters \(\vec{\theta}\), the solution gives the evolution of the parameters \begin{equation} \sum_{j} M_{k,j}\dot{\theta}_{j} = V_{k}, \end{equation} (11) with coefficients \begin{equation} \begin{split} M_{k,j} &= \mathop{\text{Re}}\nolimits \left(\frac{\partial \langle\varphi(\vec{\theta}(t))|}{\partial \theta_{k}}\frac{\partial |\varphi(\vec{\theta}(t))\rangle}{\partial \theta_{j}}\right),\\ V_{k} &= -\mathop{\text{Im}}\nolimits\left(\langle\varphi(\vec{\theta}(t))|H \frac{\partial |\varphi(\vec{\theta}(t))\rangle}{\partial \theta_{k}} \right), \end{split} \end{equation} (12) where \(\mathop{\text{Re}}\nolimits(\cdot)\) and \(\mathop{\text{Im}}\nolimits(\cdot)\) are the real and imaginary parts, respectively. We refer to Appendix A for the derivation.

Next, we describe the variational imaginary time simulation algorithm.18) The normalised Wick-rotated Schrödinger equation80) is obtained by replacing t in Eq. (9) with = \(i\tau\), \begin{equation} \frac{d |\psi(\tau)\rangle}{d\tau} = - (H -\langle H\rangle)|\psi(\tau)\rangle, \end{equation} (13) where \(\langle H\rangle =\langle \psi(\tau)|H|\psi(\tau)\rangle\) is included for preserving the norm of the state \(|\psi(\tau)\rangle\). Notably, the imaginary time evolution can be leveraged for preparing a Gibbs state and for discovering the ground state of quantum systems.18,19) Following the same procedure for real time evolution, we first make use of McLachlan's principle, \begin{equation} \delta \|(\partial/\partial \tau +H-\langle H\rangle)|\varphi(\vec{\theta}(\tau))\rangle \| = 0, \end{equation} (14) which then gives the evolution of the parameters: \begin{equation} \sum_{j} M_{k,j} \dot{\theta}_{j} = C_{k}, \end{equation} (15) with M defined in Eq. (12) and C defined by \begin{equation} C_{k} = -{\mathop{\text{Re}}\nolimits}\left(\langle\varphi(\vec{\theta}(\tau))|H \frac{\partial |\varphi(\vec{\theta}(\tau))\rangle}{\partial \theta_{k}}\right) = - \frac{1}{2} \frac{\partial E(\vec{\theta})}{\partial \theta_{k}}, \end{equation} (16) with \(E(\vec{\theta}) = (\langle\varphi(\vec{\theta}(\tau))|H |\varphi(\vec{\theta}(\tau))\rangle)\). Note that the C vector is related to the gradient of the energy, implying that variational imaginary time simulation can be regarded as a generalisation of the gradient descent method. It has been numerically found that the variational imaginary time simulation algorithm might be less sensitive to local minima in contrast to simple gradient descent methods.18) In addition, when imaginary time evolution does reach a minimum that is not the ground state, it tends to be an excited eigenstate of the Hamiltonian, which can be thus exploited for finding general eigen-spectra.32) It has recently been observed that an equivalent formulation of the variational imaginary time algorithm can be obtained by exploiting the quantum natural gradient approach.8183) Notice that since the M matrix has to be measured, variational imaginary time simulation needs more measurements than the conventional gradient descent method. However, for an increasing system size and simulation time, the number of measurements required for M matrix is asymptotically negligible.84)

Given the current parameters \(\vec{\theta}\), we now show how to efficiently measure the M, V, and C terms with quantum circuits. Suppose \(|\varphi(\vec{\theta}(t))\rangle = U_{N}(\theta_{N})\cdots U_{k}(\theta_{k})\cdots U_{1}(\theta_{1})|\varphi_{\textit{ref}}\rangle\) with the derivative of each unitary \(U_{k}(\theta_{k})\) expressed as \begin{equation} \frac{\partial U_{k}(\theta_{k})}{\partial \theta_{k}} = \sum_{i} g_{k,i} U_{k} \sigma_{k,i}, \end{equation} (17) where \(\sigma_{k,i}\) are unitary operators and \(g_{k,i}\) are complex coefficients. For instance, assuming \(U_{k} (\theta_{k}) = e^{-i\theta_{k} X}\), we have \(\partial U_{k} (\theta_{k})/\partial\theta_{k} = -i U_{k} X\) and hence \(g_{k,i} = -i\) and \(\sigma_{k,i} = X\). Thus, the derivative of the ansatz state is \begin{equation} \frac{\partial |\varphi(\vec{\theta}(t))\rangle}{\partial \theta_{k}} = \sum_{i} g_{k,i} U_{k,i} |\varphi_{\textit{ref}}\rangle, \end{equation} (18) where we defined \begin{equation} U_{k,i} = U_{N} U_{N-1} \cdots U_{k+1} U_{k} \sigma_{k,i} \cdots U_{2} U_{1}. \end{equation} (19) Now each \(M_{k,j}\) can be written as \begin{equation} M_{k,j} = \sum_{i,q}\mathop{\text{Re}}\nolimits (g^{*}_{k,i}g_{j,q} \langle\varphi_{\textit{ref}}| U^{\dagger}_{k,i} U_{j,q} |\varphi_{\textit{ref}}\rangle). \end{equation} (20) Supposing the Hamiltonian is decomposed as \(H =\sum_{\alpha} f_{\alpha} \sigma_{\alpha}\), with real coefficients \(f_{\alpha}\) and Pauli operators \(\sigma_{\alpha}\), we have \(V_{k}\) and \(C_{k}\) as \begin{equation} \begin{split} V_{k} &= - \sum_{i,\alpha} \mathop{\text{Re}}\nolimits (ig^{*}_{k,i} f_{\alpha} \langle\varphi_{\textit{ref}}| U^{\dagger}_{k,i}\sigma_{\alpha}U |\varphi_{\textit{ref}}\rangle),\\ C_{k} &= - \sum_{i,\alpha} \mathop{\text{Re}}\nolimits (g^{*}_{k,i} f_{\alpha} \langle\varphi_{\textit{ref}}| U^{\dagger}_{k,i}\sigma_{\alpha}U |\varphi_{\textit{ref}}\rangle). \end{split} \end{equation} (21) Note that each term constituting M, V, and C can be written as a sum of terms as \begin{equation} a \mathop{\text{Re}}\nolimits (e^{i\theta} \langle\varphi_{\textit{ref}}| \mathcal{V} |\varphi_{\textit{ref}}\rangle), \end{equation} (22) where \(a,\theta \in\mathbb{R}\) depend on the coefficients \(g_{k,i}\) and \(f_{j}\), and \(\mathcal{V}\) is either \(U^{\dagger}_{k,i}U_{j,q}\) or \(U^{\dagger}_{k,i}\sigma_{\alpha}U\). Then we can calculate M, C, and V with the quantum circuits shown in Fig. 3. Refer to Appendix B for a detailed explanation about construction of the quantum circuit. Notice that comprehensive analysis on the sampling cost for M, V, and C has been given by van Straaten and Koczor.84) We summarise the variational algorithm for simulating real (imaginary) time evolution.

Figure 3. Quantum circuits for computing (a) \(\mathop{\text{Re}}\nolimits(e^{i\theta}\langle\varphi_{\textit{ref}}|U_{k,i}^{\dagger} U_{j,q}|\varphi_{\textit{ref}}\rangle)\) and (b) \(\mathop{\text{Re}}\nolimits(e^{i\theta}\langle\varphi_{\textit{ref}}| U_{k,i}^{\dagger}\sigma_{j} U|\varphi_{\textit{ref}}\rangle)\).14,18,19)

The schematic figure is also shown in Fig. 4. We can also simulate time-dependent Hamiltonian evolution by using the time-dependent Hamiltonian at each step. The accuracy of the simulation can be computed at each step by the distance between the evolution of the ansatz state and that of the ideal evolution.19) In the case of the real time simulation, we have \begin{equation} \begin{split} \Delta &= \|(\partial/\partial t + iH)|\varphi (\vec{\theta}(t))\rangle \|^{2}\\ &= \sum_{k,j} M_{k,j} \dot{\theta}_{k} \dot{\theta}_{j} -2 \sum_{k} V_{k} \dot{\theta}_{k} +\langle H^{2}\rangle, \end{split} \end{equation} (23) which is a function of M, V, and \(\langle H^{2}\rangle\). By minimising the simulation error Δ, several strategies have been proposed to adaptively construct the optimal ansatz circuit for variational quantum simulation.85,86) Similar arguments also hold for imaginary time evolution. Note that a variational real time simulation algorithm was demonstrated in an experiment using 4 superconducting qubits.87) In this experiment, adiabatic quantum computing was simulated, which was used for discovering eigenstates of an Ising Hamiltonian.

Figure 4. Schematic of the variational quantum simulation algorithm. The elements of M matrix and V (C) vectors are measured for the ansatz state. The results are sent to the classical computer to solve \(M\dot{\vec{\theta}} = V\) (C) to update the parameters, which will be fed to quantum processors.

3. Variational Quantum Optimisation

In this section, we illustrate several examples of variational optimisation algorithms for different problems. The key idea is to construct a Hamiltonian or a cost function such that the solution of the problem corresponds to the ground state or the minimum of the cost function.

Quantum approximate optimisation algorithms

The quantum approximate optimisation algorithm (QAOA)13) was initially proposed for solving classical optimisation problems. The algorithm works by mapping the classical problem to a Hamiltonian \(H_{P}\) so that the ground state corresponds to the solution. Since we try to solve a classical optimisation problem, and the Hamiltonian \(H_{P}\) is diagonal in the computational basis as \(H_{P} =\sum_{\alpha} f_{\alpha} P_{\alpha}\) where \(P_{\alpha}\in \{I, Z \}^{\otimes N_{q}}\).

As an example, we consider the Boolean satisfiability problem, which aims to find solutions of Boolean variables so that all given clauses in a propositional formula are true. The j-th Boolean variable denoted as \(x_{j}\) takes a value either 1 or 0, each of which corresponds to true and false. The Boolean satisfiability problem consists of \(x_{i}\lor x_{j}\), \(x_{i}\land x_{j}\), and \(\bar{x}\) operations. \(x_{i}\lor x_{j}\) becomes 1 when either of \(x_{i}\) or \(x_{j}\) is 1, \(x_{i}\land x_{j}\) is a product of \(x_{i}\) and \(x_{j}\), and \(\bar{x}\) operations flip the value x. An example of Boolean satisfiability problem is \((x_{1}\lor x_{2})\land (\bar{x}_{1}\lor x_{2})\land (\bar{x}_{1}\lor \bar{x}_{2})\) and the solution that all the clauses are true (with value 1) is \(x_{1} = 0\) and \(x_{2} = 1\).

The general form of this problem (with clauses involving three or more booleans) is generally NP-hard and has a wide range of applications in computer science and cryptography.88) To map the Boolean satisfiability problem to a Hamiltonian, we first get the Hamiltonian with its ground state being the solution of each clause. For example, the Hamiltonian for the first clause \((x_{1}\lor x_{2})\) is \begin{equation} H_{1} = \frac{1}{4} (I-Z_{1})(I-Z_{2}). \end{equation} (24) After having the Hamiltonian \(H_{k}\) for each clause, the Hamiltonian for all clauses is expressed as \(H_{P} =\sum_{k} H_{k}\).

Since the solution corresponds to the ground state of \(H_{P}\), we can try to solve the problem by using a quantum algorithm for searching for the ground state. The variational approach for solving such problems is first studied by Farhi et al.13) who introduced the ansatz \begin{equation} U(\vec{\theta}_{1}, \vec{\theta}_{2}) = \prod_{k = 1}^{D}e^{-i \theta_{1}^{(k)} H_{X}} e^{-i \theta_{2}^{(k)} H_{P}}, \end{equation} (25) with \(H_{X} =\sum_{j = 1}^{N_{q}} X_{j}\), D being the number of repetitions of the ansatz quantum circuit, \(\vec{\theta}_{1} = (\theta_{1}^{(1)},\theta_{1}^{(2)},\ldots)\), and \(\vec{\theta}_{2} = (\theta_{2}^{(1)},\theta_{2}^{(2)},\ldots)\). With initial states \(|{-},-,\ldots\rangle\) and \(|{-}\rangle ={\frac{1}{\sqrt{2}}}(|0\rangle-|1\rangle)\), we obtain the ansatz state \(|\varphi(\vec{\theta}_{1},\vec{\theta}_{2})\rangle = U(\vec{\theta}_{1},\vec{\theta}_{2})|{-},-,\ldots\rangle\). Directly optimising the parameters might be challenging for a fixed Hamiltonian, and Farhi et al.13) suggested to consider gradually changing a Hamiltonian in each step of optimisation as \begin{equation} H(t) = \left(1-\frac{t}{T} \right) H_{X} +\frac{t}{T} H_{P}, \end{equation} (26) with \(H(0) = H_{X}\) and \(H(T) = H_{P}\). Since \(|{-},-,\ldots\rangle\) is the ground state of \(H(0) = H_{X}\) and solution is the ground state of \(H(T) = H_{P}\), the optimisation emulates the process of adiabatic state preparation with \(D\rightarrow\infty\). With sufficiently large D, and adaptive optimisations of the parameters for each optimisation step t, the algorithm may still be able to find the ground state solution.

A recent thorough study of the performance of the QAOA on MaxCut problems can be found in Zhou et al.89) Utilising nonadiabatic mechanisms, a heuristic strategy was proposed to learn the parameters exponentially faster than the conventional approach. Meanwhile, the QAOA was implemented with 40 trapped-ion qubits.90)

Variational algorithms for machine learning

Now we introduce the application of variational quantum algorithms in machine learning. In general, machine learning provides a universal approach to learn the pattern of the given data and to predict or reproduce new data. For example, in supervised learning, the given data are described by \(\{(\vec{x}_{1}, y_{1}), (\vec{x}_{2}, y_{2}),\ldots, (\vec{x}_{N_{D}}, y_{N_{D}})\}\) with \(N_{D}\) being the number of the data. Then we try to construct the model \(y = f(\vec{x})\) so that it predicts the output \(y_{\textit{new}} = f(\vec{x}_{\textit{new}})\) for any new data \(\vec{x}_{\textit{new}}\). Supposing the model \(y = f(\vec{x},\vec{\theta})\) has parameters \(\vec{\theta}\), the training process is to tune the parameters to minimise a cost function, such as \begin{equation} C_{\textit{ML}}(\vec{\theta}) = \sum_{k} |y_{k}- f(\vec{x}_{k}, \vec{\theta}) |^{2}. \end{equation} (27) When the cost function \(C_{\textit{ML}}(\vec{\theta})\) is minimised to a small value, it indicates that the given data are well-modelled. In practice, there are different ways to choose the model. For example, a linear regression model is, \begin{equation} f(\vec{x}, \vec{\theta}) = \vec{w}\cdot \vec{x}+b, \end{equation} (28) with real parameters \(\vec{w}\) and b, and \(\vec{\theta} = (\vec{w}, b)\). In deep leaning, i.e., a multi-layer neural network, the model is described as \begin{equation} \begin{split} f(\vec{x}, \vec{\theta}) &= \mathbf{g}_{N_{d}}\circ \mathbf{u}_{N_{d}}\cdots \circ \mathbf{u}_{2} \circ \mathbf{g}_{1} \circ \mathbf{u}_{1} (\vec{x}),\\ \mathbf{u}_{k} (\vec{x}) &= W_{k} \vec{x}+ \vec{b}_{k}, \end{split} \end{equation} (29) where \(N_{d}\) is the depth of the neural network, \(\mathbf{g}_{k}\) is a nonlinear activation function and \(W_{k}\) and \(\vec{b}_{k}\) are a parametrised matrix and vector, respectively. To circumvent overfitting, we can add the norm of the parameters to the cost function to restrict the degree of freedom of the parameters, which is called regularisation.

Whether machine learning works or not is highly dependent on the choice of the model. Since quantum states could efficiently represent multipartite correlations that admit no efficient classical representation, quantum machine learning protocols are proposed involving quantum neural networks that consist of variational quantum circuits. Consequently, we can leverage a quantum neural network to dramatically enhance the representability of the model.20,21) In addition, as the ansatz circuit is a unitary operator, the norm of the quantum state is necessarily unity, and this constraint may lead to a natural regularisation of the parameters to avoid overfitting.20) The schematic figures for classical and quantum neural networks are shown in Fig. 5.

Figure 5. Comparison between (a) classical neural networks and (b) quantum neural networks used for supervised learning. The figure (b) is the quantum circuit proposed in quantum circuit leaning.20)

Note that there are quantum machine learning algorithms9194) that are based on universal quantum computing without variational quantum circuits. While these algorithms are proven to have exponential speedups over classical algorithms under certain conditions, they generally require a deep quantum circuit and are not suitable for NISQ devices.

In this section, we illustrate five examples of quantum machine learning algorithms. The first two algorithms are introduced for leaning classical data for solving a classical problem; the latter three are introduced for learning quantum data.

Quantum circuit learning

The quantum circuit learning (QCL) algorithm implements supervised learning with a variational quantum circuit instead of a classical neural network.20) In QCL, a state is prepared as \(|\varphi(\vec{x},\vec{\theta})\rangle = U(\vec{\theta})U(\vec{x})|\varphi_{\textit{ref}}\rangle\), and the output \(\{f(\vec{x}_{k},\vec{\theta}) \}\) is generated by measuring the state in a properly chosen basis. The quantum circuit can naturally introduce nonlinearality of the model f. Suppose the data is described as \(\{x_{k}, y_{k}\}\) and the initial state encoded with the information of x is \begin{equation} \rho_{\textit{in}}(x) = \frac{1}{2^{N_{q}}} \bigotimes_{k = 1}^{N _{q}} [I+x X_{k} +\sqrt{1-x^{2}} Z_{k}], \end{equation} (30) which can be generated by applying the rotational-Y gate, \(R_{y}(\sin^{-1} x)\) with \(x\in [-1, 1]\), to each qubit that is initialised in \(|0\rangle\). This state involves higher order terms of x up to the \(N_{q}\)th order, where terms such as \(x\sqrt{1-x^{2}}\) may enhance the learning process. Note that this argument can be naturally generalised to higher dimensional data. In addition, nonlinearality can be introduced via the measurement process and classical post-processing of the measurement outcome. As we have discussed above, two potential benefits of QCL are the representability of multipartite correlations and the unitarity of the quantum circuit for circumventing overfitting. Whether QCL can outperform classical neural network based machine learning still needs further study.

Data-driven quantum circuit learning

Data-driven quantum circuit learning (DDQCL) implements a generative model by using a variational quantum circuit.21) We briefly summarise the classical generative model. Suppose the data are described with \(\{\vec{x}_{1},\vec{x}_{2},\ldots,\vec{x}_{N_{D}} \}\), where \(N_{D}\) is the number of data and we assume \(\vec{x}\) has a binary representation, i.e., \(\vec{x} = \{x_{1},x_{2},\ldots,x_{N_{E}}\}\) with \(x_{j}\in \{-1, 1\}\) and \(N_{E}\) being the length of the binary string. We can assume the data is generated from an unknown probability distribution \(p_{D}(\vec{x})\), and the task of a generative model is to construct a model \(p_{M}(\vec{x}|\vec{\theta})\) to approximate the probability distribution \(p_{D}(\vec{x})\). The joint probability that the data is generated from \(p_{M}(\vec{x}|\vec{\theta})\) is \begin{equation} \mathcal{L}(\vec{\theta}) = \prod_{n = 1}^{N_{D}} p_{M}(\vec{x}_{n}| \vec{\theta}), \end{equation} (31) which is the so-called likelihood function. By maximising \(\mathcal{L}(\vec{\theta})\), we can obtain the optimised model probability distribution for the data. Alternatively, we can adopt the cost function \begin{equation} C_{\textit{DD}}(\vec{\theta}) = - \frac{1}{N_{D}} \sum_{n = 1}^{N_{D}} \log[\max(\varepsilon, p_{M}(\vec{x}_{n}| \vec{\theta}))], \end{equation} (32) where a small value \(\varepsilon>0\) is introduced for avoiding singularities of the cost function. For a classical generative model, an example model \(p_{M}(\vec{x}|\vec{\theta})\) could be generated via a Boltzmann machine on a classical computer.

For DDQCL, the model \(p_{M}(\vec{x}|\vec{\theta})\) is obtained from a quantum computer, for example, by the projection probability \begin{equation} p_{Q}(\vec{x}| \vec{\theta}) = |\langle \vec{x} | \varphi(\vec{\theta})\rangle|^{2}, \end{equation} (33) where \(|\varphi(\vec{\theta})\rangle\) is an ansatz state generated on the variational quantum circuit, \(|\vec{x}\rangle = |x_{1}, x_{2},\ldots, x_{N_{E}}\rangle\) is the computational basis, and the probability is defined according to the Born Rule. The quantum generative model is also referred to as the Born machine.95) Since quantum states can efficiently represent complex multipartite correlations and the model can be efficiently sampled by measuring the prepared state, DDQCL may be able to represent generative models that are classically challenging. In experiment, DDQCL has been applied to successfully learn Greenberger–Horne–Zeilinger states (GHZ) states and coherent thermal states by using a 4-qubit trapped ion device.21) The schematic figure for sampling \(p_{Q}(\vec{x}|\vec{\theta})\) is shown in Fig. 6.

Figure 6. A quantum circuit for sampling \(p_{Q}(\vec{x}|\vec{\theta})\).

Quantum generative adversarial networks

Quantum generative adversarial networks (QuGANs)22,96) are a quantum analogue of generative adversarial networks (GANs). Conventional GANs are composed of three parts — true data, generator, and discriminator, as shown in Fig. 7(a). The generator competes with the discriminator, where the former tries to produce fake data and the latter tries to determine whether the input data are true or fake. By optimising both the generator and the discriminator, the generator can learn the distribution of the true data until the discriminator cannot tell the difference between the true data and the fake data from the generator. GANs are generalised to quantum computing by replacing each part with a quantum system.22) By using a quantum circuit as the generator, we can represent the \(N_{q}\)-dimensional Hilbert space with \(\log N_{q}\) qubits and compute sparse and low-rank matrices with \(O(\text{poly}(\log N_{q}))\) steps. Suppose the true data are described with an ensemble of quantum states as a density matrix \(\rho_{\textit{true}}\), and the discriminator implements a quantum measurement. The schematic figure for quantum GANs is shown in Fig. 7(b).

Figure 7. (Color online) Schematic diagrams for (a) classical GANs and (b) quantum GANs. In classical GANs, the generator and the discriminator consist of classical neural networks, while quantum neural networks are used in quantum GANS.

Suppose the fake density matrix from the generator is produced from a parametrised quantum circuit as \(\rho_{G}(\vec{\theta}_{G})\) with parameters \(\vec{\theta}_{G}\), which aims to learn the true density matrix \(\rho_{\textit{true}}\). By using ancillary qubits, we can express density matrices with pure states and we refer to Dallaire-Demers and Killoran96) for detailed ansatz constructions. The discriminator implements a parametrised positive-operator valued measure \(\{P^{t}(\vec{\theta}_{D}), P^{f}(\vec{\theta}_{D})\}\), with parameters \(\vec{\theta}_{D}\) and \(P^{t}(\vec{\theta}_{D})+P^{f}(\vec{\theta}_{D}) = I\), to distinguish between \(\rho_{\textit{true}}\) and \(\rho_{G}(\vec{\theta}_{G})\). Assuming that \(\rho_{\textit{true}}\) and \(\rho_{G}(\vec{\theta}_{G})\) are sent to the discriminator randomly with equal probability, the probability that the discriminator fails is \begin{align} P_{\textit{fail}} &= \frac{1}{2}(\mathop{\text{Tr}}\nolimits[\rho_{G}(\vec{\theta}_{G}) P^{t}(\vec{\theta}_{D})]+\mathop{\text{Tr}}\nolimits[\rho_{\textit{true}} P^{f}(\vec{\theta}_{D})]),\\ &= \frac{1}{2}(C_{\textit{GA}}(\vec{\theta}_{D}, \vec{\theta}_{G})+1) \end{align} (34) with \(C_{\textit{GA}}(\vec{\theta}_{D},\vec{\theta}_{G}) =\mathop{\text{Tr}}\nolimits[(\rho_{G}(\vec{\theta}_{G}) -\rho_{\textit{true}}) P^{t}(\vec{\theta}_{D})]\) being proportional to the trace distance between \(\rho_{G}(\vec{\theta}_{G})\) and \(\rho_{\textit{true}}\) when optimised over \(P^{t}(\vec{\theta}_{D})\). With fixed parameters \(\vec{\theta}_{G}\) for the generator, we optimise the discriminator \(\vec{\theta}_{D}\) to minimise the failure probability \(P_{\textit{fail}}\) or \(C_{\textit{GA}}(\vec{\theta}_{D},\vec{\theta}_{G})\). With optimised discriminator, we fix \(\vec{\theta}_{D}\) and in turn optimise the generator \(\vec{\theta}_{G}\) to maximise the failure probability. By repeating this process, the parameters \(\vec{\theta}_{D}\) and \(\vec{\theta}_{G}\) arrives at an equilibrium with \(\rho_{G}(\vec{\theta}_{G}) =\rho_{\textit{true}}\), indicating a successful learning of the data.

Quantum autoencoder for quantum data compression

The classical autoencoder is used for compressing classical data as shown in Fig. 8(a). For an input set of training data \(\{\vec{x}_{1},\vec{x}_{2},\ldots,\vec{x}_{N_{D}} \}\), the autoencoder \(\mathcal{E}\) first encodes it to \(\{\vec{x}_{1}',\vec{x}_{2}',\ldots,\vec{x}_{N_{D}}' \}\) with each \(\vec{x}_{i}'\) being a smaller vector than \(\vec{x}_{i}\). A decoder \(\mathcal{D}\) is then applied to transform the data to \(\{\vec{x}_{1}'',\vec{x}_{2}'',\ldots,\vec{x}_{N_{D}}''\}\), with each \(\vec{x}_{i}''\) having the same size as \(\vec{x}_{i}\). The task of an autoencoder is to compress the input data into smaller size, with the requirement that the input data can be recovered from the compressed one with \(\sum_{k} \|\vec{x}_{k}''-\vec{x}_{k} \|^{2}\leq\varepsilon\) where ε is a desired accuracy.

Figure 8. Schematic figures for (a) classical autoencoder and (b) quantum autoencoder. The encoding operation \(\mathcal{E}\) compress the data, and the decoding operation \(\mathcal{D}\) decodes the data to the original dimension.

Quantum autoencoder implements a similar task for compressing quantum states.23,97) Here, we consider the scheme proposed in Romero et al.23) Consider the case with input states of \(n+m\) qubits being compressed to states of n qubits. Suppose the input states are an ensemble \(\{p_{k}, |\phi_{k}\rangle_{\textit{AB}} \}\) with probabilities \(p_{k}\) and unknown states \(|\phi_{k}\rangle_{\textit{AB}}\). Here the subsystems A and B consists of n and m qubits, respectively. With a compression circuit \(U(\vec{\theta})\) and a post-selection measurement (for example in the computational basis) on the m-qubits of system B, as shown in Fig. 8(b), each input state \(|\phi_{k}\rangle_{\textit{AB}}\) of \(n+m\) qubits is mapped to a state \(\rho_{k}^{\textit{comp}}\) of n qubits. The inverse process then decodes the compressed state and map each \(\rho_{k}^{\textit{comp}}\) to an output state \(\rho_{k}^{\textit{out}}\). The average fidelity between the inputs and outputs is \begin{equation} C_{\textit{AE}}^{(1)}(\vec{\theta}) = \sum_{k} p_{k} F(|\phi_{k}\rangle, \rho^{\textit{out}}_{k}(\vec{\theta})), \end{equation} (35) which serves as a cost function and can be computed via the SWAP test or the destructive SWAP test circuit. Refer to Appendix C for details. When \(C_{\textit{AE}}^{(1)}(\vec{\theta})\) is maximised to a desired accuracy, it indicates a successful compression of the input state ensemble. Meanwhile, the successful compression and decoding, i.e., \(C_{\textit{AE}}^{(1)} = 1\), are achieved if and only if \begin{equation} U(\vec{\theta}) |\phi_{k}\rangle_{\textit{AB}} = |\phi_{k}^{\textit{comp}}\rangle_{A} \otimes |\bar{0}\rangle_{B},\ \forall k, \end{equation} (36) where \(|\phi_{k}^{\textit{comp}}\rangle_{A}\) corresponds to the compressed state, and \(|\bar{0}\rangle_{B}\) is some fixed reference state. Thus we can also adopt a simpler cost function \begin{equation} C_{\textit{AE}}^{(2)}(\vec{\theta}) = \sum_{k} p_{k} \mathop{\text{Tr}}\nolimits[U(\vec{\theta}) |\phi_{k}\rangle\langle\phi_{k}|_{\textit{AB}} U(\vec{\theta})^{\dagger} I_{A} \otimes |\bar{0}\rangle \langle\bar{0}|_{B}], \end{equation} (37) which can be computed as the probability projected to \(I_{A}\otimes |\bar{0}\rangle\langle\bar{0}|_{B}\).

Variational quantum state eigensolver

Analysing eigenvalues and eigenvectors of the covariance matrix of data is crucial for extracting its important features. Such a process is called the principal component analysis (PCA), which has been widely used in data science and machine learning. The covariance matrix of the data could be uploaded onto a quantum computer.98100) Here we show how to use the variational quantum state eigensolver (VQSE) algorithm to diagonalise input density matrices ρ to \begin{equation} \rho = \sum_{j} \lambda_{j} |\lambda_{j}\rangle\langle\lambda_{j}|,\quad \langle \lambda_{i}|\lambda_{j}\rangle = \delta_{i,j}. \end{equation} (38) We focus on the VQSE algorithm introduced in Cerezo et al.,62) which only requires a single copy of the state and other schemes requiring two copies of the state can be found in LaRose et al.61) and Bravo-Prieto et al.101) Without loss of generality, we assume the eigenvalues are in an descending order, i.e., \(\lambda_{1}\geq \lambda_{2}\geq \cdots\geq \lambda_{f}\) where \(f =\mathrm{rank}(\rho)\).

We first map a parametrised quantum circuit \(U(\vec{\theta})\) to the input density matrix as \(\rho(\vec{\theta}) = U(\vec{\theta})\rho U^{\dagger}(\vec{\theta})\). Then we define a cost function \begin{equation} C_{\textit{DI}}(\vec{\theta}) = \mathop{\text{Tr}}\nolimits[\rho(\vec{\theta}) H], \end{equation} (39) with \(H =\sum_{j} E_{j} |\vec{x}_{j}\rangle\langle\vec{x}_{j}|\) being a diagonal Hamiltonian in the computational basis \(\{|\vec{x}_{j}\rangle\}\) with non-degenerate eigenvalues \(E_{1}<E_{2}<\cdots<E_{2^{N_{q}}}\). Then we have \begin{equation} C_{\textit{DI}}(\vec{\theta}) = \vec{E} \cdot \vec{q}(\vec{\theta}), \end{equation} (40) with \(\vec{E} = (E_{1}, E_{2},\ldots,E_{2^{N_{q}}})\), \(\vec{q}(\vec{\theta}) = (q_{1}, q_{2},\ldots,q_{2^{N_{q}}})\), and \(q_{j} =\mathop{\text{Tr}}\nolimits[\rho(\vec{\theta}) |\vec{x}_{j}\rangle\langle\vec{x}_{j}|]\). Since \(\vec{q}(\vec{\theta})\) is obtained from measuring ρ, the vector \(\vec{q}(\vec{\theta})\) can be obtained from applying a doubly stochastic matrix on the eigenvalue vector \(\vec{\lambda} = (\lambda_{1},\lambda_{2},\ldots,\lambda_{2^{N_{q}}})\). Here, we set \(\lambda_{k} = 0\) for \(k > f\). Hence \(\vec{\lambda}\) majorises the measurement probabilities \(\vec{q}(\vec{\theta})\), i.e., \(\vec{\lambda}\succ \vec{q} (\vec{\theta})\). That is, with \(\vec{\lambda} = (\lambda_{1},\lambda_{2},\ldots,\lambda_{2^{N_{q}}})\) and \(\vec{q}(\theta) = (q_{1},q_{2},\ldots, q_{2^{N_{q}}})\) in descending orders, we have \(\sum_{i = 1}^{k}\lambda_{i}\geq \sum_{i = 1}^{k} q_{i}\), \(\forall k = \{1,2,\ldots, 2^{N_{q}}\}\). Since the dot product is a Schur concave function, satisfying \(f(\vec{a})\succ f(\vec{b})\) \(\forall\vec{b}\succ \vec{a}\), we have \begin{equation} C_{\textit{DI}}(\vec{\theta}) = \vec{E} \cdot \vec{q}(\vec{\theta}) \geq \vec{E} \cdot \vec{\lambda}. \end{equation} (41) The equality holds when \(\vec{q}(\vec{\theta}) =\vec{\lambda}\), i.e., the diagonalisation is achieved.

Therefore, after minimising the cost function \(C_{\textit{DI}}(\vec{\theta})\) with optimal parameters \(\vec{\theta}_{\textit{opt}}\), the state \(\rho(\vec{\theta}_{\textit{opt}})\) is rotated to \(\rho(\vec{\theta}_{\textit{opt}}) =\sum_{i}\lambda_{j}|\vec{x}_{j}\rangle\langle\vec{x}_{j}|\) with \(\lambda_{1}\geq\lambda_{2}\geq\cdots\geq \lambda_{f}\). By measuring \(\rho(\vec{\theta}_{\textit{opt}})\) in the computational basis, we thus obtain \(|\vec{x}_{j}\rangle\) with probability determined by the eigenvalue \(\lambda_{j}\) and the the corresponding eigenvector of ρ is \(U^{\dagger}(\vec{\theta}_{\textit{opt}}) |\vec{x}_{j}\rangle\).

On the other hand, a time dependent cost function which combines local and global cost functions is used to make the best of their benefits.62) See Appendix D for an explanation of local and global cost functions. Also one can find applications of VQSE in quantum error mitigation14,16,17) and entanglement specroscopy.102,103)

Variational algorithms for linear algebra

Variational quantum algorithms can be used for solving matrix-vector multiplication and solving linear systems of equations. The task of matrix-vector multiplication is to obtain \(|v_{\mathcal{M}}\rangle =\mathcal{M} |v_{0}\rangle/\|\mathcal{M} |v_{0}\rangle\|\), where \(\mathcal{M}\) is a sparse matrix, \(|v_{0}\rangle\) is a given state vector, and \(\|{|}\psi\rangle \| =\sqrt{\langle\psi|\psi\rangle}\). Meanwhile, linear systems of equation is to solve a linear equation \(\mathcal{M} |v_{\mathcal{M}^{-1}}\rangle = |v_{0}\rangle\) to have \(|v_{\mathcal{M}^{-1}}\rangle =\mathcal{M}^{-1} |v_{0}\rangle\). There exist several variational quantum algorithms for implementing these tasks.30,31,104,105) Herein, we illustrate the methods introduced in Bravo-Prieto et al.30) and Xu et al.31)

We first consider matrix-vector multiplication proposed by Xu et al.,31) where the solution \(|v_{\mathcal{M}}\rangle\) corresponds to the ground state of the Hamiltonian, \begin{equation} H_{\mathcal{M}} = I - \frac{\mathcal{M}|v_{0}\rangle\langle v_{0}|\mathcal{M}^{\dagger}}{\| \mathcal{M} |v_{0}\rangle\|^{2}}. \end{equation} (42) When \(\mathcal{M}\) is a sparse matrix and the circuit for preparing \(|v_{0}\rangle\) is known, the expectation value \(E_{\mathcal{M}} =\langle\varphi(\vec{\theta})| H_{\mathcal{M}} |\varphi(\vec{\theta})\rangle\) can be efficiently evaluated by using the Hadamard test or the SWAP test circuit. Thus by minimising the expectation value \(E_{\mathcal{M}}\), we can find the ground state \(|v_{\mathcal{M}}\rangle\). Because the ground state has a unique ground state energy 0, we can further know whether the discovered state is the ground state or not, unlike the conventional VQE where the ground state energy is generally unknown. The matrix-vector multiplication algorithm can be applied for implementing Hamiltonian simulation. Suppose we want to simulate a time evolution operator \(U =\exp(- i H t)\) via Trotterisation \(U\approx\prod\exp(- i H\delta t)\). By setting \(M = 1- i H\delta t\), we can approximate each \(\exp(- i H\delta t)\) and hence the whole evolution as \(U = M^{N_{S}}+O(t^{2}/N_{S})\), where \(N_{S} = t/\delta t\). Therefore, by subsequently applying matrix-vector multiplication, we can simulate the time evolution of quantum systems.

Here we also consider the algorithms for solving linear equations independently proposed by Xu et al.31) and Bravo-Prieto et al.101) The solution is mapped to the ground state of the Hamiltonian \begin{equation} H_{\mathcal{M}^{-1}} = \mathcal{M}^{\dagger} (I- |v_{0}\rangle\langle v_{0}|) \mathcal{M}, \end{equation} (43) with the smallest eigenvalue 0 as well. This Hamiltonian was firstly proposed by Subaşı et al.,106) who applied adiabatic algorithms to find the ground state with a universal quantum computer. With the variational method, solving the ground state is similar to the one for matrix-vector multiplication. Furthermore, Xu et al.31) used Hamiltonian morphing optimisation for avoiding local minima, and Bravo-Prieto et al.101) used a local cost function for circumventing a barren plateau issue and run a simulation with up to 50 qubits. Refer to Appendix D for these optimisation methods.

Excited state-search variational algorithms

Next we show how to find excited states and excited energy spectra of a Hamiltonian. Calculating excited energy spectra is important for studying many-body quantum physics problems. For example, it can be used to study chemical reaction dynamics, important for creating new drugs and new methodologies for mass production of beneficial materials.53,107) In addition, evaluation of excited states enables us to calculate the photodissociation rates and absorption bands, which are essential for designing solar cells and investigating their dynamics.108,109) There exist several VQAs for evaluating excited states and excited energy spectra.32,33,36,110117) These algorithms can be used as a subroutine for other applications, e.g., the Green's function,35,118) non-adiabatic coupling, and Berry's phase119) and simulating real time evolution120) etc. In this section, we review three VQAs — the overlap-based method, the subspace expansion method, and the contraction VQE method — for calculating excited state energy, and the application in calculating the Green's function.

Overlap-based method

The overlap-based method first uses VQE to find the ground state and then sequentially obtains excited states by penalising the previously obtained eigenstates.32,33) Suppose that the ground state \(|\tilde{G}\rangle\) of the given Hamiltonian H is obtained from either the conventional VQE or variational imaginary time simulation. Now, suppose we replace the Hamiltonian H with the Hamiltonian \begin{equation} H^{\prime} = H+ \alpha |\tilde{G}\rangle\langle\tilde{G}|. \end{equation} (44) Here, α is a positive number which is chosen to be sufficiently larger compared to the energy gap between the ground and the first excited state of the Hamiltonian. Then the first excited state \(|E_{1}\rangle\) of the original Hamiltonian H becomes the ground state of the new Hamiltonian \(H^{\prime}\). Therefore, with the new Hamiltonian \(H^{\prime}\), we can obtain the first excited state \(|\tilde{E}_{1}\rangle\) of H with the VQE or variational imaginary time simulation on \(H^{\prime}\).

To realise the VQE or imaginary time evolution of \(H^{\prime}\), we need to measure the energy \(E'(\vec{\theta}) =\langle\varphi (\vec{\theta})| H^{\prime} |\varphi (\vec{\theta})\rangle\) of the trial state \(|\varphi (\vec{\theta})\rangle\) as \begin{equation} E'(\vec{\theta}) = \langle\varphi (\vec{\theta})| H |\varphi (\vec{\theta})\rangle +\alpha \langle \varphi (\vec{\theta})| \tilde{G}\rangle \langle \tilde{G}|\varphi (\vec{\theta})\rangle. \end{equation} (45) The first term can be computed in the same way as the conventional VQE method. The second term is the overlap between \(|\varphi (\vec{\theta})\rangle\) and \(|\tilde{G}\rangle\), which can be evaluated with the SWAP test circuit or the destructive SWAP test circuit.121,122) These quantum circuit necessitate two copies of the state. Note that destructive SWAP test circuit only leverages a shallow depth circuit. We leave a detailed explanation about the (destructive) SWAP test circuit in Appendix C. Alternatively, we can compute the overlap term without using two copies of the state but using a doubled depth of the quantum circuit.33) Denoting \(|\tilde{G}\rangle = U(\vec{\theta}_{G}) |\varphi_{\textit{ref}}\rangle\), the overlap term can be written as \begin{equation} \langle \varphi (\vec{\theta})| \tilde{G}\rangle \langle \tilde{G}|\varphi (\vec{\theta})\rangle = |\langle\varphi_{\textit{ref}}| U^{\dagger}(\vec{\theta}_{G}) U(\vec{\theta})|\varphi_{\textit{ref}}\rangle |^{2}. \end{equation} (46) This can be evaluated by applying \(U(\vec{\theta})\) and \(U^{\dagger}(\vec{\theta}_{G})\) to the reference state \(|\varphi_{\textit{ref}}\rangle\), and measuring the probability to observe \(|\varphi_{\textit{ref}}\rangle\).

After finding the first excited state of H, we can further find the second excited state by replacing the Hamiltonian H with \begin{equation} H''= H+\alpha (|\tilde{G}\rangle \langle\tilde{G}|+|\tilde{E}_{1}\rangle\langle\tilde{E}_{1}|). \end{equation} (47) Then the second excited state of H becomes the ground state of \(H''\), which could be similarly solved via the VQE or variational imaginary time simulation. It is not hard to see that this procedure can be repeated to sequentially discover other low energy eigenstates.

In the overlap-based method, an error happens when the discovered state \(|\tilde{G}\rangle\) is not the exact ground state, such as the case where it is a superposition of the ground state and excited states. Therefore when the VQE method gets trapped to a local minimum, the overlap-based method may fail to work. Note that this problem is severe since if the ground state was calculated incorrectly, all the subsequently calculated excited states are incorrect. Interestingly, it was numerically observed that this problem is less severe for variational imaginary time evolution.32) This is because when variational imaginary time evolution fails to find the ground state, it may instead converge to an eigenstate of the Hamiltonian based on the nature of imaginary time evolution.18,32) By penalising the discovered excited state, we can still construct a new Hamiltonian to find another low energy state.

Quantum subspace expansion

The quantum subspace expansion solves a generalised eigenvalue problem in terms of the given Hamiltonian in an expanded subspace around an approximated ground state, and the obtained eigenstates and eigenenergies correspond to those of the Hamiltonian.36) Note that the obtained spectrum is error-mitigated because the excited states are approximated as a linear combination of states in the expanded subspace. Refer to Sect. 5.4 for a detailed explanation about error mitigation effect of quantum subspace expansion.

Let \(|\tilde{G}\rangle\) be an approximation of the true ground state obtained from either the VQE or variational imaginary time simulation. Suppose we approximate an eigenstate of the Hamiltonian as \begin{equation} |\psi_{\textit{eig}}(\vec{c})\rangle \approx \sum_{m} c_{m}|\psi_{m}\rangle, \end{equation} (48) where \(|\psi_{m}\rangle = |\tilde{G}\rangle\) with \(m = 0\), \(|\psi_{m}\rangle\) (\(m\geq 1\)) are states in the expanded subspace, \(\vec{c} = (c_{0}, c_{1}, c_{2},\ldots)^{T}\), and \(\langle\psi_{\textit{eig}}(\vec{c})|\psi_{\textit{eig}}(\vec{c})\rangle = 1\). For fermionic systems, we can choose \(|\psi_{m}\rangle = a_{i}^{\dagger} a_{j} |\tilde{G}\rangle\), \(m = (i,j)\), with \(a_{j}^{\dagger}\) and \(a_{j}\) being the creation and annihilation operators. For spin systems, we can set \(|\psi_{m}\rangle = P_{m} |\tilde{G}\rangle\) with \(P_{m}\in \{I, X, Y, Z \}^{\otimes N_{q}}\). As the number of \(P_{m}\) increases, the subspace expands, which potentially improves the accuracy of the subspace expansion method. Denoting \(E(\vec{c},\vec{c}^{*}) =\langle \psi_{\textit{eig}}| H |\psi_{\textit{eig}}\rangle\), the state \(|\psi_{\textit{eig}}(\vec{c})\rangle\) is an eigenstate of H when \(E(\vec{c})\) corresponds to a minimum satisfying \begin{equation} \delta [E(\vec{c}, \vec{c}^{*}) - E \langle \psi_{\textit{eig}}(\vec{c})|\psi_{\textit{eig}}(\vec{c})\rangle] = 0. \end{equation} (49) Here \(\delta E(\vec{c},\vec{c}^{*}) =\sum_{i}\delta c_{i}\partial E(\vec{c})/\partial c_{i} +\text{c.c.}\) and E is the Lagrangian multiplier for the constraint \(\langle\psi_{\textit{eig}}(\vec{c})|\psi_{\textit{eig}}(\vec{c})\rangle = 1\). A solution of this equation results in \begin{equation} \tilde{H}\vec{c} = E \tilde{S} \vec{c}. \end{equation} (50) Here \(\tilde{H}\) and \(\tilde{S}\) are defined by \begin{equation} \tilde{H}_{\alpha \beta} = \langle \psi_{\alpha}|H|\psi_{\beta}\rangle,\quad \tilde{S}_{\alpha \beta} = \langle \psi_{\alpha}|\psi_{\beta}\rangle, \end{equation} (51) and E corresponds to the energy for the eigenstate. Both \(\tilde{H}\) and \(\tilde{S}\) can be efficiently measured. For example, with \(|\psi_{m}\rangle = P_{m} |\tilde{G}\rangle\) we have \(\tilde{H}_{\alpha\beta} =\langle\tilde{G}| P_{\alpha} H P_{\beta} |\tilde{G}\rangle\) and \(\tilde{S}_{\alpha\beta} =\langle\tilde{G}|P_{\alpha} P_{\beta} |\tilde{G}\rangle\), which can be respectively obtained by measuring the expectation value of the operators \(P_{\alpha} H P_{\beta}\) and \(P_{\alpha} P_{\beta}\) for the approximated ground state \(|\tilde{G}\rangle\). By solving the generalised eigenvalue problem of Eq. (50), we can obtain the information of eigenspectra of the Hamiltonian in the considered subspace. We refer to Appendix E for the derivation.

Contraction VQE methods

The contraction VQE methods firstly discover the lowest energy subspace and then construct eigenstates in that subspace. Here we introduce two contraction VQE methods — subspace-search VQE (SSVQE)111) and multistate contracted variant of VQE (MC-VQE).112)

The procedure of SSVQE is as follows. We first prepare a set of orthogonal states \(\{|\phi_{i}\rangle_{i = 0}^{k} \}\) (\(\langle\phi_{j} |\phi_{i}\rangle =\delta_{i,j}\)), such as states in the computational basis. Then we minimise the cost function \begin{equation} C_{\textit{CO}}^{(1)}(\vec{\theta}) = \sum_{j = 0}^{k} \langle \phi_{j}|U^{\dagger}(\vec{\theta})H U(\vec{\theta})|\phi_{j}\rangle \end{equation} (52) to constrain the subspace \(\{U(\vec{\theta}^{*})|\phi_{i}\rangle\}_{i = 0}^{k}\) to the lowest \(k+1\) eigenstates, where \(\vec{\theta}^{*}\) is the parameter set after the optimisation and we define 0th excited state \(|E_{0}\rangle\) as the ground state. At this stage, \(U(\vec{\theta}^{*})|\phi_{i}\rangle\) is generally a superposition of the eigenstate \(\{|E_{i}\rangle \}_{i = 0}^{k}\) of H. To project \(U(\vec{\theta}^{*})|\phi_{s}\rangle\) (\(s\in \{0,\ldots,k \}\)) to the kth excited state \(|E_{k}\rangle\), we maximise \begin{equation} C_{\textit{CO}}^{(2)}(\vec{\phi}) = \langle \phi_{s}| V^{\dagger} (\vec{\phi})U^{\dagger}(\vec{\theta}^{*}) H U(\vec{\theta}^{*}) V(\vec{\phi}) |\phi_{s}\rangle. \end{equation} (53) By finding the optimal parameters \(\vec{\phi}^{*}\), we can approximate the kth excited state \(|E_{k}\rangle\) by \(U(\vec{\theta}^{*}) V(\vec{\phi}^{*}) |\phi_{s}\rangle\). In practice, we can start with \(k = 0\) to find the ground state and then increase the value of k to find excited states.

For the MC-VQE method, it also first projects the subspace to the lowest energy subspace in the same way as SSVQE. Different from SSVQE, MC-VQE assumes the excited states can be expanded in the lowest energy subspace as \begin{equation} |E\rangle = \sum_{i = 0}^{k} c_{i} U(\vec{\theta}^{*}) |\phi_{i}\rangle, \end{equation} (54) and the goal is to find the vector \(\vec{c} = (c_{0}, c_{1},\ldots,c_{k})\). This task is similar to subspace expansion method mentioned above. Now, \(\vec{c}\) can be computed by solving the following eigenvalue problem \begin{equation} \tilde{H} \vec{c} = E \vec{c}, \end{equation} (55) where the matrix \(\tilde{H}\) is \begin{equation} \tilde{H}_{\alpha, \beta} = \langle \phi_{\alpha}|U^{\dagger}(\vec{\theta}^{*})H U(\vec{\theta}^{*})|\phi_{\beta}\rangle, \end{equation} (56) and E is the corresponding excited state energy. We can regard Eq. (55) as the special case of Eq. (50) with \(S = I\), because the states in the subspace are mutually orthogonal. The diagonal term of the matrix \(\tilde{H}\) can be evaluated as the same procedure to the conventional VQE. Non-diagonal terms of \(\tilde{H}\) can be obtained by calculating expectation values of H with states \(|\pm_{\alpha,\beta}\rangle = (|\phi_{\beta}\rangle\pm |\phi_{\alpha}\rangle)/\sqrt{2}\) and linearly combining them as \begin{align} 2 \tilde{H}_{\alpha,\beta} &= \langle +_{\alpha,\beta}|U^{\dagger}(\vec{\theta}) H U(\vec{\theta}) |+_{\alpha,\beta}\rangle\\ &\quad - \langle -_{\alpha,\beta}|U^{\dagger}(\vec{\theta}) H U(\vec{\theta}) |-_{\alpha,\beta}\rangle. \end{align} (57)

The potential problem of the contraction VQE methods is that the energy landscape of \(C_{\textit{CO}}^{(1)}(\vec{\theta})\) may be more complicated than the conventional VQE. This is because the conventional VQE only aims to find the correct ground state, while the contraction VQE methods need to find the correct unitary for all the \(k+1\) orthogonal states. The benefit of this method is that the opitimization of \(U(\vec{\theta})\) is averaged over multiple states, therefore the excited states can be calculated with an equal accuracy.

Calculation of the Green's function

Now, we show the application of the VQE algorithms in the calculation of the Green's function,35,118) which plays a crucial role in investigating many-body physics such as high \(T_{\text{c}}\) superconductivity,123) topological insulators,124) and magnetic materials.125) The definition of the retarded the Green's function at zero temperature is \begin{equation} G_{\alpha \beta}^{(R)}(t) = - i \Theta(t) \langle G| a_{\alpha}(t) a_{\beta}^{\dagger}(0)+a_{\beta}^{\dagger}(0) a_{\alpha}(t) |G\rangle. \end{equation} (58) Here, \(\Theta(t)\) is a Heaviside step function, \(a_{\alpha (\beta)}^{(\dagger)}\) is the annihilation (creation) fermionic operator for the fermionic mode \(\alpha (\beta)\), \(a_{\alpha}(t) = e^{i H t} a_{\alpha} e^{-i H t}\), and \(|G\rangle\) is a ground state. Here, we show how to calculate the Green's function with the algorithms for finding excited states. Note that another approach is to use the variational quantum simulation algorithm in Sect. 4.

For simplicity, we consider the Green's function in the momentum space for identical spins. We thus have \(\alpha =\beta = (k,\uparrow)\) and denote the Green's function as \(G_{k}^{(R)}(t)\). The Green's function has another expression called Lehmann representation as \begin{equation} G_{k}(t) = - i \sum_{m} e^{i(E_{G}- E_{m}) t} |\langle E_{m}| a_{k}^{\dagger} |G\rangle|^{2} \end{equation} (59) where \(|E_{m}\rangle\) and \(E_{m}\) are eigenstates and eigenenergies of the Hamiltonian. Thus, if the transition amplitudes \(|\langle E_{m}|a_{k}^{\dagger} |G\rangle|^{2}\) can be calculated, we can obtain the Green's function.

In literature, Endo et al.35) used the contraction VQE method and Rungger et al.118) employed the overlap method to calculate the transition amplitudes and further the Green's function. The algorithm using the overlap method118) was originally employed for computing the Green's function in the specific Jordan Wigner encoding, whose generalisation was further studied in Ibe et al.126) for general operators. Here we review the algorithm based on the MC-VQE method.35) By using the expression for mth excited state in Eq. (54), we have \begin{equation} |E_{m}\rangle \approx \sum_{i = 0}^{k} c_{i}^{(m)} U(\vec{\theta}^{*}) |\phi_{i}\rangle, \end{equation} (60) and hence \begin{equation} \langle E_{m}| a_{k}^{\dagger} |E_{n}\rangle = \sum_{ij} c_{i}^{(m)*} c_{j}^{(n)} \langle \phi_{i}| U^{\dagger} (\vec{\theta}^{*}) a_{k}^{\dagger} U (\vec{\theta}^{*}) |\phi_{j}\rangle. \end{equation} (61) Denoting \(a_{k}^{\dagger} = A_{k}+i B_{k}\), \(|\phi_{ij}^{\pm}\rangle = U (\vec{\theta}^{*})(|\phi_{i}\rangle\pm |\phi_{j}\rangle)/\sqrt{2}\) and \(|\phi_{ij}^{i\pm}\rangle = U (\vec{\theta}^{*})(|\phi_{i}\rangle\pm i|\phi_{j}\rangle)/\sqrt{2}\), where \(A_{k}\) and \(B_{k}\) are Hermitian operators, we have \begin{equation} \begin{split} &\mathop{\text{Re}}\nolimits[\langle\phi_{i}| U^{\dagger} (\vec{\theta}^{*}) A_{k} U (\vec{\theta}^{*}) |\phi_{j}\rangle] \\ &\quad= \langle\phi_{ij}^{+}| A_{k} |\phi_{ij}^{+}\rangle - \langle \phi_{ij}^{-}| A_{k} |\phi_{ij}^{-}\rangle,\\ &\mathop{\text{Im}}\nolimits[\langle\phi_{i}| U^{\dagger} (\vec{\theta}^{*}) A_{k} U (\vec{\theta}^{*}) |\phi_{j}\rangle] \\ &\quad= \langle \phi_{ij}^{i +}| A_{k} |\phi_{ij}^{i +}\rangle - \langle \phi_{ij}^{i -}| A_{k} |\phi_{ij}^{i -}\rangle, \end{split} \end{equation} (62) and similarly for \(B_{k}\). Thus we can calculate \(\langle\phi_{i}| U^{\dagger} (\vec{\theta}^{*}) a_{k}^{\dagger} U (\vec{\theta}^{*}) |\phi_{j}\rangle\) and hence \(\langle E_{m}| a_{k}^{\dagger} |E_{n}\rangle\) by measuring the expectation values of \(A_{k}\) and \(B_{k}\) for states \(|\phi_{ij}^{\pm}\rangle\) and \(|\phi_{ij}^{i\pm}\rangle\).

Note that based on calculation of the Green's function, Rungger et al.118) implemented dynamical mean-field theory (DMFT) calculation on two-site DMFT model by using a superconducting system and a trapped ion system. Also, in the work by Ibe et al.,126) the accuracy of the calculation of transition amplitudes are compared for the overlap method and the contraction VQE methods in detail.

Variational circuit recompilation

Circuit recompilation aims to approximate a given quantum circuit with ones that are compatible with a practical experiment hardware. Concerning noisy gates or restricted set of realisable gates, the compiler runs to reduce the circuit noise or the implementation cost, as shown in Fig. 9. For example, an arbitrary two-qubit unitary is generally not directly supported on a practical quantum hardware and we need to compile the unitary into a sequence of realisable gates. While a naive decomposition of the unitary may induce too many unnecessary gates for hardware with specific topological structures, finding efficient and simple decomposition of quantum circuits is vital for near-term quantum computing.

Figure 9. Schematic figure for circuit recompilation algorithms. A variational quantum circuit with hardware constraints tries to approximate the target quantum circuit.

Denote the target unitary as \(U_{T}\) and the variational circuit as \(U(\vec{\theta})\), which consists of gates compatible with the hardware. Circuit recompilation is to tune the parameters \(\vec{\theta}\) so that \(U_{T}\approx U(\vec{\theta})\). We can mathematically define a metric of the distance between \(U_{T}\) and \(U(\vec{\theta})\) via the average gate infidelity (AGI)127,128) \begin{equation} C_{I}(\vec{\theta}) = 1-\int d \varphi |\langle \varphi|U_{T}^{\dagger} U(\vec{\theta})|\varphi\rangle|^{2} \end{equation} (63) where pure states φ are randomly chosen according to the Haar measure. By construction, AGI indicates how different two unitary gates are on average for randomly sampled pure states, and it vanishes when the circuit is perfectly re-compiled. The AGI method has been applied for compiling high fidelity CNOT gate with cross-resonance gate suffering from crosstalk and single qubit operations. Furthermore, a high-fidelity four-qubit syndrome extraction circuit was recompiled to be achievable with simultaneous cross resonance drives under crosstalk.128)

Another related cost function127) is \begin{equation} C_{\textit{HS}}(\vec{\theta}) = 1-\frac{1}{d}|{\mathop{\text{Tr}}\nolimits}(U_{T}^{\dagger} U(\vec{\theta}))|^{2}, \end{equation} (64) which can be efficiently calculated with the “Hilbert–Schmidt test” circuit127) by using two copies of states. In particular, we have \begin{equation} \frac{1}{d^{2}}|\mathop{\text{Tr}}\nolimits[U_{T}^{\dagger} U(\vec{\theta})]|^{2} = |\langle \Psi^{+}| U_{T} \otimes U^{*}(\vec{\theta})|\Psi^{+}\rangle|^{2}, \end{equation} (65) where \(|\Psi^{+}\rangle = d^{-1}\sum_{i} |i\rangle\otimes |i\rangle\) is the maximally entangled state with d being the dimension of the system. Then, \(|\mathop{\text{Tr}}\nolimits[U_{T}^{\dagger} U(\vec{\theta})]|^{2}/d^{2}\) can be evaluated by applying \(U_{T}\otimes U^{*}(\vec{\theta})\) to \(|\Psi^{+}\rangle\) and measure the probability to observe \(|\Psi^{+}\rangle\). Note that \(C_{\textit{HS}}(\vec{\theta})\) and AGI is related,129,130) \begin{equation} C_{\textit{HS}}(\vec{\theta}) = \frac{d+1}{d} C_{I}(\vec{\theta}). \end{equation} (66) Thus \(C_{\textit{HS}}(\vec{\theta})\) can be used as an alternative cost function for circuit recompilation. Note that \(C_{I}(\vec{\theta})\) and \(C_{\textit{HS}}(\vec{\theta})\) generally exponentially vanish for an increasing system size. This problem is solved by using the weighted average of this global cost function and the corresponding local cost function (see Appendix D for an explanation of global and local costs). Then simple 9-qubit unitaries were successfully recompiled for noiseless simulator and for the Rigetti and IBM experimental processor.127)

Meanwhile, circuit recompilation can be implemented for specific input states, which may reduce the optimisation complexity.127,131) The goal is to find the recompiled circuit \(U(\vec{\theta})\) so that \(U_{T} |\psi_{\textit{in}}\rangle\approx U(\vec{\theta}) |\psi_{\textit{in}}\rangle\) holds for the state \(|\psi_{\textit{in}}\rangle\) and the target unitary \(U_{T}\). When \(|\psi_{\textit{in}}\rangle\) is a product state, we can easily define the Hamiltonian \(H_{\textit{in}}\) whose ground state is \(|\psi_{\textit{in}}\rangle\). For example, for \(|\psi_{\textit{in}}\rangle = |00\cdots 0\rangle\), we can set \(H_{\textit{in}} = -\sum_{j} Z_{j}\), where \(Z_{j}\) is the Pauli Z operator acting on the jth qubit. Then, by minimising the expectation value of \(H_{\textit{in}}\) for the trial state \(|\varphi(\vec{\theta})\rangle = U^{\dagger}(\vec{\theta}) U_{T} |\psi_{\textit{in}}\rangle\), we can obtain \(U(\vec{\theta})\) such that \(U^{\dagger}(\vec{\theta})U_{T} |\psi_{\textit{in}}\rangle\approx |\psi_{\textit{in}}\rangle\) and hence \(U_{T} |\psi_{\textit{in}}\rangle\approx U(\vec{\theta}) |\psi_{\textit{in}}\rangle\). We can leverage the conventional VQE method or variational imaginary time simulation for the optimisation and the fidelity of the recompiled state is lower-bounded as \begin{equation} F_{R} \geq 1- \frac{\delta}{E_{1}-E_{0}}, \end{equation} (67) where \(\delta =\langle \varphi(\vec{\theta})|H_{\textit{in}}|\varphi(\vec{\theta})\rangle-E_{0}\) is the deviation of the cost function from the ground state energy, and \(E_{0}\) and \(E_{1}\) are the ground state and first excited state energy. Note that \(E_{0} = -N_{q}\) and \(E_{1} = 2-N_{q}\) when the unitary acts on \(N_{q}\) qubits. This algorithm successfully recompiled a unitary operator involving 7 qubit system into another quantum circuit with different topology and dramatically reduced the number of two-qubit gates to 72 from 144, while the number of single-qubit gates increased to 77 from 42.131)

Variational-state quantum metrology

Quantum metrology aims to discover the optimal setup for probing a parameter with the minimal statistical shot noise.132134) The basic setup of quantum metrology is as follows — firstly, we prepare the initial probe state \(|\psi_{p}\rangle\) and evolve the state under the Hamiltonian \(H(\omega)\) with ω being the target parameter. After time t, the probe state can be described as \(|\psi_{p}(\omega, t)\rangle\), which is measured and analysed to extract the information of ω. Typically, ω can be a magnetic field, for example, with Hamiltonian \(H(\omega) =\omega \sum_{j}\sigma_{z}^{(j)}\) with \(\sigma_{z}^{(j)}\) being the Pauli Z operator, and the measurement is a Ramsey type measurement. When separable states are used as probes, the statistical error of the parameter behaves as \(\delta\omega \propto 1/\sqrt{N_{q}}\) with \(N_{q}\) being the number of qubits. This scaling is called the standard quantum limit (SQL).134) Notably the scaling can be improved by using entangled states, such as GHZ states, symmetric Dicke states, and squeezed states. In the absence of noise or for specific types of noise in the evolution under the Hamiltonian, the optimal strategy has been revealed. For example, with no environmental noise, the optimal probe state has been proved to be the GHZ state which achieves the Heisenberg limit \(\delta\omega \propto 1/N_{q}\).133135) However, in the presence of general types of noise, analytical arguments about the optimal strategy are usually very hard.

Variational-state quantum metrology is to use a quantum computer to find the optimal quantum state for quantum metrology with noisy hardware. Different proposals have been studied with either general variational quantum circuits136) or specific experimental setups,137) i.e., optical tweezer arrays of neutral atoms.138,139) In general, suppose the initial probe state is created on a quantum device, as \(|\varphi_{p} (\vec{\theta})\rangle\). By evolving the state under the Hamiltonian \(H(\omega)\) followed by a proper measurement, we can define a cost function to reflect the metrology performance. In particular, we can either use the quantum Fisher information (QFI)136) or the spin squeezing parameter.137,140) Here, we focus on the QFI, which characterises the minimum uncertainty of the estimated parameter as \begin{equation} \delta \omega \geq \frac{1}{\sqrt{N_{s} F_{Q}[\rho_{P}(\omega, t)]}}, \end{equation} (68) where \(F_{Q}[\rho_{P}(\omega, t)]\) is the QFI for the noisy output state \(\rho_{P}(\omega, t)\) after the evolution time t and \(N_{s}\) is the number of samples. Denoting \(T = N_{s}t\) to be the effective total time of all \(N_{s}\) samples, we can define a cost function as \begin{equation} C_{\textit{ME}}(\vec{\theta},t) = \frac{T}{t} F_{Q}[\rho_{P} (\omega, t, \vec{\theta})]. \end{equation} (69) For fixed T, we aim to optimise \(\vec{\theta}\) and t to maximise \(C_{\textit{ME}}(\vec{\theta},t)\). We show the schematic of this method in Fig. 10. Although QFI of mixed states cannot be directly evaluated in an efficient way, classical Fisher information (CFI) defined for a fixed measurement basis lower bounds QFI and it can be computed efficiently. We note that CFI is equivalent to QFI when the measurement basis is optimal, so QFI can in principle be obtained by optimising the measurement.

Figure 10. Schematic figure for variational quantum-state metrology. After the probe state is prepared by the variational quantum circuit, it evolves under the Hamiltonian \(H(\omega)\) with environmental noise, and is measured to evaluate the cost function.

With variational-state quantum metrology, a highly asymmetric state has been discovered for a 9-qubit system that outperforms previous results.136) Interestingly, even though the Hamiltonian and the noise model is symmetric under the permutation of qubits, there is a symmetry breaking for the optimal solution. Note that unlike conventional analytical approaches employed in quantum metrology, we do not have to know the noise model of the quantum device to obtain the optimised state. Recently, this algorithm was generalised to multi-parameter estimation.141)

Variational quantum algorithms for quantum error correction

Quantum error correction (QEC) makes use of a larger number of physical qubits to encode logical qubits to protect them against physical errors. In general, conventional formulation of QEC does not take into account the experimental implementation of the code. However, in the NISQ era, for example, the set of possible operations and the qubit topology are restricted for each physical hardware. Therefore, hardware-friendly implementation of QEC, tailored to actual experiments, is crucial for near-term quantum computers.142144) Here, we illustrate two examples143,144) for realising QEC on NISQ computers.

Variational circuit compiler for quantum error correction

This variational circuit compiler is for automatically discovering the optimal quantum circuit satisfying user-specified requirements for a given QEC code.143) Typically, such requirements tend to come from hardware properties, such as available gate sets, limited topology, and achievable error rate. More concrete example may be two-qubit gate implementation, e.g., superconducting qubits employing CNOT gate, and ion trap systems using Møren-Sørensen gate. We prepare an ansatz unitary \(U(\vec{\theta})\), or a process \(\mathcal{E}(\vec{\theta})\), that takes account of gate noise and reflects the requirements. The compiler is to optimise the parameters \(\vec{\theta}\) so that the ansatz emulates the encoded target state \(|\psi_{0}\rangle_{L}\) for a given QEC code.

The essential point of the compiler is to design the Hamiltonian whose ground state is the target state. Then the target state can be obtained via the conventional VQE or variational imaginary time simulation algorithm. Generally, the code space is defined by a set of commuting Pauli operators, so-called stabiliser generators. For example, when we consider the three qubit code, the logical state \(|\psi\rangle_{\text{three}} =\alpha |000\rangle+\beta |111\rangle\) is an eigenstate of \(Z_{1}Z_{2}\) and \(Z_{2} Z_{3}\) with eigenvalue 1. Suppose the target logical state \(|\psi_{0}\rangle_{L}\) is determined by the stabiliser generator set \(\{G_{k}\}\) with \(G_{k} |\psi_{0}\rangle_{L} = |\psi_{0}\rangle_{L}\). For determining a particular logical qubit state, we can choose an additional logical operator, \(M_{L} = |\psi_{0}\rangle_{L}\langle\psi_{0}|_{L}-|\psi_{0}^{\bot}\rangle_{L}\langle\psi_{0}^{\bot}|_{L}\), which can be decomposed as a linear combination of the logical \(I, X, Y, Z\) operators. Then the logical state \(|\psi_{0}\rangle\) is the unique ground state of the Hamiltonian \begin{equation} H_{L} = - \sum_{k} a_{k} G_{k} - a_{0} M_{L} \end{equation} (70) with energy \(E_{0} = -(\sum_{k} a_{k} +a_{0})\), where \(a_{k}, a_{0} >0\).

When using the variational algorithm to minimise the average energy, an approximation of the encoding circuit is found. Supposing the energy of the optimally discovered state is \(E_{\textit{dis}}\), the fidelity between the discovered state and the target logical state is lower bounded as \begin{equation} f \geq 1- (E_{\textit{dis}}-E_{0})/a, \end{equation} (71) where \(a =\min\{a_{k}, a_{0}\}\). This inequality ensures that if \(E_{\textit{dis}}\) is sufficiently close to \(E_{0}\), the discovered encoding circuit approximates the target QEC code well. The algorithm has been numerically tested for the five and seven qubit codes with different available gate sets and under noise free and noisy circuits.143)

Variational quantum error corrector (QVECTOR)

Variational quantum error corrector (QVECTOR) is for discovering a device-tailored quantum error correcting code.144) Different from the compiler for preparing a target logical state,143) QVECTOR aims to discover the optimal encode circuit that preserves the quantum state under noise. As shown in Fig. 11, the circuit \(U_{S}\) is used for preparing the to-be-encoded k-qubit state \(|\varphi\rangle\), \(V(\vec{\theta}_{1})\) and \(V^{\dagger}(\vec{\theta}_{1})\) are the noisy encoding and decoding circuits on \(n\geq k\) qubits, and \(W(\vec{\theta}_{2})\) is noisy gates for state recovery, which operates on \(n+r\) qubits. This quantum circuit corresponds to creating the \([n, k]\) quantum codes with r additional qubits. The parameters \(\vec{\theta}_{1}\) and \(\vec{\theta}_{2}\) are optimised to maximise the average code fidelity, \begin{equation} C_{\textit{QV}}(\vec{\theta}_{1}, \vec{\theta}_{2}) = \int_{\psi \in \mathcal{C}} \langle \psi| \mathcal{E}_{R} (\vec{\theta}_{1}, \vec{\theta}_{2})(|\psi\rangle \langle \psi|) |\psi\rangle\,d\psi, \end{equation} (72) where \(\mathcal{E}_{R} (\vec{\theta}_{1},\vec{\theta}_{2})\) is the encoding, recovery, and decoding operations \(V(\vec{\theta}_{1})\), \(W(\vec{\theta}_{2})\), and \(V^{\dagger}(\vec{\theta}_{1})\), and the integration is calculated following the Haar distribution with \(|\psi\rangle = U_{S} |0\rangle^{\otimes k}\). In practice, we can set \(U_{S}\) to be the unitary 2-design for efficiently calculating the average fidelity.145) With numerical simulation, QVECTOR learned the three qubit code146) under the phase damping noise, resulting in a six times longer \(T_{2}\) than conventional methods. In the presence of amplitude and phase damping noise, QVECTOR learned the quantum code outperforming the five qubit stabilizer code.

Figure 11. Schematic figure of the variational circuit for QVECTOR.

Dissipative-system variational quantum eigensolver

Dissipative-system variational quantum eigensolver (dVQE) is for obtaining the non-equilibrium steady state (NESS) in an open quantum system.147) In practice, quantum systems inevitably interact with their environments, and quantum states decohere owing to the noise. Therefore, the ability to simulate open quantum systems is indispensable for studying practical quantum phenomena. Notably, investigating NESS of open quantum systems is very important, e.g., in revealing the transport mechanism in nano-scale devices such as single-atom junctions.148)

The time independent Markovian open quantum dynamics can be described by the Lindblad master equation, \begin{equation} \frac{d}{dt}\rho = - i [H, \rho]+ \mathcal{L}(\rho), \end{equation} (73) where ρ is the system state, H is the Hamiltonian, \(\mathcal{L}(\rho) =\sum_{k} (2L_{k}\rho L_{k}^{\dagger} -L_{k}^{\dagger} L_{k}\rho-\rho L_{k}^{\dagger} L_{k})\) and \(L_{k}\) is a Lindblad operator. Denoting \(\mathcal{L}'(\rho) = -i [H,\rho]+\mathcal{L}(\rho)\), the non-equilibrium steady state corresponds to the state with \(\mathcal{L}'(\rho) = 0\). To find the steady state, we first vectorise the state \(\rho =\sum_{n m}\rho_{n m} |n\rangle\langle m|\) to \begin{equation} |\rho\rangle\!\rangle = \frac{1}{\mathcal{N}}\sum_{n m} \rho_{nm} |n\rangle |m\rangle, \end{equation} (74) where \(\mathcal{N}\) is a normalisation factor. With vectorised \(\mathcal{L}'\), the steady state satisfies \(\mathcal{L}' |\rho_{\textit{ness}}\rangle\!\rangle = 0\) and hence we have \begin{equation} \langle\!\langle \rho_{\textit{ness}}| \mathcal{L}'^{\dagger} \mathcal{L}' |\rho_{\textit{ness}} \rangle\!\rangle = 0, \end{equation} (75) where \begin{equation} \begin{split} \mathcal{L}' &= (-i(H\otimes I-I\otimes H^{T})+ \mathcal{D}_{v}),\\ \mathcal{D}_{v} &= \sum_{k} \left(L_{k} \otimes L_{k}^{*} -\frac{1}{2} L_{k}^{\dagger} L_{k} \otimes I -\frac{1}{2} I \otimes L_{k}^{T} L_{k}^{*} \right). \end{split} \end{equation} (76) Here \(A^{*}\) denotes the complex conjugate of an operator A.

As \(|\rho(\vec{\theta})\rangle\!\rangle\) is a vector, we can express this vector representation of the quantum state through using a pure trial state on a variational quantum circuit by imposing proper constraints so that the corresponding density matrix \(\rho(\vec{\theta})\) will be physical, i.e., positive semi-definiteness and hermiteness have to be satisfied. The trace of the state needs to be unity but this condition will be considered when the expectation value of a given observable is measured. We can prepare the vectorised physical state on a variational quantum circuit as \begin{equation} \begin{split} |\rho(\vec{\theta}) \rangle\!\rangle &= U(\vec{\theta}_{1}) \otimes U^{*}(\vec{\theta}_{1}) |\rho_{d}(\vec{\theta}_{2}) \rangle\!\rangle,\\ |\rho_{d}(\vec{\theta}_{2}) \rangle\!\rangle &= \frac{1}{\mathcal{N}} \sum_{j} \alpha_{j}(\vec{\theta}_{2}) |j\rangle_{s} \otimes |j\rangle_{a}, \end{split} \end{equation} (77) where \(\vec{\theta}\equiv \{\vec{\theta}_{1},\vec{\theta}_{2} \}\), \(\alpha_{j}(\vec{\theta_{2}})>0\), and \(\sum_{j}\alpha_{j}(\vec{\theta_{2}}) = 1\), \(|j\rangle\) is in the computational basis, and the corresponding density matrix is \(\rho(\vec{\theta}) =\sum_{j}\alpha_{j} U(\vec{\theta}_{1}) |j\rangle_{s}\langle j|_{s} U^{\dagger} (\vec{\theta}_{1})\). Here, \(\mathcal{N} =\sqrt{\sum_{j}\alpha_{j}^{2}}\) ensures the normalisation. The subscripts s and a denote system and ancilla, respectively. We refer to Yoshioka et al.147) for the detailed ansatz construction. By preparing a trial state \(|\rho(\vec{\theta})\rangle\!\rangle\), and minimising the cost function \(C_{\textit{NE}}(\vec{\theta}) =\langle\!\langle\rho(\vec{\theta}|\mathcal{L}'^{\dagger}\mathcal{L}' |\rho (\vec{\theta})\rangle\!\rangle\), we can obtain an approximation of NESS. After the optimal parameters \(\vec{\theta}_{1}^{(op)}\) and \(\vec{\theta}_{2}^{(op)}\) are found, we can measure the expectation value of any given observable O for \(\rho(\vec{\theta})\) by randomly generating the state \(U(\vec{\theta}_{1}^{(op)})|j\rangle\) with probability \(\alpha_{j}(\vec{\theta}_{2}^{(op)})\), and averaging the measurement outcomes. This method was demonstrated for 8-qubit dissipative Ising model with a 16-qubit classical simulation.

Other applications

There are other applications, such as VQAs for nonlinear problems,149) fidelity estimation,150) factoring,151) singular value decomposition,101,152) quantum foundations,153) circuit QED simulation,154) and Gibbs state preparation.152,155,156)

4. Variational Quantum Simulation

In this section, we review the variational quantum simulation algorithms for simulating the dynamical evolution of quantum systems19,34) and the application in simulation of open quantum systems,19,34) linear algebra tasks,34) Gibbs state preparation,19) and evaluating the Green's function.35)

Variational quantum simulation algorithms for density matrix

We first show how to generalise the simulation algorithm for real and imaginary time evolution from pure states to mixed states.19) The main idea is again to consider a parametrised representation of mixed states and map the dynamics to the evolution of the parameters. As we are considering mixed states, only McLachlan's variational principle applies, which leads to the evolution of parameters with information determined by the density matrix.19) Although conventional quantum computers operate on pure states, we can also represent mixed states with their purifications by using ancilla qubits.

Variational real time simulation for open quantum system dynamics

In practice, a quantum system interacts with its environment, so open quantum system simulation algorithms are useful for investigating practical quantum phenomena. Here we aim to simulate real time evolution of open quantum systems described by the Lindblad master equation \(d\rho/dt =\mathcal{L}(\rho)\) with \(\mathcal{L}(\rho)\) being a super-operator on the state as in Eq. (73). By parametrising the state as \(\rho(\vec{\theta}(t))\) with real parameters, and applying McLachlan's variational principle \(\delta \|{d}\rho(\vec{\theta}(t))/dt -\mathcal{L}(\rho) \| = 0\), we can obtain a similar equation of the parameters as the one for closed systems in Eq. (11) as \begin{equation} \sum_{j} M_{k,j} \dot{\theta}_{j} = V_{k}, \end{equation} (78) where \begin{equation} \begin{split} M_{k,j} &= \mathop{\text{Tr}}\nolimits\left[\left(\frac{\partial \rho(\vec{\theta}(t))}{\partial \theta_{k}} \right)^{\dagger} \frac{\partial \rho(\vec{\theta}(t))}{\partial \theta_{j}} \right],\\ V_{k} &= \mathop{\text{Tr}}\nolimits\left[\left(\frac{\partial \rho(\vec{\theta}(t))}{\partial \theta_{k}} \right)^{\dagger} \mathcal{L}(\rho) \right]. \end{split} \end{equation} (79) Note that the evaluation of M and V can be reduced to the computation of terms like \(c\cdot\mathop{\text{Re}}\nolimits(\mathop{\text{Tr}}\nolimits[e^{i\varphi}\rho_{1}\rho_{2}])\), with \(\rho_{1}\) and \(\rho_{2}\) being two quantum states, and \(c,\varphi \in\mathbb{R}\).19) By encoding the mixed state via its purification, this term can be evaluated via the SWAP test circuit. Note that, in order to simulate open system of \(N_{q}\) qubits, we first need \(2N_{q}\) qubits for representing its purification. We also need two copies of the purification for evaluating M and V, so we need in total \(4 N_{q}\) qubits to simulate open quantum system dynamics on \(N_{q}\) qubits. We review shortly that an alternative algorithm that simulates the stochastic Schrödinger equation which enables the simulation of \(N_{q}\)-qubit open systems on an \(N_{q}\)-qubit quantum hardware.

Variational imaginary time simulation for a density matrix

The variational quantum simulation algorithm can be applied for simulating imaginary time evolution of density matrices as well,19) which is defined as \begin{equation} \rho(\tau) = \frac{e^{-H \tau} \rho_{0} e^{-H \tau}}{\mathop{\text{Tr}}\nolimits[e^{-H \tau} \rho_{0} e^{-H \tau}]}, \end{equation} (80) where \(\rho_{0}\) is the initial state. The time derivative equation for \(\rho(\tau)\) is \begin{equation} \frac{d \rho(\tau)}{d \tau} = - \{H, \rho(\tau) \}+2 \langle H\rangle\rho(\tau), \end{equation} (81) where \(\{A, B \} = AB+BA\). By applying McLachlan's variational principle as \(\delta \|{d}\rho/d\tau + \{H,\rho(\tau) \}-2\langle H\rangle\rho(\tau) \| = 0\), we have the evolution of the parameters as \begin{equation} \sum_{j} M_{k,j} \dot{\theta}_{j} = W_{k}, \end{equation} (82) where \begin{equation} W_{k} = -{\mathop{\text{Tr}}\nolimits}\left[\left(\frac{\partial \rho(\vec{\theta}(t))}{\partial \theta_{k}}\right) \{H, \rho(\vec{\theta}(\tau)) \} \right]. \end{equation} (83) We note that \(W_{k}\) can be computed similarly to \(V_{k}\).

Variational quantum simulation algorithms for general processes

In this section, we review the variational quantum simulation algorithms for general processes, including the generalised time evolution, its application in solving linear algebra tasks, and simulating open system dynamics.34)

Generalised time evolution

Apart from real and imaginary time evolution, we consider the generalised time evolution defined as \begin{equation} A(t) \frac{d}{dt} |u(t)\rangle = |d u(t)\rangle, \end{equation} (84) where \(|d u(t)\rangle =\sum_{k} B_{k} (t) |u'_{k}(t)\rangle\), and \(A(t)\) and \(B_{k}(t)\) are general (possibly non-Hermitian) sparse matrices that may be efficiently decomposed as sums of Pauli operators, and each \(|u'_{k}(t)\rangle\) could be either \(|u(t)\rangle\) or any fixed states. The quantum states \(|u(t)\rangle\) and \(|u'_{k}(t)\rangle\) can be unnormalised and we assume a parametrisation of the states as \(|u(\vec{\theta})\rangle =\alpha(\vec{\theta}_{1}) |\psi(\vec{\theta}_{2})\rangle\), where \(\vec{\theta} = \{\vec{\theta}_{1},\vec{\theta}_{2}\}\), \(\alpha(\vec{\theta}_{1})\) is a classical coefficient, and \(|\psi(\vec{\theta}_{2})\rangle\) is a quantum state generated on a quantum computer. By using MchLaclan's variational principle \begin{equation} \delta \|{A}(t) \frac{d}{dt} |u(\vec{\theta}(t))\rangle - \sum_{k} B_{k}(t) |u'_{k}(t)\rangle \| = 0, \end{equation} (85) we obtain the evolution of the parameters similar to Eq. (11) as \begin{equation} \sum_{j} \bar{M}_{k,j} \dot{\theta}_{j} = \bar{V}_{k}. \end{equation} (86) Each matrix element \(\bar{M}_{k,j}\) or \(\bar{V}_{k}\) could be expanded as a sum of terms that can be efficiently measured via a quantum circuit. We refer to Endo et al.34) for details.

The real time evolution corresponds to \(A(t) = 1\) and \(|du(t)\rangle = -iH|u(t)\rangle\) and the imaginary time evolution corresponds to \(A(t) = 1\) and \(|du(t)\rangle = -(H-\langle u(t)|H|u(t)\rangle)|u(t)\rangle\) for a Hamiltonian H. Therefore, the generalised time evolution unifies real and imaginary time evolution. In addition, the generalised time evolution describes a general first-order differential equations with non-Hermitian Hamiltonians, which may have applications in non-Hermitian quantum mechanics.157) In the following, we show its application in solving linear algebra tasks and simulating the stochastic Schrödinger equation.

Matrix multiplication and linear equations

Now, we explain how we can apply the algorithm for generalised time evolution to realise matrix-multiplication and to solve linear systems of equations.34) This is an alternative method for linear algebra discussed in Sect. 3.3.31,101) For a sparse matrix \(\mathcal{M}\) and a (unnormalised) state vector \(|u_{0}\rangle\), we aim to obtain \begin{equation} |u_{\mathcal{M}}\rangle = \mathcal{M} |u_{0}\rangle,\quad |u_{\mathcal{M}}^{-1}\rangle = \mathcal{M}^{-1} |u_{0}\rangle, \end{equation} (87) for matrix-multiplication and solving linear systems of equations. In terms of matrix-multiplication, by setting \(|u_{\mathcal{M}}(t)\rangle = C(t)|u_{0}\rangle\) with \(C(t) =\frac{t}{T}\mathcal{M}+(1-\frac{t}{T})I\), we have \(|u_{\mathcal{M}}(0)\rangle = |u_{0}\rangle\) being the given vector and \(|u_{\mathcal{M}}(T)\rangle =\mathcal{M} |u_{0}\rangle = |u_{\mathcal{M}}\rangle\) being the solution. The time-dependent state \(|u_{\mathcal{M}}(t)\rangle\) follows a generalised time evolution \begin{equation} \frac{d}{dt} |u_{\mathcal{M}}(t)\rangle = D |u_{0}\rangle \end{equation} (88) with \(D = (\mathcal{M}-I)/T\), which corresponds to the case with \(A(t) = I\) and \(|du(t)\rangle = |u_{0}\rangle\) in Eq. (84). For solving linear systems of equations, by considering \(C(t)|u_{\mathcal{M}^{-1}}(t)\rangle = |u_{0}\rangle\) with \(C(t) =\frac{t}{T}\mathcal{M}+(1-\frac{t}{T})I\), we have \(|u_{\mathcal{M}^{-1}}(0)\rangle = |u_{0}\rangle\) being the given vector and \(|u_{\mathcal{M}^{-1}}(T)\rangle =\mathcal{M}^{-1}|u_{0}\rangle = |u_{\mathcal{M}^{-1}}\rangle\) being the solution. The state \(|u_{\mathcal{M}^{-1}}(t)\rangle\) follows the evolution equation as \begin{equation} C(t) \frac{d}{dt} |u_{\mathcal{M}^{-1}}(t)\rangle = - D |u_{\mathcal{M}^{-1}}(t)\rangle \end{equation} (89) which corresponds to the generalised time evolution of Eq. (84) with \(A(t) = C(t)\) and \(|du(t)\rangle = -D|u(t)\rangle\). Therefore, the ability of efficiently simulating generalised time evolution enables us to solve these two linear algebra problems.

Open system dynamics

Now we show how to simulate open quantum system dynamics with the variational algorithm of generalised time evolution.34) Instead of directly simulating the Lindblad master equation defined in Eq. (73), we consider its alternative representation via the stochastic Schrödinger equation, where the whole evolution is a mixture of pure state trajectories158) \begin{align} d|\psi_{c}(t)\rangle &= \left(-iH-\frac{1}{2}\sum_{k}(L_{k}^{\dagger} L_{k} - \langle L_{k}^{\dagger} L_{k}\rangle)\right) |\psi_{c}(t)\rangle\,dt\\ &\quad + \sum_{k}\left[\left(\frac{L_{k} |\psi_{c}(t)\rangle}{\|{L}_{k} |\psi_{c}(t)\rangle\|} - |\psi_{c}(t)\rangle\right)\,dN_{k}\right]. \end{align} (90) Here \(|\psi_{c}(t)\rangle\) is the state of each trajectory, \(d|\psi_{c}(t)\rangle = |\psi_{c}(t+dt)\rangle-|\psi_{c}(t)\rangle\), and \(dN_{k}\) is a random variable which takes either 0 or 1 and satisfies \(dN_{k}\,dN_{k^{\prime}} =\delta_{k k^{\prime}}\,dN_{k}\) and \(E[dN_{k}] =\langle \psi_{c}(t)|L^{\dagger}_{k} L_{k} |\psi_{c}\rangle\,dt\). This implies that the state \(|\psi_{c}\rangle\) jumps to \(L_{k} |\psi_{c}\rangle/\| L_{k} |\psi_{c}\rangle \|\) with a probability \(E[dN_{k}]\) when the state evolves from time t to \(t+dt\). Each trajectory can be regarded as a continuous evolution of the state continuously measured by \(\{O_{0} = I-\sum_{k}L_{k}^{\dagger} L_{k}\,dt,\ O_{k} = L_{k}^{\dagger} L_{k}\,dt \}\). When the measurement corresponds to \(O_{0}\), the state evolves under the generalised time evolution of Eq. (84) with \(A = I\) and \begin{equation} |du(t)\rangle = \left[-iH-\frac{1}{2}\sum_{k}(L_{k}^{\dagger} L_{k} - \langle L_{k}^{\dagger} L_{k}\rangle)\right]|u(t)\rangle. \end{equation} (91) This process can be simulated by the variational quantum simulation algorithm. When measurement \(O_{k}\) happens, the state jumps to \(L_{k} |\psi_{c}(t)\rangle/\|{L}_{k} |\psi_{c}(t)\rangle \|\), which can be simulated with the algorithm for matrix multiplication. However, because Lindblad operators \(L_{k}\) are generally local operator acting on a few qubits, it can be more efficiently simulated via a combination of real and imaginary time evolution. The idea is to decompose each \(L_{k}\) as \(L_{k} = UDV\) with unitary matrices U and V, and diagonal matrix D. Since all these matrices only act on a few qubits, the decomposition is efficient and we can further have \(U =\exp(-iH_{U}T_{U})\), \(V =\exp(-iH_{V}T_{V})\), and \(D =\exp(-H_{D}T_{D})\) for Hamiltonians \(H_{U}\), \(H_{V}\), and \(H_{D}\), and time \(T_{U}\), \(T_{V}\), and \(T_{D}\). Then the effect of \(L_{k}\) can be simulated by sequentially applying real time evolution of \(H_{V}\) with time \(T_{V}\), imaginary time evolution of \(H_{D}\) with time \(T_{D}\), and real time evolution of \(H_{U}\) with time \(T_{U}\).

Note that this algorithm only needs to control a single copy of the state, thus only uses one-fourth of the number of qubits compared to the algorithm presented in the previous section. The variational quantum simulation algorithm for the stochastic Schrödinger equation is numerically implemented for the 2D Ising Hamiltonian with 6 qubits.34) The simulation result witnesses a dissipation induced phase transition,159) indicating its potential in probing general interesting physics phenomena of many-body open systems with intermediate scale quantum hardware.

Gibbs state preparation

The variational quantum simulation algorithm for imaginary time evolution can be applied for preparing a Gibbs state.19) Starting at the maximally mixed state \(I_{d}/d\) with dimension d, imaginary time evolution of the state with Hamiltonian H and time τ prepares the state \(e^{-2H\tau}\), which corresponds to the Gibbs state at temperature \(T = 1/2\tau\). However, because the identity operator is invariant under any unitary transformation, we cannot directly input the maximally mixed state \(I_{d}/d\) to an unitary ansatz circuit to generate the Gibbs state. One resolution is to consider the purification of the Gibbs state. At time \(\tau = 0\), we input the maximally entangled state \(|\psi_{\text{max}}\rangle ={\frac{1}{\sqrt{d}}}\sum_{i = 1}^{d} |i\rangle_{s} |i\rangle_{a}\), where s and a denote the target system and the ancilla system for purification. It is easy to verify that \(\mathop{\text{Tr}}\nolimits_{a}[|\psi_{\text{max}}\rangle\langle\psi_{\text{max}}|] = I_{s}/d\) with \(\mathop{\text{Tr}}\nolimits_{a}[\cdot]\) denoting partial trace of ancilla. To realise imaginary time evolution on the system s, i.e., \(e^{-H_{s}\tau}|\psi_{\text{max}}\rangle/\|{e}^{-H_{s}\tau}|\psi_{\text{max}}\rangle\|\), we apply a joint unitary ansatz circuit on \(|\psi_{\text{max}}\rangle\) as shown in Fig. 12. Then we variatonally simulate imaginary time evolution by the evolution of the parameters of the joint unitary ansatz. The application of Gibbs state preparation in quantum Boltzmann machines was recently studied by Zoufal et al.160)

Figure 12. The ansatz quantum circuit for obtaining the Gibbs state for \(N_{q} = 2\).

Variational quantum simulation algorithms for estimating the Green's function

Now we discuss the application of the variational quantum simulation algorithms for calculating the Green's function and the spectral function.35) We refer to Sect. 3.4.4 for the review of the Green's function. The definition of the retarded the Green's function at zero temperature is \begin{equation} G_{\alpha \beta}^{(R)}(t) = - i \Theta(t) \langle G| c_{\alpha}(t) c_{\beta}^{\dagger}(0)+c_{\beta}^{\dagger}(0) c_{\alpha}(t) |G\rangle. \end{equation} (92) Here, \(\Theta(t)\) is a Heaviside step function, \(c_{\alpha (\beta)}^{(\dagger)}\) is an annihilation (a creation) fermion operator for the fermionic mode \(\alpha (\beta)\), \(c_{\alpha}(t) = e^{i H t} c_{\alpha} e^{-i H t}\), \(|G\rangle\) is the ground state, and we consider \(t>0\) for simplicity. We can transform fermionic operators to qubit operators51,54,55) as \begin{equation} c_{\alpha} \rightarrow \sum_{k} f_{k}^{(\alpha)} P_{k},\quad c_{\alpha}^{\dagger} \rightarrow \sum_{k} f_{k}^{(\alpha)*} P_{k}, \end{equation} (93) where \(f_{k}\in \mathbb{C}\) and \(P_{k}\) is a product of Pauli operators. Then the retarded the Green's function can be described as \begin{equation} G_{\alpha \beta}^{(R)}(t) = \sum_{k k'} f_{k}^{(\alpha)} f_{k'}^{(\beta)*} \langle G| e^{i H t} P_{k} e^{-iHt} P_{k'} |G\rangle. \end{equation} (94) To calculate each term \(\langle G| e^{i H t} P_{k} e^{-iHt} P_{k'} |G\rangle\), we first approximate the ground state as \(|\tilde{G}\rangle\) by using either the conventional VQE or variational imaginary time simulation. Then, by using real time variational quantum simulation algorithms, we can approximate \begin{equation} \begin{split} e^{-iHt}|G\rangle &\approx U(\vec{\theta}_{1}(t))|\tilde{G}\rangle,\\ e^{-iHt}P_{k'}|G\rangle &\approx U(\vec{\theta}_{2}(t))P_{k'} |\tilde{G}\rangle. \end{split} \end{equation} (95) Note that, the approximated time evolution operators \(U(\vec{\theta}_{1}(t))\) and \(U(\vec{\theta}_{2}(t))\) are optimised for \(|\tilde{G}\rangle\) and \(P_{k'} |\tilde{G}\rangle\), so they are generally different operators. Finally we have \begin{equation} \langle G| e^{i H t} P_{k} e^{-iHt} P_{k'} |G\rangle \approx \langle\tilde{G}|U(\vec{\theta}_{1}(t))^{\dagger} P_{k} U(\vec{\theta}_{2}(t))P_{k'} |\tilde{G}\rangle, \end{equation} (96) which can be evaluated by the Hadamard test circuit. Note that a mechanism to dramatically reduce the number of control operation from a ancilla qubit was also introduced by Endo et al.35)

Furthermore, it is worth noting that, by combining this method with the Gibbs state preparation algorithm in the last section,19) we can also compute the Green's function for finite temperatures.

Other applications

There are several other applications of variational quantum simulation algorithms. The first one is for financial problems.161) The partial differential equation for pricing is equivalent to imaginary time evolution, so we can apply variational imaginary time simulation algorithm to simulate it. Secondly, variational imaginary time simulation can be used to simulate non-hermitian transcorrelated Hamiltonian for reducing the resource overhead and improving simulation accuracy for electronic structure calculations.162) Finally, it can be used for preparing a Boltzmann distribution with only a single copy of a quantum state.163)

5. Quantum Error Mitigation

In this section, we review the error mitigation techniques for suppressing errors in noisy quantum devices. We will use U to denote the ideal quantum gate with the corresponding channel representation being \(\mathcal{U}(\cdot)\equiv U(\cdot)U^{\dagger}\). The noisy quantum gate can be written as \(\mathcal{E}\circ\mathcal{U}\) with \(\mathcal{E}\) being the noise channel. For simplicity, we assume that gate errors are Markovian, that is, the noise channels \(\mathcal{E}\) for different gates are independent.

With an input state \(\rho_{\textit{in}}\), the ideal output state \(\rho_{\textit{out}}^{\textit{ideal}}\) can be written as \begin{equation} \rho_{\textit{out}}^{\textit{ideal}} = \mathcal{U}_{N_{g}} \circ \mathcal{U}_{N_{g}-1} \cdots \circ \mathcal{U}_{1} (\rho_{\textit{in}}) \end{equation} (97) and the noisy output state \(\rho_{\textit{out}}\) we obtain in practice is \begin{equation} \rho_{\textit{out}} = \mathcal{E}_{N_{g}} \circ \mathcal{U}_{N_{g}} \circ\cdots \circ \mathcal{E}_{1}\circ \mathcal{U}_{1} (\rho_{\textit{in}}). \end{equation} (98) Here \(N_{g}\) is the number of gates.

When the noise is below a certain threshold, we can make use of quantum error correction (QEC) to recover the ideal state. However, implementing QEC requires a huge overhead of the number of qubits. For example, Fowler et al.164) have shown that the surface code, one of the most popular codes for practical fault-tolerant quantum computing, requires around a thousand qubits per logical qubit to perform Shor's algorithm with a reasonable success probability. With the limited number of qubits available in near-term quantum computers, realising QEC on them might not be practical.

Alternative schemes under the name of quantum error mitigation (QEM) are developed instead for error suppression on NISQ devices, which rely on the cleverer post-processing of measurement results. Instead of recovering the ideal output state, most QEM methods aim to recover the ideal measurement statistics. Supposing we measure an observable M on the output state, QEM methods target at recovering \(\langle M\rangle_{\textit{ideal}} =\mathop{\text{Tr}}\nolimits(\rho^{\textit{ideal}}_{\textit{out}} M)\) via a classical post-processing of the measurement results of a set of noisy quantum circuits. Such post-processing will usually increase the variance of the effective observable we obtained, leading to more samples needed to reduce the variance of the measurement result as shown in Fig. 13. Different QEM methods work under different assumptions of the noise for different purposes. Nevertheless, QEM methods are usually only effective when the error rate of the whole circuit is low enough. Since QEM methods are mostly designed for NISQ devices with shallow circuits, they are not scalable with deep quantum circuits with a large number of qubits.

Figure 13. (Color) Schematic of QEM. In the presence of noises, the probability distribution of the expectation value of an observable is shifted from the noise-free one. Using QEM we can make the probability distribution centred around the correct expectation value; however the variance is amplified and we need more samples to achieve the same accuracy as the one before QEM.


We first introduce the most simple yet very powerful error mitigation technique based on error extrapolation. The basic idea is to run the circuit with different error rates and use them to extrapolate for the error-free result, which is illustrated in Fig. 14.

Figure 14. (Color) Schematic of quantum error extrapolation. We boost the error rate and obtain other data points. Then we extrapolate the original result and those at boosted error rates to estimate the error-free result. Fitting curves should be chosen depending on the situation. Here, we show the case of two-point (green) and three-point (blue) Richardson extrapolation as examples.

Richardson extrapolation

We first review the Richardson extrapolation error mitigation method introduced by Li and Benjamin14) and Temme et al.16) We will look at a stochastic noise process \(\mathcal{E}_{m}\) in the form of \begin{equation} \mathcal{E}_{m} = (1-\varepsilon_{m}) \mathcal{I}+\varepsilon_{m} \mathcal{N}_{m}, \end{equation} (99) where \(\mathcal{I}\) is the identity map, \(\mathcal{N}_{m}\) is the noisy map, \(\varepsilon_{m}\) is a small error rate. For simplicity, we assume that all noise channels will follow the same error rate \(\varepsilon_{m} =\varepsilon\). Supposing we measure the expectation value of an observable M, we can expand the expectation value \(\langle M\rangle\) as a function of ε according to Taylor expansion, \begin{equation} \langle M\rangle(\varepsilon) = \langle M\rangle(0)+\sum_{k=1}^{n} M_{k} \varepsilon^{k}+O(\varepsilon^{n+1}). \end{equation} (100) Here \(M_{k}\) are the coefficients and \(\langle M\rangle(\varepsilon)\) is the expectation value with error ε.

In order to estimate the ideal measurement result \(\langle M\rangle(0)\), we obtain several measurements \(\langle M\rangle(\varepsilon)\) with different noise rates. While we may not be able to decrease the error rate, we can effectively increase it with methods we will discuss shortly. We thus consider several different noisy measurements \(\langle M\rangle(\alpha_{k}\varepsilon)\) with different error rates \(\alpha_{k}\varepsilon\) with \(1 =\alpha_{0}<\alpha_{1}<\alpha_{2}\cdots <\alpha_{n}\). Then we can consider Eq. (100) to the nth order and use Richardson extrapolation to approximate \(\langle M\rangle(0)\) as \begin{equation} \begin{split} \langle M\rangle_{\textit{est}}(0) &= \sum_{k = 0}^{n} \beta_{k} \langle M\rangle(\alpha_{k} \varepsilon),\\ &= \langle M\rangle(0)+O(\varepsilon^{n+1}). \end{split} \end{equation} (101) Here the coefficients \(\beta_{k}\) satisfy \(\sum_{k = 0}^{n}\beta_{k} = 1\), and \(\sum_{l = 0}^{n}\beta_{l}\alpha_{l}^{k} = 0\) for \(k = 1,2,\ldots,n\) and the solution gives \begin{equation} \beta_{k} = \prod_{i\neq k} \frac{\alpha_{i}}{\alpha_{k}-\alpha_{i}}. \end{equation} (102) We can see that the effect of calculation error is mitigated from ε to \(O(\varepsilon^{n+1})\), which is very effective given small ε.

The reader might think that the error can be arbitrarily suppressed by increasing the number of points n. This is not possible because the estimation of Eq. (101) also introduces a big shot noise from measuring the expectation values. In particular, the variance of the approximate \(\langle M\rangle_{\textit{est}}(0)\) is \begin{equation} \mathrm{Var}[\langle M\rangle_{\textit{est}}(0)] = \sum_{k = 0}^{n} \beta_{k}^{2} \mathrm{Var}[\langle M\rangle(\alpha_{k} \varepsilon)], \end{equation} (103) which is \(\gamma_{\textit{Ric}} =\sum_{k = 0}^{n}\beta_{k}^{2}\) larger than the variance of \(\langle M\rangle\), assuming that \(\mathrm{Var}[\langle M\rangle(\alpha_{k}\varepsilon)]\) is constant for all k. This implies that we have to take \(\gamma_{\textit{Ric}}\) times greater samples to achieve the same accuracy as the error-free ideal case when employing Richardson extrapolation. In the following, we regard \(\gamma_{\textit{Ric}}\) as the cost of error mitigation. Since the sampling cost \(\gamma_{\textit{Ric}}\) increases exponentially to n,165) we can only opt for a small constant value of n.

Notably, error mitigation via Richardson extrapolation has been experimentally implemented for finding the ground state energy of H2 and LiH with the variational quantum eigensolver method.166) The error mitigation method dramatically reduces the calculation error for several orders, leading to results close to the chemical accuracy.

Exponential extrapolation

The Richardson extrapolation assumes a valid Taylor expansion with a polynomial function of the error rates and negligible higher order terms. However the expansion based on polynomial function might be inaccurate for the practical scenario with small error rate ε and large number of gates \(N_{g}\). Again, suppose all the noise channels are Markovian and stochastic as \begin{equation} \mathcal{E} = (1-\varepsilon)\mathcal{I} + \varepsilon \mathcal{N}. \end{equation} (104) Then a sequence of \(N_{g}\) gates \(\mathcal{EU} =\mathcal{E}_{N_{g}}\circ \mathcal{U}_{N_{g}}\circ \cdots\mathcal{E}_{1}\circ\mathcal{U}_{1}\) can be expanded as \begin{align} \mathcal{EU} &= \prod_{j = 1}^{N_{g}} ((1-\varepsilon)\mathcal{I} + \varepsilon \mathcal{N}_{j}) \circ \mathcal{U}_{j}\\ &= \sum_{k = 0}^{N_{g}} (1-\varepsilon)^{N_{g}-k}\varepsilon^{k} \sum_{i = 1}^{\binom{N_{g}}{k}} \mathcal{G}_{k}^{i}\\ &= \sum_{k = 0}^{N_{g}} p_{k} \mathcal{G}_{k}. \end{align} (105) Here the second line regroups the expansion according to the number of errors and \(\mathcal{G}_{k}^{i}\) corresponds to one of the expansion with k errors. In the third line, we denote the binomial distribution as \(p_{k} =\binom{N_{g}}{k} (1-\varepsilon)^{N_{g}-k}\varepsilon^{k}\) and define the average of \(\mathcal{G}_{k}^{i}\) as \begin{equation*} \mathcal{G}_{k} = \frac{\displaystyle\sum_{i = 1}^{\binom{N_{g}}{k}} \mathcal{G}_{k}^{i}}{\binom{N_{g}}{k}}. \end{equation*} With sufficiently large \(N_{g}\) but small ε satisfying \(N_{g}\varepsilon = O(1)\), the binomial distribution \(p_{k}\) can be approximated by the Poisson distribution \begin{equation} p_{k}\approx e^{-N_{g} \varepsilon} \frac{(N_{g} \varepsilon)^{k}}{k!}, \end{equation} (106) and the noisy channel can be expanded as \begin{equation} \mathcal{EU} = e^{-N_{g} \varepsilon}\sum_{k = 0}^{N_{g}} \frac{(N_{g} \varepsilon)^{k}}{k!} \mathcal{G}_{k}, \end{equation} (107) which includes a factor \(e^{-N_{g}\varepsilon}\) that exponentially decays with \(N_{g}\varepsilon\).

Therefore, instead of a polynomial expansion of the expectation value, Endo et al.17) considered the exponential decay. In particular, approximating the sum of Eq. (107) to the first order, and considering the two noisy measurements \(\langle M\rangle(\varepsilon)\) and \(\langle M\rangle(\alpha\varepsilon)\) at the two noise rates ε and \(\alpha\varepsilon\) (\(\alpha>1\)), an approximation of \(\langle M\rangle(0)\) can be found as \begin{equation} \langle M\rangle_{\textit{est}}(0) = \frac{\alpha e^{N_{g}\varepsilon}\langle M\rangle(\varepsilon)- e^{N_{g} \alpha \varepsilon}\langle M\rangle(\alpha \varepsilon)}{\alpha -1}. \end{equation} (108) Then the error mitigation cost is \begin{equation} \gamma_{\textit{exp}} = \frac{\alpha^{2} e^{2N_{g} \varepsilon}+e^{2 N_{g} \alpha \varepsilon}}{(\alpha-1)^{2}}. \end{equation} (109) Note that the exponential extrapolation only introduced a different expansion to the error and we can similarly apply the least square fitting method for the general case with multiple or different types of error rates. Advantage of exponential extrapolation over linear Richardson extrapolation was numerically demonstrated in Endo et al.17) and Giurgica-Tiron et al.167) Exponential extrapolation was used on IBM's superconducting qubit device for implementing dynamical mean-field theory simulation via Hamiltonian simulation algorithm.168) Recently, exponential extrapolation was further generalised to multi-exponential extrapolation in the form of \begin{equation} \sum_{k} b_{k} \exp(- \Gamma_{k} \varepsilon) \end{equation} (110) via analytical arguments, which was numerically shown to be effective in examples that cannot be fitted using a single exponential curve.47)

Methods to boost physical errors

Since the extrapolation methods use measurement results with different error rates, here we review three different methods for effectively increasing the error rates. Firstly, by increasing the number of noisy gates, the amount of physical noise can be effectively increased. For example, letting \(U_{\textit{CN}}\) denote a controlled-NOT (CNOT) operation with \(U_{\textit{CN}}^{2} = I\), a sequence of \(2n+1\) noisy CNOT operations approximates one CNOT operation with about \(2n+1\) times greater physical error. More precisely, suppose that a gate process can be described as \(\mathcal{EU}_{\textit{CN}} = [(1-\varepsilon)\mathcal{I}+\varepsilon \mathcal{N}]\mathcal{U}_{\textit{CN}}\), and we have \((\mathcal{EU}_{\textit{CN}})^{2n+1}\approx [(1-(2n+1)\varepsilon)\mathcal{I}+(2n+1)\varepsilon\mathcal{N}]\mathcal{U}_{\textit{CN}}\) under the assumption that the noise process \(\mathcal{N}\) commutes with the gate process \(\mathcal{U}_{\textit{CN}}\) and \(\varepsilon\ll 1\). Thus we can see that the error rate is boosted from ε to \((2n+1)\varepsilon\). This method was employed to improve the performance of the simulation using IBM Q and Rigetti Quantum Cloud services.169) To obtain fine-grained resolution for boosted error rates, unitary folding method was introduced.167) Meanwhile, it is shown that by setting n to a random variable, the boosted error rate can take a continuous value.170) Let \(p_{m}\) be a probability that \(n = m\), we have \(\sum_{m} p_{m} (\mathcal{EU}_{\textit{CN}})^{2m+1}\approx [(1-(2\langle n\rangle+1)\varepsilon)\mathcal{I}+(2\langle n\rangle+1)\varepsilon\mathcal{N}]\mathcal{U}_{\textit{CN}}\), where \(\langle n\rangle\) is an expectation value of n. Then the error rate can be boosted by a factor of \(2\langle n\rangle+1\). Note that when the noise does not commute with the gate, the boosted error rate may be different from the desired value, which may further affect the error mitigation result.

The second method is via the re-scaling of the Hamiltonian in realising quantum gates.16,166) Suppose a gate operation in a quantum circuit is described as follows \begin{equation} \frac{d}{dt} \rho_{\lambda}(t) = - i [H(t), \rho_{\lambda}(t)]+ \lambda \mathcal{L}(\rho_{\lambda}(t)), \end{equation} (111) where \(H(t)\) is a time-dependent Hamiltonian, \(\mathcal{L}\) denotes a dissipative effect, and λ is the dissipation strength. Now we re-scale the Hamiltonian to \(\frac{1}{c} H(\frac{1}{c})\) with \(c>1\), and the corresponding state \(\rho_{\lambda} '(t)\) follows \begin{equation} \frac{d}{dt} \rho'_{\lambda}(t) = - i \left[\frac{1}{c}H\left(\frac{t}{c} \right), \rho'_{\lambda}(t) \right]+ \lambda \mathcal{L}(\rho'_{\lambda}(t)). \end{equation} (112) Then we can show \(\rho'_{\lambda}(c t) =\rho_{c\lambda}(t)\) under the assumption that \(\mathcal{L}\) is invariant under re-scaling. This can be interpreted as that by reducing pulse strength by a factor of \(1/c\) and increasing the pulse width, i.e., the evolution time c times greater, we can boost the noise strength from λ to \(c\lambda\). Note that when the dissipation is not re-scaling invariant, the factor c may become different from the expected one, leading to estimation errors in error mitigation. Hamiltonian re-scaling method was demonstrated in experiment using superconducting qubits166) and the IBM OpenPulse framework.171)

The third method is via Pauli twirling.14) Suppose the dominant error source are from two-qubit gates so that single-qubit gate error rate is sufficiently low. By randomly applying Pauli gates before and after a Clifford entangling two-qubit gates, we can first convert an arbitrary error into stochastic Pauli errors.172) Then, by randomly generating additional Pauli gates after the twirled two qubit gates, we can boost the physical error rate to any desired amount. This method assumes small single qubit error rate and exact knowledge of the error form. It may not work when the single-qubit error rate is high, or knowing the exact error channel is experimentally challenging.

Mitigation of algorithmic errors

In addition to mitigating physical errors of the circuit, the extrapolation error mitigation method can also be applied to mitigating algorithmic errors in Hamiltonian simulation via Trotterisation.173) Suppose the Hamiltonian of interest is decomposed as \(H =\sum_{j} H_{j}\), where \(H_{j}\) is a Hamiltonian which involves few-body interactions. According to the first order Trotter formula, the time evolution operator \(U(t) = e^{-iHt}\) can be approximated as \begin{equation} U(t) = \left(\prod_{j} e^{-iH_{j} t/N_{T}} \right)^{N_{T}} {}+ O(t^{2}/N_{T}), \end{equation} (113) where \(N_{T}\) is the number of Trotter steps. Let \(\varepsilon_{A} = 1/N_{T}\), which can be interpreted as algorithmic error due to insufficiency of the number of Trotter steps. We can increase the accuracy of the approximation by decreasing \(\varepsilon_{A}\) or equivalently having more Trotter steps. However increasing \(N_{T}\) also introduces more physical errors and a trade-off between the algorithmic error and the physical error needs be considered.

By properly choosing the optimal number of Trotter steps \(N_{T}\), we now show how to further mitigate the algorithmic error via the error mitigation method. We first define a finite Trotter expansion as \begin{equation} U_{\varepsilon_{A}}(t) \equiv \left(\prod_{j} e^{-iH_{j} t \varepsilon_{A}} \right)^{1/\varepsilon_{A}}, \end{equation} (114) where we have \(\lim_{\varepsilon_{A}\rightarrow 0} U_{\varepsilon_{A}}(t) = U(t)\) with \(N_{T}\rightarrow \infty\). Now suppose we apply \(U_{\varepsilon_{A}}(t)\) to an initial state \(|\psi_{\textit{in}}\rangle\) and measure an observable M on the output state. Then similar to the scenario of mitigating physical errors, we can expand the measurement result as a function of the algorithmic error \(\varepsilon_{A}\) as \begin{equation} \langle M\rangle(\varepsilon_{A}) = \langle M\rangle(0)+\sum_{k}^{n} M_{k} \varepsilon_{A}^{k}+O(\varepsilon_{A}^{n+1}). \end{equation} (115) Thus we can suppress algorithmic errors in a similar way to suppresing physical errors via the extrapolation method, in particular, by boosting the algorithmic error with a smaller number of Trotter steps. Practically we need to use QEM for physical errors first, and then apply extrapolation for algorithmic errors.

Least square fitting for several noise parameters

Now, consider quantum error mitigation via the least square fitting.40) Although the assumption in literature is good characterisation of error rates and amplification of noise for extrapolation is not used, it may be employed to prepare results corresponding to several error rates. Consider well-characterised different error rates \(\varepsilon_{1},\ldots,\varepsilon_{n'}\), the expansion of Eq. (100) up to the nth order corresponds to a linear equation, \begin{equation} \mathbf{E}\vec{\mathbf{m}} \approx \vec{\mathbf{r}}, \end{equation} (116) where \begin{equation} \mathbf{E}= \begin{pmatrix} 1 &\varepsilon_{1} &\ldots &\varepsilon_{1}^{n}\\ 1 &\varepsilon_{2} &\ldots &\varepsilon_{2}^{n}\\ \vdots &\vdots &\ddots &\vdots\\ 1 &\varepsilon_{n'} &\ldots &\varepsilon_{n'}^{n} \end{pmatrix}, \end{equation} (117) \(\vec{\mathbf{m}} = (\langle M\rangle(0), M_{1},\ldots, M_{n})^{T}\), and \(\vec{\mathbf{r}} = (\langle M\rangle(\varepsilon_{1}),\langle M\rangle(\varepsilon_{2}),\ldots,\langle M\rangle(\varepsilon_{n}))^{T}\). Here the number of error rates \(n'\) can be different from the expansion order n. A solution of the linear equation can be obtained from minimising \begin{equation} \|\mathbf{E}\vec{\mathbf{m}} - \vec{\mathbf{r}}\|^{2}, \end{equation} (118) whose solution corresponds to the Richardson extrapolation method when \(n = n'\). Here we use the vector norm \(\|\vec{\mathbf{v}}\| =\sqrt{\vec{\mathbf{v}}^{T}\cdot\vec{\mathbf{v}}}\).

In practice, there may exist multiple noise parameters from different noisy effects, e.g., \(T_{1}\) and \(T_{2}\) noise. Suppose that there is another noise parameter ν, and we expand the expectation value \(\langle M\rangle(\varepsilon,\nu)\) as \begin{align} \langle M\rangle(\varepsilon, \nu) &= \langle M\rangle(0)+M_{10} \varepsilon +M_{01} \nu\\ &\quad + M_{11} \varepsilon \nu +M_{20} \varepsilon^{2}+M_{02} \nu^{2}+\cdots. \end{align} (119) Suppose we consider the truncation up to the second order, we get the same linear expansion of Eq. (116) with \begin{equation} \mathbf{E}= \begin{pmatrix} 1 &\varepsilon_{1} &\nu_{1} &\varepsilon_{1} \nu_{1} &\varepsilon_{1}^{2} &\nu_{1}^{2}\\ 1 &\varepsilon_{2} &\nu_{2} &\varepsilon_{2} \nu_{2} &\varepsilon_{2}^{2} &\nu_{2}^{2}\\ \vdots &\vdots &\ddots &\vdots &\vdots &\vdots\\ 1 &\varepsilon_{n'} &\nu_{n'} &\varepsilon_{n'} \nu_{n'} &\varepsilon_{n'}^{2} &\nu_{n'}^{2} \end{pmatrix}, \end{equation} (120) and \begin{equation} \vec{\mathbf{m}} = \begin{pmatrix} \langle M\rangle(0)\\ M_{10}\\ M_{01}\\ M_{11}\\ M_{20}\\ M_{02} \end{pmatrix},\quad \vec{\mathbf{r}} = \begin{pmatrix} \langle M\rangle(\varepsilon_{1}, \nu_{1})\\ \langle M\rangle(\varepsilon_{2},\nu_{2})\\ \vdots\\ \langle M\rangle(\varepsilon_{n'},\nu_{n'}) \end{pmatrix}. \end{equation} (121) An estimation of \(\langle M\rangle(0)\) can be similarly obtained by minimising Eq. (118). This method was demonstrated on Rigetti's 8-qubit device.40)

Quasi-probability method

The quasi-probability method is another error mitigation technique, which was first introduced by Temme et al.16) for special channels and then generalised to practical Markovian noise by Endo et al.17) The main idea is for any noise process, its effect can be cancelled out by probabilistically implementing its inverse process. Considering a single noisy gate as \(\mathcal{E}\circ\mathcal{U}\) with ideal gate \(\mathcal{U}\) and noise \(\mathcal{E}\), we may find an operation \(\mathcal{E}^{-1}\) that inverts the noise with \(\mathcal{E}^{-1}\circ\mathcal{E} =\mathcal{I}\). Note that we can mathematically invert the noise \(\mathcal{E}\) for most practical cases, the inverse process is in general unphysical and it cannot be directly realised by applying unitary operations on quantum states. Nevertheless, for any basis set of quantum processes \(\{\mathcal{B}_{i} \}\), we can decompose the inverse channel as \(\mathcal{E}^{-1} =\sum_{i} q_{i}\mathcal{B}_{i}\) and consequently we have \begin{equation} \mathcal{U} = \mathcal{E}^{-1}\circ \mathcal{E} \circ\mathcal{U} = \sum_{i} q_{i} \mathcal{K}_{i}, \end{equation} (122) with \(\mathcal{K}_{i} =\mathcal{B}_{i}\circ\mathcal{E}\circ\mathcal{U}\). Therefore even if we cannot directly implement the inverse process, we can effectively realise it by applying basis operations \(\mathcal{B}_{i}\) to the noisy gate and linearly combining the channels \(\mathcal{K}_{i}\) with real coefficients \(q_{i}\). In particular, supposing we input the gate with state ρ and measure the output with an observable M, the ideal measurement statistic \(\langle M\rangle_{\mathcal{U}(\rho)} =\mathop{\text{Tr}}\nolimits [\mathcal{U}(\rho)M]\) can be obtained as \begin{equation} \langle M\rangle_{\mathcal{U}(\rho)} = \sum_{i} q_{i}\langle M\rangle_{\mathcal{K}_{i}(\rho)}, \end{equation} (123) where each \(\langle M\rangle_{\mathcal{K}_{i}(\rho)} =\mathop{\text{Tr}}\nolimits [\mathcal{K}_{i}(\rho)M] =\mathop{\text{Tr}}\nolimits [\mathcal{B}_{i}\circ\mathcal{E}\circ\mathcal{U}(\rho)M]\) corresponds to the measurement with noisy gates. Instead of measuring all \(\langle M\rangle_{\mathcal{K}_{i}(\rho)}\), we can define \(p_{i} = |q_{i}|/C_{\mathcal{E}}\) with \(C_{\mathcal{E}} =\sum_{i}|q_{i}|\) so that \begin{equation} \langle M\rangle_{\mathcal{U}(\rho)} = C_{\mathcal{E}}\sum_{i} \mathop{\text{sgn}}\nolimits(q_{i})p_{i}\langle M\rangle_{\mathcal{K}_{i}(\rho)}. \end{equation} (124) Therefore, we can probabilistically append the basis operation \(\mathcal{B}_{i}\) to the noisy channel \(\mathcal{E}\circ\mathcal{U}\) with probability \(p_{i}\). Then we measure the output state and multiply the measurement outcome with the parity \(\mathop{\text{sgn}}\nolimits(q_{i})\). The ideal measurement \(\langle M\rangle_{\mathcal{U}(\rho)}\) corresponds to the averaged measurement results multiplied by \(C_{\mathcal{E}}\). Note that, because we multiply \(C_{\mathcal{E}}\) to the averaged result, the variance is amplified with \(\gamma_{Q} = C_{\mathcal{E}}^{2}\), which is regarded as the error mitigation cost.

As an example, we consider error mitigation of the depolarising noise \begin{equation} \mathcal{D}(\rho) = \left(1-\frac{3}{4}p\right)\rho+\frac{p}{4}(X\rho X+Y \rho Y + Z \rho Z), \end{equation} (125) whose inverse process is mathematically given by \begin{equation} \mathcal{D}^{-1}(\rho) = C_{\mathcal{D}}[p_{1} \rho -p_{2} (X\rho X+Y \rho Y + Z \rho Z)], \end{equation} (126) where \(C_{\mathcal{D}} = (p+2)/(2-2p)>1\), \(p_{1} = (4-p)/(2p+4)\), \(p_{2} = p/(2p+4)\), and \(p_{1}+3p_{2} = 1\). Thus, for any ideal unitary \(\mathcal{U}\), it can be written as \begin{align} \mathcal{U} &= \mathcal{D}^{-1} \circ\mathcal{DU}\\ &= C_{\mathcal{D}} [p_{1} \mathcal{I}\circ\mathcal{DU}-p_{2} (\mathcal{X}\circ\mathcal{DU}+\mathcal{Y}\circ\mathcal{DU}+\mathcal{Z}\circ\mathcal{DU})]. \end{align} (127) Here, we denote the I, X, Y, and Z gates as \(\mathcal{I}\), \(\mathcal{X}\), \(\mathcal{Y}\), and \(\mathcal{Z}\), respectively and the noisy gate \(\mathcal{D}\circ \mathcal{U}\) as \(\mathcal{DU}\). To recover the ideal gate \(\mathcal{U}\) from the noisy gate \(\mathcal{DU}\), we append the X, Y, Z gates with probability \(p_{2}\). When the state is measured, the parity corresponding to the generated operation and the constant \(C_{\mathcal{D}}\) is multiplied to the measurement result. By repeating this procedure many times, we can recover the measurement outcome with the ideal gate.

The quasi-probability method can be applied to a general quantum circuit, as illustrated in Fig. 15. Let us assume that the ideal process of the entire quantum circuit is described as \(\prod_{k = 1}^{N_{g}}\mathcal{U}_{k}\), where \(\mathcal{U}_{k}\) corresponds to each gate. Suppose \(\mathcal{U}_{k}\) is decomposed as \begin{equation} \mathcal{U}_{k} = C_{k} \sum_{i_{k}} p_{i_{k}} \mathop{\text{sgn}}\nolimits (q_{i_{k}}) \mathcal{K}_{i_{k}}, \end{equation} (128) and the ideal process can be represented as \begin{align} \prod_{k = 1}^{N_{g}} \mathcal{U}_{k} &= \prod_{k = 1}^{N_{g}}C_{k} \sum_{i_{1} i_{2}\cdots i_{N_{g}}} \prod_{k = 1}^{N_{g}}p_{i_{k}} \prod_{k = 1}^{N_{g}}\mathop{\text{sgn}}\nolimits (q_{i_{k}}) \prod_{k = 1}^{N_{g}} \mathcal{K}_{i_{k}}\\ &= C_{\textit{tot}}\sum_{\vec{i}} p_{\vec{i}} \cdot \mathop{\text{sgn}}\nolimits(q_{\vec{i}})\cdot \mathcal{K}_{\vec{i}}, \end{align} (129) where \(\vec{i} = (i_{1},i_{2},\ldots,i_{N_{g}})\), \(C_{\textit{tot}} =\prod_{k = 1}^{N_{g}}C_{k}\), \(p_{\vec{i}} =\prod_{k = 1}^{N_{g}}p_{i_{k}}\), \(\mathop{\text{sgn}}\nolimits(q_{\vec{i}}) =\prod_{k = 1}^{N_{g}}\mathop{\text{sgn}}\nolimits (q_{i_{k}})\), and \(\mathcal{K}_{\vec{i}} =\prod_{k = 1}^{N_{g}}\mathcal{K}_{i_{k}}\). Therefore, we can similarly realise each noisy process \(\mathcal{K}_{\vec{i}}\) with probability \(p_{\vec{i}}\) and multiply the outcome with \(\mathop{\text{sgn}}\nolimits(q_{\vec{i}})\) to recover the effect of the ideal circuit. We also multiply \(C_{\textit{tot}}\) to the averaged outcome. The error mitigation cost is \(\gamma_{\textit{tot}} = C_{\textit{tot}}^{2}\), which becomes \(\gamma_{\textit{tot}}\approx e^{2 b\varepsilon N_{g}}\) when we assume \(C_{k} = 1+ b\varepsilon\) with positive coefficient b and error probability ε. For stochastic noise, b is generally less than 2.17) We can see that the cost increases exponentially to \(\varepsilon N_{g}\), i.e., the averaged number of errors in the whole circuit. Therefore, the quasi-probability method is efficient only if \(\varepsilon N_{g} = O(1)\).

Figure 15. (Color online) Schematic of the quasi-probability method. We apply the inverse channel to cancel the noise process \(\mathcal{E}\) with quasi-probability method. The inverse operation specified in the figure corresponds to the case of depolarising channel. For simplicity, we just show the case that two-qubit gate error is a tensor product of depolarising channels. We multiply the product of parities of generated operations \(\mathop{\text{sgn}}\nolimits(q_{\vec{i}})\) to the outcome. We compute the average of product of the parity and the outcome, which is multiplied with the cost \(C_{\textit{tot}}\) to approximate the ideal expectation value of the observable.

In contrast to the extrapolation method, we also need to exactly identify the noise model \(\mathcal{E}\) in quantum circuits to apply the quasi-probability method. However, state preparation and measurement (SPAM) errors in the conventional process tomography will result in errors of the estimated process and hence the effect of error mitigation. Such a problem was addressed in Endo et al.,17) where gate set tomography (GST) was applied so that the quasi-probability decomposition obtained from GST is free from SPAM errors. The same paper also showed how to use experimentally feasible operations to suppress general Markovian errors. The application of quasi-probability in the presence of non-Markovian errors, such as temporally and spatially correlated errors, was further studied by Huo and Li.174) Detailed theoretical analysis of the cost using resource theory approach was given by Takagi.175) The quasi-probability method has been demonstrated using superconducting176) and trapped ion systems177) for single- and two-qubit gates.

Quantum subspace expansion

The quantum subspace expansion (QSE) can not only be used for evaluating excited states, but also for quantum error mitigation in variational quantum eigensolver (VQE).36) Suppose that we already have the approximation of the ground state ρ of the Hamiltonian H via either the VQE or variational imaginary time evolution. The approximation ρ may not be the exact ground state of H due to gate errors or imperfect ansatz and optimisation. To mitigate such errors, we consider a small subset of operators \(S\subset \{I, X, Y, Z \}^{\otimes N_{q}}\) and a subspace \begin{equation} \left\{\rho_{\textit{QEM}} = \sum_{i,j} c_{i} c_{j}^{*} P_{i} \rho P_{j}\,|\, P_{i}\in S,\ c_{i}\in \mathbb{C},\ \mathop{\text{Tr}}\nolimits [\rho_{\textit{QEM}}] = 1 \right\}. \end{equation} (130) The QSE method is to minimise the energy of quantum states from the subspace \begin{equation} \min\nolimits_{\vec{c}} \mathop{\text{Tr}}\nolimits[\rho_{\textit{QEM}} H] \end{equation} (131) over all possible parameters \(\vec{c} = (c_{0}, c_{1},\ldots.)\). Such a minimisation can be reformulated as solving a generalised eigenvalue problem \begin{equation} \tilde{H} \vec{c} = E \tilde{S} \vec{c}, \end{equation} (132) where E is the eigenvalue, \(\tilde{H}_{ji} =\mathop{\text{Tr}}\nolimits[\rho P_{j} H P_{i}]\) and \(\tilde{S}_{ji} =\mathop{\text{Tr}}\nolimits[\rho P_{j} P_{i}]\). Note that both \(\tilde{H}\) and \(\tilde{S}\) can be efficiently measured on a quantum computer, and the problem can be efficiently solved on a classical computer given a small set of S. Once \(\vec{c}\) is determined, we can compute the expectation value of any operator M as \(\sum_{ij}c_{i} c_{j}^{*}\mathop{\text{Tr}}\nolimits[\rho P_{j} M P_{i}]\).

The QSE method is more effective for coherent errors such as over-rotation of quantum gates, and it may not mitigate all local stochastic errors. Specifically, the subspace expansion method is equivalent to optimising states from the subspace \(\{\rho_{\textit{QEM}} =\mathcal{P}\rho\mathcal{P}^{\dagger}/{\mathop{\text{Tr}}\nolimits}[\mathcal{P}\rho\mathcal{P}^{\dagger}]\}\) with \(\mathcal{P} =\sum_{i} c_{i}P_{i}\). Under a singular value decomposition of the operator \(\mathcal{P}\), its effect can be regarded as a combination of state rotation and projection, which may be able to correct coherent errors and stochastic errors, respectively. Small under- and over-rotation of quantum gates could be corrected via the unitary part of \(\mathcal{P}\), which makes QSE effective for coherent errors. For stochastic errors, the projection part of the operator \(\mathcal{P}\) is supposed to project out the errors. For example, considering ρ to be a mixture of the ideal pure state \(|\psi\rangle\langle\psi|\) and the noise σ as \(\rho = (1-\varepsilon)|\psi\rangle\langle\psi|+\varepsilon\sigma\), we can use a projection \(|\psi\rangle\langle\psi|\) to suppress the noise, by finding coefficients to satisfy the condition \(\mathcal{P}\equiv\sum_{i} c_{i}P_{i} = |\psi\rangle\langle\psi|\). However, a general quantum state \(|\psi\rangle\) may be expanded into an exponential number of Pauli terms, such a condition becomes hardly true given a small subset of S and hence the QSE method is less effective for stochastic errors on a general quantum state. Nevertheless, as we shortly discussion in the next section, when the target state preserves certain symmetries either inherently or by encoding via an error correcting code, the QSE method can indeed effectively suppress stochastic errors by efficiently projecting to the subspace with the correct symmetry. It can thus mitigate a large portion of stochastic errors and significantly improve the calculation accuracy when further combined with other QEM techniques.37)

In practice, the set S can be chosen to be creation and annihilation operators for fermionic Hamiltonians. The QSE method was experimentally demonstrated for mitigating errors in quantum chemistry calculation of ground and excited state energies of the H2 molecule.65)

Symmetry verification

When the physical system we try to simulate exhibits certain symmetries,178,179) they can be exploited for QEM using symmetry verification.37,38) We can design ansätze with unitary components that conserve the particle number and the spin symmetries, such as the unitary coupled cluster ansatz69) or certain hardware efficient ansätze.180) We will focus on symmetries that can be mapped to Pauli operators since they can be efficiently measured. For example, the parity operators for the particle number and for the spins, \(\hat{P}_{N}\) and \(\hat{P}_{N_{\uparrow\downarrow}}\), both have \(\pm 1\) eigenvalues and thus can be mapped to Pauli operators under a suitable encoding scheme.

While the ideal quantum state preserves the symmetries, states prepared by the noisy circuit may break them. Since symmetries are broken only if an error happens, symmetry verification works by discarding the cases that break the symmetries.37,38) We can implement the measurement of \(\hat{P}_{N}\) and \(\hat{P}_{N_{\uparrow\downarrow}}\) by implementing the parity check circuit with an ancilla qubit interacting with the register qubit as shown in Fig. 16. Since a general Pauli operator is the tensor product of local Pauli operators, the symmetry can also be measured using local Pauli measurement and post-processing in certain cases without using ancillae.181) If the symmetry outcome is not the expected one, the circuit run is discarded. More errors can be detected by considering more general symmetries, such as the one that preserves the particle number. Symmetry verification can also be combined with other QEM methods as will be discussed in Sect. 5.10.

Figure 16. Quantum circuit used for symmetry verification. This quantum circuit is for four register qubits. The ancilla qubit is measured in 0 (1) when the the total particle number is even (odd). Thus, by reading out the ancilla qubit, we can detect errors in the register qubits.

Alternatively, we can effectively implement the parity check circuit via a post-processing approach.38) Here we consider the conventional VQE scenario and the case where \(\hat{P}_{N}\) is measured, and error-free subspace corresponds to \(\hat{P}_{N} |\psi\rangle = m |\psi\rangle\) with \(m\in\{\pm 1\}\). For any noisy state ρ that may have components with both positive and negative eigenvalues of \(\hat{P}_{N}\), projecting the state to the subspace with correct symmetry is \begin{equation} \rho_{m} = \frac{M_{m} \rho M_{m}}{\mathop{\text{Tr}}\nolimits[M_{m} \rho]}, \end{equation} (133) where \(M_{m} =\frac{I+m P_{N}}{2}\). Supposing we measure the Hamiltonian H, which commutes with \(\hat{P}_{N}\), we have \begin{equation} \mathop{\text{Tr}}\nolimits[H \rho_{m}] = \frac{\mathop{\text{Tr}}\nolimits[H \rho]+ m\mathop{\text{Tr}}\nolimits[H \hat{P}_{N} \rho]}{1+m \mathop{\text{Tr}}\nolimits[\hat{P}_{N} \rho]}. \end{equation} (134) Therefore, we can measure \(\mathop{\text{Tr}}\nolimits[H\rho]\), \(\mathop{\text{Tr}}\nolimits[\hat{P}_{N}\rho]\), and \(\mathop{\text{Tr}}\nolimits[H\hat{P}_{N}\rho]\) to effectively obtain the measurement with the post-selected state \(\rho_{m}\). The state of Eq. (133) is consistent with the subspace states in the previous section. When allowing to vary m, optimising m would be equivalent to the QSE procedure. It indeed has been shown that the solution of QSE coincides with the state of Eq. (133) given the symmetry of the target ground state.38)

In contrast to the parity-check circuit based approach, the current one does not need the additional ancilla and the parity-check circuit even in the most general cases. However it does need more samples since the error is mitigated via post-processing. For example, suppose the probability that the noisy state is in the correct subspace is \(p =\mathop{\text{Tr}}\nolimits[M_{m}\rho]\). Then to achieve a sampling error ε, the parity-check circuit method requires \(\mathcal{O}(1/(p\varepsilon^{2}))\) samples and the post-processing method requires \(\mathcal{O}(1/(p^{2}\varepsilon^{2}))\) samples. The post-processing approach also applies for general symmetries such as the particle number.182) The post-processing approach was experimentally demonstrated by using a two-qubit superconducting processor.183)

When considering the logical states of a stabiliser quantum error correcting code, it has code symmetries described by the stabilisers. The conventional approach for realising error detection and error correction is based on the parity check circuit with additional ancillary qubits. In McClean et al.,184) the post-processing approach was extended for realising the error detection and error correction without the parity check circuit. Since an error correcting code has multiple symmetries, we can probabilistically apply each symmetry projection to force the state back into the code space. The post-processing error decoder for the five qubit quantum error correcting code was numerically implemented with an effective threshold of 50% and applied for chemistry simulation.184)

Individual error reduction

The individual error reduction method makes use of quantum error correction (QEC) on a single qubit and post-processing to mitigate errors.39) Assume that physical noise after each quantum gate is described using the Lindblad master equation \begin{equation} \begin{split} \frac{d \rho}{dt} &= \sum_{k} \mathcal{L}_{k} (\rho)\\ L_{k}(\rho) &= 2L_{k}^{\dagger} \rho L_{k} -L_{k}^{\dagger} L_{k} \rho -\rho L_{k}^{\dagger} L_{k}, \end{split} \end{equation} (135) and the duration of the noise process is τ. Note that \(L_{k}\) operates on the kth qubit. Suppose that the lth qubit is encoded as a logical qubit and the noise on it can be reduced by a factor of \(h_{l}\) by QEC. Then the evolution of the state is described as \begin{equation} \frac{d \rho_{l}}{dt} = \sum_{k \neq l} \mathcal{L}_{k} (\rho)+(1-h_{l})L_{l}(\rho_{l}). \end{equation} (136) Then, the individual error reduction method defines \begin{equation} \tilde{\rho}^{\textit{est}} \equiv \rho^{\textit{noisy}}- \sum_{l} \frac{1}{h_{l}}(\rho^{\textit{noisy}}-\rho^{\textit{noisy}}_{l}), \end{equation} (137) where \(\rho^{\textit{noisy}}\) is the output state from the noisy quantum circuit without QEC described by Eq. (135) and \(\rho_{l}^{\textit{noisy}}\) is the output state with QEC on the lth qubit described by Eq. (136). Then it can be shown that \(\tilde{\rho}^{\textit{est}}\) approximates the ideal quantum state as \begin{equation} \tilde{\rho}^{\textit{est}} = \rho_{\textit{ideal}}+O(\tau^{2}), \end{equation} (138) where \(\rho_{\textit{ideal}}\) is the error-free output state from the quantum circuit. Thus, we can estimate the ideal expectation value \(\langle M\rangle_{\textit{ideal}} =\mathop{\text{Tr}}\nolimits (\rho_{\textit{ideal}} M)\) for an observable M by measuring the expectation value \(\langle M\rangle =\mathop{\text{Tr}}\nolimits[M\rho^{\textit{noisy}}]\) and \(\langle M\rangle_{l} =\mathop{\text{Tr}}\nolimits[M\rho_{l}^{\textit{noisy}}]\), and post-processing them via \begin{equation} \langle \tilde{M}\rangle_{\textit{est}} = \langle M\rangle-\sum_{l} \frac{1}{h_{l}}(\langle M\rangle-\langle M\rangle_{l}). \end{equation} (139) While the individual error reduction method suppresses errors to the first order, higher order effects can be further mitigated by combining it with other error mitigation methods. We also note that this method requires QEC on a single qubit, thus making it harder to implement compared to other QEM methods.

Measurement error mitigation

Measurement error mitigation is designed for suppressing errors during the measurement process.4143) Suppose that the ideal measurement is described by a set of positive-operator valued measure (POVM) operators \(\{E_{k}\}\) and the probability to obtain the measurement result k is \(p_{k} =\mathop{\text{Tr}}\nolimits[E_{k}\rho]\) for state ρ. Denote the ideal error-free probability distribution as \(\vec{p}_{\textit{ideal}} = (p_{1}, p_{2},\ldots, p_{N_{P}})^{\text{T}}\) with \(N_{P}\) being the number of POVM elements. In the presence of a measurement error, it transforms the probability distribution as \begin{equation} \vec{p}_{\textit{noise}} = \mathrm{N}\cdot \vec{p}_{\textit{ideal}}, \end{equation} (140) with N being the transformation matrix. For example, considering projective measurement of a qubit with probability \(\vec{p} = (p_{0}, p_{1})\), the misalignment measurement error can be described by \begin{equation} \mathrm{N}= \begin{pmatrix} 1-\varepsilon &\eta\\ \varepsilon &1-\eta \end{pmatrix}, \end{equation} (141) with ε and η denoting the error probabilities that the outcome flip from 0 to 1 and vice versa.

In practice, estimation of the matrix N\(_{\textit{est}}\) can be obtained via quantum detector tomography when there is no detector crosstalk over different qubits,185) so that N\(_{\textit{est}}\) is a tensor product of transformation matrices. Then we can obtain the ideal error free probability vector via \begin{equation} \vec{p}_{\textit{est}} = \mathrm{N}_{\textit{est}}^{-1}\cdot \vec{p}_{\textit{noise}}. \end{equation} (142) Although, this may produce unphysical probability with negative component due to non-classical noise such as coherent errors. To circumvent this problem, we can adopt \begin{equation} \vec{p}^{*}_{\textit{est}} = \mathop{\mathrm{argmin}}\nolimits_{\vec{p}_{\textit{est}}}(\| \mathrm{N}\cdot \vec{p}_{\textit{est}} -\vec{p}_{\textit{noise}} \|) \end{equation} (143) subject to the constraint that \(\vec{p}_{\textit{est}}\) is normalised and its elements are positive, where \(\|\cdot\|\) denotes Euclidean norm. This method was applied to IBM's five-qubit device to give a significant improvement of results in Maciejewski et al.41) and Chen et al.42) Recently, measurement error mitigation technique was proposed to further suppress cross-talk errors between qubits during the readout process.186) This method was demonstrated on the IBM 20-qubit superconducting processor.

Learning-based quantum error mitigation

While most error mitigation methods rely on exact or partial information of the noise model, the learning-based QEM method aims to suppress errors via an automatical learning process.44,45,187) Herein, we illustrate two examples by Czarnik et al.45) and Strikis et al.44)

Quantum error mitigation via Clifford data regression

Assume that we are given test data \(\{x^{\textit{ideal}}_{k}, x^{\textit{noisy}}_{k}\}\) with \(\{x^{\textit{ideal}}_{k}\}\) and \(\{x^{\textit{noisy}}_{k}\}\) being the ideal and noisy expectation values of an observable corresponding to the same target quantum circuits. We aim to learn the relationship between \(\{x^{\textit{ideal}}_{k}\}\) and \(\{x^{\textit{noisy}}_{k} \}\) as \(x^{\textit{ideal}} = g(x^{\textit{noisy}},\vec{\theta})\) via regression, and then employ this relationship to infer the unknown ideal result from any given noisy result.44) Here, \(\vec{\theta}\) are free parameters to optimise. Note that the test data \(\{x^{\textit{ideal}}_{k}, x^{\textit{noisy}}_{k}\}\) have to be sampled efficiently. We assume that the ideal error-free results \(\{x_{k}^{\textit{ideal}} \}\) are generated by efficiently simulating Clifford quantum circuits on classical computers, and \(\{x^{\textit{noisy}}_{k} \}\) are sampled from Clifford quantum circuits on real noisy quantum circuits. Given the test data, we minimise the cost function, \begin{equation} \mathrm{C}(\vec{\theta}) = \sum_{k} (g(x^{\textit{noisy}}_{k},\vec{\theta})- x^{\textit{ideal}}_{k})^{2}, \end{equation} (144) to fit the relationship between the ideal and noisy measurement results. We can choose a linear ansatz \(g(x^{\textit{noisy}}_{k},\vec{\theta}) =\theta_{1} x^{\textit{noisy}}_{k}+\theta_{2}\), or a more complex ansatz via a neural network. Once the relationship \(x^{\textit{ideal}} = g(x^{\textit{noisy}},\vec{\theta})\) is found, we input a noisy output \(x^{\textit{noisy}}\) obtained from a general quantum circuits and use the output as an estimation of the ideal result \(x^{\textit{ideal}}\). This method is referred to as Clifford Data Regression44) and illustrated in Fig. 17. Clifford Data Regression was demonstrated by using the 16-qubit IBMQ quantum computer, and a 64-qubit classical simulator.44) Since the noise effect are naturally incorporated in the learning process, this method may work for both local Markovian noise and correlated noise.

Figure 17. (Color online) Schematic figures for Clifford data regression.44) The relationship between noisy results from a quantum computer and ideal results from a classical computer is learned via regression. The training data are sampled from Clifford circuits. Then we use the relationship to predict the error mitigated result for an arbitrary non-Clifford quantum circuit.

Learning-based quasi-probability method

Instead of directly learning the ideal measurement outcome from noisy ones, Strikis et al.45) introduced a learning-based method for realising quasi-probability error mitigation without depending on process tomography. Focusing on quantum circuits that consist of arbitrary single-qubit gates and Clifford two-qubit gates, which are sufficient for universal quantum computing. Suppose the dominant error source comes from the two-qubit gates, which are called frame gates. Denote the sequence of single-qubit gates as \(\mathbf{U}= \{U_{1}, U_{2},\ldots \}\), and the ideal and noisy circuits as \(\mathcal{E}^{\textit{ideal}}(\mathbf{U})\) and \(\mathcal{E}^{\textit{noisy}}(\mathbf{U})\), respectively. To recover the ideal process, the quasi-probability method applies single-qubit Pauli operations \(\mathbf{Q}= \{Q^{1}, Q^{2},\ldots \}\) before and after gates of each layer. Denote the noisy circuit with additional operations for error mitigation as \(\mathcal{E}^{\textit{noisy}}(\mathbf{U},\mathbf{Q})\). An example of quantum circuits is shown in Fig. 18. The quasi-probability method is to find coefficients \(q(\mathbf{Q})\) so that \begin{equation} \mathcal{E}^{\textit{est}}(\mathbf{U}) = \sum_{\mathbf{Q}} q(\mathbf{Q}) \mathcal{E}^{\textit{noisy}}(\mathbf{U}, \mathbf{Q}) \end{equation} (145) approximates \(\mathcal{E}^{\textit{ideal}}(\mathbf{U})\). The conventional quasi-probability QEM method relies on the exact error channel tomography to determine the coefficients, which may fail to work for spatially and temporally correlated errors whose gate set tomography is in general hard to efficiently implement.

Figure 18. (Color) An example of quantum circuits for learning-based quasi-probability method.45) The gates denoted as Q are for mitigating errors for frame gates \(\mathcal{E}\), state preparation errors \(\mathcal{E}_{\textit{st}}\) and measurement errors \(\mathcal{E}_{\textit{me}}\), and U is for sampling a training set \(\mathbb{T}\). We insert Q and U between every layer of gates.

The learning based approach provides an alternative solution without the tomography of the error channel.45) Denoting \(\mathrm{com}^{\textit{ideal}(\textit{est})}(\mathbf{U})\) to be the expectation value of the observable of interest corresponding to \(\mathcal{E}^{\textit{ideal}}(\mathbf{U})\) [\(\mathcal{E}^{\textit{est}}(\mathbf{U})\)], we aim to minimise \begin{equation} \mathrm{C} = \frac{1}{|\mathbb{T}|} \sum_{\mathbf{U}\in \mathbb{T}}|\mathrm{com}^{\textit{ideal}}(\mathbf{U})-\mathrm{com}^{\textit{est}}(\mathbf{U}) |^{2}, \end{equation} (146) over the coefficients \(q(\mathbf{Q})\) with a training set \(\mathbb{T}\). This cost function indicates the distance between the correct expectation values and the error-mitigated expectation values. Similarly to the approach proposed by Czarnik et al.,44) the training set \(\mathbb{T}\) is set to be the set of Clifford operations \(\mathbb{C}\). Therefore, the entire quantum circuit becomes a Clifford circuit and we can efficiently calculate the ideal expectation values. It is not hard to see that the error mitigation obtained through this procedure also works for arbitrary non-Clifford single-qubit gates with gate-independent errors, because any operation can be expressed as a linear combination of Clifford operations. Since the two-qubit frame gates are fixed, there is no assumption for the error model of frame gates.

Note that the space \(S_{\mathbf{Q}}\) of all the recovery operations \(\mathbf{Q}\) may exponentially increase with the number of gates of the quantum circuit. There are two ways to overcome this problem via the truncation method and variational optimisation method.45) For the truncation method, we convert general errors to Pauli errors via Pauli twirling and truncate the space to a subset whose dimension only increases polynomially with the circuit size. Alternatively, we can use the Monte-Carlo method to evaluate the cost function and variationally search the parameters to minimise the cost function. In practice, we can set the starting point of the optimisation to the one specified with the results of imperfect gate set tomography.

Stochastic error mitigation

The previous error mitigation methods mostly focus on the scenario of digital quantum simulation with discrete noisy gates. Now we review the stochastic error mitigation method for mitigating errors in continuous processes, which works for analog quantum simulation and digital quantum computing.46)

Suppose the dynamics of the system of interest is described by the Lindblad master equation as \begin{equation} \frac{d}{dt}\rho = - i [H, \rho]+ \mathcal{L}(\rho), \end{equation} (147) with the ideal evolution Hamiltonian H, noise Lindblad operator \(L_{k}\), and \(\mathcal{L}(\rho) =\sum_{k} (2L_{k}\rho L_{k}^{\dagger} -L_{k}^{\dagger} L_{k}\rho-\rho L_{k}^{\dagger} L_{k})\). Suppose the Lindblad operators are local and their strength is weak. Notice that because the effect of Lindblad operators propagates to the entire system due to time evolution, it is generally hard to suppress such a global effect of physical noise via conventional QEM methods. For simplicity, we also assume that the Hamiltonian is time-independent and we note that the result works for general time-dependent Hamiltonians. Now we express the evolution of the state from t to \(t+\delta t\) as \begin{equation} \rho_{i}(t+\delta t) = \mathcal{E}_{i} (\rho_{i}(t)), \end{equation} (148) with \(i =\mathcal{I},\mathcal{N}\) corresponding to the ideal and noisy process. To emulate the ideal evolution of \(\mathcal{E}_{\mathcal{I}}\) with the noisy evolution \(\mathcal{E}_{\mathcal{N}}\), we can implement the quasi-probability method by applying \begin{equation} \mathcal{E}_{\mathcal{I}} = \mathcal{E}_{\mathcal{Q}} \mathcal{E}_{\mathcal{N}}, \end{equation} (149) with the recovery operation \(\mathcal{E}_{\mathcal{Q}}\) decomposed as \begin{equation} \mathcal{E}_{\mathcal{Q}} = \sum_{i} q_{i} \mathcal{B}_{i} = C_{\mathcal{Q}} \sum_{i} \mathop{\text{sgn}}\nolimits(q_{i}) p_{i} \mathcal{B}_{i}. \end{equation} (150) Here, \(C_{\mathcal{Q}} =\sum_{i} |q_{i}|\), \(p_{i} = |q_{i}|/C_{\mathcal{Q}}\), and \(\mathcal{B}_{i}\) are tensor products of single qubit operators. To simulate the whole ideal evolution of time T, we can continuously apply the recovery \(\mathcal{E}_{\mathcal{Q}}\) with a time interval \(\delta t\) with a cost \(C_{T} = C_{\mathcal{Q}}^{N_{d}}\) where \(N_{d} = T/\delta t\). The schematic figures are shown in Fig. 19.

Figure 19. (Color) Schematic figures for applying quasi-probability operations continuously.46) Note that we can obtain the result corresponding to \(\delta t\rightarrow 0\) by using stochastic error mitigation method.

However, continuously applying the recovery operation could be challenging. Note that each recovery operation for a small \(\delta t\) is almost an identity operation. Therefore, instead of continuously applying every weak recovery operations, we can use the Monte Carlo method to stochastically realise strong recovery operations in a similar vein of stochastic Schrödinger equation. The stochastic error mitigation method can be combined with the extrapolation QEM method to further mitigate the model estimation error.46)

Combination of error mitigation techniques

Here we illustrate ways to combine different error mitigation techniques. More specifically we focus on possible combinations of error extrapolation, quasi-probability, and symmetry verification studied by Cai.47)

Symmetry verification with error extrapolation

Symmetry verification cannot detect errors that stand for transformations within the same symmetry subspace. When acting on the eigenstate of the symmetry, such errors are the errors that commute with the symmetry operator, and we refer to these errors as commuting errors. Similarly we also have anti-commuting errors, and individual occurrence of them would be detectable by symmetry verification. However, when anti-commuting errors occur an even number of times, they will commute with the symmetry and thus cannot be detected. The noisy expectation value of an observable M at the error rate ε after applying symmetry verification will be denoted as \(\langle M\rangle_{s}(\varepsilon)\). As shown in Fig. 20(a), \(\langle M\rangle_{s}(\varepsilon)\) still contains undetectable errors, which can be further mitigated by combining symmetry verification with error extrapolation using \begin{equation} \langle M\rangle_{\textit{est}} = \frac{\alpha \langle M\rangle_{s}(\varepsilon)-\langle M\rangle_{s}(\alpha \varepsilon)}{\alpha -1}. \end{equation} (151) Here we have used linear extrapolation as an example. Its effectiveness has been numerically verified by McArdle et al.37) and its application to Hubbard VQE has been discussed by Cai.181)

Figure 20. (Color) Diagrams showing how different error components are suppressed using different combinations of quantum error mitigation techniques: (a) using only symmetry verification, (b) combining quasi-probability and symmetry verification, and (c) combining quasi-probability, symmetry verification and error extrapolation.

Quasi-probability method with error extrapolation

It is also possible to combine quasi-probability method with error extrapolation.46,47) Rather than using quasi-probability method to completely suppress all the errors, we can use it to reduce the error rate instead without changing the form of the error channel. In this way, we can obtain the noisy expectation values at several reduced error rates, and apply error extrapolation using these data points. Compared to naive error extrapolation, it does not require the ability to adjust the error rate and can achieve lower estimation errors due to lower effective error rates, albeit at a higher sampling cost due to the use of quasi-probability.47) The combination of these two QEM methods has also been discussed for suppressing both physical and model estimation errors of a continuous process.46)

Symmetry verification with quasi-probability method

Besides reducing the effective error rates, quasi-probability can also be used for transforming the form of error channels.47) In particular, when combined with symmetry verification, we would naturally want to use quasi-probability to remove errors that cannot be detected by symmetry verification, i.e., the commuting errors. Note that the additional gates we apply for quasi-probability might anti-commute with the symmetry and in that case we need to flip our target symmetry outcome accordingly. Additional quasi-probability can be applied to suppress the noise level of the anti-commuting errors. For the remaining anti-commuting errors, the erroneous circuit runs with an odd number of them can be detected using symmetry verification, while the erroneous circuit runs with an even number of them will still remain since they cannot be detected using symmetry verification. This is illustrated in Fig. 20(b).

Letting μ to be the expected number of errors in each circuit run after applying quasi-probability, then using Eq. (107) the expectation value of the observable can be described as \begin{equation} \langle M\rangle(\mu) = e^{-\mu} \sum_{k = 0}^{\infty} \frac{\mu^{k}}{k!} \langle M\rangle_{k}, \end{equation} (152) where \(\langle M\rangle_{k}\) is the expectation value of the observable given k errors occurring in the circuit run.

For the exponential extrapolation that we discussed in Sect. 5.1.2, in which the observable follows a exponential decay curve: \begin{equation} \langle M\rangle(\mu) = \langle M\rangle(0) e^{- \Gamma_{d} \mu}, \end{equation} (153) we are essentially assuming \begin{equation} \langle M\rangle_{k} = \langle M\rangle(0) (1- \Gamma_{d})^{k}, \end{equation} (154) with \(0\leq\Gamma_{d}\leq 1\). Such an assumption is further justified through the analytical arguments by Cai.47)

As mentioned, after we apply symmetry verification, only the cases with an even number of errors will remain, giving a resultant expectation value of: \begin{equation} \begin{split} \langle M\rangle_{e} &= \frac{1}{\cosh\mu}\sum_{\text{even $k$}} \frac{\mu^{k}}{k!} \langle M\rangle_{k}\\ &= \frac{\cosh((1-\Gamma_{d})\mu)}{\cosh(\mu)} \langle M\rangle(0) \end{split} \end{equation} (155) where we note that the normalising constant for the error number probability distribution has changed from \(e^{-\mu}\) to \(\frac{1}{\cosh\mu}\). When compared to noisy expectation value \(\langle M\rangle(\mu)\) in Eq. (153), we can see that the symmetry-verified expectation value \(\langle M\rangle_{e}\) is always closer to the noise-free value \(\langle M\rangle(0)\) and will only approach \(\langle M\rangle(\mu)\) when μ is large.

Combining quasi-probability, symmetry verification and error extrapolation

The remaining errors in the last section after applying quasi-probability and symmetry verification can be further suppressed by using hyperbolic extrapolation.47) We saw that the circuit runs with the right symmetry are those with an even number of commuting errors and their expectation value is of the form \(\langle M\rangle_{e}\) in Eq. (155). Similarly, the circuit runs with the wrong symmetry are those with an odd number of commuting errors and their expectation value is \(\langle M\rangle_{o}\) of the form: \begin{equation} \langle M\rangle_{o} = \frac{\sinh((1-\Gamma_{d})\mu)}{\sinh(\mu)} \langle M\rangle(0). \end{equation} (156) Along with \(\langle M\rangle_{e}\) in Eq. (155), we can obtain the noiseless expectation value using \begin{equation} \langle M\rangle(0) = \mathop{\text{sgn}}\nolimits(\langle M\rangle_{e}) \sqrt{\langle M\rangle_{e}^{2} \cosh^{2}(\mu)- \langle M\rangle_{o}^{2} \sinh^{2}(\mu)}, \end{equation} (157) which is called hyperbolic extrapolation. This is illustrated in Fig. 20(c).

Hence, by using the expectation values obtained from the two different symmetry outcomes, we can estimate the noise-free expectation value using hyperbolic extrapolation, without needing to probe at multiple error rates like in the conventional error extrapolation. This combined method can achieve a much better balance between the sampling cost and the estimation errors by utilising the strengths of different error mitigation techniques to fight different parts of the errors. Its effectiveness has been numerically demonstrated using 8-qubit Fermi-Hubbard model simulation under Pauli errors.47)

6. Conclusion

In this review article, we illustrated two types of hybrid quantum-classical algorithms and different quantum error mitigation methods. The variational algorithms only employ shallow depth quantum circuits and are hence tailored for noisy intermediate-scale quantum (NISQ) devices. We have classified variational algorithms into variational quantum optimisation and variational quantum simulation. Variational quantum optimisation algorithms aim to optimise a cost function tailored to a specific problem, and have wide applications for Hamiltonian spectra, machine-learning, liner algebra, etc. On the other hand, variational quantum simulation algorithms are used for simulating dynamics of quantum systems, and have applications in open quantum system simulation, linear algebra, Gibbs state preparation, etc. Meanwhile, quantum error mitigation aims to suppress errors in NISQ devices so that variational quantum algorithms can be implemented to achieve a desired calculation accuracy. Since quantum error mitigation does not generally use encoding but rely on classical post-processing, they are applicable to NISQ devices with restricted number of qubits. We have reviewed several quantum error mitigation methods and effective combination of them. Since quantum computing with NISQ devices is yet a relatively young field, whether and how these variational algorithms and error mitigation methods can be applied to solve any practically meaningful problem is still under active research investigation. This article only aims to review the most basic results in NISQ computing, and we hope it will serve as a useful reference for future researches along this direction.


We are grateful for useful discussions with Yuuki Tokunaga, Yasunari Suzuki, Tyson Jones, Bálint Koczor, Sam McArdle, Armands Strikis, Jinzhao Sun, Nobuyuki Yoshioka, Kosuke Mitarai, Yuya Nakagawa, Yuichiro Matsuzkai, Hideaki Hakoshima, Shunsuke Kamimura, and Akira Sone. S.E. thanks Tomoyuki Uemiya and Atsushi Furukawa for insightful discussions. X.Y. acknowledges support from the Simons Foundation. Z.C. acknowledges support from St John's College, Oxford. This work is supported by MEXT Q-LEAP (Grant Nos. JPMXS0120319794 and JPMXS0118068682), JST ERATO (Grant No. JPMJER1601), and EPSRC Hub EP/T001062/1.

Appendix A:
Derivation of Eq. (11) for Variational Quantum Simulation

Applying McLachlan's variational principle,77) we can map the real time evolution of \(|\psi(t)\rangle\) to the evolution of the parameters \(\vec{\theta}(t)\). This is done by minimising the distance between the ideal evolution and the evolution of the ansatz state \begin{equation} \delta \|(\partial/\partial t + iH)|\varphi (\vec{\theta}(t))\rangle \| = 0. \end{equation} (A·1) Here we denote \(\|{|}\varphi\rangle \| =\langle\varphi|\varphi\rangle\) and we have \begin{equation} \begin{split} &\|(\partial/\partial t + iH)|\varphi (\vec{\theta}(t))\rangle \|\\ &\quad = ((\partial/\partial t +iH)|\phi(\vec{\theta}(t))\rangle)^{\dagger} (\partial/\partial t +iH)|\phi(\vec{\theta}(t))\rangle)\\ &\quad = \sum_{k,j} \frac{\partial \langle \varphi (\vec{\theta}(t))|}{\partial \theta_{k}} \frac{\partial |\varphi (\vec{\theta}(t))\rangle}{\partial \theta_{j}} \dot{\theta}_{k}^{*} \dot{\theta}_{j}\\ &\qquad + i\sum_{k} \frac{\partial \langle \varphi (\vec{\theta}(t))|}{\partial \theta_{k}} H |\varphi (\vec{\theta}(t))\rangle\dot{\theta}_{k}^{*}\\ &\qquad - i\sum_{k} \langle \varphi (\vec{\theta}(t))| H \frac{\partial |\varphi (\vec{\theta}(t))\rangle}{\partial \theta_{k}} \dot{\theta}_{k}\\ &\qquad + \langle \varphi (\vec{\theta}(t))|H^{2}|\varphi (\vec{\theta}(t))\rangle. \end{split} \end{equation} (A·2) Suppose \(\dot{\theta}_{k}\) is real and we have \begin{align} &\delta \|(\partial/\partial t + iH)|\varphi (\vec{\theta}(t))\rangle \|\\ &\quad = \sum_{k} \Biggl(\sum_{j}\biggl(\frac{\partial \langle \varphi (\vec{\theta}(t))|}{\partial \theta_{k}} \frac{\partial |\varphi (\vec{\theta}(t))\rangle}{\partial \theta_{j}}\\ &\qquad + \frac{\partial \langle \varphi (\vec{\theta}(t))|}{\partial \theta_{j}} \frac{\partial |\varphi (\vec{\theta}(t))\rangle}{\partial \theta_{k}} \biggr) \dot{\theta}_{j} \delta \dot{\theta}_{k}\\ &\qquad + i\biggl(\frac{\partial \langle \varphi (\vec{\theta}(t))|}{\partial \theta_{k}}H|\varphi (\vec{\theta}(t))\rangle\\ &\qquad - \langle \varphi (\vec{\theta}(t))| H \frac{\partial |\varphi (\vec{\theta}(t))\rangle}{\partial \theta_{k}} \biggr)\delta\dot{\theta}_{k}\Biggr) = 0, \end{align} (A·3) which results in \begin{equation} \sum_{j} M_{k,j}\dot{\theta}_{j} = V_{k}, \end{equation} (A·4) where \begin{equation} \begin{split} M_{k,j} &= \mathop{\text{Re}}\nolimits \left(\frac{\partial \langle\varphi(\vec{\theta}(t))|}{\partial \theta_{k}}\frac{\partial |\varphi(\vec{\theta}(t))\rangle}{\partial \theta_{j}}\right),\\ V_{k} &= −\mathop{\text{Im}}\nolimits\left(\langle\varphi(\vec{\theta}(t))|H \frac{\partial |\varphi(\vec{\theta}(t))\rangle}{\partial \theta_{k}} \right). \end{split} \end{equation} (A·5)

Appendix B:
Hadamard Test and Quantum Circuits for Variational Quantum Simulation

The Hadamard test circuit is for calculating the expectation value of a unitary operator U for a given state \(|\psi_{\textit{in}}\rangle\) as \(\langle\psi_{\textit{in}}| U |\psi_{\textit{in}}\rangle\). The quantum circuit for Hadamard test is shown in Fig. B·1. Suppose the initial state is prepared in \(\frac{1}{\sqrt{2}} (|0\rangle_{a}+e^{i\phi} |1\rangle_{a})\otimes |\psi_{\textit{in}}\rangle_{s}\), where a and s denote the ancilla and the system, respectively. By applying the controlled unitary operation \(|0\rangle\langle 0|_{a}\otimes I_{s} + |1\rangle\langle 1|_{a}\otimes U_{s}\), we obtain the state as follows \begin{equation} |\psi_{\textit{Had}}\rangle = \frac{1}{\sqrt{2}} (|0\rangle_{a} \otimes |\psi_{\textit{in}}\rangle_{s} + e^{i \phi} |1\rangle_{a} \otimes U_{s}|\psi_{\textit{in}}\rangle_{s}). \end{equation} (B·1) Then measuring the expectation value of the Pauli operator \(X_{a}\) for the ancilla qubit, we have \begin{equation} \langle \psi_{\textit{Had}}| X_{a} \otimes I_{s} |\psi_{\textit{Had}}\rangle = \mathop{\text{Re}}\nolimits(e^{i \phi} \langle \psi_{\textit{in}}|U |\psi_{\textit{in}}\rangle). \end{equation} (B·2) The measurement of the Pauli X operator can be implemented by applying Hadamard gate and subsequently measuring the state in the computational basis. From Eq. (B·2), we can compute the real and imaginary part of \(\langle\psi_{\textit{in}}| U |\psi_{\textit{in}}\rangle\) by changing the phase ϕ. When the input state is described by a mixed state \(\rho_{\textit{in}}\), following the same procedure, we can compute the expectation value \(\mathop{\text{Tr}}\nolimits[\rho_{\textit{in}} U]\).

Figure B·1. Quantum circuit for computing \(\mathop{\text{Re}}\nolimits(e^{i\phi}\langle \psi_{\textit{in}}|U |\psi_{\textit{in}}\rangle)\).

We can also compute \(\langle\psi_{\textit{in}}|V^{\dagger} U |\psi_{\textit{in}}\rangle\) for two unitary operators U and V by preparing the state \begin{equation} |\psi_{\textit{Had}}'\rangle = \frac{1}{\sqrt{2}} (|0\rangle_{a} \otimes V_{s} |\psi_{\textit{in}}\rangle_{s} + e^{i \phi} |1\rangle_{a} \otimes U_{s}|\psi_{\textit{in}}\rangle_{s}) \end{equation} (B·3) and measuring the Pauli X operator of the ancilla qubit. The quantum circuit for this task is shown in Fig. B·2.

Figure B·2. Quantum circuit for computing \(\mathop{\text{Re}}\nolimits(e^{i\phi}\langle\psi_{\textit{in}}|V^{\dagger} U |\psi_{\textit{in}}\rangle)\).

Now, we will explain that coefficients in variational quantum simulation algorithms can be evaluated based on the Hadamard test circuit. Here, we consider the M matrix but almost the same argument holds for the V and C vectors. Note that each term constituting the M matrix in the variational simulation algorithms can be represented as a sum of terms as \begin{equation} a \mathop{\text{Re}}\nolimits (e^{i\theta} \langle\varphi_{\textit{ref}}| U^{\dagger}_{k,i}U_{j,q} |\varphi_{\textit{ref}}\rangle), \end{equation} (B·4) where \(a,\theta \in\mathbb{R}\) and \begin{equation} U_{k,i} = U_{N} U_{N-1} \cdots U_{k+1} U_{k} \sigma_{k,i} \cdots U_{2} U_{1}. \end{equation} (B·5) Notice that we can compute Eq. (B·4) by setting \(V = U_{k,i}\), \(U = U_{j,q}\), and \(|\psi_{\textit{in}}\rangle = |\varphi_{\textit{ref}}\rangle\). Now, as V and U share a large portion of quantum gates, the controlled operation need to be applied only to \(\sigma_{k,i}\) and \(\sigma_{j,q}\), which will lead to the quantum circuit shown in Fig. 3(a).

Appendix C:
SWAP Test and Destructive SWAP Test

The SWAP test and Destructive SWAP test circuits are for evaluating the overlap \(\mathop{\text{Tr}}\nolimits[\rho\sigma]\) for two states ρ and σ. Let \(\mathit{SW}\) be the SWAP operation that satisfies \(\mathit{SW}|\psi\rangle\otimes |\phi\rangle = |\phi\rangle\otimes |\psi\rangle\). Note that the expectation value of SWAP operation gives the overlap term as follows \begin{align} \mathop{\text{Tr}}\nolimits[\mathit{SW}\rho \otimes \sigma] &= \mathop{\text{Tr}}\nolimits\left[\mathit{SW}\sum_{i} p_{i} |\psi_{i}\rangle\langle\psi_{i}| \sum_{j} q_{j} |\phi_{j}\rangle\langle \phi_{j}|\right]\\ &= \sum_{i j} p_{i} q_{j} \mathop{\text{Tr}}\nolimits[|\phi_{j}\rangle\langle\psi_{i}| |\psi_{i}\rangle\langle \phi_{j}|]\\ &= \sum_{i j} p_{i} q_{j} |\langle \psi_{i}| \phi_{j}\rangle |^{2} = \mathop{\text{Tr}}\nolimits[\rho \sigma], \end{align} (C·1) where \(\rho =\sum_{i} p_{i} |\psi_{i}\rangle\langle\psi_{i}|\), and \(\sigma =\sum_{j} q_{j} |\phi_{j}\rangle\langle\phi_{j}|\).

The SWAP test circuit is the quantum circuit where a unitary operator U is replaced with \(\mathit{SW}\) in Hadamard test circuit, and the phase of the ancilla is set to \(\phi = 0\).188) The quantum circuit is shown in Fig. C·1. Swap test circuit needs a relatively deep quantum circuit, e.g., the number of gates scales as \(46 N_{q} +2\) when the decomposition used in Ref. 17 is employed, where \(N_{q}\) is the number of qubits of the states ρ and σ.

Figure C·1. SWAP test circuit for computing the overlap of two states ρ and σ. The number of required qubits is \(2N_{q}+1\).

Meanwhile, Destructive SWAP test circuit requires a much shallower quantum circuit. \(\mathit{SW}\) can be rewritten by using Bell basis as \begin{equation} \begin{split} \mathit{SW}&= \prod_{j = 1}^{N_{q}} \mathit{SW}_{j},\\ \mathit{SW}_{j} &= (|B_{1}\rangle_{j} \langle B_{1}|_{j}+|B_{2}\rangle_{j} \langle B_{2}|_{j}\\ &\quad + |B_{3}\rangle_{j}\langle B_{3}|_{j}-|B_{4}\rangle_{j} \langle B_{4}|_{j}), \end{split} \end{equation} (C·2) where \begin{equation} \begin{split} |B_{1}\rangle_{j} &= \frac{1}{\sqrt{2}}(|0\rangle_{j}^{(\rho)} |0\rangle_{j}^{(\sigma)} +|1\rangle_{j}^{(\rho)} |1_{j}\rangle^{(\sigma)}),\\ |B_{2}\rangle_{j} &= \frac{1}{\sqrt{2}}(|0\rangle_{j}^{(\rho)} |0\rangle_{j}^{(\sigma)} -|1\rangle_{j}^{(\rho)} |1\rangle_{j}^{(\sigma)}),\\ |B_{3}\rangle_{j} &= \frac{1}{\sqrt{2}}(|0\rangle_{j}^{(\rho)} |1\rangle_{j}^{(\sigma)} +|1\rangle_{j}^{(\rho)} |0\rangle_{j}^{(\sigma)}),\\ |B_{4}\rangle_{j} &= \frac{1}{\sqrt{2}}(|0\rangle_{j}^{(\rho)} |1\rangle_{j}^{(\sigma)} -|1\rangle_{j}^{(\rho)} |0\rangle_{j}^{(\sigma)}). \end{split} \end{equation} (C·3) Here, \(|0\rangle_{j}^{(\rho)}\) (\(|0\rangle_{j}^{(\sigma)}\)) denotes the jth qubit of the state \(\rho(\sigma)\). Notice that \(\mathit{SW}_{j}\) can be represented as \begin{equation} \begin{split} \mathit{SW}_{j} &= \mathit{CN}_{j} H_{j}^{(\rho)} C_{j} H_{j}^{(\rho)} \mathit{CN}_{j},\\ C_{j} &= |0\rangle\langle 0|_{j}^{(\rho)}\otimes |0\rangle\langle 0|_{j}^{(\sigma)}+ |1\rangle\langle 1|_{j}^{(\rho)}\otimes |0\rangle\langle 0|_{j}^{(\sigma)}\\ &\quad + |0\rangle\langle 0|_{j}^{(\rho)}\otimes |1\rangle\langle 1|_{j}^{(\sigma)}-|1\rangle\langle 1|_{j}^{(\rho)}\otimes |1\rangle\langle 1|_{j}^{(\sigma)},\\ \mathit{CN}_{j} &= |0\rangle\langle 0|_{j}^{(\rho)} \otimes I_{j}^{(\sigma)} + |1\rangle\langle 1|_{j}^{(\rho)} \otimes X_{j}^{(\sigma)}, \end{split} \end{equation} (C·4) where \(H_{j}^{(\rho)}\) is a Hadamard gate acting on the jth qubit of the state ρ. Thus, the measurement of \(\mathit{SW}_{j}\) can be performed by using the quantum circuit shown in Fig. C·2. Due to the definition of \(C_{j}\), the measurement result becomes +1 when we obtain 00, 01, and 10, while it becomes −1 corresponding to 11. Then the measurement result of \(\mathit{SW}\) is the product of each measurement result of \(\mathit{SW}_{j}\). Note that Destructive Swap test circuits only need \(N_{q}\) controlled NOT gates and Hadamard gates, and each gate can be applied parallelly.

Figure C·2. Quantum circuit for computing the expectation value of \(\mathit{SW}_{j}\). \(\rho_{j}\) and \(\sigma_{j}\) are the jth qubit state of ρ and σ. When 00, 01, and 10 appear, the measurement result is +1, while it is −1 for 11. Unlike SWAP test, Destructive SWAP test does not require an ancilla qubit and its circuit depth is much shallower.

Appendix D:
Methodologies for Optimisation
Local cost function

Given an \(N_{q}\)-qubit system, a local cost function is constructed by the measurement on (\(m<N_{q}\)) qubits at a time, while a global cost function requires the measurement on \(N_{q}\) qubit simultaneously. Taking into account the hardware efficient ansatz with shallow quantum circuit depth [i.e., \(L = O(\log N_{q})\)] and \(m = O(\log N_{q})\), Cerezo et al.63) rigorously showed that a local cost function can be used for avoiding the barren plateau issue and demonstrating the trainability of the target quantum circuit.

Here we show a simple example given by Cerezo et al.63) Suppose we try to approximate a known target state \(|\psi_{T}\rangle\) by an ansatz state as \(|\psi_{T}\rangle\approx U(\vec{\theta})|0\rangle^{\otimes N_{q}}\). Let us consider the trivial case \(|\psi_{T}\rangle = |0\rangle^{\otimes N_{q}}\). Then the natural choice of the cost function would be \begin{equation} C_{G} (\vec{\theta}) = 1-|\langle \psi_{T}| U(\vec{\theta}) |0\rangle^{\otimes N_{q}}|^{2}, \end{equation} (D·1) which clearly involves measurement of all the qubits. Thus \(C_{G} (\vec{\theta})\) is a global cost function. On the other hand, we define the local cost function as \begin{equation} \begin{split} C_{L}(\vec{\theta}) &= \langle\varphi(\vec{\theta})| H^{(L)} |\varphi(\vec{\theta})\rangle,\\ H^{(L)} &= I- \frac{1}{N_{q}} \sum_{k = 1}^{N_{q}} |0_{k}\rangle\langle 0_{k}| \otimes I_{\bar{k}}, \end{split} \end{equation} (D·2) where \(I_{\bar{k}}\) is an identity operator acting on all the qubits except for the kth qubit. Note that each qubit is individually measured to have \(C_{L}(\vec{\theta})\), and \(C_{G}(\vec{\theta})\) and \(C_{L}(\vec{\theta})\) vanish under the same condition.

Now, we assume \begin{equation} U(\vec{\theta}) = \bigotimes_{k = 1}^{N_{q}} e^{-i \theta_{k} X_{k}/2} \end{equation} (D·3) for a concise explanation. Then we can obtain \begin{equation} \begin{split} C_{G}(\vec{\theta}) &= 1- \prod_{k = 1}^{N_{q}} \cos^{2}\left(\frac{\theta_{k}}{2} \right),\\ C_{L}(\vec{\theta}) &= 1- \frac{1}{N_{q}} \sum_{k = 1}^{N_{q}} \cos^{2}\left(\frac{\theta_{k}}{2} \right). \end{split} \end{equation} (D·4) From these representations, we can intuitively understand that while the gradient of \(C_{G}\) decreases exponentially in the number of qubits, it does not hold for \(C_{L}\). More concretely, we can show \begin{equation} \begin{split} \left\langle \frac{\partial C_{G}(\vec{\theta})}{\partial \theta_{k}} \right\rangle &= 0,\\ \mathrm{Var}\left[\frac{\partial C_{G}(\vec{\theta})}{\partial \theta_{k}} \right] &= \frac{1}{8}\left(\frac{3}{8} \right)^{N_{q} -1}, \end{split} \end{equation} (D·5) where the expectation value and the variance are taken for randomly sampled parameters \(\theta_{k}\in [-\pi,\pi]\). Thus we can see the gradient of \(C_{G}(\vec{\theta})\) vanishes exponentially in the number of qubits. Meanwhile, \begin{equation} \begin{split} \left\langle \frac{\partial C_{L}(\vec{\theta})}{\partial \theta_{k}} \right\rangle &= 0,\\ \mathrm{Var}\left[\frac{\partial C_{L}(\vec{\theta})}{\partial \theta_{k}} \right] &= \frac{1}{8 N_{q}^{2}}, \end{split} \end{equation} (D·6) thus the barren plateau problem does not happen for \(C_{L}(\vec{\theta})\).101)

Application of the above example is to design local cost functions for linear algebra problems. The local cost function corresponding to Eq. (43) is defined via the Hamiltonian \begin{equation} H_{\mathcal{M}^{-1}}^{(L)} = \mathcal{M}^{\dagger} U_{v_{0}} \left(I - \frac{1}{N_{q}} \sum_{k = 1}^{N_{q}} |0_{k}\rangle\langle 0_{k}| \otimes I_{\bar{k}} \right) U_{v_{0}}^{\dagger} \mathcal{M}, \end{equation} (D·7) where \(U_{v_{0}} |0\rangle^{\otimes N_{q}} = |v_{0}\rangle\). In this case, we can show \(C_{L}\leq C_{G}\leq N_{q} C_{L}\). Also, letting \(C_{L}' = C_{L}/\langle v_{0}| v_{0}\rangle\) and \(C_{G}' = C_{G}/\langle v_{0}| v_{0}\rangle\) to deal with the case where the cost functions are small because of the norm of \(|v_{0}\rangle\), we have \(C_{L}'\leq C_{G}'\leq N_{q} C_{L}'\). Therefore \(C_{L} = 0\) (\(C_{L}' = 0\)) is equivalent to \(C_{G} = 0\) (\(C_{G}' = 0\)). By leveraging local cost functions, it was shown that the barren plateau problem was alleviated using up to 50-qubit system.101)

Hamiltonian morphing optimisation

When naively employing conventional VQE method for parameter updates, the trial state may be trapped in local minima, and those algorithms must be repeated with random initial parameters until the energy become sufficiently close to zero. Alternatively, Hamiltonian morphing optimisation can be used to circumvent this problem.143) This method is analogous to adiabatic state preparation, which uses the fact that when the Hamiltonian is adiabatically changed from \(H_{i}\) to \(H_{f}\), the state is always the ground state of the corresponding Hamiltonian. Taking the linear equation as an example, when Hamiltonian morphing optimisation is applied, we start from \(H_{\mathcal{M}^{-1}}(0) = I-|v_{0}\rangle\langle v_{0}|\) with the corresponding ground state \(|v_{0}\rangle\). Then, by setting the Hamiltonian and the initial state to \(H_{\mathcal{M}^{-1}}(\delta t)\) and \(|v_{0}\rangle\), where \(H_{\mathcal{M}^{-1}}(t) =\mathcal{M}^{\dagger} (t) (I-|v_{0}\rangle\langle v_{0}|)\mathcal{M} (t)\) and \(\mathcal{M}(t) = (1-t/T)I +\frac{t}{T}\mathcal{M}\), we can obtain the ground state of \(H_{\mathcal{M}^{-1}}(\delta t)\). Through repeating this procedure until \(t = T\), we can obtain the ground state of \(H_{\mathcal{M}^{-1}}(T) = H_{\mathcal{M}^{-1}}\), i.e., \(|v_{\mathcal{M}^{-1}}\rangle\). When the ansatz is sufficiently powerful and the time step \(\delta t\) is small enough, this method necessarily can find the global minimum.

Appendix E:
Subspace Expansion

Here, we show the proof of Eq. (50). The expectation value of energy in the subspace is \begin{equation} E(\vec{c}, \vec{c}^{*}) = \sum_{\alpha \beta} c_{\alpha}^{*} \tilde{H}_{\alpha \beta} c_{\beta}, \end{equation} (E·1) and there is a constraint \begin{equation} 1 = \langle \psi_{\textit{eig}}(\vec{c})|\psi_{\textit{eig}}(\vec{c})\rangle = \sum_{\alpha \beta} c_{\alpha}^{*} \tilde{S}_{\alpha \beta} c_{\beta}. \end{equation} (E·2) When \(|\psi_{\textit{eig}}(\vec{c})\rangle\) is an eigenstate, corresponding to a local minimum, the following equation holds \begin{equation} \delta [E(\vec{c}, \vec{c}^{*}) - E \langle \psi_{\textit{eig}}(\vec{c})|\psi_{\textit{eig}}(\vec{c})\rangle] = 0, \end{equation} (E·3) where E is the Lagrangian multiplier. Thus we have \begin{equation} \sum_{\alpha \beta} [\delta c_{\alpha}^{*}(\tilde{H}_{\alpha \beta} c_{\beta} -E \tilde{S}_{\alpha \beta} c_{\beta})+ (c_{\alpha}^{*} \tilde{H}_{\alpha \beta} -E c_{\alpha}^{*} \tilde{S}_{\alpha \beta})\delta c_{\beta}] = 0, \end{equation} (E·4) which necessarily holds when \begin{equation} \tilde{H} \vec{c} = E \tilde{S} \vec{c}. \end{equation} (E·5)


  • 1 J. Preskill, arXiv:1203.5813. Google Scholar
  • 2 F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. Buell, B. Burkett, Y. Chen, Z. Chen, B. Chiaro, R. Collins, W. Courtney, A. Dunsworth, E. Farhi, B. Foxen, A. Fowler, C. Gidney, M. Giustina, R. Graff, K. Guerin, S. Habegger, M. P. Harrigan, M. J. Hartmann, A. Ho, M. Hoffmann, T. Huang, T. S. Humble, S. V. Isakov, E. Jeffrey, Z. Jiang, D. Kafri, K. Kechedzhi, J. Kelly, P. V. Klimov, S. Knysh, A. Korotkov, F. Kostritsa, D. Landhuis, M. Lindmark, E. Lucero, D. Lyakh, S. Mandrà, J. R. McClean, M. McEwen, A. Megrant, X. Mi, K. Michielsen, M. Mohseni, J. Mutus, O. Naaman, M. Neeley, C. Neill, M. Y. Niu, E. Ostby, A. Petukhov, J. C. Platt, C. Quintana, E. G. Rieffel, P. Roushan, N. C. Rubin, D. Sank, K. J. Satzinger, V. Smelyanskiy, K. J. Sung, M. D. Trevithick, A. Vainsencher, B. Villalonga, T. White, Z. J. Yao, P. Yeh, A. Zalcman, H. Neven, and J. M. Martinis, Nature 574, 505 (2019). 10.1038/s41586-019-1666-5 CrossrefGoogle Scholar
  • 3 C. Huang, F. Zhang, M. Newman, J. Cai, X. Gao, Z. Tian, J. Wu, H. Xu, H. Yu, B. Yuan, M. Szegedy, Y. Shi, and J. Chen, arXiv:2005.06787. Google Scholar
  • 4 P. W. Shor, SIAM J. Comput. 26, 1484 (1997). 10.1137/S0097539795293172 CrossrefGoogle Scholar
  • 5 A. Y. Kitaev, arXiv:quant-ph/9511026. Google Scholar
  • 6 S. Lloyd, Science 273, 1073 (1996). 10.1126/science.273.5278.1073 CrossrefGoogle Scholar
  • 7 J. O’Gorman and E. T. Campbell, Phys. Rev. A 95, 032338 (2017). 10.1103/PhysRevA.95.032338 CrossrefGoogle Scholar
  • 8 J. Preskill, Quantum 2, 79 (2018). 10.22331/q-2018-08-06-79 CrossrefGoogle Scholar
  • 9 A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’brien, Nat. Commun. 5, 4213 (2014). 10.1038/ncomms5213 CrossrefGoogle Scholar
  • 10 A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, and J. M. Gambetta, Nature 549, 242 (2017). 10.1038/nature23879 CrossrefGoogle Scholar
  • 11 N. Moll, P. Barkoutsos, L. S. Bishop, J. M. Chow, A. Cross, D. J. Egger, S. Filipp, A. Fuhrer, J. M. Gambetta, M. Ganzhorn, A. Kandala, A. Mezzacapo, P. Müller, W. Riess, G. Salis, J. Smolin, I. Tavernelli, and K. Temme, Quantum Sci. Technol. 3, 030503 (2018). 10.1088/2058-9565/aab822 CrossrefGoogle Scholar
  • 12 J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, New J. Phys. 18, 023023 (2016). 10.1088/1367-2630/18/2/023023 CrossrefGoogle Scholar
  • 13 E. Farhi, J. Goldstone, and S. Gutmann, arXiv:1411.4028. Google Scholar
  • 14 Y. Li and S. C. Benjamin, Phys. Rev. X 7, 021050 (2017). 10.1103/PhysRevX.7.021050 CrossrefGoogle Scholar
  • 15 G. Torlai, G. Mazzola, G. Carleo, and A. Mezzacapo, Phys. Rev. Res. 2, 022060 (2020). 10.1103/PhysRevResearch.2.022060 CrossrefGoogle Scholar
  • 16 K. Temme, S. Bravyi, and J. M. Gambetta, Phys. Rev. Lett. 119, 180509 (2017). 10.1103/PhysRevLett.119.180509 CrossrefGoogle Scholar
  • 17 S. Endo, S. C. Benjamin, and Y. Li, Phys. Rev. X 8, 031027 (2018). 10.1103/PhysRevX.8.031027 CrossrefGoogle Scholar
  • 18 S. McArdle, T. Jones, S. Endo, Y. Li, S. C. Benjamin, and X. Yuan, npj Quantum Inf. 5, 75 (2019). 10.1038/s41534-019-0187-2 CrossrefGoogle Scholar
  • 19 X. Yuan, S. Endo, Q. Zhao, Y. Li, and S. C. Benjamin, Quantum 3, 191 (2019). 10.22331/q-2019-10-07-191 CrossrefGoogle Scholar
  • 20 K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Phys. Rev. A 98, 032309 (2018). 10.1103/PhysRevA.98.032309 CrossrefGoogle Scholar
  • 21 M. Benedetti, D. Garcia-Pintos, O. Perdomo, V. Leyton-Ortega, Y. Nam, and A. Perdomo-Ortiz, npj Quantum Inf. 5, 45 (2019). 10.1038/s41534-019-0157-8 CrossrefGoogle Scholar
  • 22 S. Lloyd and C. Weedbrook, Phys. Rev. Lett. 121, 040502 (2018). 10.1103/PhysRevLett.121.040502 CrossrefGoogle Scholar
  • 23 J. Romero, J. P. Olson, and A. Aspuru-Guzik, Quantum Sci. Technol. 2, 045001 (2017). 10.1088/2058-9565/aa8072 CrossrefGoogle Scholar
  • 24 A. Pérez-Salinas, A. Cervera-Lierta, E. Gil-Fuster, and J. I. Latorre, Quantum 4, 226 (2020). 10.22331/q-2020-02-06-226 CrossrefGoogle Scholar
  • 25 H.-L. Huang, Y. Du, M. Gong, Y. Zhao, Y. Wu, C. Wang, S. Li, F. Liang, J. Lin, Y. Xu et al., arXiv:2010.06201. Google Scholar
  • 26 Y. Du, T. Huang, S. You, M.-H. Hsieh, and D. Tao, arXiv:2010.10217. Google Scholar
  • 27 J. G. Vidal and D. O. Theis, arXiv:1901.11434. Google Scholar
  • 28 S. Lloyd, M. Schuld, A. Ijaz, J. Izaac, and N. Killoran, arXiv:2001.03622. Google Scholar
  • 29 M. Schuld, R. Sweke, and J. J. Meyer, arXiv:2008.08605. Google Scholar
  • 30 C. Bravo-Prieto, R. LaRose, M. Cerezo, Y. Subasi, L. Cincio, and P. J. Coles, arXiv:1909.05820. Google Scholar
  • 31 X. Xu, J. Sun, S. Endo, Y. Li, S. C. Benjamin, and X. Yuan, arXiv:1909.03898. Google Scholar
  • 32 T. Jones, S. Endo, S. McArdle, X. Yuan, and S. C. Benjamin, Phys. Rev. A 99, 062304 (2019). 10.1103/PhysRevA.99.062304 CrossrefGoogle Scholar
  • 33 O. Higgott, D. Wang, and S. Brierley, Quantum 3, 156 (2019). 10.22331/q-2019-07-01-156 CrossrefGoogle Scholar
  • 34 S. Endo, J. Sun, Y. Li, S. C. Benjamin, and X. Yuan, Phys. Rev. Lett. 125, 010501 (2020). 10.1103/PhysRevLett.125.010501 CrossrefGoogle Scholar
  • 35 S. Endo, I. Kurata, and Y. O. Nakagawa, Phys. Rev. Res. 2, 033281 (2020). 10.1103/PhysRevResearch.2.033281 CrossrefGoogle Scholar
  • 36 J. R. McClean, M. E. Kimchi-Schwartz, J. Carter, and W. A. de Jong, Phys. Rev. A 95, 042308 (2017). 10.1103/PhysRevA.95.042308 CrossrefGoogle Scholar
  • 37 S. McArdle, X. Yuan, and S. Benjamin, Phys. Rev. Lett. 122, 180501 (2019). 10.1103/PhysRevLett.122.180501 CrossrefGoogle Scholar
  • 38 X. Bonet-Monroig, R. Sagastizabal, M. Singh, and T. O’Brien, Phys. Rev. A 98, 062339 (2018). 10.1103/PhysRevA.98.062339 CrossrefGoogle Scholar
  • 39 M. Otten and S. K. Gray, npj Quantum Inf. 5, 11 (2019). 10.1038/s41534-019-0125-3 CrossrefGoogle Scholar
  • 40 M. Otten and S. K. Gray, Phys. Rev. A 99, 012338 (2019). 10.1103/PhysRevA.99.012338 CrossrefGoogle Scholar
  • 41 F. B. Maciejewski, Z. Zimborás, and M. Oszmaniec, Quantum 4, 257 (2020). 10.22331/q-2020-04-24-257 CrossrefGoogle Scholar
  • 42 Y. Chen, M. Farahzad, S. Yoo, and T.-C. Wei, Phys. Rev. A 100, 052315 (2019). 10.1103/PhysRevA.100.052315 CrossrefGoogle Scholar
  • 43 H. Kwon and J. Bae, IEEE Trans. Comput. (2020) [DOI: 10.1109/TC.2020.3009664]. Google Scholar
  • 44 P. Czarnik, A. Arrasmith, P. J. Coles, and L. Cincio, arXiv:2005.10189. Google Scholar
  • 45 A. Strikis, D. Qin, Y. Chen, S. C. Benjamin, and Y. Li, arXiv:2005.07601. Google Scholar
  • 46 J. Sun, X. Yuan, T. Tsunoda, V. Vedral, S. C. Bejamin, and S. Endo, arXiv:2001.04891. Google Scholar
  • 47 Z. Cai, arXiv:2007.01265 [quant-ph]. Google Scholar
  • 48 S. McArdle, S. Endo, A. Aspuru-Guzik, S. C. Benjamin, and X. Yuan, Rev. Mod. Phys. 92, 015003 (2020). 10.1103/RevModPhys.92.015003 CrossrefGoogle Scholar
  • 49 Y. Cao, J. Romero, J. P. Olson, M. Degroote, P. D. Johnson, M. Kieferová, I. D. Kivlichan, T. Menke, B. Peropadre, N. P. Sawaya, S. Sim, L. Veis, and A. Aspuru-Guzik, Chem. Rev. 119, 10856 (2019). 10.1021/acs.chemrev.8b00803 CrossrefGoogle Scholar
  • 50 B. Bauer, S. Bravyi, M. Motta, and G. K. Chan, arXiv:2001.03685. Google Scholar
  • 51 A. Aspuru-Guzik, A. D. Dutoi, P. J. Love, and M. Head-Gordon, Science 309, 1704 (2005). 10.1126/science.1113479 CrossrefGoogle Scholar
  • 52 T. Helgaker, P. Jorgensen, and J. Olsen, Molecular Electronic-Structure Theory (Wiley, New York, 2014). CrossrefGoogle Scholar
  • 53 A. Szabo and N. S. Ostlund, Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory (Courier Corporation, New York, 2012). Google Scholar
  • 54 J. T. Seeley, M. J. Richard, and P. J. Love, J. Chem. Phys. 137, 224109 (2012). 10.1063/1.4768229 CrossrefGoogle Scholar
  • 55 S. B. Bravyi and A. Y. Kitaev, Ann. Phys. 298, 210 (2002). 10.1006/aphy.2002.6254 CrossrefGoogle Scholar
  • 56 S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone, L. Cincio, and P. J. Coles, arXiv:2007.14384. Google Scholar
  • 57 T. G. Kolda, R. M. Lewis, and V. Torczon, SIAM Rev. 45, 385 (2003). 10.1137/S003614450242889 CrossrefGoogle Scholar
  • 58 M.-H. Yung, J. Casanova, A. Mezzacapo, J. Mcclean, L. Lamata, A. Aspuru-Guzik, and E. Solano, Sci. Rep. 4, 3589 (2014). 10.1038/srep03589 CrossrefGoogle Scholar
  • 59 J. Romero, R. Babbush, J. R. McClean, C. Hempel, P. J. Love, and A. Aspuru-Guzik, Quantum Sci. Technol. 4, 014008 (2018). 10.1088/2058-9565/aad3e4 CrossrefGoogle Scholar
  • 60 J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, and H. Neven, Nat. Commun. 9, 4812 (2018). 10.1038/s41467-018-07090-4 CrossrefGoogle Scholar
  • 61 R. LaRose, A. Tikku, É. O’Neel-Judy, L. Cincio, and P. J. Coles, npj Quantum Inf. 5, 57 (2019). 10.1038/s41534-019-0167-6 CrossrefGoogle Scholar
  • 62 M. Cerezo, K. Sharma, A. Arrasmith, and P. J. Coles, arXiv:2004.01372. Google Scholar
  • 63 M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. Coles, arXiv:2001.00550. Google Scholar
  • 64 E. Grant, L. Wossnig, M. Ostaszewski, and M. Benedetti, Quantum 3, 214 (2019). 10.22331/q-2019-12-09-214 CrossrefGoogle Scholar
  • 65 J. I. Colless, V. V. Ramasesh, D. Dahlen, M. S. Blok, M. E. Kimchi-Schwartz, J. R. McClean, J. Carter, W. A. de Jong, and I. Siddiqi, Phys. Rev. X 8, 011021 (2018). 10.1103/PhysRevX.8.011021 CrossrefGoogle Scholar
  • 66 M. Ganzhorn, D. J. Egger, P. Barkoutsos, P. Ollitrault, G. Salis, N. Moll, M. Roth, A. Fuhrer, P. Mueller, S. Woerner, I. Tavernelli, and S. Filipp, Phys. Rev. Appl. 11, 044092 (2019). 10.1103/PhysRevApplied.11.044092 CrossrefGoogle Scholar
  • 67 C. Hempel, C. Maier, J. Romero, J. McClean, T. Monz, H. Shen, P. Jurcevic, B. P. Lanyon, P. Love, R. Babbush, A. Aspuru-Guzik, R. Blatt, and C. F. Roos, Phys. Rev. X 8, 031022 (2018). 10.1103/PhysRevX.8.031022 CrossrefGoogle Scholar
  • 68 P. J. J. O’Malley, R. Babbush, I. D. Kivlichan, J. Romero, J. R. McClean, R. Barends, J. Kelly, P. Roushan, A. Tranter, N. Ding, B. Campbell, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, A. G. Fowler, E. Jeffrey, E. Lucero, A. Megrant, J. Y. Mutus, M. Neeley, C. Neill, C. Quintana, D. Sank, A. Vainsencher, J. Wenner, T. C. White, P. V. Coveney, P. J. Love, H. Neven, A. Aspuru-Guzik, and J. M. Martinis, Phys. Rev. X 6, 031007 (2016). 10.1103/PhysRevX.6.031007 CrossrefGoogle Scholar
  • 69 Y. Shen, X. Zhang, S. Zhang, J.-N. Zhang, M.-H. Yung, and K. Kim, Phys. Rev. A 95, 020501 (2017). 10.1103/PhysRevA.95.020501 CrossrefGoogle Scholar
  • 70 C. Kokail, C. Maier, R. van Bijnen, T. Brydges, M. K. Joshi, P. Jurcevic, C. A. Muschik, P. Silvi, R. Blatt, C. F. Roos, and P. Zoller, Nature 569, 355 (2019). 10.1038/s41586-019-1177-4 CrossrefGoogle Scholar
  • 71 F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, S. Boixo, M. Broughton, B. B. Buckley, B. Buell, D. A. Burkett, N. Bushnell, Y. Chen, Z. Chen, B. Chiaro, R. Collins, W. Courtney, S. Demura, A. Dunsworth, E. Farhi, A. Fowler, B. Foxen, C. Gidney, M. Giustina, R. Graff, S. Habegger, M. P. Harrigan, A. Ho, S. Hong, T. Huang, W. J. Huggins, L. Ioffe, S. V. Isakov, E. Jeffrey, Z. Jiang, C. Jones, D. Kafri, K. Kechedzhi, J. Kelly, S. Kim, P. V. Klimov, A. Korotkov, F. Kostritsa, D. Landhuis, P. Laptev, M. Lindmark, E. Lucero, O. Martin, J. M. Martinis, J. R. McClean, M. McEwen, A. Megrant, X. Mi, M. Mohseni, W. Mruczkiewicz, J. Mutus, O. Naaman, M. Neeley, C. Neill, H. Neven, M. Y. Niu, T. E. O’Brien, E. Ostby, A. Petukhov, H. Putterman, C. Quintana, P. Roushan, N. C. Rubin, D. Sank, K. J. Satzinger, V. Smelyanskiy, D. Strain, K. J. Sung, M. Szalay, T. Y. Takeshita, A. Vainsencher, T. White, N. Wiebe, Z. J. Yao, P. Yeh, and A. Zalcman, Science 369, 1084 (2020). 10.1126/science.abb9811 CrossrefGoogle Scholar
  • 72 M. Suzuki, J. Math. Phys. 32, 400 (1991). 10.1063/1.529425 CrossrefGoogle Scholar
  • 73 A. M. Childs, A. Ostrander, and Y. Su, Quantum 3, 182 (2019). 10.22331/q-2019-09-02-182 CrossrefGoogle Scholar
  • 74 G. H. Low and I. L. Chuang, Quantum 3, 163 (2019). 10.22331/q-2019-07-12-163 CrossrefGoogle Scholar
  • 75 P. A. M. Dirac, Math. Proc. Cambridge Philos. Soc. 26, 376 (1930). 10.1017/S0305004100016108 CrossrefGoogle Scholar
  • 76 J. Frenkel, Bull. Am. Math. Soc. 41, 776 (1935). 10.1090/S0002-9904-1935-06189-0 CrossrefGoogle Scholar
  • 77 A. D. McLachlan, Mol. Phys. 8, 39 (1964). 10.1080/00268976400100041 CrossrefGoogle Scholar
  • 78 P. H. Kramer and M. Saraceno, Geometry of the Time-dependent Variational Principle in Quantum Mechanics (Springer, New York, 1981). CrossrefGoogle Scholar
  • 79 J. Broeckhove, L. Lathouwers, E. Kesteloot, and P. Van Leuven, Chem. Phys. Lett. 149, 547 (1988). 10.1016/0009-2614(88)80380-4 CrossrefGoogle Scholar
  • 80 G. C. Wick, Phys. Rev. 96, 1124 (1954). 10.1103/PhysRev.96.1124 CrossrefGoogle Scholar
  • 81 J. Stokes, J. Izaac, N. Killoran, and G. Carleo, Quantum 4, 269 (2020). 10.22331/q-2020-05-25-269 CrossrefGoogle Scholar
  • 82 N. Yamamoto, arXiv:1909.05074. Google Scholar
  • 83 B. Koczor and S. C. Benjamin, arXiv:1912.08660. Google Scholar
  • 84 B. van Straaten and B. Koczor, arXiv:2005.05172. Google Scholar
  • 85 Y.-X. Yao, N. Gomes, F. Zhang, T. Iadecola, C.-Z. Wang, K.-M. Ho, and P. P. Orth, arXiv:2011.00622 [quant-ph]. Google Scholar
  • 86 Z.-J. Zhang, J. Sun, X. Yuan, and M.-H. Yung, arXiv:2011.05283 [quant-ph]. Google Scholar
  • 87 M.-C. Chen, M. Gong, X. Xu, X. Yuan, J.-W. Wang, C. Wang, C. Ying, J. Lin, Y. Xu, Y. Wu, S. Wang, H. Deng, F. Liang, C.-Z. Peng, S. C. Benjamin, X. Zhu, C.-Y. Lu, and J.-W. Pan, Phys. Rev. Lett. 125, 180501 (2020). 10.1103/PhysRevLett.125.180501 CrossrefGoogle Scholar
  • 88 R. M. Karp, Complexity of Computer Computations (Springer, New York, 1972) p. 85. CrossrefGoogle Scholar
  • 89 L. Zhou, S.-T. Wang, S. Choi, H. Pichler, and M. D. Lukin, Phys. Rev. X 10, 021067 (2020). 10.1103/PhysRevX.10.021067 CrossrefGoogle Scholar
  • 90 G. Pagano, A. Bapat, P. Becker, K. S. Collins, A. De, P. W. Hess, H. B. Kaplan, A. Kyprianidis, W. L. Tan, C. Baldwin, L. T. Brady, A. Deshpande, F. Liu, S. Jordan, A. V. Gorshkov, and C. Monroe, Proc. Natl. Acad. Sci. U.S.A. 117, 25396 (2020). 10.1073/pnas.2006373117 CrossrefGoogle Scholar
  • 91 P. Rebentrost, M. Mohseni, and S. Lloyd, Phys. Rev. Lett. 113, 130503 (2014). 10.1103/PhysRevLett.113.130503 CrossrefGoogle Scholar
  • 92 N. Wiebe, D. Braun, and S. Lloyd, Phys. Rev. Lett. 109, 050505 (2012). 10.1103/PhysRevLett.109.050505 CrossrefGoogle Scholar
  • 93 M. Schuld, I. Sinayskiy, and F. Petruccione, Phys. Rev. A 94, 022342 (2016). 10.1103/PhysRevA.94.022342 CrossrefGoogle Scholar
  • 94 S. Lloyd, M. Mohseni, and P. Rebentrost, Nat. Phys. 10, 631 (2014). 10.1038/nphys3029 CrossrefGoogle Scholar
  • 95 S. Cheng, J. Chen, and L. Wang, Entropy 20, 583 (2018). 10.3390/e20080583 CrossrefGoogle Scholar
  • 96 P.-L. Dallaire-Demers and N. Killoran, Phys. Rev. A 98, 012324 (2018). 10.1103/PhysRevA.98.012324 CrossrefGoogle Scholar
  • 97 K. H. Wan, O. Dahlsten, H. Kristjánsson, R. Gardner, and M. S. Kim, npj Quantum Inf. 3, 36 (2017). 10.1038/s41534-017-0032-4 CrossrefGoogle Scholar
  • 98 V. Giovannetti, S. Lloyd, and L. Maccone, Phys. Rev. Lett. 100, 160501 (2008). 10.1103/PhysRevLett.100.160501 CrossrefGoogle Scholar
  • 99 V. Giovannetti, S. Lloyd, and L. Maccone, Phys. Rev. A 78, 052310 (2008). 10.1103/PhysRevA.78.052310 CrossrefGoogle Scholar
  • 100 F. De Martini, V. Giovannetti, S. Lloyd, L. Maccone, E. Nagali, L. Sansoni, and F. Sciarrino, Phys. Rev. A 80, 010302 (2009). 10.1103/PhysRevA.80.010302 CrossrefGoogle Scholar
  • 101 C. Bravo-Prieto, D. García-Martín, and J. I. Latorre, Phys. Rev. A 101, 062310 (2020). 10.1103/PhysRevA.101.062310 CrossrefGoogle Scholar
  • 102 H. Li and F. D. M. Haldane, Phys. Rev. Lett. 101, 010504 (2008). 10.1103/PhysRevLett.101.010504 CrossrefGoogle Scholar
  • 103 Y. Subaşı, L. Cincio, and P. J. Coles, J. Phys. A 52, 044001 (2019). 10.1088/1751-8121/aaf54d CrossrefGoogle Scholar
  • 104 H.-Y. Huang, K. Bharti, and P. Rebentrost, arXiv:1909.07344. Google Scholar
  • 105 D. An and L. Lin, arXiv:1909.05500. Google Scholar
  • 106 Y. Subaşı, R. D. Somma, and D. Orsucci, Phys. Rev. Lett. 122, 060504 (2019). 10.1103/PhysRevLett.122.060504 CrossrefGoogle Scholar
  • 107 G. Blasse and B. Grabmaier, Luminescent Materials (Springer, New York, 1994) p. 10. CrossrefGoogle Scholar
  • 108 R. N. Zare and D. R. Herschbach, Proc. IEEE 51, 173 (1963). 10.1109/PROC.1963.1676 CrossrefGoogle Scholar
  • 109 T. W. Ebbesen, K. Tanigaki, and S. Kuroshima, Chem. Phys. Lett. 181, 501 (1991). 10.1016/0009-2614(91)80302-E CrossrefGoogle Scholar
  • 110 R. Santagati, J. Wang, A. A. Gentile, S. Paesani, N. Wiebe, J. R. McClean, S. Morley-Short, P. J. Shadbolt, D. Bonneau, J. W. Silverstone, D. P. Tew, X. Zhou, J. L. O’Brien, and M. G. Thompson, Sci. Adv. 4, eaap9646 (2018). 10.1126/sciadv.aap9646 CrossrefGoogle Scholar
  • 111 K. M. Nakanishi, K. Mitarai, and K. Fujii, Phys. Rev. Res. 1, 033062 (2019). 10.1103/PhysRevResearch.1.033062 CrossrefGoogle Scholar
  • 112 R. M. Parrish, E. G. Hohenstein, P. L. McMahon, and T. J. Martínez, Phys. Rev. Lett. 122, 230401 (2019). 10.1103/PhysRevLett.122.230401 CrossrefGoogle Scholar
  • 113 G. Greene-Diniz and D. M. Ramo, arXiv:1910.05168. Google Scholar
  • 114 H. Kawai and Y. O. Nakagawa, Mach. Learn.: Sci. Technol. 1, 045027 (2020). 10.1088/2632-2153/aba183 CrossrefGoogle Scholar
  • 115 J. Tilly, G. Jones, H. Chen, L. Wossnig, and E. Grant, arXiv:2001.04941. Google Scholar
  • 116 P. W. K. Jensen, L. B. Kristensen, J. S. Kottmann, and A. Aspuru-Guzik, Quantum Sci. Technol. 6, 015004 (2020). 10.1088/2058-9565/abc096 CrossrefGoogle Scholar
  • 117 P. J. Ollitrault, A. Kandala, C.-F. Chen, P. K. Barkoutsos, A. Mezzacapo, M. Pistoia, S. Sheldon, S. Woerner, J. M. Gambetta, and I. Tavernelli, Phys. Rev. Res. 2, 043140 (2020). 10.1103/PhysRevResearch.2.043140 CrossrefGoogle Scholar
  • 118 I. Rungger, N. Fitzpatrick, H. Chen, C. Alderete, H. Apel, A. Cowtan, A. Patterson, D. M. Ramo, Y. Zhu, N. H. Nguyen, E. Grant, S. Chretien, L. Wossnig, N. M. Linke, and R. Duncan, arXiv:1910.04735. Google Scholar
  • 119 S. Tamiya and Y. O. Nakagawa, arXiv:2003.01706. Google Scholar
  • 120 K. Heya, K. M. Nakanishi, K. Mitarai, and K. Fujii, arXiv:1904.08566. Google Scholar
  • 121 J. C. Garcia-Escartin and P. Chamorro-Posada, Phys. Rev. A 87, 052330 (2013). 10.1103/PhysRevA.87.052330 CrossrefGoogle Scholar
  • 122 L. Cincio, Y. Subaşı, A. T. Sornborger, and P. J. Coles, New J. Phys. 20, 113022 (2018). 10.1088/1367-2630/aae94a CrossrefGoogle Scholar
  • 123 H. Ding, T. Yokoya, J. C. Campuzano, T. Takahashi, M. Randeria, M. Norman, T. Mochiku, K. Kadowaki, and J. Giapintzakis, Nature 382, 51 (1996). 10.1038/382051a0 CrossrefGoogle Scholar
  • 124 M. Z. Hasan and C. L. Kane, Rev. Mod. Phys. 82, 3045 (2010). 10.1103/RevModPhys.82.3045 CrossrefGoogle Scholar
  • 125 J. M. Coey, Magnetism and Magnetic Materials (Cambridge University Press, Cambridge, U.K., 2010). CrossrefGoogle Scholar
  • 126 Y. Ibe, Y. O. Nakagawa, T. Yamamoto, K. Mitarai, Q. Gao, and T. Kobayashi, arXiv:2002.11724. Google Scholar
  • 127 S. Khatri, R. LaRose, A. Poremba, L. Cincio, A. T. Sornborger, and P. J. Coles, Quantum 3, 140 (2019). 10.22331/q-2019-05-13-140 CrossrefGoogle Scholar
  • 128 K. Heya, Y. Suzuki, Y. Nakamura, and K. Fujii, arXiv:1810.12745. Google Scholar
  • 129 M. Horodecki, P. Horodecki, and R. Horodecki, Phys. Rev. A 60, 1888 (1999). 10.1103/PhysRevA.60.1888 CrossrefGoogle Scholar
  • 130 M. A. Nielsen, Phys. Lett. A 303, 249 (2002). 10.1016/S0375-9601(02)01272-0 CrossrefGoogle Scholar
  • 131 T. Jones and S. C. Benjamin, arXiv:1811.03147. Google Scholar
  • 132 S. F. Huelga, C. Macchiavello, T. Pellizzari, A. K. Ekert, M. B. Plenio, and J. I. Cirac, Phys. Rev. Lett. 79, 3865 (1997). 10.1103/PhysRevLett.79.3865 CrossrefGoogle Scholar
  • 133 V. Giovannetti, S. Lloyd, and L. Maccone, Phys. Rev. Lett. 96, 010401 (2006). 10.1103/PhysRevLett.96.010401 CrossrefGoogle Scholar
  • 134 V. Giovannetti, S. Lloyd, and L. Maccone, Nat. Photonics 5, 222 (2011). 10.1038/nphoton.2011.35 CrossrefGoogle Scholar
  • 135 A. Shaji and C. M. Caves, Phys. Rev. A 76, 032111 (2007). 10.1103/PhysRevA.76.032111 CrossrefGoogle Scholar
  • 136 B. Koczor, S. Endo, T. Jones, Y. Matsuzaki, and S. C. Benjamin, New J. Phys. 22, 083038 (2020). 10.1088/1367-2630/ab965e CrossrefGoogle Scholar
  • 137 R. Kaubruegger, P. Silvi, C. Kokail, R. van Bijnen, A. M. Rey, J. Ye, A. M. Kaufman, and P. Zoller, Phys. Rev. Lett. 123, 260505 (2019). 10.1103/PhysRevLett.123.260505 CrossrefGoogle Scholar
  • 138 H. Labuhn, D. Barredo, S. Ravets, S. De Léséleuc, T. Macrì, T. Lahaye, and A. Browaeys, Nature 534, 667 (2016). 10.1038/nature18274 CrossrefGoogle Scholar
  • 139 H. Bernien, S. Schwartz, A. Keesling, H. Levine, A. Omran, H. Pichler, S. Choi, A. S. Zibrov, M. Endres, M. Greiner, V. Vuletić, and M. D. Lukin, Nature 551, 579 (2017). 10.1038/nature24622 CrossrefGoogle Scholar
  • 140 D. J. Wineland, J. J. Bollinger, W. M. Itano, F. Moore, and D. J. Heinzen, Phys. Rev. A 46, R6797 (1992). 10.1103/PhysRevA.46.R6797 CrossrefGoogle Scholar
  • 141 J. J. Meyer, J. Borregaard, and J. Eisert, arXiv:2006.06303. Google Scholar
  • 142 H. Chen, M. Vasmer, N. P. Breuckmann, and E. Grant, arXiv:1912.10063. Google Scholar
  • 143 X. Xu, S. C. Benjamin, and X. Yuan, arXiv:1911.05759. Google Scholar
  • 144 P. D. Johnson, J. Romero, J. Olson, Y. Cao, and A. Aspuru-Guzik, arXiv:1711.02249. Google Scholar
  • 145 C. Dankert, R. Cleve, J. Emerson, and E. Livine, Phys. Rev. A 80, 012304 (2009). 10.1103/PhysRevA.80.012304 CrossrefGoogle Scholar
  • 146 M. A. Nielsen and I. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, Cambridge, UK, 2002). Google Scholar
  • 147 N. Yoshioka, Y. O. Nakagawa, K. Mitarai, and K. Fujii, Phys. Rev. Research 2, 043289 (2020). 10.1103/PhysRevResearch.2.043289 CrossrefGoogle Scholar
  • 148 Y. Dubi and M. Di Ventra, Nano Lett. 9, 97 (2009). 10.1021/nl8025407 CrossrefGoogle Scholar
  • 149 M. Lubasch, J. Joo, P. Moinier, M. Kiffner, and D. Jaksch, Phys. Rev. A 101, 010301 (2020). 10.1103/PhysRevA.101.010301 CrossrefGoogle Scholar
  • 150 M. Cerezo, A. Poremba, L. Cincio, and P. J. Coles, Quantum 4, 248 (2020). 10.22331/q-2020-03-26-248 CrossrefGoogle Scholar
  • 151 E. Anschuetz, J. Olson, A. Aspuru-Guzik, and Y. Cao, International Workshop on Quantum Technology and Optimization Problems, 2019, p. 74. 10.1007/978-3-030-14082-3_7 CrossrefGoogle Scholar
  • 152 Y. Wang, G. Li, and X. Wang, arXiv:2005.08797. Google Scholar
  • 153 A. Arrasmith, L. Cincio, A. T. Sornborger, W. H. Zurek, and P. J. Coles, Nat. Commun. 10, 3438 (2019). 10.1038/s41467-019-11417-0 CrossrefGoogle Scholar
  • 154 A. Di Paolo, P. K. Barkoutsos, I. Tavernelli, and A. Blais, Phys. Rev. Res. 2, 033364 (2020). 10.1103/PhysRevResearch.2.033364 CrossrefGoogle Scholar
  • 155 A. N. Chowdhury, G. H. Low, and N. Wiebe, arXiv:2002.00055. Google Scholar
  • 156 J. Wu and T. H. Hsieh, Phys. Rev. Lett. 123, 220502 (2019). 10.1103/PhysRevLett.123.220502 CrossrefGoogle Scholar
  • 157 N. Moiseyev, Non-Hermitian Quantum Mechanics (Cambridge University Press, Cambridge, U.K., 2011). CrossrefGoogle Scholar
  • 158 C. Gardiner, P. Zoller, and P. Zoller, Quantum Noise: A Handbook of Markovian and Non-Markovian Quantum Stochastic Methods with Applications to Quantum Optics (Springer Science & Business Media, Berlin/Heidelberg, 2004) Vol. 56. Google Scholar
  • 159 J. Raftery, D. Sadri, S. Schmidt, H. E. Türeci, and A. A. Houck, Phys. Rev. X 4, 031043 (2014). 10.1103/PhysRevX.4.031043 CrossrefGoogle Scholar
  • 160 C. Zoufal, A. Lucchi, and S. Woerner, arXiv:2006.06004. Google Scholar
  • 161 F. Fontanela, A. J. Jacquier, and M. Oumgari, Available at SSRN 3499134. Google Scholar
  • 162 S. McArdle and D. P. Tew, arXiv:2006.11181. Google Scholar
  • 163 Y. Shingu, Y. Seki, S. Watabe, S. Endo, Y. Matsuzaki, S. Kawabata, T. Nikuni, and H. Hakoshima, arXiv:2007.00876. Google Scholar
  • 164 A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, Phys. Rev. A 86, 032324 (2012). 10.1103/PhysRevA.86.032324 CrossrefGoogle Scholar
  • 165 X. Yuan, Z. Zhang, N. Lütkenhaus, and X. Ma, Phys. Rev. A 94, 062305 (2016). 10.1103/PhysRevA.94.062305 CrossrefGoogle Scholar
  • 166 A. Kandala, K. Temme, A. D. Córcoles, A. Mezzacapo, J. M. Chow, and J. M. Gambetta, Nature 567, 491 (2019). 10.1038/s41586-019-1040-7 CrossrefGoogle Scholar
  • 167 T. Giurgica-Tiron, Y. Hindy, R. LaRose, A. Mari, and W. J. Zeng, arXiv:2005.10921. Google Scholar
  • 168 T. Keen, T. Maier, S. Johnston, and P. Lougovski, Quantum Sci. Technol. 5, 035001 (2020). 10.1088/2058-9565/ab7d4c CrossrefGoogle Scholar
  • 169 E. F. Dumitrescu, A. J. McCaskey, G. Hagen, G. R. Jansen, T. D. Morris, T. Papenbrock, R. C. Pooser, D. J. Dean, and P. Lougovski, Phys. Rev. Lett. 120, 210501 (2018). 10.1103/PhysRevLett.120.210501 CrossrefGoogle Scholar
  • 170 A. He, B. Nachman, W. A. de Jong, and C. W. Bauer, Phys. Rev. A 102, 012426 (2020). 10.1103/PhysRevA.102.012426 CrossrefGoogle Scholar
  • 171 J. Garmon, R. Pooser, and E. Dumitrescu, Phys. Rev. A 101, 042308 (2020). 10.1103/PhysRevA.101.042308 CrossrefGoogle Scholar
  • 172 J. J. Wallman and J. Emerson, Phys. Rev. A 94, 052325 (2016). 10.1103/PhysRevA.94.052325 CrossrefGoogle Scholar
  • 173 S. Endo, Q. Zhao, Y. Li, S. Benjamin, and X. Yuan, Phys. Rev. A 99, 012334 (2019). 10.1103/PhysRevA.99.012334 CrossrefGoogle Scholar
  • 174 M. Huo and Y. Li, arXiv:1811.02734. Google Scholar
  • 175 R. Takagi, arXiv:2006.12509. Google Scholar
  • 176 C. Song, J. Cui, H. Wang, J. Hao, H. Feng, and Y. Li, Sci. Adv. 5, eaaw5686 (2019). 10.1126/sciadv.aaw5686 CrossrefGoogle Scholar
  • 177 S. Zhang, Y. Lu, K. Zhang, W. Chen, Y. Li, J.-N. Zhang, and K. Kim, Nat. Commun. 11, 587 (2020). 10.1038/s41467-020-14376-z CrossrefGoogle Scholar
  • 178 S. Bravyi, J. M. Gambetta, A. Mezzacapo, and K. Temme, arXiv:1701.08213. Google Scholar
  • 179 K. Setia, R. Chen, J. E. Rice, A. Mezzacapo, M. Pistoia, and J. Whitfield, arXiv:1910.14644. Google Scholar
  • 180 P. K. Barkoutsos, J. F. Gonthier, I. Sokolov, N. Moll, G. Salis, A. Fuhrer, M. Ganzhorn, D. J. Egger, M. Troyer, A. Mezzacapo, S. Filipp, and I. Tavernelli, Phys. Rev. A 98, 022322 (2018). 10.1103/PhysRevA.98.022322 CrossrefGoogle Scholar
  • 181 Z. Cai, Phys. Rev. Appl. 14, 014059 (2020). 10.1103/PhysRevApplied.14.014059 CrossrefGoogle Scholar
  • 182 W. J. Huggins, J. McClean, N. Rubin, Z. Jiang, N. Wiebe, K. B. Whaley, and R. Babbush, arXiv:1907.13117. Google Scholar
  • 183 R. Sagastizabal, X. Bonet-Monroig, M. Singh, M. A. Rol, C. Bultink, X. Fu, C. Price, V. Ostroukh, N. Muthusubramanian, A. Bruno, M. Beekman, N. Haider, T. E. O’Brien, and L. DiCarlo, Phys. Rev. A 100, 010302 (2019). 10.1103/PhysRevA.100.010302 CrossrefGoogle Scholar
  • 184 J. R. McClean, Z. Jiang, N. C. Rubin, R. Babbush, and H. Neven, Nat. Commun. 11, 636 (2020). 10.1038/s41467-020-14341-w CrossrefGoogle Scholar
  • 185 J. Lundeen, A. Feito, H. Coldenstrodt-Ronge, K. Pregnell, C. Silberhorn, T. Ralph, J. Eisert, M. Plenio, and I. Walmsley, Nat. Phys. 5, 27 (2009). 10.1038/nphys1133 CrossrefGoogle Scholar
  • 186 S. Bravyi, S. Sheldon, A. Kandala, D. C. Mckay, and J. M. Gambetta, arXiv:2006.14044. Google Scholar
  • 187 A. Zlokapa and A. Gheorghiu, arXiv:2005.10811. Google Scholar
  • 188 A. K. Ekert, C. M. Alves, D. K. Oi, M. Horodecki, P. Horodecki, and L. C. Kwek, Phys. Rev. Lett. 88, 217901 (2002). 10.1103/PhysRevLett.88.217901 CrossrefGoogle Scholar

Author Biographies

Suguru Endo was born in Kanagawa, Japan. He received his bachelor's and master's degrees from Keio University in 2014 and 2016 and his Ph.D. degree from University of Oxford in 2019. He has been working as a researcher in NTT secure platform laboratories since 2020. His research interests focus on hybrid quantum-classical algorithms, quantum error mitigation, and circuit QED.

Zhenyu Cai was born in Guangdong, China. He obtained his Ph.D. degree from University of Oxford in 2020. Since then he has been a Junior Research Fellow in Physics at St John's College, Oxford. His research interests centre around quantum error correction, quantum error mitigation and the practical applications of near-term noisy quantum computers.

Simon C. Benjamin is the Professor of Quantum Technologies at Materials Department in the University of Oxford. From 2014 through 2020 he was an Associate Director of the £40M Oxford-led UK National Hub on Networked Quantum Information Technologies (NQIT) and he is part of the extended leadership team for the successor Hub in Quantum Computation and Simulation (HQCS). In Oxford, Simon leads a team of 12 applied theorists who look at various aspects of quantum computing, including architectures, fault tolerance, and algorithms that are robust against imperfections in the computer. In 2017 Simon co-founded the company Quantum Motion Technologies where he is now Chief Scientific Officer leading the theory and design effort.

Xiao Yuan received his Bachelor in theoretical physics from Peking University in 2012 and got his Ph.D. in physics from Tsinghua University in 2016. Then he worked as a postdoc at University of Science and Technology China in 2017, at Oxford University from 2017 to 2019, and at Stanford University from 2019 to 2020. He is now an assistant professor at Center on Frontiers of Computing Studies, Peking University. Xiao Yuan's research interests focus on three aspects of quantum information science, including near-term and universal quantum computing, quantum foundation, and quantum cryptography.