Quantum circuit modeling of a classical perceptron

A scheme of the quantum algorithm proposed in this work is shown in Fig. 1b. The input and weight vectors are limited to binary values, i j , w j ∈ {−1, 1}, as in McCulloch-Pitts neurons. Hence, a m-dimensional input vector is encoded using the m coefficients needed to define a general wavefunction |ψ i 〉 of N qubits. In practice, given arbitrary input \((\vec i)\) and weight \((\vec w)\) vectors

$$\vec i = \left( {\begin{array}{*{20}{c}} {i_0} \\ {i_1} \\ \vdots \\ {i_{m - 1}} \end{array}} \right),{\kern 1pt} \vec w = \left( {\begin{array}{*{20}{c}} {w_0} \\ {w_1} \\ \vdots \\ {w_{m - 1}} \end{array}} \right)$$ (1)

with i j , w j ∈ {−1, 1}, we define the two quantum states

$$|\psi _i\rangle = \frac{1}{{\sqrt m }}\mathop {\sum}\limits_{j = 0}^{m - 1} {i_j} |j\rangle ;|\psi _w\rangle = \frac{1}{{\sqrt m }}\mathop {\sum}\limits_{j = 0}^{m - 1} {w_j} |j\rangle .$$ (2)

The states |j〉 ∈ {|00…00〉, |00…01〉,…, |11…11〉} form the so-called computational basis of the quantum processor, i.e., the basis in the Hilbert space of N qubits, corresponding to all possible states of the single qubits being either in |0〉 or |1〉. As usual, these states are labeled with integers j ∈ {0, …, m − 1} arising from the decimal representation of the respective binary string. Evidently, if N qubits are used in the register, there are m = 2N basis states labeled |j〉 and, as outlined in Eq. (2), we can use factors ±1 to encode the m-dimensional classical vectors into a uniformly weighted superposition of the full computational basis.

The first step of the algorithm prepares the state |ψ i 〉 by encoding the input values in \(\vec i\). Assuming the qubits to be initialized in the state |00…00〉 ≡ |0〉⊗N, we perform a unitary transformation U i such that

$$U_i|0\rangle ^{ \otimes N} = |\psi _i\rangle .$$ (3)

In principle, any m × m unitary matrix having \(\vec i\) in the first column can be used to this purpose, and we will give explicit examples in the following. Notice that, in a more general scenario, the preparation of the input state starting from a blank register might be replaced by a direct call to a quantum memory23 where |ψ i 〉 was previously stored.

The second step computes the inner product between \(\vec w\) and \(\vec i\) using the quantum register. This task can be performed efficiently by defining a unitary transformation, U w , such that the weight quantum state is rotated as

$$U_w|\psi _w\rangle = |1\rangle ^{ \otimes N} = |m - 1\rangle {\kern 1pt} {\kern 1pt} .$$ (4)

As before, any m × m unitary matrix having \(\vec w^T\) in the last row satisfies this condition. If we apply U w after U i , the overall N-qubits quantum state becomes

$$U_w|\psi _i\rangle = \mathop {\sum}\limits_{j = 0}^{m - 1} {c_j} |j\rangle \equiv |\phi _{i,w}\rangle {\mkern 1mu} .$$ (5)

Using Eq. (4), the scalar product between the two quantum states is

$$\begin{array}{*{20}{l}} {\langle \psi _w|\psi _i\rangle } \hfill & = \hfill & {\langle \psi _w|U_w^\dagger U_w|\psi _i\rangle } \hfill \\ {} \hfill & = \hfill & {\langle m - 1|\phi _{i,w}\rangle = c_{m - 1},} \hfill \end{array}$$ (6)

and from the definitions in Eq. (2) it is easily seen that the scalar product of input and weight vectors is \(\vec w \cdot \vec i = m\langle \psi _w|\psi _i\rangle\). Therefore, the desired result is contained, up to a normalization factor, in the coefficient c m−1 of the final state |ϕ i,w 〉. For an intuitive geometrical interpretation see Supplementary Information, Sec. I.

In order to extract such an information, we propose to use an ancilla qubit (a) initially set in the state |0〉. A multi-controlled NOT gate between the N encoding qubits and the target a leads to:24

$$|\phi _{i,w}\rangle |0\rangle _a \to \mathop {\sum}\limits_{j = 0}^{m - 2} {c_j} |j\rangle |0\rangle _a + c_{m - 1}|m - 1\rangle |1\rangle _a$$ (7)

The nonlinearity required by the threshold function at the output of the perceptron is immediately obtained by performing a quantum measurement: indeed, by measuring the state of the ancilla qubit in the computational basis produces the output |1〉 a (i.e., an activated perceptron) with probability |c m−1 |2. As it will be shown in the following, this choice proves simultaneously very simple and effective in producing the correct result. However, it should be noticed that refined threshold functions can be applied once the inner product information is stored on the ancilla.25,26,27 We also notice that both parallel and anti-parallel \({\vec i} - {\vec w}\) vectors produce an activation of the perceptron, while orthogonal vectors always result in the ancilla being measured in the state |0〉 a . This is a direct consequence of the probability being a quadratic function, i.e., |c m−1 |2 in the present case, at difference with classical perceptrons that can only be employed as linear classifiers in their simplest realizations. In fact, our quantum perceptron model can be efficiently used as a pattern classifier, as it will be shown below, since it allows to interpret a given pattern and its negative on equivalent footing. Formally, this intrinsic symmetry reflects the invariance of the encoding |ψ i 〉 and |ψ w 〉 states under addition of a global −1 factor.

Implementation of the unitary transformations

One of the most critical tasks to be practically solved when implementing a quantum neural network model is the efficient realization of unitary transformations. In machine learning applications, this might eventually discriminate between algorithms that show truly quantum advantage over their classical counterparts.13 Here we discuss an original strategy for practically implementing the preparation of the input state |ψ i 〉 and the unitary transformation U w on a quantum hardware. In particular, we will first outline the most straightforward algorithm one might think of employing, i.e., the “brute force” application of successive sign flip blocks. Then, we will show an alternative and more effective approach based on the generation of hypergraph states. In the next Section, we will see that only the latter allows to practically implement this quantum perceptron model on a real quantum device.

As a first step we define a sign flip block, SF N,j , as the unitary transformation acting on the computational basis of N qubits in the following way:

$${\mathrm{SF}}_{N,j}|j{^\prime}\rangle = \left\{ {\begin{array}{*{20}{l}} {|j{^\prime}\rangle } \hfill & {{\mathrm{if}}} \hfill & {j \,

e\, j^\prime } \hfill \\ { - |j{^\prime}\rangle } \hfill & {{\mathrm{if}}} \hfill & {j = j^\prime } \hfill \end{array}} \right..$$ (8)

In general, SF N,j is equivalent to a multi-controlled quantum operation between N qubits: for example, for any N, m = 2N, a controlled Z operation between N qubits (CNZ) is a well-known quantum gate24 realizing SF N,m−1 , while a single qubit Z gate acts as SF 1,1 . For a given input \(\vec i\), the unitary U i can be obtained by combining simple single qubit rotations and sign flip blocks to introduce the required −1 factors in front of |j〉 basis states in the representation of the target |ψ i 〉 (see details in the Methods section). As already anticipated, the whole problem is symmetric under the addition of a global −1 factor (i.e., |ψ i 〉 and −|ψ i 〉 are fully equivalent). Hence, there can be only at most m/2 = 2N−1 independent −1 factors, and 2N−1 sign flip blocks are needed in the worst case. A similar procedure can also lead to the other unitary operation in the quantum perceptron algorithm, U w . Evidently, the above strategy is exponentially expensive in terms of circuit depth as a function of the number of qubits, and requires an exponential number of N-controlled gates.

A more efficient solution can be given after realizing that the class of possible input- and weight-encoding states, Eq. (2), coincides with the set of the so-called hypergraph states (see Supplementary Information, Sec. II, available at https://doi.org/10.1038/s41534-019-0140-4). The latter are ubiquitous ingredients of many renown quantum algorithms, and have been extensively studied and theoretically characterized.22,28 In particular, hypergraph states can be mapped into the vertices and hyperedges of generalized graphs, and can be prepared by using single qubit and (multi)-controlled Z gates, with at most a single N-controlled CNZ and with the possibility of performing many p-controlled CpZ gates (involving only p qubits, with p < N) in parallel. The design of the sequence for U i and U w involves a series of iterative steps described in the Methods section, and directly includes the algorithmic generation of hypergraph states based on a procedure that we will call the “hypergraph states generation subroutine” (HSGS). An example of the full sequence for a specific N = 4 case is shown in Fig. 2. Notice that our optimized algorithm involving hypergraph states successfully reduces the required quantum resources with respect to the brute force approach outlined in the previous paragraph. However, it may still involve an exponential cost in terms of circuit depth or clock cycles, i.e., of temporal steps of the quantum circuit when all possible parallelization of independent operations on the qubits is taken into account. Indeed, the sign-flip algorithm described above requires O(2N) N-controlled Z gates when running on N qubits, in the worst case. Since any CNZ can be decomposed into poly(N) elementary single and two-qubit gates,24 the overall scaling of the sign-flip approach is O(poly(N)2N). On the other hand, the worst case for the HSGS, namely the fully connected hypergraph with N vertices, corresponds to applying once all the possible Z and CpZ operations for 2 ≤ p ≤ N. Since all these operations commute, they can be arranged in such a way that all the qubits are always involved either in a single-qubit operation or a multi-controlled one (e.g., a Z on a certain qubit and the CN−1Z on the remaining ones can be done in parallel), for any progressive clock cycle. The overall number of clock cycles is still O(2N), as in the previous case, but now at most one slice contains a N-qubit operation, while all other slices with p < N can be decomposed into poly(p) elementary operations. In this respect, the proposed HSGS optimizes the number of multi-qubit operations. This may be a significant advantage, e.g., in currently available superconducting quantum processors, in which multi-qubit operations are not natively available.

Fig. 2 Quantum circuit of a N = 4 perceptron. An example of a typical quantum circuit for a perceptron model with N = 4 qubits (i.e., capable of processing m = 24 = 16 dimensional input vectors), which employs the algorithm for the generation of hypergraph states, including the HSGS (see main text). In this example, the input vector has elements i 0 = i 1 = −1, and i j = 1 for j = 2,…,15, while the weight vector has elements w 2 = w 3 = w 4 = −1, and 1 in all other entries. Multi-controlled CpZ gates are denoted by vertical lines and black dots on the qubits involved. The HSGS is realized inside the U i block after the initial H⊗N gate, and in the U w block before the final H⊗N and NOT⊗N operations Full size image

Before proceeding, it is also worth pointing out the role of U w in this algorithm, which is essentially to cancel some of the transformations performed to prepare |ψ i 〉, or even all of them if the condition \(\vec i = \vec w\) is satisfied. Further optimization of the algorithm, lying beyond the scope of the present work, might, therefore, be pursued at the compiling stage. However, notice that the input and weight vectors can, in practical applications, remain unknown or hidden until runtime.

Numerical results and quantum simulations

We implemented the algorithm for a single quantum perceptron both on classical simulators working out the matrix algebra of the circuit and on cloud-based quantum simulators, specifically the IBM Quantum Experience real backends (https://quantumexperience.ng.bluemix.net) using the Qiskit Python development kit (https://qiskit.org/). Due to the constraints imposed by the actual IBM hardware in terms of connectivity between the different qubits, we limited the quantum simulation on the actual quantum processor to the N = 2 case. Nevertheless, even this small-scale example is already sufficient to show all the distinctive features of our proposed set up, such as the exponential growth of the analyzable problems dimension, as well as the pattern recognition potential. In general, as already mentioned, in this encoding scheme N qubits can store and process 2N-dimensional input and weight vectors. Thus, \(2^{2^N}\) different input patterns can be analyzed, corresponding also to the number of different \(\vec w\) that could be defined. Moreover, all binary inputs and weights can easily be converted into black and white patterns, thus providing a visual interpretation of the artificial neuron activity.

Going back to the case study with N = 2, 22 = 4 binary images can be managed, and thus \(2^{2^2} = 16\) different patterns could be analyzed. The conversion between \(\vec i\) or \(\vec w\) and 2 × 2 pixels visual patterns is done as follows. As depicted in Fig. 3a, we label each image by ordering the pixels left to right, top to bottom, and assigning a value n j = 1(0) to a white (black) pixel. The corresponding input or weight vector is then built by setting \(i_j = ( - 1)^{n_j}\), or \(w_j = ( - 1)^{n_j}\). We can also univocally assign an integer label k i (or k w ) to any pattern by converting the binary string n 0 n 1 n 2 n 3 to its corresponding decimal number representation. Under this encoding scheme, e.g., numbers 3 and 12 are used to label patterns with horizontal lines, while 5 and 10 denote patterns with vertical lines, and 6 and 9 are used to label images with checkerboard-like pattern. An example of the sequence of operations performed on the IBM quantum computer using hypergraph states is shown in Fig. 3c for \(\vec i\) corresponding to the index k i = 11, and \(\vec w\) corresponding to k w = 7.

Fig. 3 Results for N = 2 quantum perceptron model. a The scheme used to label the 2 × 2 patterns and a few examples of patterns. b Scheme of IBM Q-5 “Tenerife” backend quantum processor. c Example of the gate sequence for the N = 2 case, with input and weight vectors corresponding to labels k i = 11 and k w = 7. d Ideal outcome of the quantum perceptron algorithm, simulated on a classical computer. e Results from the Tenerife processor using the algorithm with multi-controlled sign flip blocks. f Results from the Tenerife processor using the algorithm for the generation of hypergraph states. In e and f we explicitly indicate the corresponding average discrepancies calculated with respect to the ideal case, as defined in the main text Full size image

The Hilbert space of 2 qubits is relatively small, with a total of 16 possible values for \(\vec i\) and \(\vec w\). Hence, the quantum perceptron model could be experimentally tested on the IBM quantum computer for all possible combinations of input and weights. The results of these experiments, and the comparison with classical numerical simulations, are shown in Fig. 3d–f. First, we plot the ideal outcome of the quantum perceptron algorithm in Fig. 3d, where both the global −1 factor and the input-weight symmetries are immediately evident. In particular, for any given weight vector \(\vec w\), the perceptron is able to single out from the 16 possible input patterns only \(\vec i = \vec w\) and its negative (with output |c m−1 |2 = 1, i.e., the perfect activation of the neuron), while all other inputs give outputs smaller than 0.25. If the inputs and weights are translated into 2 × 2 black and white pixel grids, it is not difficult to see that a single quantum perceptron can be used to recognize, e.g., vertical lines, horizontal lines, or checkerboard patterns.

The actual experimental results are then shown in Fig. 3e, f, where the same algorithm is run on the IBM Q 5 “Tenerife” quantum processor.29 First, we show in panel 3e the results of the non-optimized approach introduced in the previous Section, which makes direct use of sign flip blocks. We deliberately did not take into account the global sign symmetry, thus treating any |ψ i 〉 and −|ψ i 〉 as distinct input quantum states and using up to 2N sign flip blocks. We notice that even in such an elementary example the algorithm performs worse and worse with increasing number of blocks. However, it should also be emphasized that the underlying structure of the output is already quite evident, despite the quantitative inaccuracy of the quantum simulated outputs: indeed, a threshold of 0.5 applied to the measured output would be sufficient to successfully complete all the classification tasks, i.e. the simulated artificial neuron can correctly single out from all possible inputs any given (precalculated) weight vector.

On the other hand, a remarkably better accuracy, also on the quantitative side and with smaller errors, is obtained when using the algorithm based on the hypergraph states formalism, whose experimental results are shown in panel 3f and represent the main result of this work. In this case, the global phase symmetry is naturally embedded in the algorithm itself, and the results show symmetric performances all over the range of possible inputs and weights. All combinations of \(\vec i\) and \(\vec w\) yield results either larger than 0.75 or smaller than 0.3, in good quantitative agreement with the expected results plotted in panel 3d. Again, as it is also clear from the appearance of the plots, all the classification tasks are correctly carried out. In order to give a quantitative measure of the overall agreement between the ideal (Fig. 3d), sign flips (Fig. 3e) and hypergraph states (Fig. 3f) versions, we introduce the average discrepancy as

$${\cal D} = \frac{{\mathop {\sum}

olimits_{i,w} {\left| {O(i,w) - O_{ideal}(i,w)} \right|} }}{{2^{2^{N + 1}}}}$$ (9)

where \(O(i,w) = | {\mathop {\sum}

olimits_j {i_j} w_j} |^2 = |c_{m - 1}|^2\) is the outcome of the artificial neuron for a given pair of input and weights as obtained on a real device, O ideal (i, w) is the corresponding ideal result and \(2^{2^{N + 1}}\) is the total number of possible \({\vec i} - {\vec w}\) pairs. As reported also in Fig. 3, we find \({\cal C} \simeq 0.2364\) for the sign flips case, and \({\cal D} \simeq 0.0598\) for the version involving hypergraph states. As a technical warning, we finally notice that in all of the three cases shown in panels d-f of Fig. 3, the CpZ operations were obtained by adding single qubit Hadamard gates on the target qubit before and after the corresponding CpNOT gate. For p = 1 this is a CNOT gate, which is natively implemented on the IBM quantum hardware, while the case p = 2 is known as the Toffoli gate, for which a standard decomposition into 6 CNOTs and single qubit rotations is known.24

Finally, in the spirit of showing the potential scalability and usefulness of this quantum model of a classical perceptron for classification purposes, we have applied the HSGS-based algorithm to the N = 4 qubits case by using the circuit simulator feature available in Qiskit (https://qiskit.org/ and https://qiskit.org/terra). For N = 4, 232 possible combinations of \(\vec i\) and \(\vec w\) vectors are possible, far too many to explore the whole combinatorial space as previously done for the 2 qubits in Fig. 3. To explicitly show a few examples, we have chosen a single weight vector, \(\vec w_t\), corresponding to a simple cross-shaped pattern when represented as a 4 × 4 pixels image (encoded along the same lines of the N = 2 case, see first panel in Fig. 3), and weighted it against several possible choices of input vectors. When a threshold O(i, w t ) > 0.5 is applied to the outcome of the artificial neuron, 274 over the total 216 possible inputs are selected as positive cases, and they correspond to patterns differing from \(\vec w_t\) (or from its negative) by at most two pixels. Geometrically speaking, the vectors corresponding to positive cases all lie within a cone around \(\vec w_t\). Some results are reported in Fig. 4 for a selected choice of input vectors, where the artificial neuron output is computed both with standard linear algebra and with a quantum circuit on a virtual and noise-free quantum simulator run on a classical computer. A larger set of examples is also reported in the Supplementary Information, Sec. IV.

Fig. 4 Pattern recognition for N = 4. A possible choice of the weight vector, \(\vec w_t\), for the N = 4 case is represented in the first panel (top left), and a small selection of different input vectors are then simulated with the quantum model of perceptron. Above each input pattern, the quantitative answers of the artificial neuron, namely the values of |c m−1 |2, are reported as obtained either through standard linear algebra (ideal ‘exact’ results) or resulting from the simulation of the quantum algorithm (‘q. alg’, run on a classical computer, averaged over n shots = 8192 repetitions). The two versions agree within statistical error Full size image

Based on these results, we have implemented an elementary hybrid quantum-classical training scheme, which is an adaptation of the perceptron update rule30 to our algorithm. After preparing a random training set containing a total of 3050 different inputs, of which 50 positive and 3000 negative ones according to the threshold defined above, the binary valued artificial neuron is trained to recognize the targeted \(\vec w_t\). This is obtained by using the noiseless Qiskit simulator feature, in which the artificial neuron output is computed through our proposed quantum algorithm, and the optimization of the weight vector is performed by a classical processor. We selected a random \(\vec w_0\) vector to start with, and then we let the artificial neuron process the training set according to well-defined rules and learning rates l p and l n for positive and negative cases, respectively, without ever conveying explicit information about the target \(\vec w_t\) (see further details in the Methods). An example of the trajectory of the system around the configuration space of possible patterns is shown in Fig. 5a, in which we computed the fidelity of the quantum state |ψ w 〉 encoding the trained \(\vec w\) with respect to the target state \(|\psi _{w_t}\rangle\). In Fig. 5b, we report the average value of such fidelity over 500 realizations of the training scheme, all with the same initial pattern \(\vec w_0\) and the same training set. As it can be seen, the quantum artificial neuron effectively learns the targeted cross-shaped pattern: an animated plot of a sample trajectory is also available in the Supplementary Information (see animated gif online at https://doi.org/10.1038/s41534-019-0140-4). Finally, we mention that l p = 0.5 is found to be the optimal learning rate in our case, with little effect produced by l n (which we also set to l n = 0.5 for simplicity in the simulations reported here).