Which clothes should I wear? Which restaurant should we choose for lunch? Which article should I read first? We make many decisions in our daily lives. Can we look to nature to find a method for ‘efficient decision-making’? For formal discussion, let us focus on the multi-armed bandit problem (BP), stated as follows. Consider a number of slot machines. Each machine when pulled, rewards the player with a coin at a certain probability P k (k∈{1, 2, …, N}). For simplicity, we assume that the reward from one machine is the same as that from another machine. To maximise the total amount of reward, it is necessary to make a quick and accurate judgment of which machine has the highest probability of giving a reward. To accomplish this, the player should gather information about many machines in an effort to determine which machine is the best; however, in this process, the player should not fail to exploit the reward from the known best machine. These requirements are not easily met simultaneously, because there is a trade-off between ‘exploration’ and ‘exploitation’. The BP is used to determine the optimal strategy for maximising the total reward with incompatible demands, either by exploiting rewards obtained owing to already collected information or by exploring new information to acquire higher pay-offs in risk taking. Living organisms commonly encounter this ‘exploration-exploitation dilemma’ in their struggle to survive in the unknown world.

This dilemma has no known generally optimal solution. What strategies do humans and animals exploit to resolve this dilemma? Daw et al. found that the softmax rule is the best-fitting algorithm for human decision-making behaviour in the BP task1. The softmax rule uses the randomness of the selection specified by a parameter analogous to the temperature in the Boltzmann distribution (see Methods). The findings of Daw et al. raised many exciting questions for future brain research2. How humans and animals respond to the dilemma and the underlying neural mechanisms still remain important and open questions.

The BP was originally described by Robbins3, although the same problem in essence was also studied by Thompson4. However, the optimal strategy is known only for a limited class of problems in which the reward distributions are assumed to be ‘known’ to the players and an index called ‘the Gittins index’ is computed5,6. Furthermore, computing the Gittins index in practice is not tractable for many problems. Auer et al. proposed another index expressed as a simple function of the total reward obtained from a machine7. This upper confidence bound (UCB) algorithm is used worldwide for many applications, such as Monte-Carlo tree searches8,9, web content optimization10 and information and communications technology (ICT)11,12,13,14.

Kim et al. proposed a decision-making algorithm called the ‘tug-of-war model’ (TOW); it was inspired by the true slime mold Physarum15,16, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a ‘nonlocal correlation’ among the branches, that is, the volume increment in one branch is immediately compensated for by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision-making. Thus, the TOW is a dynamical system which describes spatiotemporal dynamics of a physical object (i.e. an amoeboid organism). The TOW connected ‘natural phenomena’ to ‘decision-making’ for the first time. This approach enables us to realise an ‘efficient decision maker’–an object which can make decisions efficiently.

Here we demonstrate the physical implementation of the TOW with quantum dots (QDs) and optical near-field interactions by using numerical simulations. Semiconductor QDs have been used for innovative nanophotonic devices17,18 and optical near-field interactions have been successfully applied to solar cells19, LEDs20, diode lasers21 etc. We have already proposed QD systems for computing applications, such as the constraint satisfaction problem (CSP) and the satisfiability problem (SAT)22,23. We introduce a new application for decision-making by making use of optical energy transfer between QDs mediated by optical near-field interactions.

We use three types of cubic QDs with side lengths of a, and 2a, which are respectively represented by QD S , QD M and QD L [Fig. 1(a)]. We assume that five QDs are one-dimensionally arranged in ‘L-M-S-M-L’ or ‘M-L-S-L-M’ as shown in Fig. 1(b), where S, M and L are simplified representations of QD S , QD M and QD L , respectively. Owing to the steep electric fields in the vicinity of these QDs, an optical excitation can be transferred between QDs through resonant energy levels mediated by optical near-field interactions24,25,26,27,28. Here we should note that an optical excitation is usually transferred from smaller QDs to larger ones owing to energy dissipation processes occurring at larger QDs (details are described in Supplementary Information). In addition, an optical near-field interaction follows Yukawa-type potential, meaning that it could be engineered by inter-dot distances,

where r is the inter-dot distance and A and a are constants29.

Figure 1 (a) Energy transfer between quantum dots (QDs). Two cubic quantum dots QD S and QD M , whose side lengths are a and , respectively, are located close to each other. Optical excitations in QD S can be transferred to neighbouring structures QD M via optical near-field interactions, denoted by 29, because there exists a resonance between the level of quantum number (1, 1, 1) for QD S (denoted by S 1 ) and that of quantum number (2, 1, 1) for QD M (M 2 ). (b) QD-based decision maker. The system consists of five QDs denoted QD LL , QD ML , QD S , QD MR and QD LR . The energy levels in the system are summarised as follows. The (2, 1, 1)-level of QD ML , QD MR , QD LL and QD LR is respectively denoted by ML 2 , MR 2 , LL 2 and LR 2 . The (1, 1, 1)-level of QD ML , QD MR , QD LL and QD LR is respectively denoted by ML 1 , MR 1 , LL 1 and LR 1 . The (2, 2, 2)-level of QD LL and QD LR is respectively denoted by LL 3 and LR 3 . The optical near-field interactions are , , and . (c) Schematic summary of the state transitions. Shown are the relaxation rates , , , , and and the radiative decay rates , , , and . Full size image

When an optical excitation is generated at QD S , it is transferred to the lowest energy levels in QD L s; we observe negligible radiation from QD M s. However, when the lowest energy levels of QD L s are occupied by control lights, which induce state-filling effects, the optical excitation at QD S is more likely to be radiated from QD M s30.

Here we consider the photon radiation from either left QD ML or right QD MR as the decision of selecting slot machine A and B, respectively. The intensity of the control light to induce state-filling at the left and right QD L s is respectively modulated on the basis of the resultant rewards obtained from the chosen slot machine. We call such a decision-making system the ‘QD-based decision maker (QDM)’. The QDM can be easily extended to N-armed (N > 2) cases, although we demonstrate only the two-armed case in this study.

It should be noted that the optical excitation transfer between QDs mediated by optical near-field interactions is fundamentally probabilistic; this is described below in detail on the basis of density matrix formalism. Until energy dissipation is induced, an optical excitation simultaneously interacts with potentially transferable destination QDs in the resonant energy level. We exploit such probabilistic behaviour for the function of exploration for decision-making.

It also should be emphasised that conventionally, propagating light is assumed to interact with nanostructured matter in a spatially uniform manner (by a well-known principle referred to as long-wavelength approximation) from which state transition rules for optical transitions are derived, including dipole-forbidden transitions. However, such an approximation is not valid for optical near-field interactions in the subwavelength vicinity of an optical source; the inhomogeneity of optical near-fields of a rapidly decaying nature makes even conventionally dipole-forbidden transitions allowable17,22,23.