Microprocessor architecture

Figure 1a depicts the architectural block diagram of our microprocessor. For demonstration purposes, we minimized transistor count and thus realized a device that operates on single-bit data only. We stress that this is not a fundamental limitation and the device is readily scalable to N-bit data, broadly speaking by connecting N of our devices in parallel. Although we reduced the architecture of our device to the essentials, it comprises all basic building blocks that are common to most microprocessors. In particular, these are: an arithmetic logic unit (ALU), that forms the heart of the processor and is, in general, capable of performing basic arithmetic and logical operations; for simplicity, we have implemented here only logical conjunction and disjunction operations. An accumulator (AC), which holds one of the operands to be supplied to the ALU. An instruction register (IR), that stores the content of the program memory currently being executed, where the most significant two bits contain the instruction itself and the third bit contains the data (Although we retrieve the data directly from the program memory, our device can also process data stored in a separate data memory (Harvard architecture). In this case, the IR is supplied with an address that points to the data memory content, which is then placed on the bus.). A control unit (CU), that receives as input the instruction code from the IR and orchestrates all resources by enabling components to access the internal bus via the control signals EA and EO; A/O conveys to the ALU the operation selection code (conjunction, A/O=0; disjunction, A/O=1). A program counter (PC), which supplies the memory with the address of the active instruction. And, finally, an output register (OR), that allows the processor to transfer the results of a calculation to the output port. The memory is, as usual, implemented off-chip.

Figure 1: Microprocessor architecture. (a) Block diagram, showing the arithmetic logic unit (ALU) with inputs A and B, accumulator (AC), control unit (CU), instruction register (IR), output register (OR) and program counter (PC). Enable signals (EA and EO) and operation selection code (A/O) are supplied by the CU to the respective subunits. CLK signal generation and memory are implemented off-chip. (b) Timing diagram for the Nth instruction cycle. During the FETCH sequence the content of the memory is loaded into the IR and the address, stored in the PC, is increased. During the EXE sequence the command, stored in the IR, is executed. (c) Instruction set of the microprocessor. NOP is the no-operation instruction; LDA transfers data from the memory into the AC; AND and OR perform logical operations. Full size image

Figure 1b depicts the timing diagram of the device, using three clock (CLK) signals. The execution of each instruction occurs in two sequences—a FETCH sequence followed by an execute (EXE) sequence. The FETCH sequence consists of two phases: in a first phase, the content of the external memory (at the address stored in the PC) is loaded into the IR; the PC is then incremented in a second step. During the EXE sequence, which is implemented here in a single phase, the microprocessor decodes and executes the command stored in the IR. This cycle is repeated continuously. Each phase is triggered by a CLK signal (CLK1, phase 1; CLK2, phase 2; CLK3, phase 3). In order to be flexible in terms of clock rate and timing, we generated the CLK signals externally; an on-chip implementation is straightforward. Figure 1c summarizes the instruction set that we have implemented. The instructions are encoded with two bits; some of them are followed by one bit of data. The no-operation (NOP) instruction has no effect other than to increase the PC. LDA allows the transfer of data from the memory into the AC. AND and OR perform logical conjunction and disjunction operations, respectively.

It is instructive to consider a simple example. The program fragment

transfers in a first step, triggered by CLK1, the bit sequence 010 from the memory into the IR. CLK2 then increases the PC and the next instruction becomes available, but is not loaded into the IR yet. Triggered by CLK3, the CU then signals the AC (EA=1) to receive the data (0) from the IR via the internal bus. With the next CLK1 signal, the content of the IR is updated (IR=101), and the CU enables the ALU to perform a logical conjunction operation (A/O=0) between the data on the bus (1) and that stored in the AC during the previous instruction. Triggered by CLK3, the result of this operation (0) is finally written into the OR (EO=1).

Device implementation

We now come to the actual device implementation using a 2D semiconductor. Our microprocessor was fabricated in gate-first technology on a silicon wafer with 280-nm-thick silicon dioxide. The substrate fulfills no other function than acting as a carrier medium and could thus be replaced by glass31 or any other material, including flexible substrates14,15,16. We fabricated 18 devices per wafer, with FET channels made from chemical vapour deposition (CVD) grown large-area bilayer MoS 2 films. Two Ti/Au metal layers were used to interconnect the transistors and Al 2 O 3 was used as gate oxide. A detailed description of the device fabrication steps can be found in Methods. Subunits, such as for example, the ALU or the IR, were provided with metal pads for individual testing in a wafer probe station. All subunits were eventually bonded together and the sample was placed back into the probe chamber, where it remained in vacuum for final testing of the complete circuit.

Figure 2a (bottom) shows a schematic drawing of a so-obtained MoS 2 FET. The devices exhibit a field-effect mobility of ∼3 cm2 V−1 s−1, a threshold voltage V T of ∼0.65 V (Supplementary Fig. 3), an on/off ratio of ∼108, and uniform behaviour over a ∼50 mm2 area over the wafer (Supplementary Fig. 4). The circuit is based on the NMOS logic family, where both pull-up (load) and pull-down networks were realized using n-type enhancement-mode FETs. The implementation of an inverter (see circuit schematic in Fig. 2d) using this logic family is shown in Fig. 2a (top). A careful design of the W/L ratios, where W and L denote the width and length of the FET channels, is crucial, as it determines the switching threshold voltage V M and thus the ability to cascade logic stages. For simple analytic modelling, we performed calculations based on long-channel FET theory32. The pull-down FET is described by in the triode regime and in the saturation regime (red curves in Fig. 2e). The load FET is operated in the sub-threshold regime (V G1 =0<V T ), and thus acts as a current source over a large drain voltage range, with β being the reciprocal of the thermal potential. From the circuit schematic Fig. 2d, it is apparent that , and thus (blue symbols in Fig. 2e). The parameters K 1 and K 2 are taken from the experiment (Fig. 2b). By equating both currents, , we obtain a relation between V OUT and V IN , from which the switching threshold V M can be determined (Supplementary Fig. 6). If both transistors are implemented with same W/L ratio, V M drops below 1 V (Supplementary Fig. 6b), resulting in low noise margin, especially in the presence of additional hysteresis. Asymmetric transistor design, on the other hand, allows shifting V M towards V DD /2 (Supplementary Fig. 6a), resulting in improved switching behaviour. W/L ratios of the pull-up and pull-down transistors were hence made 45/2 (μm/μm) and 7/5, respectively.

Figure 2: Characterization of MoS 2 transistors and inverter. (a) Schematic drawing of an inverter circuit (top) and an individual MoS 2 transistor (bottom) in gate-first technology (see Supplementary Fig. 5 for corresponding micrograph). (b) Transfer characteristics of load (W/L=45/2) and pull-down (W/L=7/5) transistors. (c) Output characteristic for gate voltages between 1 and 5 V (in 1 V steps). (d) NMOS inverter circuit schematic. (e) Graphical construction to determine the output voltage V OUT of an inverter for a given input voltage V IN . The blue symbols show the load curve and the red lines are the output characteristics of the pull-down transistor (in 0.25 V steps). The intersection point of both curves determines V OUT . (f) The solid line shows the measured voltage transfer characteristic of an inverter. By mirroring this curve (dashed line) a butterfly plot is obtained, from which NM can be extracted by nesting the largest possible square in the grey shaded area. Full size image

Logic NAND gates with M inputs were implemented by connecting M pull-down transistors with W/L=(M × 7)/5 in series. The processor was realized by using a combination of these elements. The minimum feature size of 2 μm was chosen rather large for two reasons. It makes the design immune to sample inhomogeneities (for example, small holes, cracks and contaminations in the MoS 2 film) and also allows for fast visual inspection of the lithographic structures with an optical microscope. Because of the immunity of 2D transistors to short-channel effects7,8,9,10, we expect comparable performance when the devices are scaled to sub-micrometre dimensions, provided that low contact resistance can be achieved.

Figure 2b shows the transfer characteristics of load and pull-down transistors, where the ∼14 times higher current through the former demonstrates reliable controllability of the device characteristics by geometrical scaling. The output characteristic, depicted in Fig. 2c, shows clear current saturation due to channel pinch-off at the drain. The voltage transfer characteristic of our inverters exhibit excellent performance for a wide supply voltage range between V DD =2 and 7 V, with input and output logic levels being perfectly matched. Figure 2f (solid line) shows the results for V DD =5 V, for which the voltage gain reaches values of A V ≈60. Although the voltage transfer curve shows some hysteresis (that mostly stems from trap charges in the gate oxide) the noise margin of the inverter (see shaded area in Fig. 2f), NM≈0.59 × (V DD /2), is sufficiently large for integration into multi-stage logic circuits. The NAND gates showed comparable performance. We estimate a static power consumption of ≈1.4 μW per logic gate, where I D,L and I D,H denote the currents at V IN =0 and 5 V (Fig. 2e), respectively. The total power consumption of the circuit, consisting of 41 stages, is thus ∼60 μW.

A microscope image of the microprocessor is shown in Fig. 3a. The device is composed of 115 MoS 2 transistors and measures—without bonding pads—0.6 mm2 in size. Circuit schematics for a D-Latch and the ALU are shown in Fig. 3b,c, respectively. The complete schematic is presented in Supplementary Fig. 1. A D-Latch is a bi-stable circuit that can be used as 1-bit data storage element, triggered by a CLK signal. It forms the basic building block of all our data registers (IR, AC and OR) and the PC. The ALU is a combinational logic circuit, entirely based on NANDs, that performs bitwise logic operations on 1-bit data. The additional input A/O signals the ALU which operation to perform. Measurements of the ALU output for different input logic states are presented in Supplementary Fig. 8.

Figure 3: Device implementation using a 2D semiconductor. (a) Microscope image of the microprocessor. The two metal layers appear in different colour and are connected with via-holes. All subunits were provided with metal pads for individual testing. Labelled pads were used to connect the device to the periphery (memory, CLK signal generation, power supply, output), the others were wire bonded together to realize the internal connections. Scale bar, 50 μm. Circuit schematics of (b) D-Latch and (c) ALU, with W/L ratio in units of μm/μm for each transistor. IN, input; OUT, output. The complete microprocessor schematic is presented in Supplementary Fig. 1. Full size image

We first verified the functionality of the microprocessor by running the example program from above and measuring waveforms at different locations on the chip (see Methods for measurement details). As shown in Fig. 4a, the device is indeed able to deliver the correct result, with excellent signal integrity and with rail-to-rail performance, proving the ability to cascade logic stages based on 2D semiconductors. To further demonstrate the operability of the device, we present in Fig. 4b the results from a series of logical disjunction operations. The match of measured and expected outputs shows again correct operation. As shown in Supplementary Fig. 10, the device proved to be functional at CLK frequencies of 50 Hz. This is by no means a limitation of the TMD material itself, but is caused by the limitations of our measurement setup. Ultimately, the speed is limited by the current-driving capability of the pull-up transistor, which is operated in the sub-threshold regime (V GS =0<V T ) and acts as current source with I D ≈0.55 μA. For a typical (external) capacitive load of C L ≈1–10 pF, we estimate a maximum operation frequency of ≈2–20 kHz (Supplementary Fig. 11). To increase f MAX , I D could be increased by employing depletion-mode load FETs20, controlled chemical doping, improving the carrier mobility of the 2D semiconductor or just by reducing the transistor channel lengths.