

Available online at www.sciencedirect.com



Surface Science 532-535 (2003) 1193-1198



www.elsevier.com/locate/susc

## Clocked quantum-dot cellular automata shift register

Alexei O. Orlov \*, Ravi Kummamuru, R. Ramasubramaniam, Craig S. Lent, Gary H. Bernstein, Gregory L. Snider

Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556, USA

## Abstract

The quantum-dot cellular automata (QCA) computational paradigm provides a means to achieve ultimately low limits of power dissipation by replacing binary coding in currents and voltages with single-electron switching within arrays of quantum dots ("cells"). Clocked control over the cells allows the realization of power gain, memory and pipelining in QCA circuits. We present an experimental demonstration of a clocked QCA two-stage shift register (SR) and use it to mimic the operation of a multi-stage SR. Error-bit rates for binary switching operations in a metal tunnel junction device are experimentally investigated, and discussed for future molecular QCAs. © 2003 Elsevier Science B.V. All rights reserved.

Keywords: Electrical transport measurements; Quantum effects; Aluminum oxide; Metallic films

The exponential growth in functional complexity of digital integrated circuits (ICs) in the last three decades is based on successful downscaling of the "work-horse" of modern electronics, the field-effect transistor (FET). As FETs become smaller, however, effects such as sub-threshold and gate leakage (to name two) and increasing device density eventually will lead to intolerable levels of power dissipation. For instance, the power dissipation per area of a Pentium<sup>®</sup>-4 chip is comparable to that of a home electric range-top unit, and the situation is getting worse as FETs continue to shrink. The main reason for this is that despite vast improvements in IC fabrication technology, the role played by FETs is still to essentially mimic current switches much like the electro-mechanical

E-mail address: orlov.1@nd.edu (A.O. Orlov).

relays used since the 1930s. Therefore, in order to achieve minimal power dissipation, it becomes necessary to search for ways to encode and process binary information that are not based on current switch-like paradigms.

The search to find ways to perform binary operations with minimal power dissipation has a long history. Keyes and Landauer [1] suggested using a physical system which could be taken continuously from a monostable into a bistable state and back to monostability (MBM) in a cyclic fashion (Fig. 1). Here, a particle in a time-varying potential well encodes binary "0" and binary "1" as the position of the particle in one of the wells. At the initial phase (Restore), the system is in the monostable state where it is not affected by inputs. During the Switch phase the system is transformed from the monostable to a bistable state where the particular binary state is chosen by the external input, so that the binary information becomes stored in the system. During the Hold stage the information is

<sup>&</sup>lt;sup>\*</sup>Corresponding author. Tel.: +1-574-631-9143; fax: +1-574-631-4393.



Fig. 1. Three states of the system [1]: (1) Switch, (2) Hold, and (3) Restore.

preserved and the particle in the well acts as an input for the subsequent stage (e.g., by means of Coulomb interaction). Finally, the system returns to its initial Restore state.

By changing the barrier slowly compared to the characteristic settling time of the system (quasiadiabatically), the energy dissipated in the process can be minimized below the Boltzmann limit  $kT \ln 2$ . (For comparison, the power dissipation in contemporary FET-based logic is about  $10^6 \times kT$ per bit operation [2]). Application of these ideas to the case of single-electronics leads to the invention of binary logic devices where information is encoded in the positions of single-electrons within coupled "cells" [3,4]. Lent et al. [3] developed a complete set of binary transformations to implement basic logic functions using geometrically arranged cells composed of four quantum dots arranged in the corners of a square and charged with two extra electrons. This paradigm is known as quantum-dot cellular automata (QCA). Cyclical control over the barriers (clocking) in QCA [5] offers a way to realize the scheme [1] in various physical systems exhibiting Coulomb blockade. It also provides the means for information storage (Hold) so that a cell remains polarized even in the absence of the input signal. This makes pipelining possible, so that a large QCA array could be split into smaller sub-arrays, powered by separate clock lines and working on different parts of computational problem.

Theoretical calculations [6] and experiments [7] recently demonstrated power gain in clocked QCA needed for the logic level restoration. To achieve the lowest levels of power dissipation, QCA arrays could be switched quasi-adiabatically (using clock signals  $\sim$ 10 times slower than the tunneling time

[5]). For future molecular implementations of QCAs, theoretical calculations [5] show picosecond switching times for quasi-adiabatic schemes operating at room temperature. While the goal of making minimal size QCA arrays could be achieved using molecules, molecular QCA technology is not yet available. However, it is possible to build a working prototype using any system exhibiting Coulomb blockade in electron transport between dots forming the cells. The implementation of the MBM scheme for the classical metal system with tunnel junction (TJ) barriers was suggested in [5] and later refined in [8]. Toth and Lent applied the clocking scheme to a metal QCA system in [9], and a basic unit-a QCA latch-was experimentally demonstrated in [10]. Here we present the results of experimental studies of a two-stage QCA shift register (SR) and discuss its inherent error statistics.

To fabricate a metal-dot OCA SR we use aluminum TJ technology which combines electron beam lithography (EBL) with a suspended mask technique and in situ oxidation [11]. Using this method, thin-film aluminum dots separated by TJs are produced. The advantages of this technology are its simplicity (only three processing steps: direct EBL writing, development, and metal deposition with oxidation); good uniformity (TJ resistance in one run typically varies by only 20-30%); and a high yield (>90%) for a large number of junctions (>50) produced simultaneously. A charging energy of  $E_{\rm C} \sim 1 \text{ meV}$  of the aluminum "dots" limits the operating temperature to about 1 K. To satisfy the condition  $E_{\rm C} \gg kT$  the experiments are performed in a dilution refrigerator at the temperatures 50-200 mK. Standard lock-in techniques are used to measure conductance of single-electron transistors (SET) acting as electrometers. To suppress the superconductivity of aluminum, experiments are performed in a magnetic field of 1 T.

An SEM micrograph of the SR is shown in Fig. 2. The device consists of two QCA latches [10] L1 (made of dots D1–D3), L2 (dots D4–D6), and two readout electrometers E1 and E2. The two latches are coupled to each other by means of lateral capacitors,  $C_{\rm C}$ . Each latch consists of three micronsize A1 "dots" separated by multiple TJ (MTJ); the



Fig. 2. SEM micrograph of the QCA SR.

area of each junction is about  $50 \times 50$  nm. The MTJ design is used to suppress second-order tunneling processes [12] which can result in the loss of information during the hold time. SET electrometers E1 and E2 are used to measure the states of L1 and L2 respectively.

The operation of the QCA SR of Fig. 2 is presented in Fig. 3. At the starting point,  $t_0$ , both L1 and L2 are set to the monostable, or "null" state. First, latch L1 is activated, i.e. switched from the null to a bistable state, while L2 is kept in the null state. To activate L1, first a small differential signal  $V_{\rm IN}$  corresponding to logical "0" (logical "1") is applied to the inputs at  $t_1$  ( $t_7$ ) (Fig. 3). L1 remains in the null state until CLK1 is set HIGH at  $t_2(t_8)$  (note that clock HIGH is actually negative voltage). When CLK1 is set high, L1 becomes active, and an electron is transferred to D3 (D1). The Coulomb barrier separating the end dots is now high, so the electron is now locked in D3 (D1). Once L1 is locked, the signal input is removed at  $t_3$  ( $t_9$ ) and the state of L1 no longer depends on the input signal for as long as CLK1 remains HIGH. The dipole electric moment created by locking an electron in D3 acts as the input signal for L2 in that D6 is biased negatively relative to D4. Next, L2 is activated (CLK2 is set HIGH) at  $t_4$  ( $t_{10}$ ). As a result, an electron in L2 switches from D5 to D4. L2 holds the bit after CLK1 is removed at  $t_5(t_{11})$  for as long as CLK2 is



Fig. 3. Operation of the two-stage SR. Phase-shifted clock signals are applied to two capacitively-coupled latches to propagate binary information from one latch to the next in a sequential manner controlled by the clock. Five successive traces are overlaid to demonstrate the level of repeatability.

high (until  $t_6$  ( $t_{12}$ )). At this point in time the SR returns to its initial null state and is ready to receive new binary input.

To summarize, the cycle describing the operation of a QCA SR is as follows: both L1 and L2 are in null state  $\rightarrow$  input applied to L1  $\rightarrow$  CLK1 is applied and L1 is active  $\rightarrow$  input removed, L1 stores the bit  $\rightarrow$  CLK2 is applied and L2 is active  $\rightarrow$  CLK1 is removed, L2 stores the bit  $\rightarrow$ CLK2 is removed, both L1 and L2 are back to null state. We can see that the QCA SR operates exactly as predicted in [8,9]. An interesting feature of this design is that either latch can be used as an input to the other, providing means for bidirectional computation.

One of the crucial parameters for every logic device is the speed of switching for binary operations. The operational speed of the QCA latch is determined primarily by the tunneling time of the

electron ( $\tau \approx R_{\rm J}C_{\rm J} \approx 10^{-10}$  s, where  $R_{\rm J} \approx 3 \times 10^5 \Omega$ , and  $C_{\rm J} \approx 3 \times 10^{-16}$  F are the resistance and the capacitance of the junction, respectively). For quasiadiabatic operation this gives the switching "speed limit" of the order of 1 ns for this  $Al/AlO_x$ prototype. Due to much lower total capacitance  $(C \sim 10^{-19} \text{ F})$ , the expected switching speed is of the order of picoseconds for future molecular OCAs. Note that the clock speed in our current experiment is limited not by the switching speed in the latch, but by parasitic RCs in the electrometer circuits. Since the temporal resolution of the SET readout is about 0.2 ms, any events occurring at a higher rate simply cannot be resolved by the detector. To solve this problem, a radio frequency SET electrometer [13] will be used in future experiments.

We now show that the two-stage SR can be used to replicate the propagation of a single bit through a multi-stage SR. In a multi-stage SR, a bit is first written into the circuit by the input and then propagated using each latch as an input to the next stage. The same situation can be simulated using the two-stage SR by moving the bit back and forth from one latch to the other. Initially, a bit is written into the first latch by the input. Using L1 as input, the bit is moved into L2 after which L1 is turned off. Then using L2 as input, it is copied back into L1, and L2 is turned off. This process can be repeated a number of times to achieve the same effect as transferring a bit through a long line of latches.

Fig. 4 shows the timing diagram of the experiment. The following describes the switching sequence. (a) All the signals are zero and the two latches are in the null state. (b) The input (binary "0") is applied and (c) CLK1 is set HIGH, L1 switches, storing the bit in L1. (d) The input is removed, so subsequent behavior of the system is no longer influenced by external inputs. (e) CLK2 is set HIGH, and L2 switches using L1 as its input. The bit is now stored in L2. (f) L1 is set to null (CLK1 set LOW). (g) Instead of applying the CLK1 to a third latch in the line, it is applied to L1 which sees L2 as an input and switches accordingly. The bit is stored once again in L1. (h) CLK2 is set LOW and L2 switches to null. To simulate a line of 11 latches, 11 clock pulses (6 CLK1, and 5



Fig. 4. Experiment simulating a multiple-stage SR. An input is applied only at the beginning to write a bit. Then, the bit is transferred from one latch to the other. In the second half of the experiment, the sequence of events is repeated with an input of the reverse polarity.

CLK2) are used. This sequence is then repeated with an input of the opposite polarity (binary "1") to verify that bit inversion at the input leads to the bit inversion at the output.

No degradation in the voltage levels is observed as the bit is moved back and forth between the latches due to the ability of QCA to exhibit power gain [6,7]. The energy needed for power restoration is provided by the clock signals just as by conventional power supplies for FET logic. However, it is not the case that no errors in bit propagation are observed. Fig. 4 is an example of a sequence that presented zero errors, but this is not always the case. Therefore, it is critical to examine the source of errors in QCA SR operation and their relative probabilities of occurrence.

There exist several potential physical mechanisms causing errors in operation of a singleelectron latch [8]: (a) static errors, corresponding to the loss of information during the retention time; (b) "switching errors" caused by the probability for a system to switch to the energetically unfavorable state due to thermal fluctuations and coupling to external noise sources; (c) dynamic errors, which occur when the switching speed is too high compared to the tunneling rate, so that the relaxation to the final state cannot be accomplished. The analysis of the experimental data shows that the performance of our SR device is limited by the switching error:

$$p_{\rm sw} = 0.5 \exp(-\Delta/kT),\tag{1}$$

where  $\Delta \sim 0.2 E_{\rm C}$  is the energy difference between the electron states in the end dots caused by the input voltage [8]. Experimental data in Fig. 5 (triangles) show that switching errors in the input latch decrease exponentially with increasing external input signal,  $V_{\rm IN}$ . However, the signal produced by a QCA latch that acts as an input for a subsequent latch in a line of latches is fixed due to the discreteness of the electron switching mechanism. The interaction between the two latches is set by the latch-to-latch coupling. Therefore, the



Fig. 5. Experimental switching error in QCA latch and SR as a function of external input signal at T = 70 mK. Triangles show switching error of the input latch to which external signal is applied; solid dots show switching error measured in the output latch driven by the input latch. Solid line show calculations (Eq. (1)) for T = 100 mK. Error bars correspond to  $(P_{sw})^{1/2}$ .

latch-to-latch switching error remains constant independent of the input.

If the probability of error-free operation for a latch is  $q_{sw} = 1 - p_{sw}$ , then the switching error for a line of *n* identical latches is:

$$p_{\rm sw.line} = 1 - q_{\rm sw}^n. \tag{2}$$

For the device studied here, the  $p_{sw} \approx 0.04$  in latch–latch switching operation, so that for a line of 11 latches,  $p_{sw,line} = 0.36$ . This number is rather high compared to error rates achievable in other low-switching energy digital circuits, such as single-flux quantrons, with  $p < 10^{-12}$  [14]. But as QCA elements shrink (and corresponding charging energy grows) the probability of switching error is reduced dramatically. For example, in a molecular QCA system operating at 77 K, with  $E_C \approx 1$  eV, the switching error rate in a QCA latch is reduced to the level of  $p_{sw} \approx 4 \times 10^{-14}$ , thus ensuring reliable operation of multistage SRs.

To conclude, we experimentally demonstrate the operation of a QCA two-stage SR operating at  $T \sim 100$  mK. Though the current prototype operates only at low temperatures, future generations of the QCAs are expected to work at higher temperatures with much better performance in speed and reliability than existing metal TJ prototypes.

## Acknowledgements

This research was supported in part by the W.M. Keck foundation and NSF. We are thankful to A. Korotkov (UC Riverside) for suggesting bitshifting experiments, W. Porod for helpful discussions, and K. Yadavalli for technical support.

## References

- [1] R.W. Keyes, R. Landauer, IBM J. Res. 14 (1970) 152.
- [2] The 2001 edition of the Semiconductor Industry Association's International Technology Roadmap for Semiconductors (ITRS) (http://public.itrs.net).
- [3] C.S. Lent, P.D. Tougaw, W. Porod, G.H. Bernstein, Nanotechnology 4 (1993) 49.
- [4] C.S. Lent, P.D. Tougaw, Proc. IEEE 85 (1997) 541.
- [5] K.K. Likharev, A.N. Korotkov, Science 273 (1996) 763.
- [6] J. Timler, C.S. Lent, J. Appl. Phys. 91 (2002) 823.
- [7] R.K. Kummamuru, J. Timler, G. Toth, C.S. Lent, R. Ramasubramaniam, A.O. Orlov, G.H. Bernstein, G.L. Snider, unpublished.

1198

- [8] A.N. Korotkov, K.K. Likharev, J. Appl. Phys. 84 (1998) 6114.
- [9] G. Toth, C.S. Lent, J. Appl. Phys. 85 (1999) 2977.
- [10] A.O. Orlov, R.K. Kummamuru, R. Ramasubramaniam, G. Toth, C.S. Lent, G.H. Bernstein, G.L. Snider, Appl. Phys. Lett. 77 (2) (2001) 295.
- [11] T.A. Fulton, G.H. Dolan, Phys. Rev. Lett. 59 (1987) 109.
- [12] D.V. Averin, A.A. Odintsov, Phys. Lett. A 140 (1989) 251.
- [13] R.J. Schoelkopf, P. Wahlgren, A.A. Kozhevnikov, P. Delsing, D.E. Prober, Science 280 (1998) 1238.
- [14] Q.P. Herr, M.J. Feldman, Appl. Phys. Lett. 69 (1996) 694.