# Design and Implementation of a 16-Word by 1-Bit Register File Using Adiabatic Quantum Flux Parametron Logic

N. Tsuji, C. L. Ayala, *Member, IEEE*, N. Takeuchi, T. Ortlepp, Y. Yamanashi, *Member, IEEE*, and N. Yoshikawa, *Member, IEEE* 

Abstract—We have been developing extremely energy-efficient microprocessors using the adiabatic quantum flux parametron (AQFP) logic. In this study, we designed and fabricated an AQFP register file, which is one of the key building blocks in the microprocessor. The 16-word by 1-bit register file with dual output ports and a single input port was designed by using an AQFP cell library with a minimalist design. The circuit is composed of three decoders and feedback delay latches (D-latches), which are clocked by four-phase excitation currents. The circuits were fabricated using the AIST 10 kA/cm<sup>2</sup> Nb process. The total junction number and circuit area are 2544 and 3.1 mm  $\times$  5.4 mm, respectively. The estimated energy consumption is 18 aJ per clock cycle for 5 GHz operation. The latency is 1600 ps for 5 GHz operation. In the low-speed test, we confirmed the correct operations across 15 addresses.

*Index Terms*—superconducting circuits, QFP, adiabatic logic, latch, register file

## I. INTRODUCTION

Extremely energy-efficient logic devices are required in order to realize future high-end computing systems. Superconducting logic devices are attractive because of their low power consumption compared to complementary metaloxide-semiconductor (CMOS) circuits, which have built modern high-end computers. Recently, very energy-efficient superconducting logic devices based on rapid single-fluxquantum (RSFQ) logic [1] have been developed, including *LR*-biased RSFQ [2], energy-efficient RSFQ (eSFQ) [3], lowvoltage RSFQ (LV-RSFQ) [4], and reciprocal quantum logic (RQL) [5].

We have been developing adiabatic quantum flux parametron (AQFP) logic [6], [7] as ultra-low-power superconductor logic circuits. In AQFP logic, the static power consumption is zero as a result of AC flux biasing, and the

Automatically generated dates of receipt and acceptance will be placed here; authors do not produce these dates. The present study was supported by a Grant-in-Aid for Scientific Research (S) (No. 26220904) from the Japan Society for the Promotion of Science (JSPS).

N. Tsuji, Y. Yamanashi, and N. Yoshikawa are with the with the Department of Electrical and Computer Engineering, Yokohama National University, Yokohama 240-8501, Japan (e-mail: nyoshi@ynu.ac.jp).

C. L. Ayala and N. Takeuchi are with the Institute of Advanced Sciences, Yokohama National University, Yokohama 240-8501, Japan (e-mail: ayala@ynu.ac.jp).

T. Ortlepp is with CiS Research Institute for Microsensor Systems GmbH, 99099 Erfurt, Germany (e-mail: tortlepp@cismst.de).

dynamic power consumption is considerably reduced due to adiabatic switching operations. Our circuit simulation results showed that the bit energy of an AQFP gate using underdamped Josephson junctions can be reduced to  $sub-k_BT$  level [8], [9], where  $k_B$  is the Boltzmann constant and T is the temperature.

We have been developing extremely energy-efficient microprocessors using the AQFP logic. In a previous study, we built an AQFP cell library with a minimalist design [10] and confirmed that AQFP logic gates using the cell library had reasonably wide operation margins in the experiment. We also constructed latches for AQFP logic circuits. The feedback latch, which holds data by propagating data through a feedback loop, has wide operation margins because it is composed of basic AQFP logic gates. We designed the feedback latches using an AQFP majority gate and experimentally confirmed correct operations [11]. The use of majority gates effectively reduced the number of junctions in the feedback latch.

In the present study, we designed and fabricated a 16-word by 1-bit register file composed of the feedback delay latches (D-latches). We report the whole system and measurement result in this paper.

## II. AQFP LOGIC GATE

Figure 1 shows a circuit diagram of the AQFP buffer gate, which is composed of two superconducting loops including Josephson junctions,  $J_1$  and  $J_2$ . Inductors,  $L_1$  and  $L_2$ , are



Fig. 1. Circuit diagram of the AQFP buffer gate.



Fig. 2. Circuit diagram of the majority gate. The majority gate is composed of three buffer cells and 3-to-1 branch cell.

magnetically coupled to excitation inductors,  $L_{x1}$  and  $L_{x2}$ , respectively, through coupling coefficients  $k_1$  and  $k_2$ . In the figure,  $I_{in}$ ,  $I_{out}$ , and  $I_x$  are the input, output, and AC excitation currents, respectively. When the excitation fluxes are applied to each loop using  $I_x$  while a small  $I_{in}$  is provided to the gate, an SFQ is stored in either the left (state 0) or right (state 1) loop, depending on the direction of  $I_{in}$ . The large output current, whose direction depends on the direction of the input current, flows through the output inductor  $L_q$ . In this present study, we use four-phase excitation currents, whose phases are shifted by 90° relative to each other, to propagate information in the AQFP gates. As the AQFP gates are excited in turn by the four-phase excitation currents, information propagates through the gates.

We built an AQFP cell library with a minimalist design [10], in which logic gates are effectively designed by arraying the four building block cells: buffer, NOT, constant, and branch. A constant-0 (constant-1) cell constantly outputs '0' ('1') thanks to the asymmetry in circuit parameters. Figure 2 shows a circuit diagram of a majority gate, which is composed of three buffer cells and a 3-to-1 branch cell. The majority gate outputs '1' when two or three of the three inputs are '1'. An AND (OR) gate can be realized by replacing one of the three buffer cells in the majority gate with a constant-0 (constant-1) cell. In this design, the majority, AND, and OR gates each require six junctions.

## III. FEEDBACK DELAY LATCH

The feedback latch holds an internal state by propagating data through the feedback loop between input and output ports. In general, the feedback latch has wide operation margins, because it is composed of basic AQFP logic gates. In the previous study, we designed the feedback delay latch (D-latch) using the majority gate [11]. The use of majority gates effectively reduces the number of junctions in the feedback latch. Table I shows the truth table for the D-latch. The D-latch holds the current internal state,  $Q_n$ , when the enable signal, E, is 0. On the other hand, the internal state is overwritten by the data signal, D, when E is 1. Figure 3 shows a circuit diagram of the D-latch for the register file with dual output ports and single input port. The D-latch is composed of 42 junctions and clocked by four-phase excitation currents.

| TABLE I<br>Truth table for the D-latch |   |                    |             |
|----------------------------------------|---|--------------------|-------------|
| Ε                                      | D | $Q_{\mathrm{n+1}}$ | Description |
| 0                                      | 0 | $Q_{\rm n}$        | Hold        |
| 0                                      | 1 | $Q_{ m n}$         |             |
| 1                                      | 0 | 0                  | Transfer    |
| 1                                      | 1 | 1                  |             |



Fig. 3. Circuit diagram of the D-latch for the register file. *E* is the enable signal, *D* is the data signal,  $Q_n$  is the current internal state, and  $Q_{n+1, B}$  are the next internal states.

#### IV. 16-WORD BY 1-BIT REGISTER FILE

### A. Design

We designed a 16-word by 1-bit register file with dual output ports and a single input port. Figure 4 shows the block diagram of the register file. The circuit is composed of a write decoder, two read decoders, two mergers and feedback Dlatches, which are clocked by four-phase excitation currents. The decoders are based on AND gates and the mergers are composed of OR gates with a fan-in of 2. During a write operation, the data signal and write address signals are sent, the write decoder selects one of the D-latches, and the data is written to the selected D-latch. During a read operation, read address signals are sent, the read decoder selects one of the Dlatches, and the data is read out from the selected D-latch through mergers. We designed the registers at the address 0 and 1 as a constant-0 register and a constant-1 register, respectively. This means that, when the address 0 (1) is selected by the read decoder, the data, '0' ('1'), is constantly read out. When address 0 or 1 is selected by the write decoder, the write operation is not performed (effectively a 'nop' or 'no-operation'). Figure 5 shows the micrograph of the register file. The circuit was fabricated by using the Nb integrated circuit process, the AIST high-speed standard process (AIST HSTP). The total junction number and circuit area are 2544 and 3.1 mm  $\times$  5.4 mm, respectively. The estimated energy consumption is 18 aJ per clock cycle for 5 GHz operation. The latency is 1600 ps for 5 GHz operation.



Fig. 4. Block diagram of the 16-word by 1-bit register file. The register file is composed of a write decoder, two read decoders, two mergers and D-latches.

## B. Experimental Result

Figure 6 shows the measurement result of the 16-word by 1bit register file at 100 kHz, where  $I_D$  is the data signal,  $I_{w0}$ ,  $I_{w1}$ ,  $I_{w2}$ , and  $I_{w3}$  are the write address signals,  $I_{ra0}$ ,  $I_{ra1}$ ,  $I_{ra2}$ , and  $I_{ra3}$ are the read address, and  $V_{outA}$  and  $V_{outB}$  are the amplified output voltages of readout DC superconducting quantum interference devices (SQUIDs). The DC SQUIDs were magnetically coupled to AQFP gates in the final excitation stage [6], so as to read out dual data outputs from the register file. When the logic state of the AQFP gate coupled to the DC SQUID is 1, the DC SQUID shows a voltage transition.

First, the write decoder selects the register from address 0 to 15 by the write address signals one-by-one and data '0' is written to each register. This corresponds to the 'Write 0' step in figure 6. Second, the read decoder selects the register from address 0 to 15 by the read address A (B) signals one-by-one and the stored data are read out from all registers to the output port A (B), where the read address B signals are the same pattern as the read address A signals. This corresponds to the 'Read 0' step. While 'Read 0' is being performed, the write decoder selects the register from address 0 to 15 by the write address signals one-by-one and data '1' is written to all registers. This corresponds to the 'Write 1' step. In the end, the read decoder selects the register from address 0 to 15 sequentially by the read address A (B) signals and the stored data are read out from all registers to the output port A (B). This corresponds to the 'Read 1' step. It should be noted that the registers at the address 0 and 1 are the constant-0 register and the constant-1 register, respectively. Since only the output from the address 6 was unstable, we confirmed the correct operations across 15 addresses.



Fig. 5. Micrograph of the 16-word by 1-bit register file. The total junction number is 2544 and the circuit area is  $3.1 \text{ mm} \times 5.4 \text{ mm}$ .

## C. Discussion

In this study, we measured eight chips, and the best chip exhibits the correct operations across 15 addresses, whereas other chips show the correct operations across smaller number of addresses. This might be due to the long wirings between AQFP logic gates in the register file. In AQFP logic, as the wiring becomes longer, the amplitude of signal currents is reduced and circuit yields deteriorate [10], [12]. Therefore, we believe that the improvement of the wiring length will enable the register file to be fully operational.

In AQFP logic, the lengthy distribution of the excitation lines creates clock skew. The clock skew restricts the maximum operation frequency of AQFP circuits. The register file designed in this study has a large amount of clock skew. The estimation shows that the maximum operating frequency is about 1 GHz in this design. The increase of the maximum operation frequency can be obtained by reducing the clock skew. The division of the clock network to smaller blocks by using the H-tree structure is necessary for reducing the clock skew.



Fig. 6. Measurement result of the 16-word by 1-bit register file at 100 kHz.  $I_D$  is the data signal,  $I_{w0}$ ,  $I_{w1}$ ,  $I_{w2}$ , and  $I_{w3}$  are the write address signals,  $I_{ra0}$ ,  $I_{ra1}$ ,  $I_{ra2}$ , and  $I_{ra3}$  are the read address A signals, and  $V_{outA}$  and  $V_{outB}$  are the amplified output voltages of readout DC SQUIDs. The first '1' ouput during 'Read 0' is from address 1. 'Error' shows that the output from the address 6 is unstable.

#### V. CONCLUSION

We designed and fabricated a 16-word by 1-bit AQFP register file. The circuit is composed of three decoders, two mergers and feedback D-latches and clocked by four-phase excitation currents. The total junction number and circuit area are 2544 and 3.1 mm  $\times$  5.4 mm, respectively. The estimated energy consumption is 18 aJ per clock cycle for 5 GHz operation. The latency is 1600 ps for 5 GHz operation. We confirmed correct operation across 15 addresses in the low-speed test.

#### REFERENCES

 K. K. Likharev and V. K. Semenov, "RSFQ logic/memory family: A new Josephson-junction technology for sub-terahertz-clock-frequency digital systems," *IEEE Trans. Appl. Supercond.*, vol. 1, no. 1, pp. 3–28, Mar. 1991.

- [2] Y. Yamanashi, T. Nishigai, and N. Yoshikawa, "Study of LR-loading technique for low-power single flux quantum circuits," *IEEE Trans. Appl. Supercond.*, vol. 17, no. 2, pp. 150–153, Jun. 2007.
- [3] O. A. Mukhanov, "Energy-efficient single flux quantum technology," IEEE Trans. Appl. Supercond., vol. 21, no. 3, pp. 760–769, Jun. 2011.
- [4] M. Tanaka, M. Ito, A. Kitayama, T. Kouketsu, and A. Fujimaki, "18-GHz, 4.0-aJ/bit Operation of Ultra-Low-Energy Rapid Single-Flux-Quantum Shift Registers," *Jpn. J. Appl. Phys.*, vol. 51, 053102, May 2012.
- [5] Q. P. Herr, A. Y. Herr, O. T. Oberg, and A. G. Ioannidis, "Ultra-low-power superconductor logic," J. Appl. Phys., vol. 109, no. 10, 103903, May 2011.
- [6] N. Takeuchi, D. Ozawa, Y. Yamanashi, and N. Yoshikawa, "An adiabatic quantum flux parametron as an ultra-low-power logic device," *Supercond. Sci. Tech.*, vol. 26, no. 3, 035010, Mar. 2013.
- [7] N. Takeuchi, K. Ehara, K. Inoue, Y. Yamanashi, and N. Yoshikawa, "Margin and Energy Dissipation of Adiabatic Quantum-Flux-Parametron Logic at Finite Temperature," *IEEE Trans. Appl. Supercond.*, vol. 23, no. 3, 1700304, Jun. 2013.
- [8] N. Takeuchi, Y. Yamanashi, and N. Yoshikawa, "Simulation of sub-k<sub>B</sub>T bit energy operation of adiabatic quantum-flux-parametron logic with low bit error," *IEEE Trans. Appl. Supercond.*, vol. 103, 062602, Aug. 2013.
- [9] N. Takeuchi, Y. Yamanashi, and N. Yoshikawa, "Thermodynamic study of energy dissipation in adiabatic superconductor logic," *Phys. Rev. Appl.*, vol. 4, no. 3, p. 034007, Sep. 2015.
- [10] N. Takeuchi, Y. Yamanashi, and N. Yoshikawa, "Adiabatic quantumflux-parametron cell library adopting minimalist design," J. Appl. Phys. vol. 117, 173912, 2015.
- [11] N. Tsuji, N. Takeuchi, Y. Yamanashi, T. Ortlepp, and N. Yoshikawa, "Majority Gate-Based Feedback Latches for Adiabatic Quantum Flux Parametron Logic," *IEICE Trans. Electron.*, vol. E99-C, no. 6, pp. 710-716, Jun. 2016.
- [12] D. Si, N. Takeuchi, K. Inoue, Y. Yamanashi, and N. Yoshikawa, "Yield analysis of large-scale adiabatic-quantum-flux-parametron logic: The effect of the distribution of the critical current," *Phys. C Supercond. its Appl.*, vol. 504, pp. 102–105, Sep. 2014.