# Beyond Logic Applications for Ferroelectric Field Effect Transistors

Michael Niemier University of Notre Dame





Applications and Systems Driven Center for Energy-Efficient Integrated Nanotechnologies This work was supported in part by ASCENT, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA

1

### How does technology scaling impact m/c scaling?



#### Slightly better improvements at low TDP, but still only 2X to 3X...

Robert Perricone, X. Sharon Hu, Joseph Nahas, and Michael Niemier, "Can Beyond-CMOS Devices Illuminate Dark Silicon?" to appear in Communications of the ACM, 2018.

"NRI research has explored a broad spectrum of beyond-CMOS devices for a 'new logic switch' to replace the current CMOS-based transistor ... a 'better switch' has not been found. Comprehensive benchmarking of beyond-CMOS devices ... has revealed little or no advantage of these devices over CMOS for conventional Boolean logic and the von Neumann architecture."

"some devices demonstrate unique characteristics suitable for novel architectures or computing paradigms, e.g., nonvolatility in logic devices, reconfigurablity, [and/or] high computation density."

### What do ferroelectric devices offer?

#### Steep subthreshold swings



#### **Memory functionality**



Devices with integrated ferroelectrics are well-positioned to address aforementioned space!

#### Analog synaptic behavior



H. Mulaosmanovic, Novel ferroelectric FET based synapse for neuromorphic systems, VLSI Symposium, 2016.

4

# **Talk outline**

- FeFET device, models
- FeFETs for logic-in-memory (LIM), compute-in-memory (CIM)
  - Emphasis on design/benchmarking of content addressable memories (LIM)
  - Briefly discuss FeFET-based CIM
- FeFETs for neuromorphic applications
  - FeFET-based analog synapse
  - FeFET-based (binary) convolutional neural networks (CNNs)
- Wrap-up

# Background

# **FeFET device structure & operating modes**





Interplay between FE material + underlying transistor capacitance results in different modes of operation:

- Non-volatile mode (device can maintain state)
- Steep switching mode (aimed at high performance)

Aziz, et al., "Computing with Ferroelectric FETs: Devices, Models, Systems, and Applications," p. 1289-1298, DATE 2018.

## Time-dependent Landau Khalatnikov (LK) model

# • LK model is SPICE compatible static coefficients kinetic coefficient $E = \alpha P + \beta P^3 + \gamma P^5 + \rho \frac{dP}{dt}$ Electric field Polarization

 $\alpha$ ,  $\beta$ ,  $\gamma$  calibrated to hafnium zirconium oxide (HZO)

| α   | -7x10 <sup>9</sup> m/F                                   |
|-----|----------------------------------------------------------|
| β   | 3.3x10 <sup>10</sup> m <sup>5</sup> /F/coul <sup>2</sup> |
| γ   | 7x10 <sup>9</sup> m/F                                    |
| ρ   | 0.25                                                     |
| tfe | 5.7 nm                                                   |



FeFET simulated by combining selfconsistent LK equation with 45 nm PTM

### **Multi-Domain Preisach Model**



• The response of HZO film is described by the total contributions of many ideal ferroelectric domains of varying  $E_c^{\pm}$ .

K. Ni, M. Jerry, J.A. Smith, and S. Datta, "A Circuit Compatible Accuracy Compact Model for Ferroelectric FETs," in VLSI Symposium 2018.

### **Multi-Domain Preisach Model**



## **Operation**

- Can switch FeFET polarization with:
  - Positive gate voltage pulse (program)
  - Negative gate voltage pulse (erase)
- Pulse causes stable, reversible Vt shift
  - Low Vt, high Vt depends on dipole's orientation
- 2 distinguishable states = memory window
  - Sense with readout of drain current



May tradeoff pulse duration, amplitude depending on application-level figures of merit

Dünkel, et al., "A FeFET based super-low-power ultra fast embedded NVM technology for 22 nm FDSOI and beyond," IEDM, 19.7.1, 2017.

# Logic-in-memory & Compute-in-memory

# Logic-in-memory: CAMs

#### **Content Addressable Memory (CAM)**

- Fast HW search O(1) for search intensive apps
- Often use ternary CAM (TCAM) i.e., store 1, 0, or X (where X is "don't care" (DC))
- TCAMs applicable to database apps, neural networks, *routers and switches*, etc.



#### **TCAM** array architecture search lines matchlines mismatch Χ Χ 0 1 match address match →01 Χ 0 match 10 X Χ

0

search data =  $0\ 1\ 1\ 0\ 1$ 

search line drivers

mismatch.

matchline

https://www.pagiamtzis.com/cam/camintro/

sense amps

# **Emerging neuromorphic computing models**

#### e.g., Projection Networks

Train neural network, lightweight network in lockstep





The choice of the type of projection matrix  $\mathbb{P}$  as well as representation of the projected space  $\Omega_{\mathbb{P}}$  in our setup has a direct effect on the computation cost and model size. We propose to leverage an efficient randomized projection method using a modified version of ocality sensitive hashing (LSH) to define  $\mathbb{P}(.)$ .

TCAM-supported hashing again an important compute kernel

ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections

> Sujith Ravi Google Research, Mountain View, CA, USA sravi@google.com

Look at TCAMs based on ASCENT technologies to support these models, other applications – consider FeFETs here...

# 4T, 2FeFET TCAMs (w/negative supply, LK model)



### 4T, 2FeFET TCAMs (w/o negative supply, LK model)



**Prior work** considered TCAM based on LK model with *negative* supply

|         | Step | WL0/WL1  | BL/BL | SL/SL |
|---------|------|----------|-------|-------|
| Write 0 | 1    | Vwrite/0 | 0/Vdd | 0     |
|         | 2    | 0/Vwrite | Vdd/0 |       |
| Write 1 | 1    | Vwrite/0 | Vdd/0 | 0     |
|         | 2    | 0/Vwrite | 0/Vdd |       |
| Don't   | 1    | Vwrite/0 | 0/Vdd | 0     |
| care    | 2    | 0/Vwrite | Vdd/0 |       |
| search  |      | 0/0      | 0/0   | data  |

X. Yin et al., "Ferroelectric FET based Non-volatile Logic-in-Memory Circuits," IEEE TVLSI (in submission), 2018.

### 4T, 2FeFET TCAMs (w/o negative supply, LK model)



|         | Step | WL0/WL1  | BL/BL | SL/SL |
|---------|------|----------|-------|-------|
| Write 0 | 1    | Vwrite/0 | 0/Vdd | 0     |
|         | 2    | 0/Vwrite | Vdd/0 |       |
| Write 1 | 1    | Vwrite/0 | Vdd/0 | 0     |
|         | 2    | 0/Vwrite | 0/Vdd |       |
| Don't   | 1    | Vwrite/0 | 0/Vdd | 0     |
| care    | 2    | 0/Vwrite | Vdd/0 |       |
| search  |      | 0/0      | 0/0   | data  |

X. Yin et al., "Ferroelectric FET based Non-volatile Logic-in-Memory Circuits," IEEE TVLSI (in submission), 2018.



TCAM," IEEE TCAS (in submission), 2018.



X. Yin et al., "Design and Benchmarking of 2-Ferroelectric FET TCAM," IEEE TCAS (in submission), 2018.

### **Benchmarking** (area comparisons)



<sup>A</sup>J. Li, R. K. Montoye, M. Ishii, and L. Chang, "1 mb 0.41 µm2 2t-2r cell nonvolatile TCAM with two-bit encoding and clocked self-referenced sensing," IEEE Journal of Solid-State

*Circuits*, vol. 49, no. 4, pp. 896–907, 2014. <sup>®</sup>S. Matsunaga, A. Katsumata, M. Natsui, T. Endoh, H. Ohno, and T. Hanyu, "Design of a nine-transistor/two-magnetic-tunnel-junction- cell-based low-energy nonvolatile ternary content-addressable memory," Japanese J. of Applied Physics, vol. 51, no. 2S, p. 02BM06, 2012

# **Benchmarking methodology**

- All designs evaluated in context of 64x64 array
- Assume
  - 45 nm PTM
  - Inverter-based SA
  - Minimum sized transistors for TCAM cell, SA
- Extract wiring parasitics from DESTINY
  - M. Poremba, et al., "Destiny: A tool for modeling emerging 3D NVM and EDRAM caches," in *DATE*, 2015, pp. 1543–6.
- Delay assumes worst case
  - (i.e., 1-bit mismatch...)



| Benchmarking | (other figures | of merit) |
|--------------|----------------|-----------|
|--------------|----------------|-----------|

| 64X64 size                    | 16T CMOS                         | 2T2R                                                 | 4T-2FeFET                  | 2FeFET           |
|-------------------------------|----------------------------------|------------------------------------------------------|----------------------------|------------------|
| Cell structure                |                                  |                                                      |                            |                  |
| Technology node               | 45nm                             | 45nm                                                 | 45nm                       | 45nm             |
| Cell area (µm²)               | 1.12 ( <mark>7.5</mark> x)       | 0.41 <sup>[1]</sup> (2.7x)                           | 0.65 ( <mark>4.3</mark> x) | 0.15 (1x)        |
| ON/OFF ratio                  | ~10 <sup>6</sup>                 | ~100 <sup>[2]</sup>                                  | ~104                       | ~10 <sup>4</sup> |
| Search voltage                | 1V                               | 1V                                                   | 1V                         | 1V               |
| Search delay (ps)             | 582 ( <mark>1.7x</mark> )        | 350 ( <mark>1.03x</mark> )                           | 1013 ( <mark>3.0x</mark> ) | 341 (1x)         |
| Search energy (fJ/bit/search) | 1.0 ( <mark>2.4x</mark> )        | 1.2 ( <mark>2.7</mark> x)                            | 0.5 ( <mark>1.3x</mark> )  | 0.4 (1x)         |
| Normalized EDP                | 4.1x                             | 2.8x                                                 | 3.8x                       | 1x               |
| Write scheme                  | Voltage driven dynamic switching | Current driven                                       | Voltage driven             | Voltage driven   |
| Write voltage                 | 1V                               | Set 1.8V <sup>[3]</sup><br>Reset 1.2V <sup>[3]</sup> | ±4V                        | ±4V              |
| Write time                    | < 2ns                            | ~ 10 ns                                              | 10 ns                      | 10 ns            |
| Write energy (fJ/row)         | 309 ( <mark>3.5x</mark> )        | 288000 ( <mark>3225x</mark> ) <sup>[3]</sup>         | 512 ( <mark>5.7x</mark> )  | 89 (1x)          |

## **Benchmarking** (other figures of merit)



## FeFET-based CIM: architecture

#### **FeFET-based CIM architecture**





- CM/MM is voltage-based sense scheme responsible for (N)OR logic and reads
- **CM** is current-based sense scheme used for Boolean (N)AND, X(N)OR, and ADD; also leverages voltage scheme
- SUM and CARRY is additional circuity for carry and sum



### FeFET-based CIM: benchmarking



FeFET-CIM has speed-ups (energy reductions) of ~119X (~1.6X) and ~1.97X (~1.5X) over ReRAM and STT-RAM CIM for in-memory addition of 32-bit words

FeFET-CIM approach offers an average speedup of  $\sim$ 2.5X and energy reduction of  $\sim$ 1.7X when compared to a conventional (not in-memory) approach.

Computing in Memory with Spin-Transfer Torque Magnetic RAM Stefard Ferlink and Computer Engineering, Power University (jain130.aranjack.aushik.raghumathan)@purdue.chu MAGI

ue Logic Design Within Memristive Memories Using Memristor-Aided loGIC (MAGIC)
Nishil Talati, Saransh Gopta, Pravin Mane, and Shahar Kvatinsky. Member, IEEE

ANOTECHNOLOGY VOL. 15 NO. 4. IULY 2016



Dayane Reis, et al., to appear at ISLPED 2018.

# **Neuromorphic applications**

## **Inference & training**



### **Vector-matrix multiplication with crossbars**



Dense analog synaptic memory arrays perform MACs and update at the location of the data

### **Analog synaptic device characteristics**



• Synaptic memory needs to be high density, low latency, energy efficient, and preserve high network accuracies.

# Analog synapses

#### **Filamentary O-RRAM**



#### ✓ High density

٠

• Electro-thermal switching

#### $\checkmark$ <100ns pulse widths

 Low G<sub>max</sub>/G<sub>min</sub> ratios demonstrated thus far

#### Non-Filamentary RRAM



#### ✓ High density

- Electro-thermal switching
- <100ns pulse widths to be demonstrated
- Low G<sub>max</sub>/G<sub>min</sub> ratios demonstrated thus far

#### **Ferroelectric FET**



- 2T design proposed
- ✓ Electric-field switching
- ✓ 75ns pulse widths
- ✓ Large and tunable Gmax/Gmin

Ferroelectric FET is a promising candidate for an analog synaptic memory device.

### **Ferroelectric FET analog synapse**

٠



Electric-field controlled partial polarization switching in ferroelectrics FETs can be harnessed for synaptic memory with nanosecond updates.

### Effect of Pulse Scheme on Pr



Multi-domain Preisach model accurately captures the the response of the remnant polarization

### Simulated G vs Pulse Number: Scheme 3



• FeFET synapse response from simulated Pr in programming scheme 3.

### FeFET Analog Synapse: Scheme 3



• Partial polarization switching within the ferroelectric gate oxide results in a gradual decrease/increase (potentiation/depression) in V<sub>T</sub>.

## **Analog Synapse Benchmarking**



• FeFET under pulse scheme 3 exhibits the reduced footprint, high accuracy, and low latency

### FeFET-based binary crossbars: circuits



Xioaming Chen, Xunzhao Yin, Michael Niemier, and Xiaobo Sharon Hu, "Design and Optimization of FeFET-based Crossbars for Binary Convolutional Neural Networks," to appear in *Design, Automation, and Test in Europe (DATE)*, 2018.

## FeFET-based crossbars: benchmarking

#### Benchmarking assumptions for 64x64 crossbar array

- FeFET: 10nm FinFET, Tfe=10.5nm, VwL=0.6, Vw=0.6, VR=-0.55
- **RRAM:**  $R_{on}=10K\Omega$ ,  $Roff=1M\Omega$ , Vw=2
- For both: VHL=0.3
- · Average case: half input bits and half weights are 1



|               | Cell area $(F^2)$ | $R_{\rm wire} (\Omega)$ | $C_{\text{wire}}$ (fF) |
|---------------|-------------------|-------------------------|------------------------|
| CMOS          | 150               | 0.245                   | 0.059                  |
| FeFET         | 60                | 0.155                   | 0.037                  |
| RRAM (1R*2)   | 4 (×2) [30]       | 0.04                    | 0.0096                 |
| RRAM (1T1R*2) | 20 (×2) [30]      | 0.09                    | 0.022                  |



Xioaming Chen, Xunzhao Yin, Michael Niemier, and Xiaobo Sharon Hu, "Design and Optimization of FeFET-based Crossbars for Binary Convolutional Neural Networks," to appear in Design, Automation, and Test in Europe (DATE), 2018.

### **Takeaways ... promising metrics!**





#### www.src.org/program/jump



Semiconductor Research Corporation

@srcJUMP