The Frontiers of Robust Circuit Design in Sub-28nm Process Technologies

Jim Dodrill
ARM

“The conventional wisdom that led to our success in the past will no longer work in the future.”

Cliff Hou, VP R&D, TSMC, at DAC 2014
Welcome to the Frontier

Agenda

• The Cost of Design Margins
• FinFET Changes the Landscape
• Variation
  – Process, Voltage, and Temperature
  – Location and Layout Dependent Effects
• Random Failures
  – Radiation, Synchronization, and Noise
• Aging
  – BTI, HCI, EM, and TDDB
• Putting it All Together
Focus

- Focus is on SoC implementation aspects
  - Assume foundry and IP partners have done their best to offer reliable products

- For each failure mechanism:
  - Introduce the physical background
  - Compare planar vs. FinFET
  - Offer practical design advice

Device Lifetime

- Testing
  - defects
  - variation

- Random Failures
  - soft errors
  - synchronization
  - noise

- Aging
  - BTI, HCI
  - EM, TDDB

![Device Lifetime Diagram]
The Heart of the Matter...

“At the heart of reliability engineering is the fact that there is a distribution of lifetimes for each failure mechanism. With low failure rate requirements we are interested in the early time-range of the failure time distributions. There has been an increase in process variability with scaling (e.g., distribution of dopant atoms, CMP variations, and line-edge roughness). At the same time the size of a critical defect decreases with scaling. These trends will translate into an increased time spread of the failure distributions and, thus, a decreasing time to first failure.”

ITRS 2013, Process Integration, Devices, and Structures

Tactics

• Margining
• Thermal Management
• Dynamic Voltage and Frequency Scaling (DVFS)
  – In-situ monitors
  – Adaptive Supply Voltage (ADV) or Back-Bias (ABB)
• Redundant Components
  – Error Detection and/or Correction
  – Time Multiplexing (Resource or Task Allocation)
  – Cannibalization (Resource Sharing)
The Cost of Design Margins

Moore’s Law
Dark Silicon
Margin Cost

“Traditional VLSI design bypasses the analysis and optimization of such non-uniform, dynamic network by approximating the problem into optimization of uniform static network with certain guard band.”

Muhammad Alam,
“Reliability- and process-variation aware design of integrated circuits”
2008 Elsevier Ltd.
Moore’s Law is Still Alive

- The cost per transistor continues to reduce
  - Or does it?

$ \text{$/mm}^2$ (normalized)

mm$^2$ / Transistor (normalized)

$ \text{$/Transistor}$ (normalized)


Another View...

- Lower yield will drive costs up in at 20nm and smaller while competition will drive costs down at 28nm and larger.
The Dark Silicon Apocalypse

“Where once we would spend exponentially increasing amounts of silicon area to buy performance, now, we will spend exponentially increasing amounts of silicon area to buy energy efficiency.”


Will increasing amounts of silicon area be used to buy reliability?

Of course!

Timing Margins

• More accurate modeling of timing variation results in more efficient implementations
FinFETs Change the Landscape

Planar vs. FinFET
Multi-patterned Interconnect

Introducing the FinFET

Source: GLOBALFOUNDRIES
Planar vs. FinFET

32 nm Planar Transistors

22 nm Tri-Gate Transistors

Source: Mark Bohr, et al. (Intel) (2011)

Fins are Very Thin

• This is a TEM image of the Intel 22nm Fin
• At 14nm, Fins will be about 20 atoms wide

Source: Chipworks (2012)
The Future...

Source: Martin van den Brink, "Many ways to shrink: The right moves to 10 nanometer and beyond," AMSL (2014)

Delay vs. Voltage

Source: Mark Bohr , et al. (Intel) (2011)
**Power vs. Speed**

Source: Shien-Yang Wu, et al. (TSMC), IEDM (2013)

**Leakage vs. Achieved Frequency**

- Leakage reduction from FinFET is significant

Source: Leah Schuth, (ARM) (2014)
Interconnect

22 nm Process

14 nm Process

80 nm minimum pitch

52 nm (0.65x) minimum pitch

Source: S. Natarajan, et al. (Intel), IEDM (2014)

Wire Resistance

- The RC time constant of wires is increasing substantially as line widths reduce

Cell vs. Wire Delay

• Wire delay is becoming as big as cell delay
  – We’ve heard this for years, but it’s real now


Double Patterning

• Note the wire thickness and spacing differences in the two metal patterns, A & B.

Source: ITRS 2013 EDITION: INTERCONNECT

---

On-Chip Variation

Process, Voltage & Temperature Variation
Layout Dependent Effects
OCV Modelling
Sources of Variation

- Process (transistor and wire)
- Voltage
- Layout Dependent Effects (LDE)
- Temperature

Variation: Planar vs. FinFET

- Single fin FinFETs are not used due to high variation

<table>
<thead>
<tr>
<th>Source</th>
<th>Planar</th>
<th>FinFET</th>
</tr>
</thead>
<tbody>
<tr>
<td>Random Dopant</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>(less)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Line Edge Roughness</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>Gate Edge Roughness</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>Gate Granularity</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>Fin Edge Roughness</td>
<td></td>
<td>✓</td>
</tr>
<tr>
<td>Fin Height</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>Fin Shape</td>
<td></td>
<td>✓</td>
</tr>
</tbody>
</table>

**Variation vs. Voltage**

- Delay and variation increase as voltage decreases

![Graph showing variation vs. voltage](image)

Source: Isadore Katz, CLK DA (2014)

---

**Voltage Variation**

- Chip dynamic voltage drop based on two different operating modes

![Images showing voltage variation](image)

A. Shanmugavel, Ansys Inc. (2013)
Temperature Variation

- Thermal conduction from the channel
  - planar transistors: tends toward the substrate
  - finFET transistors: tends toward the metal

![Package Temperature](image1.png)  ![Die Power Density](image2.png)

A. Shanmugavel, Ansys Inc. (2013)

Layout Dependent Effects

- Non-uniformities in the surrounding context of a transistor adds to delay variation
OCV Modeling

• Traditional OCV
  – Percentage derate applied to clock paths
  – Plus a fixed margin added to the clock uncertainty

AOCV Modeling

• Advanced OCV (AOCV)
  – Tables of derate values for cells and nets indexed by path depth
  – These tables are called Stage-Based OCV (SB-OCV)
  – May include location-based (LOCV) derate tables
Stage Based-OCV Limitation

• SB-OCV is limited to:
  – One timing arc per cell
  – One load/slew point

Delay Variation vs. Load/Slew

OCV Modeling

• Liberty Variance Format (LVF)
  – Table of sigma values for cell delay, transition time, and constraints in the Liberty model
  – Derates are calculated by the timing engine based on N*sigma and path depth

<table>
<thead>
<tr>
<th>LOAD</th>
</tr>
</thead>
<tbody>
<tr>
<td>σ</td>
</tr>
<tr>
<td>σ</td>
</tr>
<tr>
<td>σ</td>
</tr>
<tr>
<td>σ</td>
</tr>
<tr>
<td>σ</td>
</tr>
<tr>
<td>σ</td>
</tr>
<tr>
<td>σ</td>
</tr>
<tr>
<td>σ</td>
</tr>
</tbody>
</table>

37

38
Statistical Constraint Margins

- Timing Constraints also have variation
  - Usually only hold and removal constraints are margined

Constraint Margins vs. Voltage

- The variation of also constraints increases as voltage decreases

Source: Isadore Katz, CLK DA (2014)
PVT Corners

Hold must be met across the entire process range.

Full-yield setup corner

Typical

On-Die Mean Fast

On-Die Mean Slow

Slowest

Fastest

Determining N*sigma

• LVF allows users to choose N*sigma
  – Native die yield should dominate, not timing yield

<table>
<thead>
<tr>
<th>Step</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td>Estimate manufacturing yield</td>
<td>95% 5/100</td>
</tr>
<tr>
<td>Chose a timing yield target that is better</td>
<td>97% 3/100</td>
</tr>
<tr>
<td>Estimate the number of near-critical hold paths</td>
<td>50K 2/100K</td>
</tr>
<tr>
<td>multiply</td>
<td>99.99994% 6/10M</td>
</tr>
<tr>
<td>Derive sigma</td>
<td>about 5σ</td>
</tr>
</tbody>
</table>
“Statisticians, like artists, have the bad habit of falling in love with their models.”

-- George Box

Practical Advice

• Use the latest, most accurate variation models available
  – Consult your foundry and IP provider for guidance

• Be sure to account for non-obvious variation
  – Layout Dependent Effects
  – Wire Variation

• Consider the cost of design margins
  – Yield loss vs. power, area, and design schedule
  – Prioritize good hold margins
Random Failures

Soft Error Rate (SER)
Synchronization
Random Telegraph Noise (RTN)

The Neutron Strike

• Cosmic radiation includes alpha particles and high energy neutrons that can create ionization which can upset logic

SER of Planar vs. FinFET

- The trend is for improvement in SER
  - FinFET has 3X-10X lower SER than planar


SER of Systems

- While the SER of individual state elements is improving, overall chip SER is not
  - Flip-flops dominate in systems with ECC

SER vs. Voltage in FinFET

- SER increases as voltage decreases
  - Various FinFET technologies can improve SER

![Graph showing SER vs. Voltage in FinFET technologies]


Practical Advice

- Acquire FIT rate estimates from foundries and IP providers
  - Consider the lowest operating voltage domain
- Determine which state elements contribute the most to the probability of program failure
  - The “Architectural Vulnerability Factor” (AVF)
  - Harden those state elements
- Use appropriate error detection and correction
Clock domain crossing

- Synchronizers are used when data crosses between two asynchronous clock domains.
  - That means data can change during the window between the setup and hold constraints.

![Synchronizer Diagram]

Metastability

- When setup and hold constraints are violated, the signals inside the receiving flip-flop can fail to resolve within a clock period.

![Metastability Diagram]
MTBF for Synchronizers

\[ MTBF = \frac{e^{T_S/\tau}}{T_W \cdot f_d \cdot f_c \cdot n} \]

- \( T_S \) is the resolution time, which is approximately the clock period
- \( \tau \) is the resolution time constant, a function of the latch design and PVT corner
- \( T_W \) is the time window, also a function of the latch design and PVT
- \( f_d \) is the data frequency
- \( f_c \) is the clock frequency
- \( n \) is the number of synchronizers in the entire system

MTBF vs. Voltage

- MTBF reduces as voltage decreases
  - MTBF also decreases as the period decreases
  - VT choice is critical
Practical Advice

- Use specially constructed synchronizer cells
  - One cell, N-stages deep
  - Choose the lowest $V_T$ and shortest $L_G$ available
- Obtain the necessary MTBF parameters for your synchronizers
  - Use the model for an average transistor on a slow die
  - Calculate the MTBF at each voltage and frequency combination
  - Account for the total number of synchronizers in the computing system

Random Telegraph Noise

- Random Telegraph Noise (RTN) is caused by the capture and emission of carriers at traps (defects) in the oxide boundary

Random Telegraph Noise

- “The static variability of the source-induced RDF is found to overwhelm the dynamic on-current fluctuation due to RTN.”

\[ \text{RTN} = \text{Random Telegraph Noise} \]
\[ \text{RDF} = \text{Random Dopant Fluctuation} \]


---

Aging

- BTI: Bias Temperature Instability
- HCI: Hot Carrier Injection
- EM: Electro-migration
- TDDB: Temperature Dependent Dielectric Breakdown
### BTI: Stress and Recovery

**PMOS:**
- Negative Bias Temperature Instability (NBTI)
- G: 0 (on) → D: 1
- G: 1 (off) → D: 1 or 0

**Stress**

**NMOS:**
- Positive Bias Temperature Instability (PBTI)
- G: 1 (on) → D: 0
- G: 0 (off) → D: 0 or 1

**Natural Recovery**

**Proactive Recovery**

Based on Lin Li, “Improving the Reliability of Microprocessors Under BTI and TDDB Degradations,” University of Pittsburgh (2014)

---

### BTI: Planar vs. FinFET

- The $V_T$ shift due to PBTI is lower, and due to NBTI is higher for FinFET

![Graph showing $V_T$ shift for various nodes](image)

Kyong Taek Lee, (Samsung) IEEE (2013)
BTI: Ring Oscillator Degradation

- $V_T$ shifts due to BTI lead to larger propagation delay


---

BTI vs. Voltage

- At higher voltage, delay degradation is more

$$\Delta V_{th} = Ae\left(\frac{-E_a}{kT}\right)e^{(\gamma V_{gs})}t^n$$

BTI and Variation

- Process variation remains a normal distribution after aging

[Graph showing normal distribution of threshold voltage before and after aging]

D. Angot, et. al., IEDM (2013)

BTI: Stress vs. Relaxation

- Partial healing during relaxation leads to delay degradation that is state dependent

[Graph showing normalized degradation over time for PMOS NBTI at 125°C]

Kyong Taek Lee, (Samsung) IEEE (2013)
BTI and Duty Cycle

**Arbitrary Switching**

![Graph showing Duty Factor and Duty Cycle](image)

- Duty Factor
  - 100%
  - 75%
  - 50%
  - 25%
  - 5%

**Source:** Haldun Kufluoglu, “MOSFET Degradation due to Negative Bias Temperature Instability (NBTI) and Hot Carrier Injection (HCI) and its Implications for Reliability-Aware VLSI Design,” Purdue University (2007)

---

Path Rank Analysis

- Rank paths in fresh and aged design, sorted by slack
- Non-critical paths can become critical, vice versa

<table>
<thead>
<tr>
<th>Path rank</th>
<th>% Timing Degradation</th>
<th>Path rank</th>
<th>% Timing Degradation</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Fresh (Dhrystone)</td>
<td>Aged (Dhrystone)</td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>14084</td>
<td>179394</td>
<td>15.61</td>
</tr>
<tr>
<td>2</td>
<td>9781</td>
<td>145042</td>
<td>15.41</td>
</tr>
<tr>
<td>3</td>
<td>9329</td>
<td>134419</td>
<td>15.18</td>
</tr>
<tr>
<td>4</td>
<td>12345</td>
<td>1413427</td>
<td>17.57</td>
</tr>
<tr>
<td>5</td>
<td>6220</td>
<td>272323</td>
<td>15.67</td>
</tr>
<tr>
<td>6</td>
<td>36672</td>
<td>224034</td>
<td>15.46</td>
</tr>
<tr>
<td>7</td>
<td>7771</td>
<td>331934</td>
<td>15.76</td>
</tr>
<tr>
<td>8</td>
<td>11580</td>
<td>275422</td>
<td>15.56</td>
</tr>
<tr>
<td>9</td>
<td>28975</td>
<td>481425</td>
<td>16.06</td>
</tr>
<tr>
<td>10</td>
<td>20054</td>
<td>208561</td>
<td>15.24</td>
</tr>
</tbody>
</table>

**Source:** Vikas Chandra, et al. (ARM), “Workload dependent NBTI and PBTI analysis for a sub-45nm commercial microprocessor,” IEEE (2013)
Hold Time with Gated Clocks

• Since the rising edge of a gated clock spends most of its time in recovery, it does not age
• Rising edge of un-gated clocks does age
• Leading to potential hold failures over time

Practical Advice (1/2)

• Set critical range to at least 10% of the clock period
  – Prevents area and leakage recovery from creating setup paths that will age to become critical
Practical Advice (2/2)

- Most gated clocks should connect directly to sequential elements
  - Identify cases of clock tree stages beyond the gated clock and add extra hold margin
- Advocate with EDA vendors for static timing with aging based on time and switching activity
  - The issue of determining an appropriate switching activity remains

Hot Carrier Injection

\[ \Delta V_{th} \approx L_{eff} \times \left[ t \times \frac{I_d}{W} \times \left( \frac{I_{sub}}{I_d} \right)^m \right]^n e^{\left( \frac{-E\alpha}{kT} \right)} \]

HCl: Planar vs. FinFET

- HCl in FinFET is better than planar transistors

![Graphs showing HCl comparison between FinFET and planar transistors](image)


Practical Advice

- Limit the maximum transition time to reduce degradation due to HCl
  - This will insure HCl has less effect than BTI
- Advocate with EDA vendors for static timing with aging due to HCl
  - Table based on input transition, output load, and switching activity
**Electromigration**

\[ MTTF = A j^{-n} e^{\frac{Q}{kT}} \]  
(Black’s Equation)

- Resistance changes as dislocations form
  - Failure criteria is specified as a given resistance shift in a percentage of samples


**EM: Resistance vs. Time**

- Resistance changes as dislocations form
  - Failure criteria is specified as a given resistance shift in a percentage of samples

**EM Trend**

- With stronger FinFET transistors and thinner interconnect, EM is becoming critical.

**EM and FinFETs**

- Self-Heating in FinFETs may lead to worse EM in surrounding wires
  - Self-heat manifests as a sensitivity to the fin or gate count in switching aging degradation.

---

http://semimd.com/blog/tag/rram/
Practical Advice

- Take advantage of the Blech effect whenever possible, especially in VIA stacks
- Build and verify power grids early
  - Limit placement density in areas with high power
- Limit wire length and maximum transition times to reduce RMS EM violations on signals
- Follow IP provider guidelines for cell level EM compliance
- Carefully balance operating temperatures and MTBF due to EM

Time Dependent Dielectric Breakdown

\[ t_{BD} = A_0 e^{-\beta V} e^{\left(-\frac{Ea}{kT}\right)} \]

Source: ITRS 2013 EDITION: INTERCONNECT
**TDDB:**

- Electric fields are highest at wire tips
  - Line edge roughness contributes on wire sides

Fig. 6. (a) Planar SEM view of finger test structure (b) Electric field simulation of finger test structure (c) Planar SEM view of comb structure (comer) (d) Electric field simulation of comb structure (comer) (e) Planar SEM view of comb structure (body) (f) Electric field simulation of comb structure (body)

Source: Ong, IEEE IPFA (2012)

---

**TDDB vs. Spacing**

- TDDB is becoming a concern

## Summary

The Balancing Act

---

### The Balancing Act: Temperature

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Temperature</th>
</tr>
</thead>
<tbody>
<tr>
<td>Frequency</td>
<td>*</td>
</tr>
<tr>
<td>Power</td>
<td>*</td>
</tr>
<tr>
<td>Variation</td>
<td>*</td>
</tr>
<tr>
<td>SER</td>
<td>*</td>
</tr>
<tr>
<td>Sync. MTBF</td>
<td>*</td>
</tr>
<tr>
<td>BTI/HCl</td>
<td>*</td>
</tr>
<tr>
<td>EM</td>
<td>*</td>
</tr>
<tr>
<td>TDBB</td>
<td>*</td>
</tr>
</tbody>
</table>

*Depends on voltage*
The Balancing Act: Voltage

Welcome to the Frontier