

## Physical-Aware, High-Capacity RTL Synthesis for Advanced Nanometer Designs

Sanjiv Taneja Vice President, Product Engineering Cadence Design Systems ISQED2013, March 4-6 Santa Clara, CA



## Agenda

1. Market trend and challenges

2. Physical effects of interconnect and congestion

3. Physical aware RTL synthesis

4. Hierarchical flow

5. Summary



## Semiconductors – at the heart of the next technology wave



By 2020 there will be over 10 billion mobile internet devices, and the core of each is a specialized semiconductor

- Tablets
- Smartphones
- MP3 players
- Gaming devices
- Car electronics
- Mobile video
- Home entertainment
- Wireless appliances

Source: Morgan Stanley

#### cādence<sup>®</sup>

## SoC Design Challenges

|                                       |         | 65nm                                        | <b>40</b> nm | 28nm               | 20nm                    | 14nm                                    |
|---------------------------------------|---------|---------------------------------------------|--------------|--------------------|-------------------------|-----------------------------------------|
| Performance                           |         | 0.5-1GHz                                    | 1-2 GHz      | 1.5-3GHz           | 2-5 GHz                 | >5 GHz                                  |
| Design size (instances)               |         | 10M                                         | 20M          | 50M                | 100M+                   | 200M+                                   |
| Power Density<br>(W/cm <sup>2</sup> ) | Dynamic | 100                                         | 180          | 250                | 425                     | 650                                     |
|                                       | Leakage | 50                                          | 120          | 250                | 425                     | 650                                     |
| Mixed Signal Content                  |         | Increasing Mixed Signal content in all SoCs |              |                    |                         |                                         |
| DFM                                   |         | DRC                                         | DRC, Litho   | DRC, Litho,<br>LDE | DRC, Litho,<br>LDE, DPT | DRC, Litho,<br>LDE, DPT,<br>FINFET etc. |

Source: IBS



## Physical Interconnect Modeling Challenge Impact of Physical Effects

|                                   | 65nm                                                                        | 40nm    | 28nm     | 20nm                                 | 14nm   |
|-----------------------------------|-----------------------------------------------------------------------------|---------|----------|--------------------------------------|--------|
| Performance                       | 0.5-1GHz                                                                    | 1-2 GHz | 1.5-3GHz | 2-5 GHz                              | >5 GHz |
| Design size (instances)           | 10M                                                                         | 20M     | 50M      | 100M+                                | 200M+  |
| Physical Effects for Interconnect | Routing Topology / Detoured Nets<br>Coupling Capacitance / Slew Degradation |         |          |                                      |        |
| Delay Modeling                    |                                                                             |         | -        | esistance Estima<br>wareness / Via E |        |

Layer Assignment:At 20nm – resistance per unit length from<br/>20 Ohm/Micron (M1-M5) to 8 Ohm/Micron (M6-M7)<br/>and finally to 0.05 Ohm/Micron (M8)

| Via Effect: | At 28nm – wire resistance can increase by 2x due to      |
|-------------|----------------------------------------------------------|
|             | Via resistance from one routing topology to another. IBS |

cādence

## GigaScale Design Closure Challenges



Floorplan Complexity 1000+ hard IPs, 1000+ pins Block Closure Complexity 10-30 MMMC views Hierarchical Assembly Complexity 50-100M instances 10-25 blocks



## **Congestion Challenge**

Impact of Physical Effects

- Congestion due to poor floorplan
  - Adjusting floorplan can be the solution
    - Macro and port placement

### Congestion due to netlist structure

- Cannot be fixed in physical or can cost PPA(\*)
- Need re-synthesis for best PPA and convergence in physical



#### PPA = Performance Power Area

High congestion structures:

- Cross bars
- Barrel shifters
- Memory connected Mux chains

cādence

## Need physically aware high-capacity synthesis to bridge the gap



8 © 2013 Cadence Design Systems, Inc. All rights reserved.

cādence<sup>°</sup>

## Physical Interconnect Modeling Challenge Impact of Physical Effects



#### Nets Dominate

Logical physical gap remains

Band-aids include:

- Over-design
- Multiple iterations

Root cause not addressed in deployed solutions

Need physical awareness in an actionable logic design context

cādence°

## Successive refinement of wire modeling

Multiple wire abstractions



0 © 2013 Cadence Design Systems, Inc. All rights reserved.

cādence<sup>®</sup>

#### Physical Layout Estimation Wireload Model Replacement Improves Netlist Creation



- What is PLE?
  - A physical modeling technique to capture timing closure P&R tool behavior for RTL synthesis optimization
    - Result: better timing-power-area balance
- Uses actual design and physical library info
- Dynamically adapts to changing logic structures in the design
- Same runtime as WLM

Does a good job modeling the short wires in a design (80-90%)

Improves QoS and predictability over WLM

cādence<sup>°</sup>

### Congestion Optimization Techniques Produce a cleaner design to begin physical implementation

#### Morphing:

- Incrementally estimates and optimizes congestion
- Uses native, real-time congestion estimation

#### **Global whitespace distribution:**

- Re-distribute "whitespace" around placed instances to reduce pin access problem
- Amount of whitespace is calculated based on
  - Instance pin density, Local congestion severity, Global interconnect

#### **Congestion & placement aware IOpt:**

- Dynamically estimate congestion for every move
- Incrementally place new gates to reduce congestion
- Structuring and cell selection is congestion-aware and placement-aware





## **Congestion driven DFT logic placement**

- Challenges:
  - Due to dense connectivity, standard placement algorithms can "clump" compression logic causing local Routing congestion
- Solution:
  - Specialized DFT aware congestion driven placement algorithms can mitigate congestion without disturbing signal path placement





### Physical-aware Scan Chain Optimization Superior results

- Challenges:
  - Lack of physical information
  - Scan chain congestion
  - Impact on timing/SI

#### Solution:

- Scan chain built using DEF physical information
- Proven to reduce scan wire congestion by 40%
- Improved balancing
- Shorter physical scan chains, reducing area and wire congestion





## Synthesis capabilities for advanced node

| Leakage optimization<br>(MVt opto) for<br>advanced nodes | <ul> <li>Enhanced multi-Vt cell selection during Global Mapping</li> <li>Leakage power more significant at 28nm and below</li> </ul>                                |
|----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                          |                                                                                                                                                                     |
| Improved slew                                            | <ul> <li>Enhanced slew degradation estimation in RC timing<br/>analysis</li> </ul>                                                                                  |
| degradation<br>estimation                                | <ul> <li>Multi-threaded RC timing calculation to deliver fast runtime<br/>with accuracy</li> </ul>                                                                  |
|                                                          |                                                                                                                                                                     |
| Layer assignment<br>estimation &<br>modeling             | <ul> <li>Estimate layer assignments for critical nets</li> <li>Pass forward layer assignment assumption to physical implementation to ensure convergence</li> </ul> |
|                                                          |                                                                                                                                                                     |
| Advanced OCV (on-<br>chip-variation)<br>support          | <ul> <li>Logic depth based cell delay variation (derating)</li> <li>More accurate than plain OCV with common cell derating value for all cells</li> </ul>           |



## Evolution of Design Flow for a Better Timing Closure



- Physical aware logic synthesis
  - Incremental congestion prevention
  - Structural datapath support
  - Physical aware clock gating
  - Physical aware logic structuring
  - Physical aware mapping

### Physical Aware Structuring Minimizes Congestion & Improves Timing

- Target high congestion structures
  - Cross bars
  - Barrel shifters
  - Memory connected Mux chains





<sup>17 © 2013</sup> Cadence Design Systems, Inc. All rights reserved.

- Physical aware
  - Tradeoff Mux structuring



- Physical aware
  - Tree rebalancing (huge inputs)



## cādence<sup>°</sup>

### Improved Timing & Cleaner Placement Physical Aware Re-Structuring

- Targets OR trees with some shared sub functions
  - Example shown: Four inputs on four sides of module, eight 256-bit OR trees





#### Physical Aware Mapping Optimized Timing with Increased Correlation

- Initial map is purely logical
  - "Logic schematic" with wire estimation
  - Long wires not predictable
  - Enables initial placement



- Map w/ placement has register location
  - Automatic path requirement adjustment
  - Mapping w.r.t. long wire-aware timing
  - Logic "squeezed" to meet timing





## **Hierarchical Flow**

## Partitioning/Prototyping

- Generate partitions from top level design
  - Logical and physical partitions
- Create timing budget for individual partitions
- Optimize each partition

## Assembling top level design

- Assemble partitions together
- Various models can be used to represent partitions
  - Lib/LEF
  - Full netlist/def/spef
  - ILMs

## Performing Top level timing closure

- Interface logic optimization



cādence<sup>®</sup>

## Benefits of using ILMs

- Highly accurate representations of the original design.
  - ILMs do not abstract, they simply discard what is not required for modeling boundary timing.
- Small memory footprint and runtime
  - Up to 90% of the logic can be discarded
- Can be adapted easily to any stage of the design process
- Easy to replace one ILM with another





## Summary

- Advanced technology nodes pose new challenges for RTL Synthesis
- Physical effects of interconnect and congestion need to be modeled at RTL for a convergent, predictable flow
- Logic structuring and global mapping need to be physically aware in order to generate layout friendly netlist
- Hierarchical methodology needs to evolve to manage complexity while providing accurate modeling in a bottomup flow

# cādence®