# EE249 Lab Graphical System Design

October 4, 2012 Hugo A. Andrade, Kaushik Ravindran, Jeff C. Jensen

# **National Instruments**

- More than 50 international branches in over 45 countries
- Corporate headquarters in Austin, TX



- 6,400+ employees
- More than 1,000 products



### **National Instruments**

Offering graphical system design solutions to the Test and Measurement and Industrial Embedded





### What We Do

National Instruments equips engineers and scientists with tools that accelerate productivity, innovation, and discovery





### **Diversity of Customers**



- Top 100 customers ≈ 35% of revenue
- More than 30,000 customers in more than 90 countries
- 95% of Fortune 500 manufacturing companies have adopted Virtual Instrumentation



## **Diversity of Applications**

#### No Industry >15% of Revenue in 2011



Academic



Advanced Research



Automotive



Big Physics



**Consumer Electronics** 



Defense/Aerospace



Energy



Life Sciences



Mobile Devices



Semiconductors



# **Technology Overview**



# **NAE: Engineering Grand Challenges**



Advance health informatics



Engineer better medicines



Develop carbon sequestration methods



Secure cyberspace



Engineer the tools of scientific discovery



Provide access to clean water



Advance personalized learning



Manage the nitrogen cycle



Reverse-engineer the brain



Enhance virtual reality



Make solar energy economical



Provide energy from fusion



Restore and improve urban infrastructure



Prevent nuclear terror

#### http://www.engineeringchallenges.org/



ni.com

# **Build Better Systems Faster**



#### Better Integration





Higher Performance

Lower Costs

We equip engineers and scientist with the tools that accelerate productivity, innovation and discovery

### The Traditional Approach to Automated Test



Source: Agilent, Keithley, and Nicolet



### The Customer Decision: Build or Buy in Embedded



### **Build**

- Custom HW/SW solution
- Long lead times for new product
- Significant resource requirements



### Buy

- Off-the-shelf hardware with LabVIEW
- Use less resources because systems are prebuilt
- Faster time to market



### The Virtual Instrumentation Approach



### The Software Is the Instrument



### Empowering Users Through Software Providing unique

differentiation and preserving customer investments



LEGO<sup>®</sup> MINDSTORMS<sup>®</sup> NXT *"the smartest, coolest toy of the year"* 







CERN Large Hadron Collider "the most powerful instrument on earth"



# **Graphical System Design**

#### A Platform-Based Approach



# System Design to Deployment





# Abstraction to the Pin

|              |   |  | li baline pole            |              |  |
|--------------|---|--|---------------------------|--------------|--|
| He basada da |   |  | ferri infre<br>no it (the | 1111<br>1111 |  |
|              |   |  | Big Bas                   |              |  |
|              |   |  |                           |              |  |
|              |   |  | list half and             |              |  |
|              |   |  |                           |              |  |
|              |   |  |                           |              |  |
| 品品品品         | - |  |                           |              |  |



#### LabVIEW FPGA



VHDL

# Integration of Modular I/O and Commercial Technology



**Box Instruments** 





#### **PXI** Modular Instruments



# Faster System Development





#### Integrating Components

#### Integrated System Platform



# Integrating Software and Hardware Elements



Productive software and reconfigurable hardware for any system that needs measurement and control



### Software

# Hardware

#### COMMUNITY

140,000+ online members 250+ registered user groups 1000+ job postings online 400K+ children through LEGO

#### CONNECTIVITY

9000+ instrument drivers 8000+ example programs 1000+ motion drives 1000+ smart sensors 1000+ Third-party PAC devices

#### COLLABORATION -

280+ third-party add-ons 400+ Solution partners 1000+ value added resellers 35+ training courses



#### PROCESSOR

Intel, Microsoft, Freescale, Wind River Multi-core and real-time technology

#### FPGA

Xilinx Virtex & Spartan Reconfigurable hardware

#### IP

Control & signal processing IP & I/O drivers Built-in graphical IP, integrate user IP

I/O

Analog Devices, Texas Instruments Connect to any sensor & actuator

BUS

PCI/PCIe, Enet, USB, wireless, deterministic Enet, Open architecture

# Productive software and reconfigurable hardware for any system that needs measurement and control







# **Integrated Distributed Heterogeneous Platform**





High-Speed Data Streaming

- Synchronize memory access
- Fast data links for maximum performance

A/D Technology

- Multirate sampling
- Individual channel triggering



Microprocessors

- Floating-point processing
- Communications
- Multicore technology
- Reprogrammable

FPGAs

- High-speed control
- High-speed processing
- Reconfigurable
- True Parallelism
- High Reliability

I/O

- Custom timing & triggering
- Modular I/O
- Calibration
- Custom modules



# Advanced Data Acquisition

**ISIS** Proton Synchrotron





# **Semiconductor Test**

**Analog Devices** 





# **Pipeline Test and Validation**

Inertial Pipeline Inspection Gauge





# **EcoCAR Challenge**

Virginia Tech – 1<sup>st</sup> Place 2011





## Large-scale RT Applications

European Extremely Large Telescope

ni.com



# E-ELT Primary Mirror (M1): Mirror and Segment Models







### LabVIEW<sup>TM</sup> based Control Platform







### National Instruments Vision Evolved

"To do for embedded what the PC did for the desktop."

# **Graphical System Design**

#### Virtual Instrumentation

Instrumentation RF Digital Distributed

Real-time measurements Embedded monitoring Hardware in the loop

#### **Embedded Systems**

Industrial control RT/FPGA systems Electronic devices C code generation





#### **High-Level Development Tools**









# LabVIEW Virtual Instrument

# **Front Panel**



# 

### **Block Diagram**





#### Getting Started





X

| Dutitled 1 Front Panel                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                         |            |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|------------|
| <u>File Edit View Project Operate Tools Window</u>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | <u>H</u> elp                            |            |
| 다 🚱 🔘 💵 15pt Application Font 💌 🚛                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | · • • • • • • • • • • • • • • • • • • • | 2 <b>1</b> |
| Image: Content in the second secon |                                         |            |
| •                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                         |            |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                         |            |





# Creating a VI

### Front Panel Window





# **Dataflow Programming**

- Block diagram execution
  - Dependent on the flow of data
  - Block diagram does NOT execute left to right
- Node executes when data is available to ALL input terminals
- Nodes supply data to all output terminals when done





### **Structured Dataflow**





## LabVIEW as a Target Language

- Application Wizards Patterns
- StateCharts
- MathScript
- Control and Simulation Diagram
- Express Nodes and X-nodes
- I/O Nodes



### **Application Wizards - Patterns**





### **Application Wizards - Patterns**









| FPGA Wizard                                                                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|---------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| + Add Item X Remove Item                                                              | Real of the Help                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| Timing Engines         Click the Add Item button to add timing engines and functions. | Welcome to the FPGA Wizard         The FPGA Wizard uses a configuration dialog to help you         design and generate LabVIEW code for data acquisition (DAQ)         and process-control applications. The wizard provides a         starting point by using common FPGA architectures to         generate code specific to your hardware. When you select the         timing and type of 1/0 you want to perform, the FPGA Wizard         generates a ready-to-run FPGA diagram, along with a host         interface VI that enables you to communicate with the FPGA         using the host computer. The generated code can be run as is,         or you can further customize it to meet your specific         measurement and control needs.         To get started, select one of the three types of timing engines:         Buffered DMA Input, Single-Point Continuous, or Single-Point         Timed Loop.         As you go through the dialogs of the FPGA Wizard, you can         move your mouse cursor over each control to see an         explanation of that control, or click the Help button at the         bottom right to see the FPGA Wizard Help. |
|                                                                                       | Generate Code Save Close Help                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |



| TPGA Wizard *                                                                                                                                                    |                                                                      |                                                                                                                                                        |  |  |  |  |  |  |  |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|--|--|
| + Add Item X Remove Item                                                                                                                                         |                                                                      | 😵 Hide Help                                                                                                                                            |  |  |  |  |  |  |  |  |
| Timing Engines          Buffered DMA Input         AI (Connector0/AI0)         DI (Connector0/DIO0)         Single Point Timed I/O         PWM (Connector0/DIO1) | PWM Generation   Resource   Connector0/DIO1   Polarity   Active High | Timing Engines tree<br>Displays the timing engines<br>and functions that you add in<br>a tree. The timing engines<br>contain the functions you<br>add. |  |  |  |  |  |  |  |  |
| Click the Add Item button to add<br>timing engines and functions.                                                                                                |                                                                      |                                                                                                                                                        |  |  |  |  |  |  |  |  |
|                                                                                                                                                                  | Generate Code Sav                                                    | re Close Help                                                                                                                                          |  |  |  |  |  |  |  |  |















## The G (LabVIEW) Language Model

- Homogenous dataflow language
  - Structured case (switch, select) and loops
    - "Structured dataflow"
- Run-time scheduling
  - Explicit task level parallelism
  - Implicit parallelism heuristically identified
- Synthesizable language
  - To machine code on x86 and PPC processors
  - To VHDL for FPGAs
  - To C for embedded processors
- Turing complete



# Evolution of LabVIEW Code Generation





### **DFIR - Background**

Data Flow Intermediate Representation (DFIR) is used today to separate front-end editors from back-end compilers (as illustrated below) and to provide a consistent framework for managing code generation and optimizations.





### **DFIR - Background**

DFIR models existing G data flow semantics with arbitrary VI hierarchy. Wires are also modeled as nodes, which can generate custom code if needed.





### System Deployment

- Target aware synthesis
- I/O Port Abstraction
  - I/O Classes
  - Protocol generation
- Channel Abstraction
  - FIFO
  - Loop-to-loop
  - Peer-to-peer
  - Board-to-host (DMA)



## System Deployment

- Timing
  - Expressing an order
    - Language constructs
    - Operating Environments
  - Reality of Platform timing
    - Static analysis



### What is LabVIEW FPGA





### **Enforcing Dataflow in FPGA**









# System-Level Design

### **Concurrent Application**



#### Application trends

- Large # of parallel tasks
- Large node/channel counts
- High performance requirements
- E.g. streaming DSP applications

Implementation Gap

# Parallel Platform

### Platform trends

- Large # of processing elements
- Heterogeneous processors and memories
- Distributed I/O
- E.g. Heterogeneous FPGA targets



## Modeling System-Level Designs

System-level designs introduce new modeling constructs:

- Systems
- Targets
- Mixed MoC Diagrams





# **RF Communications Applications Overview**





# **OFDM TX/RX Block Diagram**







# Streaming Model of the OFDM Transmitter



- Compile time # transmitters
- Nu = {72, 180, 300, 600, 900, 1200}
  - Initialization time Bandwidth
- CP mode = {'Normal', 'Extended'}
  - Run time, To overcome Inter-symbol-interference, Can be applied at symbol boundary

viable for analysis and implementation?

CP Vector

Selection based on CP mode, Elements must be applied at symbol boundary

ni.com















| Errors & Warning | rors & Warnings Output Schedule |          |    |       |       |            |       |       |       |       |       |       |        |            |        | 1      | H X    |        |        |        |        |        |        |          |
|------------------|---------------------------------|----------|----|-------|-------|------------|-------|-------|-------|-------|-------|-------|--------|------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|----------|
| P. P.            |                                 |          |    |       |       |            |       |       |       |       |       |       |        |            |        |        |        |        |        |        |        |        |        |          |
| Block            | 1 2                             | <b>.</b> | 0  | 10000 | 20000 | 30000      | 40000 | 50000 | 60000 | 70000 | 80000 | 90000 | 100000 | 110000     | 120000 | 130000 | 140000 | 150000 | 160000 | 170000 | 180000 | 190000 | 200000 | 210      |
| ZeroPad.lvdsp    | 3                               | 513      | x3 |       | 1     |            |       |       | 1     |       |       |       | 1      |            |        |        |        |        |        |        |        |        |        | <u>^</u> |
| FFT (Xilinx 7.0) | 3                               | 5792     |    |       |       | <b>–</b> 1 |       |       |       |       |       |       |        | <b>–</b> 1 |        |        |        |        |        |        |        |        |        | =        |
| FIR (Xilinx 5.0) | 240                             | 120      |    |       | x240  |            |       |       | x240  |       |       |       | x240   |            |        |        | x240   |        |        |        | x240   |        |        |          |
| FIR (Xilinx 5.0) | 500                             | 60       |    |       | x500  |            |       |       | x500  | 5     |       |       | ×500   |            |        |        | x500   |        |        |        | x500   |        |        |          |
|                  |                                 |          | •  |       |       |            |       |       |       |       |       |       |        |            |        |        |        |        |        |        |        |        |        | • •      |

Calculated Schedule View



# High-Level Model to FPGA blocks





**INSTRUMENTS** 

# **RF Design Flow**



### **Related Frameworks**

- Ptolemy II
  - + Prominent framework for exploring MoCs
  - Code generation for HW not fully developed
- Grape-II
  - + Facilitates emulation of SDF/CSDF on FPGAs
  - Lacking in smooth integration of IP
- LabVIEW FPGA
  - + Commercial framework for generation of HW from dataflow models
  - No synthesis and optimization of multi-rate models
- Xilinx System Generator
  - + Commercial framework for HW generation from SR and DT models
  - Not suitable for dataflow models/ limited HW optimization
- Agilent System Vue
  - + Support expressive dataflow models/ libraries for RF+DSP applications
  - No path to implementation on specific HW targets
- · Open DF and CAL
  - + HW generation from dataflow models/ generates VHDL
  - Less analysis options/ Limited support for integration of commercial IP



### DSP Design Module – Summary

- Simplifies creation of complex DSP subsystems targeted for FPGA deployment, allowing
  - Fast prototype of real-time FPGA-based DSP subsystems
  - Integration of rich signal processing IP libraries that exploit FPGA and surrounding DSP fabric
  - Design of signal processing IP blocks with LabVIEW FPGA or by importing third-party IP
  - Exploration of design trade-offs between timing requirements and resource constraints



## **Tool Flow**





# Tool Flow (Focus Areas)



### **GUI Screenshot**

|                                                                             | LabVIE           | W DS                          | P Design I | Module  | - RMS.I        | vdsp                                                                     |     |     |            |           |         |            |           |          |        | • ×      |
|-----------------------------------------------------------------------------|------------------|-------------------------------|------------|---------|----------------|--------------------------------------------------------------------------|-----|-----|------------|-----------|---------|------------|-----------|----------|--------|----------|
| Home                                                                        | Ex               | plore                         | View       |         |                |                                                                          |     |     |            |           |         |            |           |          |        | 0        |
| Apply Show<br>Buffer Sizes Schedu                                           |                  | Sample Counts E Fused Regions |            |         |                | Execution Time (cycles)     53       Initiation Interval (cycles)     27 |     |     |            |           |         |            |           |          |        |          |
| Analysis                                                                    |                  |                               |            | Annota  | tions          |                                                                          |     |     | Diagr      | am Execu  | ution   |            |           |          |        |          |
| Palette Palette Input/Output                                                |                  | 8                             | <b>×</b>   |         |                |                                                                          |     |     | uare.lvdsp |           |         |            |           |          |        | ^<br>    |
| <ul> <li>Signal Source</li> <li>Stream Manipulat</li> <li>Delay</li> </ul>  |                  |                               | E          | Input 3 | <u>1</u> [132. | 32 💷                                                                     | 4   | x   |            | (X) -1-3- | 1-Vx 1- | 1-         | ·1(I      | I) I33.3 | 3 31 ( | Dutput   |
| <ul> <li>Downsample</li> <li>Upsample</li> <li>Distribute Street</li> </ul> |                  |                               |            |         |                |                                                                          |     |     | Abs.lvdsp  |           |         |            |           |          |        |          |
| Interleave Stream     41-4- X     SumAbs(X)       Index Stream     1        |                  |                               |            |         |                |                                                                          |     |     |            |           |         |            |           |          |        |          |
| Errors & Warnings                                                           | s <mark>S</mark> | <mark>chedu</mark>            | lle        |         |                |                                                                          |     |     |            |           |         |            |           |          |        |          |
| Block                                                                       | 1 2              | <b>_</b>                      | 0          | 25      | 50             | 75                                                                       | 100 | 125 | 150        | 175       | 200     | 225        | 250       | 275      | 300    | 325      |
| SumAbs.lvdsp                                                                | 1                | 4                             |            |         |                |                                                                          |     |     |            |           |         |            |           |          |        | <b>n</b> |
| SumSquare.lvdsp                                                             | 1                | 11                            |            |         |                |                                                                          |     |     |            |           |         |            |           |          |        | =        |
| Square Root                                                                 | 1                | 8                             |            |         |                |                                                                          |     |     |            |           |         |            |           |          |        |          |
| Subtract                                                                    | 1                | 1                             |            |         |                |                                                                          |     |     |            |           |         |            |           |          | [      |          |
|                                                                             |                  |                               | •          |         |                |                                                                          |     |     |            |           |         |            |           |          |        | Þ        |
| Errors & Warnings * Schedule *                                              |                  |                               |            |         |                |                                                                          |     |     |            |           |         |            |           |          |        |          |
|                                                                             |                  |                               |            |         |                |                                                                          |     |     |            |           | 1       | 14.00, 226 | 5.00 100% | 0        | Q      |          |



# **MoCs for Streaming Applications**



<u>rtey trade on</u>. Analyzability vo. Expressionity

[1] Edward A. Lee, "Concurrent Models of Computation for Heterogeneous Software", EECS 290, 2004



### **Analysis and Optimization Features**

- Core dataflow optimizations
  - Model validation
    - $_{\circ}$  Deadlock detection and boundedness check
  - Throughput and latency computation
  - Buffer size optimization (under throughput constraints)
  - Schedule computation
- Hardware specific optimizations
  - Resource constrained schedule computation
  - Retiming and fusion
  - Rate matching
  - IP interface synthesis

[1] S. S. Bhattacharyya, P. K. Murthy and E. A. Lee, "Software Synthesis from Dataflow Graphs," Kluwer Academic Publishers, Norwell, Mass, 1996.



### **Design Capture in DSP Design Module**





# Synthesis Results for OFDM Rx/Tx Example

| Resource<br>Name | Available Resource<br>Elements | Transmitter<br>Utilization | Receiver<br>Utilization |
|------------------|--------------------------------|----------------------------|-------------------------|
| Slices           | 14,720                         | 43.1%                      | 79.2%                   |
| Slice Registers  | 58880                          | 21.6%                      | 54.6%                   |
| Slice LUTs       | 58880                          | 24.7%                      | 57.3%                   |
| DSP48s 640       | 640                            | 2.7%                       | 8.3%                    |
| Block RAM        | 244                            | 8.2%                       | 19.7%                   |





### Successful collaboration with UCB

### Correct and Non-Defensive Glue Design using Abstract Models-

Stavros Tripakis University of California Berkeley, CA, USA stavros@eecs.berkeley.edu

Current hardware design practice often relies on integration

of components, some of which may be IP or legacy blocks While integration eases design by allowing modularization

and component reuse, it is still done in a mostly ad hoc

manner. Designers work with descriptions of components

that are either informal or incomplete (e.g., documents in

English, structural but non-behavioral specifications in IP-

XACT) or too low-level (e.g., HDL code), and have little to

no automatic support for stitching the components together

This paper addresses this problem using a model-based approach. The key idea is to use high-level models, such

as dataflow graphs, that enable efficient automated analy-sis. The analysis can be used to derive performance proper-

ties of the system (e.g., component compatibility, through

put, etc.), optimize resource usage (e.g., buffer sizes), and

ven synthesize low-level code (e.g., control logic). However,

these models are only abstractions of the real system, and

often omit critical information. As a result, the analysis outcomes may be defensive (e.g., buffers that are too big) or

even incorrect (e.g., buffers that are too small). The paper

examines these situations and proposes a correct and non-defensive design methodology that employs the right models

<sup>\*</sup>This work was supported in part by the Center for Hybrid and Embedded Software Systems (CHESS) at UC Berkeley, which receives support from the National Science Foundation (NSF awards #0720882 (CSR-EHS: PRET) and #0931843 (Action-Webs), the U.S. Army Research Office (ARO #W911NF-07-2-

0019), the U. S. Air Force Office of Scientific Research (MURI #FA9550-06-0312), the Air Force Research Lab (AFRL), the

#FA955408-16312), the Air Force Research Lab (AFRL), the Multiscale Systems Contert (MUSyC), one of six research centers funded under the Focus Center Research Program, a Semionduc-tor Research Corporation program, and the following companies: Bosch, National Instruments, Thales, and Teyota. This work was also supported by direct contribution and funding from the Na-las supported by direct contribution and funding from the Na-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to poor on servers or to midstribute to lists, requires prior specific specific terms of the server of the midstribute to lists.

permission and/or a ree. CODES+ISSS'11, October 9–14, 2011, Taipei, Taiwan.

Copyright 2011 ACM 978-1-4503-0715-4/11/10 \_\_\$10.00

to explore accurate performance and resource trade-offs.

Providing such support is the glue design problem.

ABSTRACT

Hugo Andrade, Arkadeb Ghosal Rhishikesh Limave Kaushik Ravindran Guoqiang Wang, Guang Yang National Instruments Corp. Berkeley, CA, USA {first.lastname}@ni.com National Instruments Austin, TX, US/ {first.lastname}@r

> Categories and Subject Descriptors B.6.3 [Hardware]: Design Aid:

Jacob Korner

Ian Wong

General Terms

Design, Theory, Verification

### Keywords

Glue design, Dataflow, Abstraction, Non-defensiv

### 1. INTRODUCTION

Both hardware and software design have h evolved toward higher-level models and languages ware, programming languages have evolved from to structured to object-oriented programming. design has evolved from transistor and gate layou synthesis to high-level synthesis. This evolution sometimes called "raising the level of abstraction the designer to focus on design properties that mat while hiding lower-level details. Abstraction is es order to manage the ever-increasing size and com designs

Another aspect of modern hardware design flow important in coping with large and complex system support for component-based design. This is manif support tor component-oased acsign. I mis is main methods that rely on integration of components su tellectual Property (IP) blocks from native and th sources, for instance, Xilinx CoreGen [36], Nations ments LabVIEW FPGA [25], or the OpenCores lib Complex designs are created by stitching togethe

While component-based design allows for modu and component reuse, integration is still an ad hos lacking the support of rigorous methodology, the tools. In particular, the design of the requisite of cation and control logic to connect the blocks is and error-prone process. The interfaces of these l pose low-level control and timing artifacts that the must manually reconcile to create systems that are valid (i.e., functionally correct) but also meet per requirements (e.g., throughput and area con call this the glue design problem.

In this paper, we approach this problem as follo stract, high-level models, called actors, are first cor for individual components. The actors are then compo

Abstract-Application advances in the signal processing and communications domains are marked by an increasing demand for better performance and faster time to market. This has motivated model-based approaches to design and deploy such ap-plications productively across diverse target platforms. Dataflow models are effective in capturing these applications that are moures are energies in capturing uses appreciations that are real-time, multi-rate, and streaming in nature. These models facilitate static analysis of key execution properties like buffer sizes and throughput. There are established tools to generate implementations of these models in software for processor targets. However, prototyping and deployment on hardware targets, such as FPGAs, are critical to the development of new applications FPGAs are increasingly used in computing platforms for high performance streaming applications. Existing tools for hardware implementation from dataflow models are limited in their capa-bilities. To close this gap, we present DSP Designer, a framework to specify, analyze, and implement streaming applications on hardware targets. DSP Designer encourages a model-based design approach starting from a Parameterized Cyclo-Static Dataflow model. The back-end supports static analysis of execution prop erties and generates implementations for FPGAs. It also include: an extensive library of hardware actors and eases third-party IP integration. Overall, DSP Designer is an exploration framework that translates high-level algorithmic specifications to efficient hardware. In this paper, we illustrate the modeling, analysis, and implementation capabilities of DSP Designer. Through a detailed case study, we show that DSP Designer is viable for the design of next generation signal processing and communications systems.

### I. INTRODUCTION

Dataflow models are widely used to specify, analyze, and implement multi-rate computations that operate on streams of data. The Static Dataflow (SDF) model of computation is wellknown for describing signal processing applications [1]. An SDF model is a graph of computational actors connected by channels that carry streams of data. The semantics require the number of data tokens consumed and produced by an actor per firing be fixed and pre-specified. This guarantees decidability of key execution properties, such as deadlock-free operation and bounded memory requirements [2].

Over the years, several extensions of SDF have been developed that improve the expressiveness of the model while preserving decidability, such as Cyclo-Static Dataflow (CSDF) [3], Parameterized Static Dataflow (PSDF) [4], Heterochronous Dataflow (HDF) [5], Scenario-Aware Dataflow (SADF) [6], and Static Dataflow with Access Patterns (SDF-AP) [7]. Complementing these modeling advances, algorithmic solutions for static analysis have been studied in depth. Viable techniques have been developed for computation of throughput, buffer sizes, and schedules [2] [8] [9].

The expressiveness of dataflow models in naturally cap-

time analyzability properties, has made them popular in the domains of multimedia, signal processing, and communications. These high level abstractions are the starting points for model-based design approaches that enable productive design, fast analysis, and efficient correct-by-construction implementations. Ptolemy II [10], LabVIEW [11], and Simulink [12] are examples of successful tools built on the principles of modelbased design from dataflow models.

From Streaming Models to FPGA Implementations

Hugo Andrade, Jeff Correll, Amal Ekbal, Arkadeb Ghosal, Douglas Kim, Jacob Kornerup, Rhishikesh Limaye,

Ankita Prasad, Kaushik Ravindran, Trung N Tran, Mike Trimborn, Gerald Wang, Ian Wong, Guang Yang

National Instruments Corportation, USA.

These tools predominantly deliver software implementations for general purpose and embedded processor targets. However, -increasing demands on performance of new applications and standards have motivated prototyping and deployment on hardware targets, such as Field Programmable Gate Arrays (FPGAs). FPGAs are integral components of modern computing platforms for high performance signal processing Surprisingly, few studies have been directed to the synthesis of efficient hardware from dataflow models.

The configurability of FPGAs and constraints of hardware design bring unique implementation challenges and performance-resource trade-offs. FPGAs permit a range of implementation topologies of varying degrees of parallelism and communication schemes. Fine-grained specification of actor execution at the cycle level enables execution choices between fully specified static schedules and more flexible self-timed schedules. Communication between actors could be through direct wires, handshake protocols, shift registers, shared registers with scheduled access, or dedicated FIFO buffers. Each mechanism poses different requirements on the interface and glue logic to stitch actors. Finally, a key requirement for hardware design is the integration of pre-created configurable intellectual property (IP) blocks. Hardware actor models must canture relevant variations in data access patterns and execution characteristics of different configurations.

We address these challenges with DSP Designer, a framework for hardware-oriented specification, analysis, and implementation of streaming dataflow models. The intent is to enable DSP domain experts to express complex applications and performance requirements in algorithmic manner and to auto-generate efficient hardware implementations. The main components of DSP Designer are: (a) a graphical specification language to design streaming applications, (b) an analysis engine to validate the model, select buffer sizes and optimize resource utilization to meet throughput constraints, and perform other pertinent optimizations, and (c) implementation support to generate an efficient hardware design and deploy it on Xilinx FPGAs. The specification is based on the Parameterized turing streaming applications, coupled with formal compile Cyclo-Static Dataflow (PCSDF) model of computation, which

A Heterogeneous Architecture for Evaluating Real-Time One-Dimensional Computational Fluid Dynamics on FPGAs

Matthew Viele Isaac Liu, Edward A. Lee Electrical Engineering and Computer Science R&D UC Berkeley Drivven, Inc Berkeley, California, USA Elizabeth, Colorado, USA {liuisaac, eal} @eecs.berkeley.edu mviele@drivven.com

Guoqiang Wang, Hugo Andrade R&D National Instruments Corp. Berkeley, California, USA {gerald.wang, hugo.andrade}@ni.com

-Many fuel systems for diesel engines are developed help of commercial one-dimensional computational fluid (ID CFD) solvers that model and simulate the behavior ow through the interconnected pipes off-line. This paper a novel framework to evaluate 1D CFD models in real an FPGA. This improves fuel pressure estimation and loop on fuel delivery, allowing for a cleaner and more engine. The real-time requirements of the models are y the physics and geometry of the problem being solved. amework, the interconnected pipes are partitioned into i sub-volumes that compute their pressure and flow rate e step based upon neighboring values. We use timingnchronization and multiple Precision Timed (PRET) cores to ensure the real-time constraints are met. ng the programmability of FPGAs, we use a configeterogeneous architecture to save hardware resources,

second order effects such as cavitation and thermal gradients that are taken into account in the GT-SUITE calculations. The second order effects are small, but important for designing a well-behaved system. However, there is a salient distinction between an off-line research-oriented approach like GT-SUITE and a real-time approach like the one presented here. So long as the real-time code is sufficiently accurate to allow improved fuel pressure estimation, it can close the loop of fuel delivery, allowing for a more precise air/fuel ratio control and thus a cleaner and more efficient engine.

1D CFD is used when the system to be evaluated can be described as a network of pipes. The advantage of 1D CFD over its 2D and 3D cousins is the greatly reduced number of nodes to be solved, and the simplified equations in each ransient

| tic | Dataflow with Access Patterns: |  |
|-----|--------------------------------|--|
|     | Semantics and Analysis         |  |

hosal\*, Rhishikesh Limaye\*, Kaushik Ravindran\*, Stavros Tripakis\*\* ita Prasad\*, Guoqiang Wang\*, Trung N Tran\*, Hugo Andrade\* nstruments Corp., Berkeley, CA, USA, (firstname.lastname)@ni.com rsity of California, Berkeley, CA, USA, starvos@eecs.berkeley.edu

ltimedia applications are commonly sclo-Static Dataflow (SDF/CSDF) plicitly specifies how much data is per firing during computation. This e-time analyzability of many usesuch as deadlock absence, channel hput. However, SDF/CSDF is limire how data is accessed in time els often leads to implementation , use more resources than neces ., use insufficient resources). In model called Static Dataflov F-AP) that captures the timing of roduction and consumption). This intics of SDF-AP, defines key propecution and discusses algorithm inder correctness and resource con sented to evaluate these analysi estimate the resources needed. oplications modeled by SDF-AP.

ct Descriptors: C.3 [Special-ion-based Systems]: Signal pro-

y, Algorithms, Experimentation nantics, access patterns

ON

Sta

is a model of computation to spec ent multi-rate computations that ms of data [13]. An SDF model ted graph of computational actors channels. The SDF semantics ref tokens consumed and produced by and pre-specified. This guarantee properties: existence of deadlock ed infinite computation, through on schedule [1, 13]. The expres

siveness of the SDF model in naturally capturing stream ing applications, coupled with its strong compile-time pre-

Permission to make digital or hard copies of all or part of this work for Permission to imate digital or hard copies of all or part of this work for perional or classroom use is granted without fee provided that copies are not make or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to indistribute to lists, requires prior specific permission and/or a fee. *IAX 2012*, June 3-7 2012, San Francisco, California, USA Copyright 2012 CAU 8791-43053-11199-112065. S10.00.

dictability properties, has made it popular in the domains of ultimedia, digital signal processing, and communications. While the standard SDF model is untimed, it is a common practice to associate worst-case execution time (WCET) mod-els to analyze the timing behavior of applications [7, 12, 14, 15, 20]. These timing annotations enable static analysis of SDF models and mapping solutions to specific platforms un der resource and performance constraints. Worst-case tim ing models have been applied to capture execution behavior of SDF actors for software and hardware implementations. However, these timing models suffer a key deficiency: they lose information about the precise timing of consumption and production of tokens by an actor during a firing cycle The problem is particularly evident when SDF models are In probability products are used to capture hardware implementations. Many hardware IP blocks require that data tokens be delivered to them at precisely specified clock cycles from the start of execution. This loss of timing information in SDF models results in suboptimal analysis and implementations that conservat

For example, consider a design connecting a producer Pto a consumer C. P produces 1 token per firing and executes in 1 time unit, and C consumes 8 tokens per firing and executes in 8 clock cycles. Suppose that the IP block implementing C requires 8 tokens to be delivered in 8 con-secutive cycles. Unfortunately, the SDF timing model is not sufficiently expressive to capture this behavior. The seman tics of SDF assumes that an actor cannot start firing until sufficient tokens are present at the inputs. As a result, if the above example is modeled with SDF, C cannot start firing until after P completes eight firings. Therefore, a buffer of size at least 8 must be added between P and C: C may start c acception only after the buffer has collected 8 tokens from . While this is a valid implementation, it is sub-optimal in terms of allocation of buffer resources. In contrast, a better mplementation can exploit knowledge about the behavior of C and determine that a buffer of size one is sufficient. Cyclo-Static Dataflow (CSDF) [2] is a generalization of SDF that appears to resolve the problem. CSDF "breaks" a firing into finer-grained phases, and specifies consumption and productions of tokens for each phase. But CSDF still relies on the same basic hypothesis as SDF, i.e., that an actor will wait until sufficient tokens have accumulated at the input channels before beginning a phase. Unfortunately

this hypothesis violates requirements related to the precise timing of token accesses. In the example above, C requires that it receive 8 tokens in 8 consecutive clock cycles once it commences firing. CSDF cannot capture this constraint and as a result can lead to incorrect implementations [19

er.  $\Delta t$  is step. For s than 1 tance, if etization

o makes

a highly

of fluid

electrical

each of

set of

e of the ed, such

ion path

nnect of

solve

by the

ement is

ormation

tional Instruments Corporation.

ission and/or a fee.

### **Trends in Future Computational Platforms**

- Rapid increase in multi/ many core processors
- Convergence of architectures
- Gain in performance (speed, memory etc)
- Sophisticated power/ thermal management
- Unreliability from manufacturing technology
- NoC, high speed memory interface, specialized IO, reconfigurable fabric etc ...





# **Future Heterogeneous Architectures**

E XILINX. Zyng Extensible Processing Platform



### Synthesis on Heterogeneous Platforms

 Motivation: To develop automatic system-level synthesis and exploration framework to deploy high-level application specifications onto heterogeneous platforms

### Goals:

- Develop system-level language for the domain expert
- Improve productivity while maintaining performance
- Provide exploration framework to evaluate cost/quality, and derive optimal platform/ mapping
- Allow system-level simulation/verification/validation to ensure model requirements



### Y-Chart: A Disciplined System Design Methodology



### Challenges

- Models for heterogeneous platform architectures
  - Computation, communication, I/O, Storage, UI, Cloud
- Mapping and Optimization (for distributed computation and communication)
  - Allocation, binding, reusing and scheduling
- Appropriate application description level
  - Models of computation, Domain specific knowledge
- System level validation
  - Testing, simulation, verification

