Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

Zynq and HLS accelerate the SDR design process

HW Architecture

- Viterbi Decoder

- axi DMA

- Filter RX

Demo

Design Process - Details

SDK

BIN Deployment

SDK

IP Integration

HW Accelerator Design Flow

Vivado

IP Integration

Vivado

Export SDK

BSP

HLS

IP Deployment

Vivado

Synthesis

Vivado

BitSteam

Output: Binary

Source Code

C / C++ Code

3.4 HW Accelerators

Vivado

Place & Route

HLS

Cosimulation

HLS

gcc Compiler

Minir Code Modification

Output: Bitstream

HLS

Synthesis and

HW Tunning

HLS

Code Verification

TestBench

Output: IP Block

3 HW accelerators have been designed:

-Demodulator, Viterbi Decoder and Filter48 Taps;

-The IP interface: axi-stream and axi lite

-The Maximum Data trasfer rate is 9.6GB/sec.

-The DMA IP is exploited

- Fixed floating point datatype is used.

- 100% of C Code Reused

- 0 % VHDL

- Cosimulation is VERY FAST!!!

How design an HW Accelerator

FIR 48 TAPS Example Exploiting AXI LITE

Data Type Float

AXI Lite @100MHz

SW Output took 3211.76 us.

HW Output took 4952.69 us.

1st HLS Opt. Pragma Pipe @125MHz

SW Output took 3217.09 us.

HW Output took 2981.16 us.

2nd HLS Opt. float to ap_fixed clk @150MHz

SW Output took 3224.38 us.

HW Output took 1549.92 us.

AXI LITE not the optimal solution to exchange data

Demo

2 Porting the C Code

3.1 Profiling

Porting Code To ARM

Using predefined options on BSP and gcc is possible to measure the timecall and total number of call of each functions . Gmon.out

HW Configuration Vivado

Intrinsic Code NEON & ASM NEON HW Floating Point

Quick overview about the HSL, Vivado and SDK

HDR Dem0

3.3 SW Optimization

3 System

Optimization

  • Platforms expolited Xilinx SDK 2014 and Vivado 2014.2
  • Input: HDR110b Simulator code (x86 platform)
  • Output: Binary and new C code (not Optimized)

1st Tx integration and verification

2nd RX integration and verification

Minor software modifications i.e.pointers management

Tips:

  • The RX is the critical part of the design
  • RX Perfomance : 1722 us per bit required 100 us about 17 time slower.
  • The code is not optimized to run on ARM
  • The data type double

ARM NEON OpenSource Libraries

There are 100 of ready to use ARM libraries, here after the ,most common one.

The RX code has been optimized exploting:

  • Intrinsic NEON instructions: Matrix inversion, Matrix multiplication.
  • Generic Code optimization
  • ARM optimized libraries
  • Compiler optimization options

Effort 200 h

Tips: The sotware modifications do not require to regenerate the bitstream neither the BSP. thus are simple and fast to check.

Effort: 200 h

Results

Results

3.2 Code Analysis

Performances: [Bitrate 3200 300 byte]

  • Initial processing time 1722 us x bit
  • Software optimization 187 us x bit
  • Hardware accelerators 147 us x bit

Required Processing time

2400 416us

3400 277 us

6400 156 us

8000 125us

9600 100 us

More than 95% of processing is allocated on SW.

Current Zynq design can demodulated up to 6400 bps

additional modifications are needed to accommodate 8000 and 9600 bps modulation.

FPGA occupancy < 15%

Decided Bit

AXi Interface

- The RX process is under the governance of a SM

- There is a Massive use of pointers

- There is a Massive use Matrix operations inv, multiplication etc

-Changing the data type from double to float or fixpoint, should

improve drastically the performance. However this option is not

efficient due to the extensive modifications to be made on the source code.

- The approach taken is to optimize the SW locally changing "ground" routing i.e. lsl lsr pow2 InvMatrix etc; moreover integrating HW accelerators that operate in pipeline with Processor, reducing the

processor workload.

Viterbi

Decoder

RX Receiver

Demod

Raw Sample

Tips: Using the Zynq 7030 we have estimated an additional improvement of about 20 % with no software or hardware changes.

System Definition

Multi Platform vs Single Platform

- HW Developer kit ZedBoard Zynq 7020

- Design Tools: Vivado, SDK and HLS

- Thirdpart libraries: RTOS and LWIP

1 System Definition

System boot Process

AMBA BUS

Traditional Flow

ZYNQ & HLS

Development

Platform

ALL IN ONE

Deployment

Platform

Effort : 160 h

Design

Platform

Finding

Is it possible to optimize the SDR design process

Traditional Flow Vs HLS Flow

Model Design C/C++

Model Design (Matlab C/C++)

SW Design C/C++

SW Design (ASM, C/C++)

FW Design C/C++

FW Design (VHDL & Verilog)

Assessment

System Integration

System C/C++

Integration

Large Team

4 people

Micro Team

1 engineer

- Zynq Solution vs DSP Solution

-Zynq is flexible and powerful

-Zynq is scalable 7030 7045 and 70100

-Zynq and HLS do not require HW background /ALL is C/C++

-Zynq has comprehensive ecosystem and large community

- HW cost and BOM comparison:

- Zynq solution leads to a simplified schematic

- The use of Zynq Platform contributes to extend the system life span

- Effort:

960h(DSP) Vs 560h(Zynq)

WaveForm Expert

HW platform Expert

100% Fully working Design

GP Engineer (HW)

New Platform and New Tools

Verify the effectiveness of Xilinx tools

- High Level Synthesis (HLS)

- Zynq HW platform

C. S.: Porting of Mil-std 188 HDR 110B-C

Introduction

How much it will cost ?

  • Dev. Kit start from 400 $ up to 5000 $
  • Tool 10K HLS license free in case of Dev. Kit. bundle
  • Tool 5k Vivado license free in case of Dev. Kit bundle

Extend the suite with Matlab and Simulink HIL i.e. MIDCAS

SDR HDR 110B Details

Current Implementation

  • Modem designed for Naval communication
  • Based on Mil-STD188 -110B HDR appendix C
  • Operates in HF with different data rate (from 2400 up to 12800 bps)
  • Modulation supported QPSK, 8PSK, 16QAM, 32 QAM and 64 QAM

ZYNQ & HLS

Zynq Architecture

1-2 Gops

HDR 110B

10-1000+Gops

RX Section

TX Section

Zynq Family

HLS

Vivado High-Level Synthesis accelerates IP creation by enabling C, C++ and System C specifications to be directly targeted into Xilinx All Programmable devices without the need to manually create RTL.

HLS Flow

Antonio Di Marzo

THKS,

Questions

By Antonio Di Marzo

Design Process

Introduction

  • HW & SW Architecture definition
  • Porting the current HDR code on Zynq
  • Profiling
  • Designing HW accelerators with HLS
  • Software optimization exploiting the NEON
  • Quick overview on SDR HDR 110B
  • Zynq Hw Platform KeyFeatures
  • HLS Overview
  • Xilinx Design Flow

DEMO

Assessment and lesson learned

  • Results
  • Reference and lectures

Learn more about creating dynamic, engaging presentations with Prezi