Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Copy of Embedded Image Processing using ARM based FPGA

No description
by

brera romeos

on 11 December 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Copy of Embedded Image Processing using ARM based FPGA

Embedded Image Processing using ARM based FPGA
Danielle Sullivan and Eliza Bailey
May 6th, 2014
Swarthmore College

FPGAs and Image Processing
Altera Cyclone V SoCKit FPGA
Major Benefits of FPGAs
Performance
Time to Market
Cost
Long-Term Maintenance

System on Chip (SoC) FPGA Benefits
Improve system performance through high-bandwidth interconnect between Hard Processor System (HPS) and FPGA
Configurable
Hardware
Interface between software and hardware
Lower power
Design process simplified
One chip compared to two
Embedded ARM based FPGA
ARM based SoC FPGA combines the following in a single SoC:
FPGA
HPS (microcontroller)
HPS includes:
multiport memory controller
peripheral elements
flash memory
dual-core ARM processor (CPU/ microprocessor)
System Development Flow
System Development Flow Cont.
Hardware
JTAG - FPGA fabric programming
e.x. GHRD
Software
Embedded Linux - precompile image saved on SD card
Serial over USB - serial communication with embedded Linux
via PuTTY
Ethernet -
Enables communication with embedded Linux file system
Remote debugging

Cross Compilation
Cross Compiler - compiler which is capable of creating executables for systems other than which the compiler is running on
How This applies to us:
We separate our build environment from the target environment by pre-compiling our vip_mpeg file
Embedded Linux is our target environment

Altera Golden Reference Design
Tested and confirmed functional HPS configuration
Setup peripheral functions
FPGA Configuration
Configured Qsys system
integration between the modules
Enable SW/HW handoff
GHRD is very specific design, control very specific customizable peripherals
Configuring this setup gave us experience in:
Design flow process
System integration within SoC designs

Demo 1: Blinking LED
Demo2: VIP Reference Design
Comparing Hardware and Software
Hardware vs. Software run-time comparison of bouncing picture-in-picture application
Processing of application run on the Embedded ARM FPGA architecture versus the pure software implementation on CPU via OpenCV's Python platform
Conclusions
OpenCV implementation had run-time of 3 min and 33 sec compared to 15 min and 46 sec SoC run-time
Using hardware for image processing move function was faster compared with Open CV
memcpy was long and misleading
copying data from DDR3 on HPS to DDR3 on FPGA


Thank you for listening.

Specifically we would like to thank:
Professor Chen-Huan Chiang
Edmond Jaoudi

QUESTIONS?
PuTTY - Cross Compilation and Debugging
SoCKit Board Development
Hardware
Qsys - Make appropriate changes to VIP mixer demo parameters
MatLab - Generate new logo
Memory Initialization File (.mif)

Software
Modify C Code to meet new design needs
Change sizing parameters
Modify moving function to our new needs
OpenCV Implementation
Developed our program via OpenCV's Python extension
optimized for fast image reading, display, and processing

Program Summary
One output image, three different portions of image
large image
smaller second image
logo
Values changes by a set delta x and y value
If the smaller picture hits the boundary of the window, the delta x or y values are negated
Timing
Function Run-time Analysis
SoC:
get_tick_cout()
reading, decoding, moving

OpenCV:
time.time()
reading, moving
Lab 1
FPGA - purely hardware approach (no processor involvement)
No software
Lab 2
Hardware used the Golden Reference Design
Provide link from one pin of CPU to LED output
Software
GPIO running in the embedded Linux
Demo: OpenCV Implementation
Results - Moving Function
Final application shows the lag resultant in the background frame of our SW/HW version
Partially attributed to the slow reading time and decoding times
OpenCV reading time of the background image is on average 74.4 times quicker than the libmpeg2 equivalent
libmpeg2 reading of layer 1 is also slower
So small resulting in no delay visible to the naked eye
OpenCV's reading function incorporates both reading an decoding

The move function, was approximately 3 000 times faster in FPGA compared to OpenCV

Real time processing of images is computationally expensive
As image technology progresses, pixel number and quality are growing quickly
Image processing requires an increasingly large amount of processing power
These computationally intensive tasks’ performance could be greatly improved with the parallelism offered by an FPGA
Motivation
Qsys
Compare run-times of software and hardware image processing
Gain familiarity with Embedded Design System for SoC FPGA
Learn IP Core-based SoC design methodology
ex: Altera Video Image Processing (VIP) Design Suite with Qsys
Solution combining OpenCV with SoC platform would allow to take advantage of each systems benefits
Comparing SoC and OpenCV implementations for more complex image processing techniques could result in run-time gains
Board does not support real time video input

Future Work
VGA
Ethernet
USB
JTAG
USB
UART
Power
Results
Software Implementation
3min 33s
SoC Implementation
15min 46s
Results - Moving Function
Timing in the software part of the SoC only incorporates the time needed to write the data to the registers
Hardware moving is done in parallel with software
Hardware cannot be timed in software so we have to rely on hardware latencies to approximate the time
Approximations found by multiplying the clock period time by the latencies (1/300MHz *(<1 + <1 + <1 +3))
input
FR
CPS
MIX
CVO
output
< 1
< 1
< 1
3
[1] Altera. Cyclone V SoC Development Kit User Guide. Altera Organization, November 2013.
[2] Altera. Video and Image Processing Suite User Guide. Altera Organization, February 2014.
[3] Altera Corporation. SoC Embedded Design Suite. Altera.
[4] Altera Corporation. Altera video image process- ing (vip) solution. Online PPT, 2008.
[5] Altera Corporation. Altera SoC Embedded De- sign Suite User Guide. Altera, November 2013.
[6] Altera Corporation. Embedded Linux Getting Started Guide. Altera, March 2013.
[7] Steven Kravatsky. Arrow sockit evaluation board - how to boot linux. RocketBoards.org publication, April 2014.
[8] Steven Kravatsky. Gsrd user manual - arrow sockit edition. RocketBoards.org publication, March 2014.
[9] RocketBoards.org. DC934 Linux Application Users Guide For the SoCKit Board, May 2013.
[10] Y. Sorel. Real-time embedded image pro- cessing applications using the algorithm archi- tecture adequation methodology. In Proceed- ings of IEEE International Conference on Im- age Processing, ICIP’96, Lausanne, Switzerland, September 1996.
References
LED
3 000 Speed Up
Demo: SoC Implementation
Full transcript