Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

Previous

Goal

Current

The End

Conclusion

Backend Design Criteria

Question: Is it possible to construct a language agnostic backend for high-level languages without sacrificing performance?

CAPE: C-Targeting Array Processing Engine

Language agnostic

Support a programming model not a specific language

Programming Model

  • High-level
  • Declarative
  • Array-oriented

Language integration via intermediate representation

Efficient

Target a performance comparable to straight forward hand-coded C/C++ for the same application

Question: Is it possible to construct a language agnostic back for high-level languages without sacrificing performance?

Language Integration

  • Map abstractions
  • vector bytecode - intermediate representation

Internal Representation BhIr

  • Annotated vector bytecode

Transformations

  • Optimization
  • Normalization
  • Fusion, grouping bytecode sequences
  • Code generator for array operations with parallelization and composition of multiple array operations

  • Caching JIT-Compiler and object storage for array operation kernels

  • Runtime instrumenting compilation, buffer management, array operation scheduling and execution

What

Why

Bohrium: a virtual machine approach to portable parallelism

Mads R.B. Kristensen, Simon A.F. Lund, Troels Blum, Kenneth Skovhede, Brian Vinter.

In proceedings of the Parallel & Distributed Processing Symposium Workshops (IPDPSW14)

NIELS BOHR INSTITUTE

Implementation Scope

Tools of the trade

Future / Ongoing Work

FACULTY OF SCIENCE

UNIVERSITY OF COPENHAGEN

DENMARK

CAPE: C-Targeting Array Processing Engine

Productivity

niels

numerically intensive expression language for science

Reconfigurable: BH_STACK=[cape,cluster_proxy,gpu]

Interactive environment via

Array Descriptor

Performance

Automatic Mapping

of Array Operations

to Specific Architectures

Array Operations

  • Element-wise aka map, zip operator over array(s)
  • Reduction
  • Scan

Programming Pitfalls

Performance

Correctness

  • Deadlocks
  • Race-conditions

#directives

C / C++ / Fortran

OpenMP / pthreads / Qthread

MPI

OpenACC / LEO

PGAs

OpenCL / CUDA

for efficient

hardware utilization

CPUs,

APUs,

Hybrid,

FPGA,

and clusters of them configured in

shared and distributed memory systems...

Heat Equation in C and OpenMP and MPI with Latency Hiding

Heat Equation in C and OpenMP and MPI

Heat Equation in C

Heat Equation in C and OpenMP

Heat Equation in Python / NumPy

Bohrium Processing Unit

Goal: ASIC for executing Bohrium Bytecode

Simon Andreas Frimann Lund

Mads R. B. Kristensen

Brian Vinter

  • High flops-to-watt ratio
  • Low latency
  • FPGA prototype
  • Languages used by TOP15 Computational Finance / Financial Engineering / "Quant Programs"

https://www.quantnet.com/mfe-programs-rankings/

  • As well as the University of Copenhagen
  • HIPERFIT industry partners with an affinity for APL

  • Collaborative effort
  • There is even more to it

November 13, 2016

WOLFHPC 2016 in conjunction with SC16

HIPERFIT

CAPE: C-Targeting Array Processing Engine

Array Operation Fusion

Fusion Fail

C99

  • OpenMP

Experimental

  • OpenACC
  • LEO

Greedy

Optimal

  • Improve mathematical models for Finance
  • Express them in verifiable Domain-Specific Languages (DSLs)
  • Execute them efficiently on High Performance Systems

Encore

Fusion of Parallel Array Operations

Mads R.B. Kristensen, Simon A.F. Lund, Troels Blum, James Avery.

In proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT16).

CAPE: C-Targeting Array Processing Engine

CAPE: Xeon PHI

Performance

  • best case 2x speedup
  • often speeddown
  • WHY!?

Allocation ~250MB/s

Transfer ~5.5GB/s

Codegen specialization

  • Flattening
  • Array contraction
  • Array operation composition
  • Array shape => Loop constructs
  • Checks buffer-references for aliasing

Data management

  • device allocation
  • transfer to/from device
  • data persistence

https://github.com/safl/offload/tree/master/mic

make broken

./broken

Codegeneration

  • parallization
  • specialization

CAPE: C-Targeting Array Processing Engine

SIMD utilization

Memory Management

Allocation and de-allocation of buffers backing array storage

Alignment

GPUs and Accelerators

  • data transfer: host <-> device
  • data persistence: buffer-reuse on device

Software Victim cache

  • Delay de-allocation
  • Reuse buffers

Thread Management

CAPE codegen takes SIMD into consideration, however, current implementation relies on auto-vectorization by the backend C-compiler

Brittle, example:

gcc often fails where commercial compilers prevail.

Simplistic approach by using #pragma omp simd [...]

did not yield expected results

Investigate further and possibly expand codegen with intrinsics or explicit means of ensuring the compiler that vectorization makes sense.

Thanks

Doubling the Performance of Python/NumPy With Less Than 100 SLOC

Simon A.F. Lund, Kenneth Skovhede, Mads R.B. Kristensen, Brian Vinter. In proceedings of the 3rd Python for High Performance and Scientific Computing (PyHPC13@SC13)

Implemented with HWLOC

Multi-core and MIC

  • # threads
  • Control core/thread affinity

CAPE: C-Targeting Array Processing Engine

Baseline: Python/NumPy

Baseline: Serial C99

python [-m bohrium] benchmark.py

CAPE-AC: Without array contraction

CAPE: WITH array contraction

https://github.com/bh107/benchpress.git rev. 0aa2942

https://github.com/bh107/bohrium.git rev. b4d3586

www.erda.dk/public/archives/YXJjaGl2ZS0xSWhQSmU=/published-archive.html

User knows his high-level

array-oriented programming

Knows the configuration

of the computing system

GPUs,

Accelerators,

Command-line interface via docopt

Combined: the high-level array-oriented programming model and its declarative nature provides implicit data-parallel operations and freedom for the backend to decide how to efficiently compute them.

Learn more about creating dynamic, engaging presentations with Prezi