Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
Previous
Goal
Current
The End
Conclusion
Question: Is it possible to construct a language agnostic backend for high-level languages without sacrificing performance?
CAPE: C-Targeting Array Processing Engine
Language agnostic
Support a programming model not a specific language
Programming Model
Language integration via intermediate representation
Efficient
Target a performance comparable to straight forward hand-coded C/C++ for the same application
Question: Is it possible to construct a language agnostic back for high-level languages without sacrificing performance?
Language Integration
Internal Representation BhIr
Transformations
What
Why
NIELS BOHR INSTITUTE
Future / Ongoing Work
FACULTY OF SCIENCE
UNIVERSITY OF COPENHAGEN
DENMARK
CAPE: C-Targeting Array Processing Engine
Productivity
numerically intensive expression language for science
Reconfigurable: BH_STACK=[cape,cluster_proxy,gpu]
Array Descriptor
Performance
Array Operations
Programming Pitfalls
Performance
Correctness
#directives
C / C++ / Fortran
OpenMP / pthreads / Qthread
MPI
OpenACC / LEO
PGAs
OpenCL / CUDA
and clusters of them configured in
shared and distributed memory systems...
Simon Andreas Frimann Lund
Mads R. B. Kristensen
Brian Vinter
https://www.quantnet.com/mfe-programs-rankings/
November 13, 2016
WOLFHPC 2016 in conjunction with SC16
CAPE: C-Targeting Array Processing Engine
Array Operation Fusion
Fusion Fail
C99
Experimental
Greedy
Optimal
Encore
CAPE: C-Targeting Array Processing Engine
Performance
Codegen specialization
Codegeneration
CAPE: C-Targeting Array Processing Engine
Allocation and de-allocation of buffers backing array storage
Alignment
GPUs and Accelerators
CAPE codegen takes SIMD into consideration, however, current implementation relies on auto-vectorization by the backend C-compiler
Brittle, example:
gcc often fails where commercial compilers prevail.
Simplistic approach by using #pragma omp simd [...]
did not yield expected results
Investigate further and possibly expand codegen with intrinsics or explicit means of ensuring the compiler that vectorization makes sense.
Thanks
Implemented with HWLOC
Multi-core and MIC
CAPE: C-Targeting Array Processing Engine
Baseline: Python/NumPy
Baseline: Serial C99
python [-m bohrium] benchmark.py
CAPE-AC: Without array contraction
CAPE: WITH array contraction
https://github.com/bh107/benchpress.git rev. 0aa2942
https://github.com/bh107/bohrium.git rev. b4d3586
www.erda.dk/public/archives/YXJjaGl2ZS0xSWhQSmU=/published-archive.html
User knows his high-level
array-oriented programming
Knows the configuration
of the computing system
Combined: the high-level array-oriented programming model and its declarative nature provides implicit data-parallel operations and freedom for the backend to decide how to efficiently compute them.