Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Scipting Language Performance Through Interoperability

Talk given at the High Performance Scripting Language (HPSL) at PPoPP15.
by

Simon Andreas Frimann Lund

on 23 March 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Scipting Language Performance Through Interoperability

Python/Chapel interoperability module
pyChapel
pychapel.rtfd.org
FFI
Module Compiler
Usage Examples
that turns
Python / Numpy
into an array language
npbackend
also known as the Python module
npbackend, how does that help?
And how do I use it?
npbackend - Usage and User Interface
Goals
Legacy support
Minimally Intrusive
Graceful degration

Enable Use of multiple "targets"
Bohrium
Numexpr
libgpuarray
pyChapel
... <insert your backend here>
Benchmarks
Shallow Water
Domain: 2000x2000
Iterations: 100
Heat Equation
Domain: 3000x3000
Iterations: 100
Measuring elapsed wall-clock in seconds
Comparing NumPy to npbackend with different targets
Comparing NumPy to NumPy through npbackend to measure overhead
Intel Xeon E5640, 2.66Mhz
12MB, LLC
96GB DDR3, Main memory
NVIDIA Geforce GTX 460, 1GB DDR5
With: GCC 4.8.2, OpenCL 1.2, Linux 3.13, Python 2.7, and NumPy 1.8.1.

Average of three runs, deviation from
the mean used as verification.
npbackend, facilitating target optimizations
Indirection Layer
Lazy / Deferred Evaluation

Construct numeric kernels
Loop fusion
Array Contraction
Dead code elimination
Common subexpressions
...
Examplified
Results
Numexpr x2
BohriumCPU x2
libgpuarray x3.7
BohriumGPU x12
Results
Numexpr x2.2
BohriumCPU x2.6
libgpuarray x5.6
BohriumGPU x18
Scipting Language Performance Through Interoperability
Productivity vs Performance
Choose one?
Language Concerns
R
Python
MATLAB
OpenMP/pthreads
CUDA
OpenCL/CUDA/OpenACC
MPI
GPUs,
Accelerators
APUs,
Hybrid,
FPGA
CPUs
For efficiency
Modeling
Visualization
Data Analysis
Experimentation


The Best of Both Worlds?
High Productivity Computing Systems (HPCS) aka
High Productivity High Performance Languages
X10
Fortress
UPC
Chapel
Parallel Programming Languages
Shared Memory
Distributed Memory
Convenient notation
DEAD
Java "ish"
C/C++ "ish"
Alive and well, OpenSource with an active team at Cray Inc. including academic collaborations
Designed from scratch with heritage / lessons learned from HPF and ZPL.
Multi-resolution: high-level abstractions implemented within the language itself, abstractions can be pealed off.
Chapel
pragmas
C / C++ / Fortran
Locale Abstraction
Hierarchical
Abstract unit of target architecture
Reasoning about locality and affinity

PGAS
Public vs Private determined by scoping

Express:

begin on Locale[0]
....
begin on node.left do
search(node.left)

Probing:
locale.(physicalMemory, id, name, coreCount, etc.)

Code-blocks / Expression
begin
cobegin
coforall
sync
serial
On Data
atomic variables
sync variables
Global-view abstractions
forall loops
High-level: A = B + alpha * C
Mapped to hardware with Domain Maps using Locales
Default strategies user-overridable
in one slide
First-class index-set
dense
sparse
strided
associative
un-structured

Handles:
Memory layout
Distribution
Domains
Loop constructs
Iterators
Parallel Iterators
I/O
Bringing it all together
Python Approaches
Implicit
Explicit
Intrusion
pyCuda
pyOpenCL
PyMIC
Hardware APIs
npbackend
Bohrium
Numba
function specialization
jit compilation
@decorators
Parakeet
pyChapel
Data-parallelism
Libraries
Pythran
cython
Pythran
JIT / Dynamic Compilers
Static Compilation
NumPy / SciPy
Language extensions
Parallel language constructs
ctypes
swig
Fwrap
pyChapel
FFIs / wrappers
Copperhead
Specialization
Lowlevel interfaces
"Modern" feel
Separating NumPy API From Implementation, Kristensen, Mads R. B. and Lund, Simon A. F. and Blum, Troels and Skovhede, Kenneth. In proceedings of 4th Workshop Python for High Performance and Scientific Computing (PyHPC14@SC14) .
or getting the best of multiple worlds...
From a Python perspective...
Chapel
chapel-for-python-programmers.rtfd.org
However...
Multiple distinct models for expressing parallelism
Low-level hardware instrumentation
Target Specific
Not really
what we wanted...
Python/NumPy
a Portable and Scalable Language for Scientific Computing
Loss of high-level abstractions, less like Python more like "Convenient" low-level programming
Python Approaches - Applied
Future Work / Current Limitations
Acknowledgements
Thanks to the Chapel Team!
My supervisor and colleagues
Especially: Brad Chamberlain, Lydia Duncan, Thomas Van Doren, Elliot Ronaghan, and Ben Harshbarger.
FFI - Type support
Expand NumPy ndarray support
Make type-declaration optional
Auto-create export declarations

Module compiler
Expand type-parsing and type-mapping

npbackend target
Implicit acceleration of Python/NumPy
Using FFI / decorators behind the scene

Parallelism and Locality
Currently only node-level, does not leverage all the capabilitites of Chapel
HIPERFIT: This research has been partially supported by the Danish Strategic Research Council, Program Committee for Strategic Growth Technologies, for the research center 'HIPERFIT: Functional High Performance Computing for Financial Information Technology' (hiperfit.dk) under contract number 10-092299.
This research has been partially supported by the Danish Strategic Research Council, Program Committee for Strategic Growth Technologies, for the 'High Performance High Productivity' project under contract number 09-067060.
Science
Finance
Rosen
Figure from Code Complete by Steve McConnell
Chapel
Python/NumPy
Python/NumPy + pyChapel
Python/NumPy
Python/NumPy + pyChapel
Python/NumPy
Python/NumPy + pyChapel
Take some market data
Run some analytics / modeling on the data
Visualize the results

Setup simulation from dataset
Run simulation
Reduce data and analyze results
Rosen-filter, kernel-function from NumPy/SciPy software stack.
Behind the scenes
Arguments are cast or converted to equivalent types
Scalar arguments are passed by value
Array-metadata is copied
Array-data is passed by reference

Extern decorators
@Chapel ( Chapel code )
@FromC ( C code )
@FromC ( Third-party libraries )
threading
multiprocessing
mpi4py
Retarget without modifying the application

NPBE_TARGET="gpu"

Controlled via enrivonment variables
Python / NumPy / SciPy for program definition, data description, and interaction
Straightforward sequential semantics
Let the user program close to their domain, not close to the hardware

Obtain performance through data-parallelism of array operations
Transparently map operations to Chapel via npbackend targeting pyChapel

And when the abstractions and mappings fail...
Provide an alternative to C ...
... an alternative with a unified model for parallel computations
With high-level data-parallel constructs to do most of the lifting and ...
... syntax and semantics close to Python
Provide Chapel as the underlying language empowering the Python user
an approach to Python as a High Performance Scripting Language
Observations when running the above
A constant compilation overhead of ~3 seconds per module plus a varying amount of time depending of the compiled code.
Overhead is amortized with speedups ranging from 1.8x to 15.0x
Call overhead was not measurable when comparing the pyChapel and Chapel implementations of the rosen-kernel. Indicating sufficiently low call overhead.
The reference implementation of Python, cPython, makes all of the above possible

Rich support for Python <-> C-interoperability
Although other approaches exists C-interoperability is the main driver for Python performance
chapel.cray.org
pychapel.rtfd.org
chapel-for-python-programmers.rtfd.org
Chapel
pyChapel
Chapel for Python Programmers
Full transcript