Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Coordinate Descent for Big Data - Multicore, Cluster, GPU

No description
by

Martin Takac

on 31 October 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Coordinate Descent for Big Data - Multicore, Cluster, GPU

Coordinate Descent Methods
Martin Takáč
Big Data Optimization
for
Failure of naive parallelism!
The Problem
LASSO with 1 Billion Variables
Problem size: 500GB
joint work with Peter Richtárik, Jakub Mareček
(Multicore, Cluster, GPU, ....)
Multicore, shared memory
Cluster, MPI/OpenMP
GPU
#CPUs: 32
RAM: 20-30GB, sometimes 1-2TB
Following theory
all proces. to read the same x
all proces. choose different coordinate
all proces. compute update
x is updated by all proces. afterwards
Asynchronous impl.
requires 2 synchronization
steps per iteration!!!
Tesla K20X
#Cuda cores: 2688
max single precision performance: 3.95 Tflops
GDDR5: 6 GB (Memory bandwidth: 250 GB/sec)
SIMD architecture
sensitive on memory access
Nice sampling of coordinates is not good!
What about creating blocks?
many nodes, communication using MPI
each node many cores => OpenMP parallelization
wrap size = 2
Synchronous vs. asynchronous
FP (Fully Parallel)
PS (Parallel and
Serial blocks)
Lasso problem with 3 TB data matrix
additional memory requirements
How many updates on node for RA-PS variant
avg. computation time for one coordinate
duration of one reduce all operation
total computation time
4 sockets, each sockets 8cores
UoE
IBM research
Full transcript