Loading presentation...
Prezi is an interactive zooming presentation

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Apache Hama - Streaming and Big Data Processing @ Daum Communications

Seminar @ Daum Communications, May 2, 2012

Edward Yoon

on 20 March 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Apache Hama - Streaming and Big Data Processing @ Daum Communications

Apache Hama
Edward J. Yoon
Streaming and Big Data Processing
Edward J. Yoon
Founder of Apache Hama
Committer of Apache Hadoop, BigTop
Author of Hadoop Book for Korean
Translator of MongoDB, O'Reilly
Mentor of Google Summer of Code
Who Am I?
Open Source
Written In Java
Under Apache 2.0 License
Currently 3 official releases, 6 PMC members and active contributors.
What's Hama?
Provides a Pure BSP computing engine
M/R-like Input/Output Formatter
SequenceFile, Text, Accumulo, HBase, ..
Hadoop-like Job manager
Checkpoint Recovery
Pregel-like Graph computing framework
Network Weather Monitoring System
message-passing paradigm
style of application development
Provides a flexible, simple, and
small APIs
Enables to
perform better than MPI
for communication-intensive applications
impossibility of deadlocks

or collisions
in the communication mechanisms
Compare to M/R and MPI
Originally introduced by Valiant
A sequence of supersteps
Bulk Synchronous Parallel?
Processing Big Data with complicated relationships
e.g., graph or network.
Iterative or Recursive scientific applications
Continuous Event Processing
So, fit for what?
Accepted to
Apache Incubator
Introduced BSP World
Ready to Graduation to Apache TLP!!!
Pluggable RPC Architecture for message transfer
e.g., Hadoop RPC, Avro RPC, ..., etc.
Message Collector, Bundler, and Compressor to reduce network overheads and contentions
public abstract
void bsp(BSPPeer<K1, V1, K2, V2, M> peer)

IOException, SyncException;
BSP programming API
Pi calculation
Sparse Matrix-Vector multiplication
K-means Clustering[1]
Parallel Support Vector Machine[2]
BSP Examples
public void
compute(Iterator<MSGTYPE> messages)

Graph API
In-link Count
Single Source Shortest Path
Bipartite Matching
Graph Examples
SSSP Performance
A SSSP for a random graph (100 million vertices,1 billion edges) is computed in 30 minutes on 3 Racks cluster
Which is the Big Data?
See http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
Hama and Stream Processing
1. http://github.com/willmore/hama-kmeans
2. http://github.com/truncs/psvm
Real-Time Network Traffic Analysis
Analyze user actions and patterns
Social Target Marketing
PageRank calculation
Observe evolution of the Social Networks
Detect SPAMs
Business Intelligence
Extract Hot Trends
Could be applied to
Daum Services
Search, News, Shopping, Yozm, Daum MyPeople, Poll, TV Pot, Maps, ..., etc.
2002 March
Information Retrieval
Information Suggestion
Sexy + Suggestive
Who would you like to talk to?
See http://wiki.apache.org/hama
More Documentation
Develop your own BSP applications
compute(Iterator<IntWritable> messages)
throws IOException {
int currMax = getValue();
.sendMessageToNeighbors(new IntWritable(currMax));

(messages.hasNext()) {
IntWritable next = messages.next();
(next.get() > currMax) {
currMax = next.get();

(currMax > getValue()) {
setValue(new IntWritable(currMax));
} {
public void
Full transcript