Apache Hama
Edward J. Yoon
<edwardyoon@apache.org>
Streaming and Big Data Processing
Edward J. Yoon
@eddieyoon
Founder of Apache Hama
Committer of Apache Hadoop, BigTop
Author of Hadoop Book for Korean
Translator of MongoDB, O'Reilly
Mentor of Google Summer of Code
Who Am I?
Open Source
Written In Java
Under Apache 2.0 License
Currently 3 official releases, 6 PMC members and active contributors.
What's Hama?
Really?
Provides a Pure BSP computing engine
M/R-like Input/Output Formatter
SequenceFile, Text, Accumulo, HBase, ..
Hadoop-like Job manager
Checkpoint Recovery
Pregel-like Graph computing framework
Network Weather Monitoring System
Characteristics
Supports
message-passing paradigm
style of application development
Provides a flexible, simple, and
easy-to-use
small APIs
Enables to
perform better than MPI
for communication-intensive applications
Guarantees
impossibility of deadlocks
or collisions
in the communication mechanisms
Compare to M/R and MPI
Originally introduced by Valiant
A sequence of supersteps
Bulk Synchronous Parallel?
Processing Big Data with complicated relationships
e.g., graph or network.
Iterative or Recursive scientific applications
Continuous Event Processing
So, fit for what?
2008
2012
2010
Accepted to
Apache Incubator
Introduced BSP World
Ready to Graduation to Apache TLP!!!
Pluggable RPC Architecture for message transfer
e.g., Hadoop RPC, Avro RPC, ..., etc.
Message Collector, Bundler, and Compressor to reduce network overheads and contentions
Internals
public abstract
void bsp(BSPPeer<K1, V1, K2, V2, M> peer)
throws
IOException, SyncException;
BSP programming API
Pi calculation
Sparse Matrix-Vector multiplication
K-means Clustering[1]
Parallel Support Vector Machine[2]
BSP Examples
public void
compute(Iterator<MSGTYPE> messages)
throws
IOException;
Graph API
In-link Count
Single Source Shortest Path
PageRank
Bipartite Matching
Graph Examples
SSSP Performance
A SSSP for a random graph (100 million vertices,1 billion edges) is computed in 30 minutes on 3 Racks cluster
Which is the Big Data?
See http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
Hama and Stream Processing
1. http://github.com/willmore/hama-kmeans
2. http://github.com/truncs/psvm
Real-Time Network Traffic Analysis
Analyze user actions and patterns
Social Target Marketing
PageRank calculation
Observe evolution of the Social Networks
Detect SPAMs
Business Intelligence
Extract Hot Trends
Could be applied to
Daum Services
Search, News, Shopping, Yozm, Daum MyPeople, Poll, TV Pot, Maps, ..., etc.
1995
2010
2005
2002 March
Information Retrieval
Information Suggestion
Sexy + Suggestive
Who would you like to talk to?
See http://wiki.apache.org/hama
More Documentation
Develop your own BSP applications
http://wiki.apache.org/hama/DevelopBSP
Q&A?
Pregel?
@Override
compute(Iterator<IntWritable> messages)
throws IOException {
int currMax = getValue();
.sendMessageToNeighbors(new IntWritable(currMax));
(messages.hasNext()) {
IntWritable next = messages.next();
(next.get() > currMax) {
currMax = next.get();
}
}
(currMax > getValue()) {
setValue(new IntWritable(currMax));
} {
voteToHalt();
}
public void
if
else
while
this
if
Present Remotely
Send the link below via email or IM
Present to your audience
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Apache Hama - Streaming and Big Data Processing @ Daum Communications
Seminar @ Daum Communications, May 2, 2012
by
Tweet