Prezi

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in the manual

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Apache Hama - Streaming and Big Data Processing @ Daum Communications

Seminar @ Daum Communications, May 2, 2012
by Edward Yoon on 20 March 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Apache Hama - Streaming and Big Data Processing @ Daum Communications

Apache Hama
Edward J. Yoon
<edwardyoon@apache.org>
Streaming and Big Data Processing
Edward J. Yoon
@eddieyoon
Founder of Apache Hama
Committer of Apache Hadoop, BigTop
Author of Hadoop Book for Korean
Translator of MongoDB, O'Reilly
Mentor of Google Summer of Code
Who Am I?
Open Source
Written In Java
Under Apache 2.0 License
Currently 3 official releases, 6 PMC members and active contributors.
What's Hama?
Really?
Provides a Pure BSP computing engine
M/R-like Input/Output Formatter
SequenceFile, Text, Accumulo, HBase, ..
Hadoop-like Job manager
Checkpoint Recovery
Pregel-like Graph computing framework
Network Weather Monitoring System
Characteristics
Supports
message-passing paradigm
style of application development
Provides a flexible, simple, and
easy-to-use
small APIs
Enables to
perform better than MPI
for communication-intensive applications
Guarantees
impossibility of deadlocks

or collisions
in the communication mechanisms
Compare to M/R and MPI
Originally introduced by Valiant
A sequence of supersteps
Bulk Synchronous Parallel?
Processing Big Data with complicated relationships
e.g., graph or network.
Iterative or Recursive scientific applications
Continuous Event Processing
So, fit for what?
2008
2012
2010
Accepted to
Apache Incubator
Introduced BSP World
Ready to Graduation to Apache TLP!!!
Pluggable RPC Architecture for message transfer
e.g., Hadoop RPC, Avro RPC, ..., etc.
Message Collector, Bundler, and Compressor to reduce network overheads and contentions
Internals
public abstract
void bsp(BSPPeer<K1, V1, K2, V2, M> peer)

throws
IOException, SyncException;
BSP programming API
Pi calculation
Sparse Matrix-Vector multiplication
K-means Clustering[1]
Parallel Support Vector Machine[2]
BSP Examples
public void
compute(Iterator<MSGTYPE> messages)

throws
IOException;
Graph API
In-link Count
Single Source Shortest Path
PageRank
Bipartite Matching
Graph Examples
SSSP Performance
A SSSP for a random graph (100 million vertices,1 billion edges) is computed in 30 minutes on 3 Racks cluster
Which is the Big Data?
See http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
Hama and Stream Processing
1. http://github.com/willmore/hama-kmeans
2. http://github.com/truncs/psvm
Real-Time Network Traffic Analysis
Analyze user actions and patterns
Social Target Marketing
PageRank calculation
Observe evolution of the Social Networks
Detect SPAMs
Business Intelligence
Extract Hot Trends
Could be applied to
Daum Services
Search, News, Shopping, Yozm, Daum MyPeople, Poll, TV Pot, Maps, ..., etc.
1995
2010
2005
2002 March
Information Retrieval
Information Suggestion
Sexy + Suggestive
Who would you like to talk to?
See http://wiki.apache.org/hama
More Documentation
Develop your own BSP applications
http://wiki.apache.org/hama/DevelopBSP
Q&A?
Pregel?
@Override
compute(Iterator<IntWritable> messages)
throws IOException {
int currMax = getValue();
.sendMessageToNeighbors(new IntWritable(currMax));

(messages.hasNext()) {
IntWritable next = messages.next();
(next.get() > currMax) {
currMax = next.get();
}
}

(currMax > getValue()) {
setValue(new IntWritable(currMax));
} {
voteToHalt();
}
public void
if
else
while
this
if
See the full transcript