Present Online
Send the link below via email or IM to invite your audience
Start the presentation
- Invited audience will follow you as you navigate and present
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can view together your prezi
- Learn more about this feature in the manual
Download prezi for:
Present offline on a PC or Mac.
- Embedded YouTube videos need an active Internet connection to play.
- Portable prezis are not editable.
Edit and present offline with Prezi Desktop
- To open PEZ file, please download Prezi Desktop
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
Apache Hama - Streaming and Big Data Processing @ Daum Communications
Seminar @ Daum Communications, May 2, 2012
by Edward Yoon
on 11 March 2013
Tweet
Prezi Transcript
Apache Hama Edward J. Yoon
<edwardyoon@apache.org> Streaming and Big Data Processing Edward J. Yoon
@eddieyoon
Founder of Apache Hama
Committer of Apache Hadoop, BigTop
Author of Hadoop Book for Korean
Translator of MongoDB, O'Reilly
Mentor of Google Summer of Code Who Am I? Open Source
Written In Java
Under Apache 2.0 License
Currently 3 official releases, 6 PMC members and active contributors. What's Hama? Really? Provides a Pure BSP computing engine
M/R-like Input/Output Formatter
SequenceFile, Text, Accumulo, HBase, ..
Hadoop-like Job manager
Checkpoint Recovery
Pregel-like Graph computing framework
Network Weather Monitoring System Characteristics Supports message-passing paradigm style of application development
Provides a flexible, simple, and easy-to-use small APIs
Enables to perform better than MPI for communication-intensive applications
Guarantees impossibility of deadlocks or collisions in the communication mechanisms Compare to M/R and MPI Originally introduced by Valiant
A sequence of supersteps Bulk Synchronous Parallel? Processing Big Data with complicated relationships
e.g., graph or network.
Iterative or Recursive scientific applications
Continuous Event Processing So, fit for what? 2008 2012 2010 Accepted to
Apache Incubator Introduced BSP World Ready to Graduation to Apache TLP!!! Pluggable RPC Architecture for message transfer
e.g., Hadoop RPC, Avro RPC, ..., etc.
Message Collector, Bundler, and Compressor to reduce network overheads and contentions Internals public abstract void bsp(BSPPeer<K1, V1, K2, V2, M> peer)
throws IOException, SyncException; BSP programming API Pi calculation
Sparse Matrix-Vector multiplication
K-means Clustering[1]
Parallel Support Vector Machine[2] BSP Examples public void compute(Iterator<MSGTYPE> messages)
throws IOException; Graph API In-link Count
Single Source Shortest Path
PageRank
Bipartite Matching Graph Examples SSSP Performance A SSSP for a random graph (100 million vertices,1 billion edges) is computed in 30 minutes on 3 Racks cluster Which is the Big Data? See http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html Hama and Stream Processing 1. http://github.com/willmore/hama-kmeans 2. http://github.com/truncs/psvm Real-Time Network Traffic Analysis Analyze user actions and patterns
Social Target Marketing
PageRank calculation
Observe evolution of the Social Networks
Detect SPAMs
Business Intelligence
Extract Hot Trends Could be applied to Daum Services Search, News, Shopping, Yozm, Daum MyPeople, Poll, TV Pot, Maps, ..., etc. 1995 2010 2005 2002 March Information Retrieval Information Suggestion Sexy + Suggestive Who would you like to talk to? See http://wiki.apache.org/hama More Documentation Develop your own BSP applications
http://wiki.apache.org/hama/DevelopBSP Q&A? Pregel? @Override
compute(Iterator<IntWritable> messages)
throws IOException {
int currMax = getValue();
.sendMessageToNeighbors(new IntWritable(currMax));
(messages.hasNext()) {
IntWritable next = messages.next();
(next.get() > currMax) {
currMax = next.get();
}
}
(currMax > getValue()) {
setValue(new IntWritable(currMax));
} {
voteToHalt();
} public void if else while this if
See the full transcript




