Prezi

Share this prezi

Who can edit:

Present Online

Send the link below via email or IM to invite your audience

Copy

Start the presentation

Start presenting

  • Invited audience will follow you as you navigate and present
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can view together your prezi
  • Learn more about this feature in the manual

Download prezi for:

Present offline on a PC or Mac.

  • Embedded YouTube videos need an active Internet connection to play.
  • Portable prezis are not editable.

Edit and present offline with Prezi Desktop

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

Apache Hama - Streaming and Big Data Processing @ Daum Communications

Seminar @ Daum Communications, May 2, 2012
by Edward Yoon on 11 March 2013

Comments (0)

Please log in to add your comment.

Report abuse

Prezi Transcript

Apache Hama Edward J. Yoon <edwardyoon@apache.org> Streaming and Big Data Processing Edward J. Yoon @eddieyoon Founder of Apache Hama Committer of Apache Hadoop, BigTop Author of Hadoop Book for Korean Translator of MongoDB, O'Reilly Mentor of Google Summer of Code Who Am I? Open Source Written In Java Under Apache 2.0 License Currently 3 official releases, 6 PMC members and active contributors. What's Hama? Really? Provides a Pure BSP computing engine M/R-like Input/Output Formatter SequenceFile, Text, Accumulo, HBase, .. Hadoop-like Job manager Checkpoint Recovery Pregel-like Graph computing framework Network Weather Monitoring System Characteristics Supports message-passing paradigm style of application development Provides a flexible, simple, and easy-to-use small APIs Enables to perform better than MPI for communication-intensive applications Guarantees impossibility of deadlocks or collisions in the communication mechanisms Compare to M/R and MPI Originally introduced by Valiant A sequence of supersteps Bulk Synchronous Parallel? Processing Big Data with complicated relationships e.g., graph or network. Iterative or Recursive scientific applications Continuous Event Processing So, fit for what? 2008 2012 2010 Accepted to Apache Incubator Introduced BSP World Ready to Graduation to Apache TLP!!! Pluggable RPC Architecture for message transfer e.g., Hadoop RPC, Avro RPC, ..., etc. Message Collector, Bundler, and Compressor to reduce network overheads and contentions Internals public abstract void bsp(BSPPeer<K1, V1, K2, V2, M> peer) throws IOException, SyncException; BSP programming API Pi calculation Sparse Matrix-Vector multiplication K-means Clustering[1] Parallel Support Vector Machine[2] BSP Examples public void compute(Iterator<MSGTYPE> messages) throws IOException; Graph API In-link Count Single Source Shortest Path PageRank Bipartite Matching Graph Examples SSSP Performance A SSSP for a random graph (100 million vertices,1 billion edges) is computed in 30 minutes on 3 Racks cluster Which is the Big Data? See http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html Hama and Stream Processing 1. http://github.com/willmore/hama-kmeans 2. http://github.com/truncs/psvm Real-Time Network Traffic Analysis Analyze user actions and patterns Social Target Marketing PageRank calculation Observe evolution of the Social Networks Detect SPAMs Business Intelligence Extract Hot Trends Could be applied to Daum Services Search, News, Shopping, Yozm, Daum MyPeople, Poll, TV Pot, Maps, ..., etc. 1995 2010 2005 2002 March Information Retrieval Information Suggestion Sexy + Suggestive Who would you like to talk to? See http://wiki.apache.org/hama More Documentation Develop your own BSP applications http://wiki.apache.org/hama/DevelopBSP Q&A? Pregel? @Override compute(Iterator<IntWritable> messages) throws IOException { int currMax = getValue(); .sendMessageToNeighbors(new IntWritable(currMax)); (messages.hasNext()) { IntWritable next = messages.next(); (next.get() > currMax) { currMax = next.get(); } } (currMax > getValue()) { setValue(new IntWritable(currMax)); } { voteToHalt(); } public void if else while this if
See the full transcript