Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Hadoop

Demo Presentation
by

Hamada Zahera

on 24 January 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Hadoop

MapReduce JOB Hadoop , Apache Framework for distributed Processing of large data sets (Terabytes, Petabyes) Hadoop The Yellow Elephant Job Execution Flow CrossValidation- Hadoop The package does training & testing of Learning Algorithms , beside visualizing ROC curve. The execution of Crossvalidation is worked in
Parallel Mahout "Hadoop is an elephant , because it carry out heavy Data" Introduction Hadoop is consisted of two parts : Storage Processing Hadoop Distributed
File System MapReduce engine Hadoop Job should be implemented as MapReduce programming model User should implement: Map() Goal: map Input data set as
<key,value> pairs.
<K1,V1> Reduce() Goal: execute the functional job of the application with the grouped values according to the key Input: <K1,list of Values> Output: <k1, result> output: zero or more <key,value> Input: Key,Value from the Input
File Format Big File 00 11 01... 00 3 11 9 .. 01 4 12 50 ..
02 5 12 60
05 5 14 50 .
.
. Input Mapper <0,00 11 01 ..> <106,00 3 11 9 ..> <212, 4 12 50 ..>
<318,02 5 12 60 ..>
<424, 05 5 14 50..> .
.
. <1990, 50>
<1990, 60>
<1995, 15>
<1995, 60>
<1998,55> ... Shuffle <1990 , [50,60,...]> <1995,[15,60,...]> <1998,[55,...]> Reduce <1990,111>
<1995,22>
<1998,78> Output Data set reducer() reducer() reducer() reducer() reducer() Every reducer
doing the Job
< train(). test()> .
.
.
.
.
. Mahout is a scalable machine learning package designed to process large data set It has a lot of algorithms implemented in many topics : clustering , recommendation , classification, .. etc
Most of these algorithms are implemented to work in hadoop , some of them are implemented in parallel. Mahout is used by : References [1] http://hadoop.apache.org/
[2] http://mahout.apache.org/
[3] https://cwiki.apache.org/MAHOUT/powered-by-mahout.html test set train set
Full transcript