Prezi

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in the manual

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Statistics powered by Hadoop

No description
by Sebastian Müller on 11 July 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Statistics powered by Hadoop

Autoscout Dealer Statistics
powered by Hadoop
Logfile
Logfile
Data flow
Logfile
Logfile
Hadoop
Hive
Oracle
We are talking about...
18 000 000 Data points
and 4 Gigabytes
... per day!
We calculate about...
12 values for each of our 2M classifieds
15 values for each of our 40K Dealers
... so we have over 24.5M results
Lets do it...
Flume
Sqoop
Data structure
01:00:00 2012-05-13 pv 212769979 mobil.autoscout24.fr d3fc0700-caf7-4a2c-95a1-11f8b1407e7c
Timestamp
Action
Content
Source
Anonymous user identifier
The data is stored in Hive for structure and partitioned by date and hour for fast access
}
Aggregation for classifieds and save in Hive
Create oracle table and export classified result to oracle
}
Aggregation for customers and save in Hive
}
Again create oracle table and export
Tell the rest of the world, that job is done
}
Performance
This example were only 2 hours
Our nightly job does a whole day in 15 minutes
See the full transcript