Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.



No description

toto aiman

on 20 December 2012

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Hadoop

system for processing
mind-boggingly large
amount of data MapReduce rogramming model that applies to many large scale computing problems Architectures The need for Hadoop History of Hadoop Map phase and the Reduce phase FrameWork of MapReduce It's written in Java.
It's Open-source version of bigTable
It has three components: HBase Master, HRegionServers and HBase client. Map-Reduce nodes keeps track of the entire map and reduces jobs running across the nodes performs the map and reduce tasks that are assigned by the JobTracker Do the same HBase in extra time and computation A query language
Similar to SQL database
Create a flow of data as small steps HDFS FACTS Pig Pig Latin Pig Latin
User can build his won functions for special processing
User can focus on semantic side instead of efficiency side Pig Latin Example Finding the most 5 tops visited pages by users aged between 18 and 25.
It takes 15 hours instead of 4 hours in Pig Latin HBase Master:
serving HRegionServer and table's administrative functions HRegionServer:
handeling read and write client requests and gets list of regions to serve. HBase Client:
finding HRegionServer and communicating with HBaseMaster The form of address: <family>:<label>
Address family: (row key,column family, column)
Address: "alice@stanford.edu"
(alice,info:email) HDFS HBse Store massive amount
of data What is Hadoop? But i'm getting ahead of my self.. Advantages of Hadoop Plain Hadoop is hard to manage
Plain Hadoop is hard to keep alive
Plain Hadoop is hard to use
Plain Hadoop is not optimized for your hardware
Costly to maintain and develop Cost friendly and reachable
Fast and save time
Easy to store huge amount of data
Easy to handle the rate of computation
Flexibility with processing data
Flexibility with programming languages Limitation of Hadoop Is data warehousing system for Hadoop

Is used for ad-hoc queries, data summarises and alaysis Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. Hive architecture

•Execution Engine: Executes the plan generated by the compiler. •User Interface: Currently a command line interface where the user can enter HiveQL queries. •Driver: Receives the queries and retrieves the session specific information for the other components. •Compiler: Parses the query into different query blocks and expressions. It also generates the query plan and optimizes it to generate an execution plan. •MetaStore: Stores the metadata on the different tables and partitions. IN education and pretty much every start up out there... OK, enough history.
How does it work? References ‏Cloudera.com. "What Is Hadoop?" What Is Hadoop. N.p., n.d. Web. 23 Sept. 2012. <http://www.cloudera.com/what-is-hadoop/>.

‏Hadoop Website. "The Hadoop Distributed File System: Architecture and Design." The Hadoop Distributed File System: Architecture and Design. N.p., n.d. Web. 23 Sept. 2012. <http://hadoop.apache.org/docs/r0.18.3/hdfs_design.html>.

‏Hadoop Website. "Welcome to Apache Pig!" Welcome to Apache Pig! N.p., n.d. Web. 14 Oct. 2012. <http://pig.apache.org/>.

‏Hadoop.apache.org. "Welcome to Apache⢠Hadoop®!" Welcome to Apache⢠Hadoop®! N.p., 17 Sept. 2012. Web. 22 Sept. 2012. <http://hadoop.apache.org/>.

‏Jablonski, Joey. Introduction to Hadoop. N.p.: Dell Company, n.d. PDF.<http://i.dell.com/sites/content/business/solutions/whitepapers/en/Documents/hadoop-introduction.pdf>.

‏Jason, V. (2009). Pro Hadoop. United states of America: wolf Greek publishing services.

‏Jimmy, L. (2010). Data-intensive text processing with mapreduce. Toronto: Morgan & Claypool Khetrapal, Ankur, and Vinay Ganesh. HBase and Hypertable for Large Scale Distributed Storage. Dept. of Computer Science, Purdue University, n.d. Web. 15 Oct. 2012. <http://cloud.pubs.dbs.uni-leipzig.de/sites/cloud.pubs.dbs.uni-leipzig.de/files/Khetrapal2008HBaseandHypertableforlargescaledistributedstorage.pdf>.

‏Tom, W. (2011). Hadoop: The Definitive Guide, Second Edition. United States of America: O’Reilly Media, Inc.
‏Wood, Ken. "Techno Musings." HDS Blogs: A Series on Hadoop Architecture -. N.p., 13 July 2012. Web. 22 Sept. 2012. <http://blogs.hds.com/technomusings/2012/07/a-series-on-hadoop-architecture.html>. education research online learning environment Announce University Initiative to Address
Internet-Scale Computing Challenges & A cluster of processors running Apache’s Hadoop project
A Creative licensed university curriculum
Open source software to help students develop programs for clusters running Hadoop
Website to encourage collaboration among universities. http://code.google.com/edu/content/parallel.html http://lucene.apache.org/hadoop/ http://socialstudent.com Abeer Al-Harbi
Isra'a Al-Doulatly
Toqa Aiman Mukhiemer
Full transcript