Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

BUILDING HIVE

A PETABYTE SCALE DATA WAREHOUSE USING HADOOP

RICHI

MONISHA

APURVA

  • An Open Source
  • Not Easy as it required

MAP-REDUCE

  • Lacked expressiveness of popular languages like

PRIMITIVE TYPES

DATA MODEL

  • Integers-bigint(8bytes),int(4bytes),smallint(2bytes),tinyint(1byte).All integers are signed.
  • Floating point numbers- float(single precision),double(double precision)
  • String
  • Hive stores data in tables

  • Tables consist of a number of rows and

columns

  • Each column has an associated type

HADOOP

COMPLEX TYPES

PROBLEMS

TECHNOLOGY TO ADDRESS SCALING NEEDS

QUERY LANGUAGE

DATA

SCALABILITY

ISSUE

  • Associative arrays-map
  • List
  • Structs
  • SUBSET OF SQL
  • VARIOUS TYPES OF JOINS-INNER,LEFT,OUTER,CARTESIAN PRODUCTS,

GROUP BY,AGGREGATION,UNION,CREATE

TABLE LIKE SQL.

EXPENSIVE

TIME

CONSUMING

Growth

from 15TB to

700TB

EXAMPLE

CREATE TABLE t1(st string,f1 float,li list<

map<string,struct<p1:int,p2:int>>);

TYPES

THANK YOU

LIMITATIONS

  • PRIMITIVE TYPES
  • COMPLEX TYPES

Reaching the Goal

INSERT OVERWRITES EXISITING DATA

They are used in summarization jobs to advanced machine learning algorithms

Learn more about creating dynamic, engaging presentations with Prezi