Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Big Data - Adatlabor

No description
by

Zoltan C. Toth

on 11 September 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Big Data - Adatlabor

Big Data
mi a big data?
"big data is when the size of the data becomes part of the problem." (Roger Magoulas)
YARN
HDFS
nagy blokkok 3 másolattal
High Availability namenode
Elosztott számítások és adattárolás
hétköznapi hardveren
MapReduce
data locality
combiners
streaming API
nincs felülírás
JobTracker
TaskTrackers
HDFS
Tez
MapReduce
flume
Azkaban
NoSQL
*
orchestration
The Hadoop Ecosystem
Streaming API
pandas
scikit-learn
pinball
chronos
ML
machine learning
The Problem
Text files have no schema
MapReduce is hard to learn
Need a place for ad-hoc analysis
The Solution
Run SQL on Hadoop
Structure
Shell
Driver
Compiler
Execution Engine
Metastore
MR/TEZ
(usually MySQL)
Performance optimisation
(Data on HDFS)
partitioning - directory prefixes

hdfs://user/hive/warehouse/mytable/dt=2015-02-02
hdfs://user/hive/warehouse/mytable/dt=2015-02-03
hdfs://user/hive/warehouse/mytable/dt=2015-02-04

Support for Indexes
MSCK REPAIR TABLE mytable
when partition is added
write once, read often
ORC/Parquet
/Avro/RCFile
Optimized file formats
Managed vs External tables
Full transcript