Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Big Data Overview - New

No description
by

Zoltan C. Toth

on 11 March 2017

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Big Data Overview - New

Big Data
what is big data?
"big data is when the size of the data becomes part of the problem." (Roger Magoulas)
YARN
HDFS
128MB blocks
~3 replicas
no overwrite
WORM
High Availability namenode
Distributed data storage
and computing on
commodity hardware
MapReduce
data locality
combiners
streaming API
JobTracker
TaskTrackers
HDFS
Tez
MapReduce
flume
NoSQL
*
orchestration
The Hadoop Ecosystem
pinball
Airflow
ML
machine learning
Azkaban
The Problem
Text files have no schema
MapReduce is hard to learn
Need a place for ad-hoc analysis
The Solution
Run SQL on Hadoop
Structure
Shell
Driver
Compiler
Execution Engine
Metastore
MR/TEZ
(usually MySQL)
(Data on HDFS)
Managed vs External tables
App server
Production
Database
Messaging
System
Stream
Processing
Monitoring Tool
(Storage)
Data Lake
Computational
Framework
Orchestration Tool
(ETL Tool)
Data Warehouse
BI Tool
App server
App server
A typical Data Infrastructure
Full transcript