Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

BIG DATA

No description
by

Revital Berman

on 2 September 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of BIG DATA

BIG DATA
Problems with DBMS Systems
Huge quantities of information - Capturing, Managing and Processing DATA
Working with DATA CLUSTERS, rather than SINGLE NODE SERVERS.
Un-Structures Data i.e. pictures, audio, 3D models ect..
Varied types if data sources that need to be synchronized.
A Platform designed to handle large quantities of un-structured \ multi-structured DATA.
Replaced the traditional RDBMS.
Mechanism: Highly optimized key–value stores intended for simple retrieval & appending operations
Eventual consistency ==> ACID (Atomicy, Consistency, Isolation, Durability) is NOT guaranteed
Implementations:
BIG-TABLE (developed by GOOGLE)
Dynamo (Amazon)
Apache Cassandra ect..
Big Data - USES
Science Uses:
DNA - Cracking the Genome
NASSA NCCS - Space Observations & Simulations.
Future Predictions based on information
Private Uses:
Search Engines
DNA \ Photo \ Audio analysis \ Facial Recognition (i.e. facebook..)
GPS analysis for traffic reports
Business Uses - BI (i.e. Wallmart):
DSS (Decision Support Systems)
Personal advertising \ Marketing
in Twitter, Yahoo!, Facebook, Google..
Public Sector Uses:
Barak Obama's campaign
FBI terror tracking
NoSql - Not Only SQL
Map-Reduce Framework
`
Parallelize processing problems across
huge data sets, using a large number of computers
--> MIMD (Multiple Instructions, Multiple Data).
Cluster\Grid purposes --> "Marshalling":
Overall management of the whole process.
Distributing the data into nodes\branches (servers), replicating it (to achieve fault tolerance),
Providing for redundancy of results
Node purpose:
Maps the data received (divide problem into sub-problems & distribute further) .
Processing various tasks.
Result:
Achieving scalability & fault-tolerance
Weaknesses
Security & Privacy Risks
Expensive, require Skills
Conclusions un-Reliable:
Data volumes make statistic-abnormals substential.
Intuitive-Results vs Structures, reliable results.
Too much information complicates decision making.
A lack of best practices -New Technology
What is "Big-Data" ?
The 4 V's:
HIGH Volume
HIGH Velocity
HIGH Variety
Veracity - is data valid & accurate?..
BigData Implementation
MapReduce Architecture Example
Apache's Hadoop
BigData implementation.
Originally written for Yahoo.
Used by Yahoo, Facebook, Amazon, IBM, Twitter, America OnLine ect..
Allows parallel processing of Petabytes scaled information, by thousands of processing units (nodes),
spread on computer clusters.

Advantages
Grid Computing - Massive Parallel by GRID.
Flexibility:
Scalability: in size, geographically..
Elasticity: Allocate & release resources being adaptive to current load (Dynamic tree).
Robust: Fault Tolerance (Graceful degradation)
Exhibit Transparency to end-user.

MAP REDUCE - Logical
The methods are processed in parallel,
on distributed nodes of a cluster.

MAP procedure:
Performs filtering & sorting of data into QUEUES, using Key-Value Pairs:
Map (k1,v1) --> list(k2,v2)

REDUCE procedure:
Performs a summary & grouping operations
on the processed-Data \ Results that are
gathered back:
Reduce (k2, list (v2)) --> list(v3)
Handling BigData \ Architecture
HDFS - Hadoop Distributed File-System
Cutting & Distributing large files into medium size chunks, & distributing
them on sub-computers (replicating each piece a few times).
The file-system is mapped by the "Name-Node" server.
Background
Amount of Stored Data increases at 2.5 quintillion (2.5 * 10^18) each Day!
Until the end of the current Decade - the amount of Data will be 50 times it is today
The rising of the Internet in the 90's
The rising of Ubiquitous Devices (Laptops, Smart Phones)
Digitizing of information - Books, Music, Films etc...
Full transcript