Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
Transcript of BIG DATA
Problems with DBMS Systems
Huge quantities of information - Capturing, Managing and Processing DATA
Working with DATA CLUSTERS, rather than SINGLE NODE SERVERS.
Un-Structures Data i.e. pictures, audio, 3D models ect..
Varied types if data sources that need to be synchronized.
A Platform designed to handle large quantities of un-structured \ multi-structured DATA.
Replaced the traditional RDBMS.
Mechanism: Highly optimized key–value stores intended for simple retrieval & appending operations
Eventual consistency ==> ACID (Atomicy, Consistency, Isolation, Durability) is NOT guaranteed
BIG-TABLE (developed by GOOGLE)
Apache Cassandra ect..
Big Data - USES
DNA - Cracking the Genome
NASSA NCCS - Space Observations & Simulations.
Future Predictions based on information
DNA \ Photo \ Audio analysis \ Facial Recognition (i.e. facebook..)
GPS analysis for traffic reports
Business Uses - BI (i.e. Wallmart):
DSS (Decision Support Systems)
Personal advertising \ Marketing
in Twitter, Yahoo!, Facebook, Google..
Public Sector Uses:
Barak Obama's campaign
FBI terror tracking
NoSql - Not Only SQL
Parallelize processing problems across
huge data sets, using a large number of computers
--> MIMD (Multiple Instructions, Multiple Data).
Cluster\Grid purposes --> "Marshalling":
Overall management of the whole process.
Distributing the data into nodes\branches (servers), replicating it (to achieve fault tolerance),
Providing for redundancy of results
Maps the data received (divide problem into sub-problems & distribute further) .
Processing various tasks.
Achieving scalability & fault-tolerance
Security & Privacy Risks
Expensive, require Skills
Data volumes make statistic-abnormals substential.
Intuitive-Results vs Structures, reliable results.
Too much information complicates decision making.
A lack of best practices -New Technology
What is "Big-Data" ?
The 4 V's:
Veracity - is data valid & accurate?..
MapReduce Architecture Example
Originally written for Yahoo.
Used by Yahoo, Facebook, Amazon, IBM, Twitter, America OnLine ect..
Allows parallel processing of Petabytes scaled information, by thousands of processing units (nodes),
spread on computer clusters.
Grid Computing - Massive Parallel by GRID.
Scalability: in size, geographically..
Elasticity: Allocate & release resources being adaptive to current load (Dynamic tree).
Robust: Fault Tolerance (Graceful degradation)
Exhibit Transparency to end-user.
MAP REDUCE - Logical
The methods are processed in parallel,
on distributed nodes of a cluster.
Performs filtering & sorting of data into QUEUES, using Key-Value Pairs:
Map (k1,v1) --> list(k2,v2)
Performs a summary & grouping operations
on the processed-Data \ Results that are
Reduce (k2, list (v2)) --> list(v3)
Handling BigData \ Architecture
HDFS - Hadoop Distributed File-System
Cutting & Distributing large files into medium size chunks, & distributing
them on sub-computers (replicating each piece a few times).
The file-system is mapped by the "Name-Node" server.
Amount of Stored Data increases at 2.5 quintillion (2.5 * 10^18) each Day!
Until the end of the current Decade - the amount of Data will be 50 times it is today
The rising of the Internet in the 90's
The rising of Ubiquitous Devices (Laptops, Smart Phones)
Digitizing of information - Books, Music, Films etc...