Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

BIG DATA

No description
by

BONDAZ Rafaèle

on 24 June 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of BIG DATA

Application
BIG DATA
Presentation plan
Thanks for your

attention
Definition
Technology
Issues
BIG DATA
History
SWOT ANALYSIS
A large data set,
mostly generated on internet
It also refers to a
method of data analysis
Big Data management is difficult with conventional storage solutions and treatment. (mass data)
DATABASE DESIGN AND MODELLING
DATABASE MANAGEMENT SYSTEM
DATA STORAGE
SPECIFICATIONS

Massively distributed
Store unstructured data
Document oriented with key values storage
Increase reliability and availabilty
Improve performance


Examples : Mongo DB, Cassandra (Facebook) , Couchbase, HBase
NoSQL
(Not only SQL)
No relationnal DBMS
The logical drive is no longer the table
Suggest alternative solutions to traditional databases and analysis
Triple issues (3V rule):
VOLUME :
an important data volume to be processed
VARIETY :
various information type and data sources
VELOCITY :
a high velocity level to achieve:
Hadoop
Project of the Apache Software Foundation
Open-source framework

Composed of several modules, including
HDFS
MapReduce

Two steps for MapReduce
The MAP step
The REDUCE step
VOLUME
VARIETY
VELOCITY
Increase of data produced
New computing tools and technologies :
connectivity
and
portability
Lower cost of storage with cloud computing
Storage architecture should be redesigned to fit with large volumes of data.

Cloud Computing : distributed computing over a network
Parallelized database systems : database load is balanced among servers
Improve processing speeds.

Many data sources :
Entreprise Ressources Planning (ERP), social networks, web pages ...
Structured and unstructured data :
80% of information is unstructured)

90% of information is not exploited...

Real-time analysis :
Data Stream Mining
Increasing the frequency at which data are generated, captured and shared.
STRENGHTS
BETTER costs
BETTER processing
A SUPPORT for industry and scientific research

Provide a degree of accuracy and flexibility unreached

WEAKNESSES

Big data requires human interpretation to process information


Be careful to pollution effects ! More data does not necessarily mean better data
OPPORTUNITIES
Change information landscapes

Boost the traditional BI

Huge opportunity for processing new data formats (audios, videos and pictures)

Take more specific information on customer consumption patterns

Invest all kinds of application areas and companies



THREATS
Rafaèle BONDAZ / Séverine BOUCHET / Inès SENHADJI
Definition
History
Issues
Applications
Technologies
SWOT
90% of the data in the world were created in the past two years
1997
2008
1960-1990
1940-1960
1997
2001
Since 2008
Big-data computing is considered as the biggest innovation in computing in the last decade.
1940-1960
University teachers noticed that the number of books was growing exponentially.
1960-1990
“We call this the problem of big data”
2001
Big data can be found everywhere
1998
1998
Computer scientists thought about a way to gather and group all the important information into data storage.
Sciences
Politics
Private Sector
Big data analysis played a large role in Barack Obama's successful 2012 re-election campaign.

The United States Federal Government owns six of the ten most powerful supercomputers in the world.
Data science is the study of the generalizable extraction of knowledge from data, yet the key word is science.
Analysts estimate that enterprises will spend
$34 billion on big data investments in 2013.
Big brother is watching you ....
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Any Questions ?
Full transcript