Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

hadoopDemo

No description
by

Sam zhang

on 8 February 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of hadoopDemo

Big Data / Hadoop
Proof of Concept Technical challenge:
A typical drive from 1990 to 2010:
Size: 1,370 MB to 1,000,000 MB
transfer speed: from 4.4 MB/s to 100 MB/s
Read whole data: 5 minutes to 2 hours 30 minutes
Hadoop solution:
Hadoop Distributed FS (HDFS) Block Size = 64MB
Use many cheap computers work together. Opportunities and Threats for FBL?

Data-driven decision-making Telematics - make usage-based insurance (UBI) possible “One of the largest challenges insurers will face in 2012 and beyond is capturing and interpreting data from a growing number of structured and unstructured sources, including but not limited to social media, policyholder behavior and telematics," said Jamie Yoder, insurance advisory practice co-leader at PwC. "Insurers that apply advanced analytical techniques to harness the power of big data will be better able to understand their customers, tailor products to meet their needs, and enhance the overall customer experience." Insurance Fraud prevention/detection IT logs analysis
support IT as business
Measure and visualize IT activities
Data-driven decision making Agenda
What is Big Data
What is Hadoop
Why Hadoop
Opportunities and threats
Demo
How
Recommendation Sentiment analysis of FBl and our competitors Agenda
What is Big Data?
What is Hadoop
Why Hadoop
Opportunities and Threats
Demo
How
Recommendation Variety
Volume
Velocity Variety:
80 % of all the data is unstructured data Volume:
IDC estimate "digital universe":
2006 0.18 zettabytes
2011 1.8 zettabyte
1 zettabytes = 1,000 exabytes
= 1,000,000 petabytes
= 1,000,000,000 terabytes.

J. B.: FBL
1.0 petabytes storage
4.0 petabyte in Archieve
Agenda
What is Big Data?
What is Hadoop
Why Hadoop
Opportunities and Threats
Demo
How
Recommendation Perspectives
Haddop is created by Doug Cutting to support Nutch - web-search software
Hadoop is Big Data OS
Hadoop is an interrupted innovated tool for BI
Hadoop is just an Java application
Hadoop is like SQL server, but process unstructured data
Hadoop is like excellent teamwork, but the team members are cheap computers Questions:
How much does it cost to move to AWS.amazon.com?
What is the highest request rate to FBLFinancial.com? Agenda
What is Big Data
What is Hadoop
Why Hadoop
Opportunities and threats
Demo
How
Recommendation Hadoop will not replace the Database The Hadoop is very scalable, but NOT expensive:
hadoop nodes built on cheap x86 servers. Each node costs about $4,000,
Most relational database deployments run at around $10,000 or $12,000 per terabyte. Hadoop Solution Providers (Distributors):
IBM, EMC, AWS, etc.
Microsoft(with Hortonworks )
Many new vendors Agenda
What is Big Data
What is Hadoop
Why Hadoop
Opportunities and threats
Demo
How
Recommendation Big version, start simple, fast Sentiment Analysis
Usability: foodmood.in
What are people talking about telematics/UBI on twitter, facebook, and/or Linkedin in last year, this year, this month in MN, IA, MO? IT Logs Analysis
Who access what at what time?
estimate data transfer cost on cloud - AWS pricing based on data transfer in and out
request rate for all the services
How critical if a specific service down in 2:00 PM, 5:00 PM, 10:00PM, 11:00 PM? UBI (Usage-Based Insurance)
required infrastructure
new pricing modeling
legal issue? Fraud detection:
Need Complex machine learning?
legal issue? Hadoop support our vision:
Simple,
Modern,
Fast Simple, Modern, Fast ROI
Normal BI: 89%
BI + Predictive modeling: 145%
Big Data: 1066%
- Source: EMC presentation In Production for some organizations:
A health insurance company: recovered $35 million yearly by using Greenplum
-Source: EMC presentation The Coalition Against Insurance Fraud estimates that in 2006 a total of about $80 billion was lost in the United States due to insurance fraud. This can buy 90 FBL Financial. (Marketing Cap: 900 million in Feb. 6, 2013) Agenda
What is Big Data?
What is Hadoop
Why Hadoop
Opportunities and Threats
Demo
How
Recommendations x86 infrastructure Hadoop 1.0.4 Cascading lib. Java Application Pig Pig script Web servers HTML5 with SVG, D3 Log folder1 Log foldern Working folder1 Working Folder map map map structured data reduce reduce Analytics and
Machine Learning Velocity:
90% of the data generated from last two years ??
- sources: a few presentation

J. B. FBL data growth rate at 30 ~ 60%
Extract linguistic, subjective, information of opinions, attitudes, emotions, and perspectives
understand our customers and marketplace
data-driven decision-making An insurer instituted a pilot program that offered lower rates to policyholders in exchange for the ability to put on-board sensors on motor vehicles. These sensors gathered telematics data to monitor the driving behavior of policyholders

- source: Oracle white paper sentiment analysis in social media:
True sentiment analysis is hard, but we can have a simple model and implementation:
some errors are OK
a simple learning model:
emotion :-) :( :/
Lengthening - greeeeeeeat
Have installed Nutch.
Full transcript