Prezi

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in the manual

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

What's the deal with big data?

No description
by Paul Boal on 23 April 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of What's the deal with big data?

What's so "big" about "big data"? Concept History Hype Technology Skills Promise Culture "Data, data everywhere..."
What to do... The term "big data" refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.
--McKinsey & Company, 2011 Characterized by:
Volume
Velocity
Variety
... and Value ...the old levers for capturing value... do not take full advantage of the insights that big data provides.
--McKinsey & Company, 2013 2008 - Yahoo claims largest Hadoop cluster
10,000 CPU cores

2010 - Facebook claims largest Hadoop cluster
21 PB storage

2011 - IBM Watson wins Jeopardy!
(Used Hadoop to build its knowledge base)

2012 - Facebook
100 PB storage

2013 - Facebook, Yahoo, LinkedIn, Salesforce, Data Warehouse Vendors Hadoop library(ggplot2)
library(plyr)
setwd("C:\\Workspace3\\nurse_registration")
options(StringsAsFactors=FALSE)
d<-read.csv("DataOut.csv", head=TRUE,
colClasses=c("factor","numeric","numeric","numeric","numeric","numeric","factor","factor","factor","factor"))
e<-ddply(d, .(CO_DESCR), transform, ecdf=ecdf(PCT)(PCT))

p<-ggplot(e, aes(PCT, ecdf, color=Is.Nurse))
p + geom_step()

d1<-subset(d, Is.Nurse=='TRUE' &
CO_DESCR!='MERCY HEALTH' &
CO_DESCR!='MERCY HTH NON-CONSOLIDATED' &
TOTAL_MINUTES > 120 &
ADT_MINUTES < 12)

e<-ddply(d1, .(CO_DESCR), transform, ecdf=ecdf(PCT)(PCT))

ggplot(e, aes(PCT, ecdf, color=CO_DESCR)) +
geom_step(fill=NA) +
coord_cartesian(ylim = c(0,1), xlim=c(0,100)) +
theme(legend.justification=c(1,0), legend.position=c(1,0))

ggplot(d1, aes(x=PCT, color=CO_DESCR)) + geom_density()

ddply(d1, ~CO_DESCR, summarise, mean=mean(PCT), median=median(PCT), sd=sd(PCT))

ggplot(d1, aes(x=CO_DESCR, y=PCT)) +
geom_boxplot() + stat_summary(fun.y=mean, geom="point", shape=5, size=4) +
coord_cartesian(ylim = c(0,.10)) +
theme(axis.text.x=element_text(angle=90))
hist(d1$ADT_MINUTES, breaks=12) "Getting it right"
Right customer
Right product
Right price
Right time Personalization Healthcare:
Right living
Right care
Right provider
Right value
Right innovation Connecting the dots Do we trust our data? Do we trust our analysts? Do we trust statistics? Do we NOT trust decision makers? 1. Learn the tools and skills
Data analysis (R, Python)
Statistics (R)
SQL
Hadoop 2. Don't get caught up in the hype
Practice "connecting the dots"
Stay engaged through user groups
Listen and continue learning
Don't forget to ask "and... so what?" 3. Be creative
This is a space where innovation is key.
Have to take chances with ideas.
Explore the data.
Document what you've tried and why. 4. Be patient
There's a long way to go before "big data" is mainstream. 5-10 years
Things are going to change a lot in the next several years, you have to be flexible.
Commit yourself to learning. Your boss is unlikely to tell you to go learn "big data" -- unless you're really lucky. Paul Boal
Dir Data Management, Mercy
April 2013
See the full transcript