Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in the manual
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
What's the deal with big data?
Paul Boalon 23 April 2013
Transcript of What's the deal with big data?
What to do... The term "big data" refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.
--McKinsey & Company, 2011 Characterized by:
... and Value ...the old levers for capturing value... do not take full advantage of the insights that big data provides.
--McKinsey & Company, 2013 2008 - Yahoo claims largest Hadoop cluster
10,000 CPU cores
2010 - Facebook claims largest Hadoop cluster
21 PB storage
2011 - IBM Watson wins Jeopardy!
(Used Hadoop to build its knowledge base)
2012 - Facebook
100 PB storage
2013 - Facebook, Yahoo, LinkedIn, Salesforce, Data Warehouse Vendors Hadoop library(ggplot2)
e<-ddply(d, .(CO_DESCR), transform, ecdf=ecdf(PCT)(PCT))
p<-ggplot(e, aes(PCT, ecdf, color=Is.Nurse))
p + geom_step()
d1<-subset(d, Is.Nurse=='TRUE' &
CO_DESCR!='MERCY HEALTH' &
CO_DESCR!='MERCY HTH NON-CONSOLIDATED' &
TOTAL_MINUTES > 120 &
ADT_MINUTES < 12)
e<-ddply(d1, .(CO_DESCR), transform, ecdf=ecdf(PCT)(PCT))
ggplot(e, aes(PCT, ecdf, color=CO_DESCR)) +
coord_cartesian(ylim = c(0,1), xlim=c(0,100)) +
ggplot(d1, aes(x=PCT, color=CO_DESCR)) + geom_density()
ddply(d1, ~CO_DESCR, summarise, mean=mean(PCT), median=median(PCT), sd=sd(PCT))
ggplot(d1, aes(x=CO_DESCR, y=PCT)) +
geom_boxplot() + stat_summary(fun.y=mean, geom="point", shape=5, size=4) +
coord_cartesian(ylim = c(0,.10)) +
hist(d1$ADT_MINUTES, breaks=12) "Getting it right"
Right time Personalization Healthcare:
Right innovation Connecting the dots Do we trust our data? Do we trust our analysts? Do we trust statistics? Do we NOT trust decision makers? 1. Learn the tools and skills
Data analysis (R, Python)
Hadoop 2. Don't get caught up in the hype
Practice "connecting the dots"
Stay engaged through user groups
Listen and continue learning
Don't forget to ask "and... so what?" 3. Be creative
This is a space where innovation is key.
Have to take chances with ideas.
Explore the data.
Document what you've tried and why. 4. Be patient
There's a long way to go before "big data" is mainstream. 5-10 years
Things are going to change a lot in the next several years, you have to be flexible.
Commit yourself to learning. Your boss is unlikely to tell you to go learn "big data" -- unless you're really lucky. Paul Boal
Dir Data Management, Mercy