Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
Strata London: Hadoop and Beyond
Transcript of Strata London: Hadoop and Beyond
Director Data Science Data
Warehouse Mainframe Enterprise fit Exploratory fit Analytical
orientation Integration Integration Diversification Diversification Source: McKinsey Global Institute Thank you @duncan3ross
firstname.lastname@example.org Real world architectures Edouard Servan-Schreiber,
Director of Solution Architecture @duncan3ross @edouardss What analysts need All data, all the time
Timely Logical modeling is not evil ODBC tells you a db is alive, it's not a data transport tool Time is critical for deploying analysis Taking analytical data from operational systems creates loading Consistency can be more important than accuracy Think of deployment from the start Data security is critical Try to minimise data movement The journey to big data How do we get out of this mess? We can't rely on standards Source: xkcd.com Overthrowing the high-priests of IT @edouardss
email@example.com QFS We are data miners - we have most fun when we're finding out interesting things from data. We want more data, more detailed data, all the time - instantly accessible and available for analysis.
Unfortunately the world doesn't always cooperate.
This isn't primarily a technical presentation, in that it comes from an analytical perspective, but it addresses the current technological landscape, and sketches some solutions.
We don't have a magic bullet, and would suggest that you are suspicious of anyone who says that they do: the world is too complex (and fun) for that Tool independence
Deployment to real use - GB to PB
The ability to add or create new data
Independence from the high priests of IT
Closeness to business users Must not create unexpected impact on operational world Technology Social Data Magic Faster/cheaper hardware will make problems go away We can put it in the cloud Some concrete ways to learn from
(and interact with)
dead tech Typical Analysis:
interpretation of images, networks, collaborative filtering... Typical Analysis:
real time lookups, usage spikes, data collection and simple aggregation Typical Analysis: classical data mining, segmentation, predictive modelling Typical Analysis:
don't know, but it's expensive Programmers vs data analysts Let's get Hadoop The way out is to make sure that data is useful
There must be a business problem, don't just show off your skill
It's not about the algorithm/technology
New technologies all have a benefit, What we're learning, and where it hurts Today's cludge is tomorrow's code base Source xkcd.com Analytical
RDBMS Tensions M/R Analytical DB NoSQL Real time Traditional DM Math intensive DM Processed data Less processed data Iterative analysis All the data ? ? ? Reporting