Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Strata London: Hadoop and Beyond

Real world architectures

Duncan Ross

on 2 October 2012

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Strata London: Hadoop and Beyond

Hadoop and Beyond Duncan Ross,
Director Data Science Data
Warehouse Mainframe Enterprise fit Exploratory fit Analytical
orientation Process
orientation Integration Integration Diversification Diversification Source: McKinsey Global Institute Thank you @duncan3ross
duncan.ross@teradata.com Real world architectures Edouard Servan-Schreiber,
Director of Solution Architecture @duncan3ross @edouardss What analysts need All data, all the time
Timely Logical modeling is not evil ODBC tells you a db is alive, it's not a data transport tool Time is critical for deploying analysis Taking analytical data from operational systems creates loading Consistency can be more important than accuracy Think of deployment from the start Data security is critical Try to minimise data movement The journey to big data How do we get out of this mess? We can't rely on standards Source: xkcd.com Overthrowing the high-priests of IT @edouardss
edouard@10gen.com QFS We are data miners - we have most fun when we're finding out interesting things from data. We want more data, more detailed data, all the time - instantly accessible and available for analysis.

Unfortunately the world doesn't always cooperate.

This isn't primarily a technical presentation, in that it comes from an analytical perspective, but it addresses the current technological landscape, and sketches some solutions.

We don't have a magic bullet, and would suggest that you are suspicious of anyone who says that they do: the world is too complex (and fun) for that Tool independence
Transparent access
Deployment to real use - GB to PB
The ability to add or create new data
Separable workload
Independent control
Processing power
Independence from the high priests of IT
Closeness to business users Must not create unexpected impact on operational world Technology Social Data Magic Faster/cheaper hardware will make problems go away We can put it in the cloud Some concrete ways to learn from
(and interact with)
dead tech Typical Analysis:
interpretation of images, networks, collaborative filtering... Typical Analysis:
real time lookups, usage spikes, data collection and simple aggregation Typical Analysis: classical data mining, segmentation, predictive modelling Typical Analysis:
don't know, but it's expensive Programmers vs data analysts Let's get Hadoop The way out is to make sure that data is useful
There must be a business problem, don't just show off your skill
It's not about the algorithm/technology
New technologies all have a benefit, What we're learning, and where it hurts Today's cludge is tomorrow's code base Source xkcd.com Analytical
Hadoop Exploratory
RDBMS Tensions M/R Analytical DB NoSQL Real time Traditional DM Math intensive DM Processed data Less processed data Iterative analysis All the data ? ? ? Reporting
Full transcript