Loading presentation...
Prezi is an interactive zooming presentation

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Intro to Data Science & Predictive Analysis

Demo to BI Engineers on analytics

Prezi Hill

on 10 April 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Intro to Data Science & Predictive Analysis

and show me cool stuff
decision tree (C4.5)
regression and neural networks
Like so much buzzwords,
I thirst for explanations
data science, data mining, machine learning, statistical inference, supervised learning, unsupervised learning, big data, clustering, predictive analysis, big science, business intelligence, analytics, prescriptive analysis, text mining, text analysis, unstructured analysis, pattern recognition

Data Analytics,
and so should you!

easy to get started,
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets
-Robert C. Holte, 1993, Computer Science Department, University of Ottawa
not too much math
Data Scientist: The Sexiest Job of the 21st Century,

Davenport, Patil, HBR, 2012
Data scientists’ most basic, universal skill is the ability to write code.
A quantitative analyst can be great at analyzing data but not at subduing a mass of unstructured data and getting it into a form in which it can be analyzed.
K-Fold Cross Validation
Social Media Tips
- Start with the most successful sites
- Place a key focus on quality content
- Always respond to comments
- Join conversations and share your thoughts
- Use promotions and giveaways
- Don't make selling your main focus
- Be consistent with posting times, schedule updates and posts on evenings and weekends

Explore Data
Select Features
Prepare Data
Model, Validate
Data Mining Funnel
Data Warehousing
Data Science
by Billy Hill
(SAS via Enterprise Miner)
ample - enough rows to discover patterns w/out overwhelming
xplore - sort, max, min, describe, plot

(Cross Industry Standard Process for Data Mining)
Business Understanding
understanding objectives and requirements, converting to knowledge into data mining problem
Data Understanding
data collection, get familiar with data, identify data quality problems, discover insights into data, detect interesting subsets to form hypotheses for hidden information
Feature Selection, Dimensionality Reduction

Arguably the most important phase that will be repeated

Lots of machine learning algorithms and statistical tools


As DW experts and BI Engineers,
you know more about this than me
[Infer] a function from labeled training data
Foundations of Machine Learning
, 2012, Mohri
can be done via

proprietary systems like SAS
custom mid tier
drools or other biz rules system
ETL into data mart(s)

One Rule (1R) Algorithm
use most predictive feature
predicting a type/label/boolean
survival on Titanic
customer renewal
predicting a number
quarterly sales
stock price
decision tree
very powerful
easy to interpret
RapidMiner Demo
neural net
RapidMiner Demo
DS: Polynomial
more than just pretty graphs
Guiding Themes
R demo
aka voting
Dimension Reduction
aka, less is more
Hughes Phenomenon
With a fixed number of training samples, the predictive power reduces as the dimensionality increases
Curse of Dimensionality
more fields = more sparsity
RapidMinder Demo, DS: Sonar
Random Forest Demo
Full transcript