Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Intro to Data Science & Predictive Analysis

Demo to BI Engineers on analytics
by

Prezi Hill

on 10 April 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Intro to Data Science & Predictive Analysis

SHUT UP!
and show me cool stuff
decision tree (C4.5)
regression and neural networks
Like so much buzzwords,
I thirst for explanations
data science, data mining, machine learning, statistical inference, supervised learning, unsupervised learning, big data, clustering, predictive analysis, big science, business intelligence, analytics, prescriptive analysis, text mining, text analysis, unstructured analysis, pattern recognition

Data Analytics,
and so should you!

easy to get started,
aka
K.I.S.S.
principle
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets
-Robert C. Holte, 1993, Computer Science Department, University of Ottawa
not too much math
Data Scientist: The Sexiest Job of the 21st Century,

Davenport, Patil, HBR, 2012
...
Data scientists’ most basic, universal skill is the ability to write code.
...
A quantitative analyst can be great at analyzing data but not at subduing a mass of unstructured data and getting it into a form in which it can be analyzed.
K-Fold Cross Validation
Social Media Tips
- Start with the most successful sites
- Place a key focus on quality content
- Always respond to comments
- Join conversations and share your thoughts
- Use promotions and giveaways
- Don't make selling your main focus
- Be consistent with posting times, schedule updates and posts on evenings and weekends
ensembling
classification
and

regression
Explore Data
Select Features
Prepare Data
Model, Validate
Operationalize
Data Mining Funnel
Data Warehousing
Data Science
by Billy Hill
SEMMA
(SAS via Enterprise Miner)
S
ample - enough rows to discover patterns w/out overwhelming
E
xplore - sort, max, min, describe, plot

CRISP-DM
(Cross Industry Standard Process for Data Mining)
Business Understanding
understanding objectives and requirements, converting to knowledge into data mining problem
Data Understanding
data collection, get familiar with data, identify data quality problems, discover insights into data, detect interesting subsets to form hypotheses for hidden information
Feature Selection, Dimensionality Reduction

Arguably the most important phase that will be repeated

Lots of machine learning algorithms and statistical tools

ETL,

As DW experts and BI Engineers,
you know more about this than me
[Infer] a function from labeled training data
Foundations of Machine Learning
, 2012, Mohri
can be done via

proprietary systems like SAS
SQL
custom mid tier
drools or other biz rules system
reporting
dashboards
ETL into data mart(s)

One Rule (1R) Algorithm
i.e.,
use most predictive feature
classification
predicting a type/label/boolean
survival on Titanic
customer renewal
regression
predicting a number
quarterly sales
stock price
decision tree
very powerful
easy to interpret
RapidMiner Demo
regression
neural net
RapidMiner Demo
DS: Polynomial
more than just pretty graphs
K.I.S.S
overfit
underfit
tradeoff
validation
Guiding Themes
R demo
ensembling
aka voting
Dimension Reduction
aka, less is more
Hughes Phenomenon
With a fixed number of training samples, the predictive power reduces as the dimensionality increases
Curse of Dimensionality
more fields = more sparsity
RapidMinder Demo, DS: Sonar
Random Forest Demo
Full transcript