Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
by Billy Hill
SEMMA (SAS via Enterprise Miner)
Sample - enough rows to discover patterns w/out overwhelming
Explore - sort, max, min, describe, plot
CRISP-DM (Cross Industry Standard Process for Data Mining)
Business Understanding
understanding objectives and requirements, converting to knowledge into data mining problem
Data Understanding
data collection, get familiar with data, identify data quality problems, discover insights into data, detect interesting subsets to form hypotheses for hidden information
Explore Data
Feature Selection, Dimensionality Reduction
Arguably the most important phase that will be repeated
Lots of machine learning algorithms and statistical tools
RapidMinder Demo, DS: Sonar
Select Features
- Start with the most successful sites
- Place a key focus on quality content
- Always respond to comments
- Join conversations and share your thoughts
- Use promotions and giveaways
- Don't make selling your main focus
- Be consistent with posting times, schedule updates and posts on evenings and weekends
Hughes Phenomenon
With a fixed number of training samples, the predictive power reduces as the dimensionality increases
Curse of Dimensionality
more fields = more sparsity
data science, data mining, machine learning, statistical inference, supervised learning, unsupervised learning, big data, clustering, predictive analysis, big science, business intelligence, analytics, prescriptive analysis, text mining, text analysis, unstructured analysis, pattern recognition
ETL,
As DW experts and BI Engineers,
you know more about this than me
Prepare Data
Data Scientist: The Sexiest Job of the 21st Century, Davenport, Patil, HBR, 2012
...
Data scientists’ most basic, universal skill is the ability to write code.
...
A quantitative analyst can be great at analyzing data but not at subduing a mass of unstructured data and getting it into a form in which it can be analyzed.
Model, Validate
Operationalize
can be done via