Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
These are all services which leverage data to make the invisible, visible through automated intelligence, so we may better explain, predict and decide.
Machine Learning
The Big Data Exploratorium: Data Mining, from Patents to Memes
Wednesday, June 22, 2011 from 2:30 – 3:15pm in B302/03
John L. Taylor
Senior Data Analyst, iovation
johnnylogic@gmail.com
johnnylogic.org
Ingredients
Chefs
According to Some Cynics
According to Drew Conway
According to Hilary Mason
According to Indeed.com
According to Steve Miller
Maths!
Prob. Theory
Statistics
Numeric Analysis
Linear Algebra
Etc.
(from Wu, Kumar, et al 2008. "Top Ten Data Mining Algorithms" Knowl Inf Syst (2008) 14:1–37
DOI 10.1007/s10115-007-0114-2)
Learning Theory
Tools
Here's a plot of the fraction of the mass concentrated in the corners as a function of dimension. For a 7 dimensional cube, about 96% of the mass is concentrated in one of it's 128 "corners"
Data Understanding
Measurement Theory
Data Preparation
Recipes
Data Analysis Tools
Open Source and Open Access Rules the Data Science Realm!
Store
Analyze
Present
General Purpose
Data Acquisition
Regular Expressions
Data Warehousing
Data Description
NoSQL
Measurement Theory
Data Description
Data Quality Assessment
APIs
ETL
Websites
Data Marts
Data Preparation
Cubes
Surveys
Transducers
Data scraping
Data cleansing: Removing outliers, placing things in standard form, and otherwise reducing the noisiness of data.
Data transformation: application of a deterministic mathematical function to each point in a data set .
Data imputation: the substitution of some value for a missing data point or a missing component of a data point. Once all missing values have been imputed, the dataset can then be analyzed using standard techniques for complete data.
Data weighting and balancing: should cases be treated the same, or somehow normalized?
Data filtering: high and low pass filters can be used to further cleanse data.
Data abstraction: Should data be re-categorized or coarse-grained differently?
Data reduction
Data derivation: Should new variables be created?
You Get The Picture!
Thank You!
Data Warehousing 101
Thursday, June 23, 2011 from 2:30 – 3:15pm in B201
Modeling
Evaluation
Machine Learning
Performance Measures
Estimating Performance
Comparing Performance
Select modeling technique
Create an experimental design
Build the model
Analyze
Collect
Deploy
Store
Obtain
Scrub
Explore
Model
iNterpret
CRISP-DM