What is Data

Data Science

Data Science

Transformation of raw data into meaningful and useful information for business analysis purposes

Business Intelligence

**Kanwal Prakash Singh**

Data Scientist

Data Scientist

**Data Science**

Big Data

Small Data (really) ?

Relation

structured

unstructured

Internet of Things

Extraction of knowledge / insights from data

Business Intelligence

Visualization

Statistics

Machine Learning

Forecasting

Optimizations

Operational Science

Predict

Exploratory

Data Warehousing

Artificial Intelligence

Data Visualizers

Depict the analysis in an engaging way

Art and science

Data Scientist

statistician who knows more programming than other statisticians and a programmer who knows more statistics / ML / maths than other programmers

A Data Scientist can be a better BI manager / expert

A BI manager / expert also loaded with statistics / ML / maths is again a Data Scientist

Databases

Relational

No SQL

Structured

Map Reduce

In Memory Storage

Key Value

Describing Data

Nominal : Categorical, discrete

Ordinal : natural orderings, ranking

Interval : Similar to ordinal with defined difference

Ratio : similar to interval with a natural 0

Hypothesis

Null Hypothesis : refers to a general statement or default position that there is no relationship between two measured phenomena

A Statistical Model can be thought of as a pair (y,P) where Y is set of possible observations and P is probability Distribution over Y

A statistical model is a formalization of relationships between variables in the form of mathematical equations ( source wiki)

Examples :

Errors & Bias

Type 1 : False positive - Incorrect rejection of Null Hypothesis

Type 2 : False Negatives -incorrect failure to reject a false null hypothesis

Bias : Missing from the Target by a quantity / measure

Machine Learning

Supervised : Data set is labeled.

Example linear/logistic regression, SVM

Unsupervised : Finding Structures and relationships on unlabeled data.

Example : K-means, DBSCAN, K- NN

Construction and study of systems that can learn from data. Examples , explain it like 5!

Classification : problem of identifying to which of a set of categories a new observation belongs

Clustering : Grouping of samples into groups such that samples belonging to the same group / cluster are more similar

Regression : takes a group of random variables, thought to be predicting Y, and tries to find a mathematical relationship between them

**ML**

Support Vector Machines

Linear Regression

K-means

K-Nearest Neighbors

Naive Bayes

Bayesian Networks

Decision Trees

Random Forests

Regression Trees

Hierarchical Clustering

Perceptron Training Algorithm

Artificial Neural Networks

Forecasting : Making statements (predicting) about the events which are about to occur. Examples - Weather Forecasting, trading, sales forecasting etc.

Optimizations : Minimizing / maximizing a cost function, examples gradient descent, K-means.

Skills

Curiosity

Commitment

Business savvy

Presentation

Creativity

Intuition

Skills

Analytical skill-set

Mathematics / statistics

Experiment Design

Customer Centric

Proactive

Story Telling

Collaborative

Programming

Machine Learning

Technology

Databases

Problems

Recommendation Engines

A/B Testing

Optimizations

Data Analytics

Demand/Supply

Rankings

Behavior Analysis

Warehousing

Real-Time Infra

Search!

Solved Problems

Behavior Analysis : Clustered SVMs

Scheduling Algorithm

Supply Forecasting

Business Expansion Optimization

Demand/Supply

Graph Databases

Visualizations

http://housing.com/dsl/traffic-flux/bangalore/inbound

Genetic Algorithms

Filter

http://visual.ly/what-makes-good-data-scientist

Resources

Practice

Kaggle

Data Science Central

KdNuggets Blog

Coursera

Courses in IITB

Programming

Play with Data Sets

http://visual.ly/big-data-revolution

http://www.reddit.com/r/dataisbeautiful/