Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Data Science

An overview of Data Science followed by attitude and skill set for a good data scientist. Presented in Analytics club IIT Bombay
by

Kanwal Prakash Singh

on 21 August 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Data Science

Raw Facts and figures
What is Data
Data Science
Data Science
Transformation of raw data into meaningful and useful information for business analysis purposes
Business Intelligence
Kanwal Prakash Singh
Data Scientist

Data Science
Big Data
Small Data (really) ?
Relation
structured
unstructured
Internet of Things
Extraction of knowledge / insights from data
Business Intelligence
Visualization
Statistics
Machine Learning
Forecasting
Optimizations
Operational Science
Predict
Exploratory
Data Warehousing
Artificial Intelligence
Data Visualizers
Depict the analysis in an engaging way
Art and science
Data Scientist
statistician who knows more programming than other statisticians and a programmer who knows more statistics / ML / maths than other programmers
A Data Scientist can be a better BI manager / expert
A BI manager / expert also loaded with statistics / ML / maths is again a Data Scientist
Databases
Relational
No SQL
Structured
Map Reduce
In Memory Storage
Key Value
Describing Data
Nominal : Categorical, discrete
Ordinal : natural orderings, ranking
Interval : Similar to ordinal with defined difference
Ratio : similar to interval with a natural 0
Hypothesis
Null Hypothesis : refers to a general statement or default position that there is no relationship between two measured phenomena
A Statistical Model can be thought of as a pair (y,P) where Y is set of possible observations and P is probability Distribution over Y
A statistical model is a formalization of relationships between variables in the form of mathematical equations ( source wiki)
Examples :
Errors & Bias
Type 1 : False positive - Incorrect rejection of Null Hypothesis
Type 2 : False Negatives -incorrect failure to reject a false null hypothesis
Bias : Missing from the Target by a quantity / measure
Machine Learning
Supervised : Data set is labeled.
Example linear/logistic regression, SVM
Unsupervised : Finding Structures and relationships on unlabeled data.
Example : K-means, DBSCAN, K- NN
Construction and study of systems that can learn from data. Examples , explain it like 5!
Classification : problem of identifying to which of a set of categories a new observation belongs
Clustering : Grouping of samples into groups such that samples belonging to the same group / cluster are more similar
Regression : takes a group of random variables, thought to be predicting Y, and tries to find a mathematical relationship between them
ML
Support Vector Machines
Linear Regression
K-means
K-Nearest Neighbors
Naive Bayes
Bayesian Networks
Decision Trees
Random Forests
Regression Trees
Hierarchical Clustering
Perceptron Training Algorithm
Artificial Neural Networks
Forecasting : Making statements (predicting) about the events which are about to occur. Examples - Weather Forecasting, trading, sales forecasting etc.
Optimizations : Minimizing / maximizing a cost function, examples gradient descent, K-means.
Skills
Curiosity
Commitment
Business savvy
Presentation
Creativity
Intuition
Skills
Analytical skill-set
Mathematics / statistics
Experiment Design
Customer Centric
Proactive
Story Telling
Collaborative
Programming
Machine Learning
Technology
Databases
Problems
Recommendation Engines
A/B Testing
Optimizations
Data Analytics
Demand/Supply
Rankings
Behavior Analysis
Warehousing
Real-Time Infra
Search!
Solved Problems
Behavior Analysis : Clustered SVMs
Scheduling Algorithm
Supply Forecasting
Business Expansion Optimization
Demand/Supply
Graph Databases
Visualizations
http://housing.com/dsl/traffic-flux/bangalore/inbound
Genetic Algorithms
Filter
http://visual.ly/what-makes-good-data-scientist
Resources
Practice
Kaggle
Data Science Central
KdNuggets Blog
Coursera
Courses in IITB
Programming
Play with Data Sets
http://visual.ly/big-data-revolution
http://www.reddit.com/r/dataisbeautiful/































































































































Full transcript