Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

ML Pres

No description
by

Andriy Lazorenko

on 23 May 2017

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of ML Pres

Unsupervised learning
Algorithms
K - means clustering
Two step in K-means
Cluster Assignment Step
Move Centroid Step
Supervised learning
Linear regression and gradient descent
Introduction to machine learning
by Andriy Lazorenko
What is machine learning?
Classification
Motivational examples
More common:
Data Analytics (large DBs)
Computer Vision
Natural Language Processing
Robotics
Product Recommendations
Motivational example
Case study: predictive sales
Logistic regression
Supporting Vector Machine (SVM)
Decision trees
Anomaly detection
PCA
Summary
Skewed data
Train, C-V, Test sets
Dataset for training
Very often more data is better then algorithm fine-tuning

Learning curves
Dataset
Features
Evaluation
Neural
Networks
Select features
Transform
Scaling
Principal Component Analysis
Create features to represent it
Representation
Explore data
Evaluation metrics

How good/bad your algorithm is going?
Actual ' + '
Actual ' - '
Predicted ' - '
Predicted ' + '
Can we trust just accuracy?
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed.
Motivational example
Case study: e-mail spam filters
'Eigenfaces': PCA on face recognition
K-means clustering
PCA
Anomaly detection
What data can help us answer to our questions?
What data seems to be useful?
features into principal components
to use them as new features
SVM and k-Means Clustering has both depend on decision boundary, linear model, which means involve gradient that depends on both y and x axis.

The Decision Tree however, doesn't apply things like gradient. It only split x-axis or y-axis, but not both. That's why the split only depends on each of the feature, not joint.

Linear Regression, has same similar principles, which ignore the relation of the features. each of feature in LR has its own coef, and scaled automatically depending on its coef. Because of that it doesn't require the relationship of the features which result to unaffected by feature scalling.
Why?
Feature scaling is one important tool in Machine Learning. It's about how we normalize the range of each of our feature so that it can't dominate from one to another.
Data types:
Numerical
Categorical
Time Series
Text
How do we handle
text
data?
make data understandable to a machine
Overfit
Underfit
Regularization
Bias-Variance and No. of features
Approaches to evaluate performance of an algorithm
Help the machine to learn!
Feature creation
Selecting relevant features
Feature transformation
Feature scaling
Remember learning curves
Regularization
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

for iter = 1:num_iters
predict = X*theta;
errors = (predict - y);
theta = theta - alpha / m * (X' * errors);
end
Can use libraries to implement grad descent with given cost function
Powerful when used with technique called "Kernel", based on similarity of arbitrary points (landmarks) - is able to classify complex shapes like shown to the right
(it uses SVM with Gaussian (RBF) Kernel, which is beyond the scope of the lecture)
Advantages:
No need for feature normalization (will cover later)
Nonlinear relationships between parameters do not affect tree performance
Easy to explain to managers =)
Familiar concept for physicists.
Sources and acknowledgements
Andrew Ng's Machine Learning course on Coursera
Udacity's "Introduction to machine learning course"
http://napitupulu-jon.appspot.com/
http://jomit.blogspot.com/
Full list of sources available online
Thanks to Ivan Shpotenko for support on creating this presentation - his advice shaped the presentation
Contacts
https://www.facebook.com/andriy.lazorenko
andriy.lazorenko@gmail.com
Skype: andriy.lazorenko
https://vk.com/andriy.lazorenko
050-449-51-61
Full transcript