Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Transcript of ML Pres
K - means clustering
Two step in K-means
Cluster Assignment Step
Move Centroid Step
Linear regression and gradient descent
Introduction to machine learning
by Andriy Lazorenko
What is machine learning?
Data Analytics (large DBs)
Natural Language Processing
Case study: predictive sales
Supporting Vector Machine (SVM)
Train, C-V, Test sets
Dataset for training
Very often more data is better then algorithm fine-tuning
Principal Component Analysis
Create features to represent it
How good/bad your algorithm is going?
Actual ' + '
Actual ' - '
Predicted ' - '
Predicted ' + '
Can we trust just accuracy?
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed.
Case study: e-mail spam filters
'Eigenfaces': PCA on face recognition
What data can help us answer to our questions?
What data seems to be useful?
features into principal components
to use them as new features
SVM and k-Means Clustering has both depend on decision boundary, linear model, which means involve gradient that depends on both y and x axis.
The Decision Tree however, doesn't apply things like gradient. It only split x-axis or y-axis, but not both. That's why the split only depends on each of the feature, not joint.
Linear Regression, has same similar principles, which ignore the relation of the features. each of feature in LR has its own coef, and scaled automatically depending on its coef. Because of that it doesn't require the relationship of the features which result to unaffected by feature scalling.
Feature scaling is one important tool in Machine Learning. It's about how we normalize the range of each of our feature so that it can't dominate from one to another.
How do we handle
make data understandable to a machine
Bias-Variance and No. of features
Approaches to evaluate performance of an algorithm
Help the machine to learn!
Selecting relevant features
Remember learning curves
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
for iter = 1:num_iters
predict = X*theta;
errors = (predict - y);
theta = theta - alpha / m * (X' * errors);
Can use libraries to implement grad descent with given cost function
Powerful when used with technique called "Kernel", based on similarity of arbitrary points (landmarks) - is able to classify complex shapes like shown to the right
(it uses SVM with Gaussian (RBF) Kernel, which is beyond the scope of the lecture)
No need for feature normalization (will cover later)
Nonlinear relationships between parameters do not affect tree performance
Easy to explain to managers =)
Familiar concept for physicists.
Sources and acknowledgements
Andrew Ng's Machine Learning course on Coursera
Udacity's "Introduction to machine learning course"
Full list of sources available online
Thanks to Ivan Shpotenko for support on creating this presentation - his advice shaped the presentation