Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

An introduction to Machine Learning - SLB

WebValley 2012, Transacqua (TN), Italy.
by

Giuseppe Jurman

on 21 June 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of An introduction to Machine Learning - SLB

machine
statistical
A intro
by Giuseppe Jurman
(with contribs by Roberto Visintainer)
FBK-Trento
WebValley Intl 2016
quick
Examples of (animal) learning
Bait shyness
Bait shyness revisited
Pigeon superstition
eat bait
get sick
eat bait
(learning, NOT "built-in" knowledge)
Food delivered at regular intervals
At delivery time, each pigeon pecks at some object
Then the pigeon spends more and more time pecking at the same object
This increases the chance that next food spraying finds the bird in the same location
Finally, this wrongly reinforces the pigeon's association of the food with the object
[Skinner, 1947]
eat bait
keep eating bait
"A priori" knowledge
Food & sickness can be related
Food & electric shock cannot be related
[Garcia & Koelling, 1966]
Incorporation of a priori knowledge is inevitable for the success of the learning algorithm: "
No free lunch
" theorem!
1)
guess a law
(starting from part of the available data)
2)
compute the consequences (
learn how to make predictions)
3)
verify the agreement with experiments (
compare the obtained predictions against the remaining available data)
Inference
Why
machine
learning?

Complexity
Adaptivity
Problem
dimension
Problem
diversity
"
Select among competing hypotheses that which makes the fewest assumptions
"
Father William of Ockham (d'Okham) - Lex Parsimoniae
Statistical machine learning
The art of developing algorithms and techniques that allow
computers
to learn, i.e. gaining understanding by constructing
models
of observed data through
inference
from sample with the intention to use them for prediction.
Types of learning
Supervised
Unsupervised
Reinforcement
How an agent ought to take actions in an environment so as to maximize some notion of cumulative reward
Markov processes
Density estimation
Maximum likelihood
Class discovery
Clustering
data has NO target
data has PARTIAL target
data has target
Regression
Classification
Tic-Tac-Toe
(Multi-)armed bandits
(Almost) formally...[binary classification]
Finding a function f:X->{-1,1} in a space H that
predictively
approximates W
Each example in the training set D is
independently and identically distributed
(i.i.d.) according to an unknown distribution W from a universe X.
To assess the goodness of f a
loss
function is used: L(y,f) = sign(y-f(x))
Risk & training error
Expected risk
Empirical risk
Training error
Minimizing the training error is prone to
overfitting
and it results in poor predictive performance
Solution: add the
inductive bias
, restricting H
The bias-variance dilemma
The generalization error
can be decomposed into 3 parts:
Bias
: the loss of the main prediction relative to the optimal prediction - depends on (the
capacity
of) H
Variance
: the average loss of the predictions relative to the main prediction - depends on D
Noise
: the remaining loss that cannot be eliminated - unpredictable
The
capacity
h(H) is a measure of its size, or complexity: it is the largest n such that there exist a set of examples Dn such that one can always find an f in H which gives the correct answer for all examples in Dn, for any possible labeling (e.g., for the set of linear functions
y = w · x + b in d dimensions, the capacity is d + 1)
PREDICTIVITY
Compromise
between complexity of the
solution and error on training set.
Regularization
In the setting n<<d, beware the
curse of dimensionality:
the volume of the space is so large that the samples became
sparse
, jeopardizing the statistical significance of the methods.
no learning algorithm can work for all distributions
Examples of classifiers
k-nearest neighbour
Support Vector Machine
Decision Tree
Artificial Neural Network
Linear Discriminant Analysis
Gaussian Mixture
The IRIS flower dataset
[Fisher, 1936]
50 samples from each of three species of Iris
4 features: length and width of the sepals and petals
Iris setosa
Iris virginica
Iris versicolor
X
O
+
Linear discriminant analysis
Decision tree
Gaussian mixture model
Support Vector Machine
Two key topics...
... at least to mention
Model selection - tuning parameters and choosing the best model
Feature importance (is not the same for all variables)
K-fold cross validation
Bagging/Boosting
Feature selection
Feature ranking
Full transcript