Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

CNN details

No description
by

Grzegorz Gwardys

on 17 January 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of CNN details

Convolutional Neural Network
Examples
Convolutional Neural Network

more layers, different dims
GPU implementations
more data
Equations
general activation:
Training - neuron layers
Training - convolution and subsampling
common for both
Perceptron
Frank Rosenblatt
Multi-Layer Perceptron
XOR Problem
`
Auto-Associative Neural Networks
When training the AANN to reconstruct the input patterns, it will actually learn a mapping into a lower-dimensional space and the respective inverse mapping. Thus, the hidden layer learns a compact representation of the data, and this technique can be applied to data
compression. If the neurons’ activation functions are linear, it has been shown by Baldi and Hornik that the projection performed by this type of NN is equivalent to a PCA of the input data’s covariance matrix, and the weight vectors of the output neurons correspond to its leading eigenvectors.
sigmoid:
linear:
hyperbolic
tangent:
Training Neural Networks - before the main course
Online training:
After presentation of each training example, the error is calculated, and the weights are updated accordingly.



Offline training:
The whole training set is propagated through the NN, and the respective errors are accumulated. Finally, the weights are updated using the accumulated error. This is also called batch training

Back Propagation algorithm
does not guarantee to find a global minimum
which is
an inherent problem of gradient descent optimization.
The combination of weights leading to a minimum of error function E is considered to be a solution of the learning problem.
In order to calculate the gradient of E, at each iteration,
the error function has to be
continuous and differentiable.
Training Neural Networks - details
Training Neural Networks - BP Alghorithm
Differences to Lenet-5
Motivation
We have shown that a max pooling operation is vastly superior for capturing invariances in image-like data, compared to a subsampling operation.
Maxpooling
Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition
ReLU - Rectifier Linear Units
Dropout
reduce over-fitting

for each forward pass and each back propagation, randomly 'turn off' a neuron with probability 0.5. (activation = 0)
faster training
even better results in accuracy
other variants such as Noisy ReLU's, leaky ReLU's (look at wikipedia)
Momentum
Weight decay
all weights and biases of NN
l2 regularization (there are others)
small positve constant
Cross Validation
A common technique used to improve the generalization capacity of a NN is called crossvalidation. Here, the training set is divided into two disjoint parts.
One used for the actual training and the other for validation, i.e. to verify how well the NN performs on unknown data.
overfitting
heavyside function
scaled version of sigmoid
one neuron means one separation line
?
a lot of data
w1x1 + w2x2 + w3x3 + b = Ax + By + Cz + D = 0

various data

Wha is this?
update to "observable"
w
kj
k
j
z
pj
local gradient
o
pk
update to "hidden"
w
kj
k
j
z
pj
o
pk
w
ji
i
local gradient for neuron j
1
x
pi
previous local gradient
w
kj
k
j
z
pj
o
pk
w
ji
i
x
pi
Conv-Full Connection

Presentation is mainly based on:
http://lmb.informatik.uni-freiburg.de/papers/download/du_diss.pdf
http://rogerioferis.com/VisualRecognitionAndSearch2014/material/presentations/GuangnanAndMajaDeepLearning.pdf
https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxsc3ZydHV0b3JpYWxjdnByMTR8Z3g6Njg5MmZkZTM1MDhhZWNmZA
Mini-batch: in practice there is used hybrid approach (some part of dataset is propagated through the NN)
pk
Training Neural Networks - before details
Training Neural Networks - chain rule
z-y
w(u=0,v=0)
-1*d(0,0)-2*d(1,0) ...
But ...
Stochastic Pooling
The size of the filters has to match the size/scale of the patterns we want to detect (task dependet)
A standard neural net applied to images:
- scales quadratically with the size of the input
- does not leverage stationarity

Solution:
- connect each hidden unit to a small patch of the input
- share the weight across space

Wegiht updates:
Filter 14 seems to pick up
vibrato singing
.
Filter 242 picks up some kind of
ringing ambience
.
Filter 250 picks up vocal thirds, i.e.
multiple singers singing the same thing
.
Filter 253 picks up
various types of bass drum sounds
.

Spotify ...
Full transcript