Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Transcript of CNN details
Convolutional Neural Network
more layers, different dims
Training - neuron layers
Training - convolution and subsampling
common for both
Auto-Associative Neural Networks
When training the AANN to reconstruct the input patterns, it will actually learn a mapping into a lower-dimensional space and the respective inverse mapping. Thus, the hidden layer learns a compact representation of the data, and this technique can be applied to data
compression. If the neurons’ activation functions are linear, it has been shown by Baldi and Hornik that the projection performed by this type of NN is equivalent to a PCA of the input data’s covariance matrix, and the weight vectors of the output neurons correspond to its leading eigenvectors.
Training Neural Networks - before the main course
After presentation of each training example, the error is calculated, and the weights are updated accordingly.
The whole training set is propagated through the NN, and the respective errors are accumulated. Finally, the weights are updated using the accumulated error. This is also called batch training
Back Propagation algorithm
does not guarantee to find a global minimum
an inherent problem of gradient descent optimization.
The combination of weights leading to a minimum of error function E is considered to be a solution of the learning problem.
In order to calculate the gradient of E, at each iteration,
the error function has to be
continuous and differentiable.
Training Neural Networks - details
Training Neural Networks - BP Alghorithm
Differences to Lenet-5
We have shown that a max pooling operation is vastly superior for capturing invariances in image-like data, compared to a subsampling operation.
Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition
ReLU - Rectifier Linear Units
for each forward pass and each back propagation, randomly 'turn off' a neuron with probability 0.5. (activation = 0)
even better results in accuracy
other variants such as Noisy ReLU's, leaky ReLU's (look at wikipedia)
all weights and biases of NN
l2 regularization (there are others)
small positve constant
A common technique used to improve the generalization capacity of a NN is called crossvalidation. Here, the training set is divided into two disjoint parts.
One used for the actual training and the other for validation, i.e. to verify how well the NN performs on unknown data.
scaled version of sigmoid
one neuron means one separation line
a lot of data
w1x1 + w2x2 + w3x3 + b = Ax + By + Cz + D = 0
Wha is this?
update to "observable"
update to "hidden"
local gradient for neuron j
previous local gradient
Presentation is mainly based on:
Mini-batch: in practice there is used hybrid approach (some part of dataset is propagated through the NN)
Training Neural Networks - before details
Training Neural Networks - chain rule
The size of the filters has to match the size/scale of the patterns we want to detect (task dependet)
A standard neural net applied to images:
- scales quadratically with the size of the input
- does not leverage stationarity
- connect each hidden unit to a small patch of the input
- share the weight across space
Filter 14 seems to pick up
Filter 242 picks up some kind of
Filter 250 picks up vocal thirds, i.e.
multiple singers singing the same thing
Filter 253 picks up
various types of bass drum sounds