**Convolutional Neural Network**

**Examples**

**Convolutional Neural Network**

more layers, different dims

GPU implementations

more data

Equations

general activation:

Training - neuron layers

Training - convolution and subsampling

common for both

Perceptron

Frank Rosenblatt

Multi-Layer Perceptron

XOR Problem

`

Auto-Associative Neural Networks

When training the AANN to reconstruct the input patterns, it will actually learn a mapping into a lower-dimensional space and the respective inverse mapping. Thus, the hidden layer learns a compact representation of the data, and this technique can be applied to data

compression. If the neurons’ activation functions are linear, it has been shown by Baldi and Hornik that the projection performed by this type of NN is equivalent to a PCA of the input data’s covariance matrix, and the weight vectors of the output neurons correspond to its leading eigenvectors.

sigmoid:

linear:

hyperbolic

tangent:

Training Neural Networks - before the main course

Online training:

After presentation of each training example, the error is calculated, and the weights are updated accordingly.

Offline training:

The whole training set is propagated through the NN, and the respective errors are accumulated. Finally, the weights are updated using the accumulated error. This is also called batch training

Back Propagation algorithm

does not guarantee to find a global minimum

which is

an inherent problem of gradient descent optimization.

The combination of weights leading to a minimum of error function E is considered to be a solution of the learning problem.

In order to calculate the gradient of E, at each iteration,

the error function has to be

continuous and differentiable.

Training Neural Networks - details

Training Neural Networks - BP Alghorithm

**Differences to Lenet-5**

**Motivation**

We have shown that a max pooling operation is vastly superior for capturing invariances in image-like data, compared to a subsampling operation.

**Maxpooling**

Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition

**ReLU - Rectifier Linear Units**

**Dropout**

reduce over-fitting

for each forward pass and each back propagation, randomly 'turn off' a neuron with probability 0.5. (activation = 0)

faster training

even better results in accuracy

other variants such as Noisy ReLU's, leaky ReLU's (look at wikipedia)

Momentum

Weight decay

all weights and biases of NN

l2 regularization (there are others)

small positve constant

Cross Validation

A common technique used to improve the generalization capacity of a NN is called crossvalidation. Here, the training set is divided into two disjoint parts.

One used for the actual training and the other for validation, i.e. to verify how well the NN performs on unknown data.

overfitting

heavyside function

scaled version of sigmoid

one neuron means one separation line

?

a lot of data

w1x1 + w2x2 + w3x3 + b = Ax + By + Cz + D = 0

various data

Wha is this?

update to "observable"

w

kj

k

j

z

pj

local gradient

o

pk

update to "hidden"

w

kj

k

j

z

pj

o

pk

w

ji

i

local gradient for neuron j

1

x

pi

previous local gradient

w

kj

k

j

z

pj

o

pk

w

ji

i

x

pi

Conv-Full Connection

**Presentation is mainly based on:**

http://lmb.informatik.uni-freiburg.de/papers/download/du_diss.pdf

http://rogerioferis.com/VisualRecognitionAndSearch2014/material/presentations/GuangnanAndMajaDeepLearning.pdf

https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxsc3ZydHV0b3JpYWxjdnByMTR8Z3g6Njg5MmZkZTM1MDhhZWNmZA

Mini-batch: in practice there is used hybrid approach (some part of dataset is propagated through the NN)

pk

Training Neural Networks - before details

Training Neural Networks - chain rule

z-y

w(u=0,v=0)

-1*d(0,0)-2*d(1,0) ...

But ...

Stochastic Pooling

The size of the filters has to match the size/scale of the patterns we want to detect (task dependet)

A standard neural net applied to images:

- scales quadratically with the size of the input

- does not leverage stationarity

Solution:

- connect each hidden unit to a small patch of the input

- share the weight across space

Wegiht updates:

Filter 14 seems to pick up

vibrato singing

.

Filter 242 picks up some kind of

ringing ambience

.

Filter 250 picks up vocal thirds, i.e.

multiple singers singing the same thing

.

Filter 253 picks up

various types of bass drum sounds

.

Spotify ...