Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
ImageNet Classification with Deep Convolutional Neural Networks
Transcript of ImageNet Classification with Deep Convolutional Neural Networks
Local Response Normalization
Local normalization scheme aids generalization
Pooling layers in CNNs summarize the outputs of neighboring groups of neurons in the same kernel
Traditionally, the neighborhoods summarized by adjacent pooling units do not overlap!
Models with overlapping pooling are slightly more difficult to overfit.
in NIPS 2012
Authors: A.Krizhevsky, I.Sutskever, G.E.Hinton
Presented by: Amir Shahroudy
ImageNet Classification with
Deep Convolutional Neural Networks
Over 15 million labeled high-resolution images
Roughly 22,000 categories
Collected from the web
Labeled by human labelers using Amazon’s Mechanical Turk crowd-sourcing
Annual competition called the
ImageNet Large-Scale Visual Recognition Challenge (ILSVRC)
Uses a subset of ImageNet with
roughly 1000 images in each of 1000 categories.
1.2 million training images,
50,000 validation images,
and 150,000 testing images.
ImageNet consists of variable-resolution images,
our system requires a constant input dimensionality
Down-sample the images to a fixed resolution of 256x256
Subtracting the mean activity over the training set from each pixel. So training is on the (centered) raw RGB values of the pixels.
Details of learning
The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels
The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer.
3 Fully connected layers
Done on GPU#1
Done on GPU#2
2048 neurons each
A single GTX 580 GPU has only 3GB of memory,
which limits the maximum size of the networks.
1.2 million training examples do not fit on one GPU
Spread the net across two GPUs
Current GPUs can read from and write to
one another’s memory directly
This scheme reduces our top-1 and top-5 error rates by 1.7% and 1.2%, respectively.
Nonlinearity as Rectified Linear Units (ReLUs)
activity of a neuron computed by applying kernel i at position (x,y) and then applying the ReLU nonlinearity
The constants are hyper-parameters whose values are determined using a validation set;
They used k = 2, n = 5, alpha = 0.0001, and beta = 0.75
Response normalization reduces the top-1 and top-5 error rates by 1.4% and 1.2% respectively
This architecture has 60 million parameters.
Danger of overfitting, To combat overfitting:
Extracting random 224x224 patches (and their horizontal reflections) from the 256x256 images and training the network on these extracted patches
Perform PCA on the set of RGB pixel values on training set.
Add multiples of the found principal components into each training image
eigenvectors and eigenvalues
of the 3x3 covariance matrix
of RGB pixel values
Gaussian: mean=0 SD=0.1
Training using stochastic gradient descent
i is the iteration index
v is the momentum variable,
epsilon is learning rate,
the average over the i th batch Di of the derivative of the objective with respect to w, evaluated at wi
Initialized the weights in each layer from a zero-mean Gaussian distribution with standard deviation 0.01
Set to zero the output of each hidden neuron with probability 0.5
These neurons do not contribute to the forward pass and do not participate in backpropagation
Applied in the first two fully-connected layers
It doubles the number of iterations required to converge