Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
Deep Learning for Computer Vision
Transcript of Deep Learning for Computer Vision
Invented in the 80s.
Too computationally expensive
Got them started working in the 90s
LeCun (Director of AI Research, Facebook)
Started using GPUs
University of Toronto scaled them up in 2012 (AlexNet)
DNNresearch (Acquired by Google Deep Mind)
3x3 Convolution Example
Pushing the Limits!!
Deep Learning for Computer Vision
Features are key
Improvements come from better features.
Difficult to hand-engineer.
Why not learn features?
Why not learn a model end-to-end?
(lines, edges, color gradients)
by combining features
based on parts
2014 CVPR Tutorial on Deep Learning for Vision (Contain many of the pictures from this presentation)
LeNet Deep Learning Tutorial
Stanford CS231n - Convolutional Neural Networks for Visual Recognition
What does convolution do?
Apply learned filters to get feature maps
Apply a filter throughout image
Detects feature in any part of the image
Higher layers learn more complex objects
State of the Art (SOTA) Networks
SOTA 2012 for classification
Subsampling Layer (MaxPooling)
Reduces Number of parameters
Neural Networks on Images
NN are good classifiers. Why not use them for images?
MNIST image: 28x28x1
(28 wide, 28 high, 1 color channel)
= 784 weights for each neuron in the first layer
Larger Image: 224x224x3 = 150,528 weights for single neuron!!
Need something with fewer learned parameters.
What kind of filters are learned?
Use ConvNet as Feature Extractor
Take the output of the network as a low dimensional representation of the image.
Classify on these features or use for other tasks.
Neural Network Refresher
A Short History of Convolutional Neural Networks
Output is a projection (linear) of the input.
28x28 pixel images (normally cropped from 32x32)
Grayscale (pixel value 0-255 or normalized to 0-1)
50,000 Training Examples
10,000 Validation Examples
10,000 Testing Examples
Output is a function (non-linear) of the input.
RELU - Rectified Linear Unit:
Most common in Computer Vision
Efficient to compute
No vanishing gradient
Used on output layer
Squashes inputs to predict a class
Produces probability for each class
Convolutional Neural Network (ConvNet or CNN)
Apply Learned filters to all areas of the image.
Focus on strongest features
Use Fully Connected Layers and Softmax
Single Layer MNIST Classifier
Reference (not pictured):
2 layers, 8192 neurons each (>65M parameters): 0.95% error
Reuse learned weights
SOTA 2014 for classification
Achieves 0.92 % error rate with (406k parameters)
~6.6% Top-5 error
~15.3% Top-5 error
By Jimmy Whitaker
Very Broad Field
Motion Analysis, Scene Reconstruction, Image Restoration
Intuition Behind Deep Neural Nets