Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Deep Learning for Computer Vision

An overview of Deep Learning for computer vision, including a short introduction to Neural Networks and Convolutional Neural Networks.

Jimmy Whitaker

on 11 August 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Deep Learning for Computer Vision

Computer Vision
Typical Approach
Lessons Learned
CNN Components
Invented in the 80s.
Too computationally expensive
Got them started working in the 90s
LeCun (Director of AI Research, Facebook)
Started using GPUs
University of Toronto scaled them up in 2012 (AlexNet)
DNNresearch (Acquired by Google Deep Mind)
"Deep ConvNets"
3x3 Convolution Example
Pushing the Limits!!
Caption Generation:

Video sequences:

Deep Learning for Computer Vision
Features are key
Improvements come from better features.
Difficult to hand-engineer.
Why not learn features?
Why not learn a model end-to-end?
(lines, edges, color gradients)
by combining features
based on parts
2014 CVPR Tutorial on Deep Learning for Vision (Contain many of the pictures from this presentation)

LeNet Deep Learning Tutorial

Stanford CS231n - Convolutional Neural Networks for Visual Recognition
What does convolution do?
Apply learned filters to get feature maps
Apply a filter throughout image
Detects feature in any part of the image
Higher layers learn more complex objects
State of the Art (SOTA) Networks
SOTA 2012 for classification
Subsampling Layer (MaxPooling)
Reduces Number of parameters

Neural Networks on Images
NN are good classifiers. Why not use them for images?
MNIST image: 28x28x1
(28 wide, 28 high, 1 color channel)
= 784 weights for each neuron in the first layer

Larger Image: 224x224x3 = 150,528 weights for single neuron!!

Need something with fewer learned parameters.

What kind of filters are learned?
Use ConvNet as Feature Extractor
Take the output of the network as a low dimensional representation of the image.
Classify on these features or use for other tasks.

Neural Network Refresher
Hidden Layer
Neuron has:
Connections (Weights)
Activation Function
Layer has:
Network has:
A Short History of Convolutional Neural Networks
Output is a projection (linear) of the input.
MNIST Dataset
28x28 pixel images (normally cropped from 32x32)
Grayscale (pixel value 0-255 or normalized to 0-1)
50,000 Training Examples
10,000 Validation Examples
10,000 Testing Examples
Traditional Approach
Output is a function (non-linear) of the input.
Activation Function
RELU - Rectified Linear Unit:
Most common in Computer Vision
Efficient to compute
No vanishing gradient
Used on output layer
Squashes inputs to predict a class
Produces probability for each class
Convolutional Neural Network (ConvNet or CNN)
Apply Learned filters to all areas of the image.
Focus on strongest features
Use Fully Connected Layers and Softmax
Single Layer MNIST Classifier
Reference (not pictured):

2 layers, 8192 neurons each (>65M parameters): 0.95% error
Reuse learned weights
Simple Illustration
SOTA 2014 for classification
Achieves 0.92 % error rate with (406k parameters)
~6.6% Top-5 error
~15.3% Top-5 error
By Jimmy Whitaker
Object Recognition
Very Broad Field
Motion Analysis, Scene Reconstruction, Image Restoration
Intuition Behind Deep Neural Nets
Full transcript