Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Deep Learning for Computer Vision

An overview of Deep Learning for computer vision, including a short introduction to Neural Networks and Convolutional Neural Networks.
by

Jimmy Whitaker

on 11 August 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Deep Learning for Computer Vision

Computer Vision
Typical Approach
Lessons Learned
CNN Components
Invented in the 80s.
Fukushima
Too computationally expensive
Got them started working in the 90s
LeCun (Director of AI Research, Facebook)
Started using GPUs
University of Toronto scaled them up in 2012 (AlexNet)
DNNresearch (Acquired by Google Deep Mind)
"Deep ConvNets"
3x3 Convolution Example
Pushing the Limits!!
Caption Generation:





Video sequences:

Deep Learning for Computer Vision
Features are key
Improvements come from better features.
Difficult to hand-engineer.
Why not learn features?
Why not learn a model end-to-end?
Extract
features
(lines, edges, color gradients)
Construct
parts
by combining features
Discover
objects
based on parts
Resources
2014 CVPR Tutorial on Deep Learning for Vision (Contain many of the pictures from this presentation)

LeNet Deep Learning Tutorial

Stanford CS231n - Convolutional Neural Networks for Visual Recognition
What does convolution do?
Apply learned filters to get feature maps
Apply a filter throughout image
Detects feature in any part of the image
Higher layers learn more complex objects
State of the Art (SOTA) Networks
SOTA 2012 for classification
Subsampling Layer (MaxPooling)
Reduces Number of parameters

Neural Networks on Images
NN are good classifiers. Why not use them for images?
MNIST image: 28x28x1
(28 wide, 28 high, 1 color channel)
= 784 weights for each neuron in the first layer

Larger Image: 224x224x3 = 150,528 weights for single neuron!!

Need something with fewer learned parameters.

What kind of filters are learned?
Use ConvNet as Feature Extractor
Take the output of the network as a low dimensional representation of the image.
Classify on these features or use for other tasks.

Neural Network Refresher
Neuron
Hidden Layer
Neuron has:
Inputs
Connections (Weights)
Activation Function
Output
Layer has:
Neurons
Network has:
Layers
A Short History of Convolutional Neural Networks
Backpropagation
Output is a projection (linear) of the input.
MNIST Dataset
28{
}
28
28x28 pixel images (normally cropped from 32x32)
Grayscale (pixel value 0-255 or normalized to 0-1)
50,000 Training Examples
10,000 Validation Examples
10,000 Testing Examples
Traditional Approach
Output is a function (non-linear) of the input.
Activation Function
RELU - Rectified Linear Unit:
Most common in Computer Vision
Efficient to compute
No vanishing gradient
Softmax:
Used on output layer
Squashes inputs to predict a class
Produces probability for each class
Convolutional Neural Network (ConvNet or CNN)
Convolution:
Apply Learned filters to all areas of the image.
Pooling:
Focus on strongest features
Classification:
Use Fully Connected Layers and Softmax
Single Layer MNIST Classifier
Reference (not pictured):

2 layers, 8192 neurons each (>65M parameters): 0.95% error
Reuse learned weights
Simple Illustration
SOTA 2014 for classification
Achieves 0.92 % error rate with (406k parameters)
~6.6% Top-5 error
~15.3% Top-5 error
By Jimmy Whitaker
Object Recognition
Classification
Localization
Detection
Segmentation
Very Broad Field
Recognition,
Motion Analysis, Scene Reconstruction, Image Restoration
Intuition Behind Deep Neural Nets
Full transcript