Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Preprocessing Computer Vision

A lab presentation that lists various preprocessing steps in computer vision. For some other stuff check my official page at www.cacs.louisiana.edu/~axs2573/
by

Anurag Singh

on 9 November 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Preprocessing Computer Vision

Preprocessing Methods
in Computer Vision : A Discussion

Introduction
Features
Pixel-Level
What are features?
Dimensionality Reduction
Anurag Singh
why preprocess?
Difficult to directly model RGB to world state
It is as important as choice of model
Not always done to reduce noise
Remove information which doesn't pertain to task in hand.
RGB value fluctuates on various factors like ambient lighting, camera properties etc
Method are ad-hoc
Chosen based on experience
Trivia :- Originated in late middle english (1400 AD)
What are Features?
Features in English language can be
noun(a distinctive attribute or aspect of something) or
verb (have as a prominent attribute or aspect)
In Statistics it is covariates or predictor variable
In computer vision features can be a lot of things
corners(keypoints),edges to high-level descriptors(they all are important).
Feature Detection
Feature Descriptor
Feature Matching
General Aim:- To familiarize with preprocessing methods in computer vision
Some of the methods like dimensionality reduction can be applied to other areas of applied machine learning
My aim :- I wanted to review it :) and giving a talk or presentation is probably one of the best way to make yourself understand things.
Second Aim :- To make a technical presentation in a non linear format (not easy!!)
Feature Descriptor
Feature Detection
Feature Matching
Whitening
To provide invariance to fluctuation due to change in ambient light
Applied at individual pixels
Image is transformed into new set of gray(color) values
Various method
Whitening
Linear Filters
Texton Map
Global Optimization
whitening
Mean
variance
Gaussian Blur
Gabor Filter
Laplacian
Difference of Gaussian
where,
Selective in both orientation and scale
Similar to Mammalian visual perception
Convolves 2D gaussian
Result blurs an image hence reducing noise
Canny Edge
SIFT : Scale Invariant Feature Transform
Adaptive Non Maximal Suppression(ANMS)
Methods for detecting features
Conclusion
Image is first blurred
It is convoluted with pair of orthogonal derivative filters
Orientation :-
Amplitude :-
H is horizontal filter , V is vertical filter
Threshold is at arbitrary value (it is noisy)
Non max Suppression
Each pixel are put in 4 orientation bins 0,45,90,135 degrees
Current pixel is set to zero if any pixel in neighborhood is perpendicular to it and of greater amplitude.
Linc Lab Seminar SP'13
A non linear presentation
Most feature detectors look for local maxima which causes uneven distribution
Region with higher contrast will have higher points
Solution :- Look at the neighborhood of radius r
Feature detected locally is compared to neighborhood and 10% of feature points with largest response is selected.
Texton Map
Texton is a discrete variable which maps a pixel to a set of texture classes
Each pixel's corresponding texture is replaced by the map's key(index)
Useful in Semantic Segmentation
Higher response to change
Zero where it is flat or no change in contrast
SIFT : Scale Invariant Feature Transform
Histogram of Oriented Gradients
Bag of Words (Dictionary of Features)
Shape context descriptor
Silhouette is a better representation than RGB Values
Example :- 3D joint angle estimation which is should not be dependent on clothing
Shape context Descriptor is a fixed length vector that characterizes the object contour
Computation :
It constructs more detailed characteristic of spatial structure
Its a useful preprocessing step for quasi regular structure like pedestrian detection
orientation and amplitude quantized into 9 bins
cell descriptors with 9D orientation in 6 x 6 cell
block descriptors by concatenating 3 x 3 blocks of cell
It attempts to characterize a larger region or an entire image by summarizing statistics of descriptors
Each descriptors is considered a word from a dictionary of possible descriptors
Dictionary is computed by finding the interest point in a large number of images.
To compute bag of words for image each descriptors is replaced by the nearest entry from the dictionary.
It doesn't use spatial information
It works remarkably well for object recognition.
Drawback :- One can't do object localization, why?
Regularization
Markov Random Field
Image courtesy :- http://cs.brown.edu/courses/cs143/results/proj3/senewman/
Image Courtesy :- Computer vision : Models,Learning and Inference
Image Courtesy :- Computer vision : Models,Learning and Inference
Harris Corner Detector
As name suggests it find corners
Corner is a point in an image where intensity varies in both direction.
Can you think of a point where intensity doesn't vary or varies in one direction?
Image Structure Tensor :-
Image Courtesy :- Computer vision : Models,Learning and Inference
Solving for singular values A1,A2 for tensor
If both A1 and A2 are large it is a corner
Image Courtesy :- Computer vision : Models,Learning and Inference
Image Courtesy :- Computer vision : Models,Learning and Inference
Two major points :-
matching strategies
efficient data structure and algorithm
Matching strategies
matched points are found at low error rate
matched points correspondence is passed to next level
Euclidean distance conditioned on a threshold
Efficient data structure and algorithm
All candidate match not efficient
multi-dimensional hashing
locality sensitive hashing
Kd trees
Image is blured with two gaussian filter
A difference is taken
It is normalized
Usesful for edge detection and scale-space extremas
Image courtesy :- Modeling Gig ;)
Image Courtesy :- Computer Vision Algorithm and Application
Question ?
Notable References :-
CSCE (CMPS) 508 lecture notes
Computer Vision :- Model, Learning and Inference
Computer Vision :- Algorithm and Application
Scale. Location and Orientation invariant
Steps :
Image is blurred with K Gaussian kernels
These blurred images are stacked together
In 3D voxel neighborhood extremas are identified
Extrema position is approximated to sub-pixel level
Orientation :- 36 bins to cover 360 degrees, each values has local support
d(i,j) -> known point, w(i,j)-> data penalty, f(i,j) -> unkown points, s(i,j) -> interaction potentials
These are compact representations that summarize the contents of an image region
SIFT descriptor usually compliments sift detector.
Orientation and amplitude maps are computed around interest points
A 16 x 16 window with non overlapping 4 x 4 grid is created
At each cell 8D histogram of the image orientations is computed.
4 x 4 = 16 histograms are concatenated to make a single 128 X 1 vector
The descriptor is invariant to constant intensity changes.
To fit models to data under severally under constrained solution space
unknown function f(x,y) from d(x,y) data points (inverse problem)
Goal :-
Formulate the task of transformation (filtering)
Use some optimization method
infer the best solution (energy minimization)
Use :- Most of the Computer vision tasks are inverse problem
Steps involved : -
Construct a global energy function that describes solution parameters
Find minimum energy solution
Sparse Linear system
Iterative technique
Consciously leaving out the math details
Bayesian statistical modeling :-
prior assumption about solution space,
log-likelihood and energy minimization using maximum a posterioiri (MAP)
Probability in a MRF is a Gibbs distribution
Good for labeling problem
Used for interactive segmentation, structure detection.
Speeded Up Robust Features (SURF)
Representation as integral image
Sum = A-B-C+ D (A,B,C,D are corners of a box)
Hessian Matrix based interest points
They are fast and are computed at every point
Hessian matrix at x = (x,y) at scale sigma is given as
SURF Descriptor :-
Haar wavelet responses.
Calculate dominant orientation of an interest point
Extract 64-D descriptor vector based on sums of wavelet response
Image courtesy :- "SURF: Speeded Up Robust Features" by Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool
Original Image
SIFT points
General Idea : - Data in higher dimension and reduced to lower dimension
Some of the techniques :-
Linear Methods
PCA : describes as much of the variance in the data as possible
LDA : maximize the linear separability between data points
Non Linear
Multidimensional scaling(MDS) :- retaining the pairwise distances between the data points as much as possible (good for Visualization mri)
isomaps :- Improves MDS by preserving pairwise geodesic distance
Kernel PCA :- linear PCA in a high-dimensional space that is constructed using a kernel function (say SVM)
Local non linear
LLE : local technique for dimensionality reduction that is similar to Isomap in that it constructs a graph representation of the datapoints
Laplacian Eigenmaps : the local properties are based on the pairwise distances between near neighbors (clusterring)
A few red strokes in the foreground object
A few blue ones in the background.
System computes color distributions for the foreground and background
Solves a binary MRF.
MRF example
Can you identify the image?
Do you know the story behind the photo?
Canny example
"Most iconic picture"
Full transcript