**Preprocessing Methods**

in Computer Vision : A Discussion

in Computer Vision : A Discussion

Introduction

**Features**

**Pixel-Level**

What are features?

Dimensionality Reduction

**Anurag Singh**

why preprocess?

Difficult to directly model RGB to world state

It is as important as choice of model

Not always done to reduce noise

Remove information which doesn't pertain to task in hand.

RGB value fluctuates on various factors like ambient lighting, camera properties etc

Method are ad-hoc

Chosen based on experience

Trivia :- Originated in late middle english (1400 AD)

What are Features?

Features in English language can be

noun(a distinctive attribute or aspect of something) or

verb (have as a prominent attribute or aspect)

In Statistics it is covariates or predictor variable

In computer vision features can be a lot of things

corners(keypoints),edges to high-level descriptors(they all are important).

Feature Detection

Feature Descriptor

Feature Matching

General Aim:- To familiarize with preprocessing methods in computer vision

Some of the methods like dimensionality reduction can be applied to other areas of applied machine learning

My aim :- I wanted to review it :) and giving a talk or presentation is probably one of the best way to make yourself understand things.

Second Aim :- To make a technical presentation in a non linear format (not easy!!)

Feature Descriptor

Feature Detection

Feature Matching

Whitening

To provide invariance to fluctuation due to change in ambient light

Applied at individual pixels

Image is transformed into new set of gray(color) values

Various method

Whitening

Linear Filters

Texton Map

Global Optimization

whitening

Mean

variance

Gaussian Blur

Gabor Filter

Laplacian

Difference of Gaussian

where,

Selective in both orientation and scale

Similar to Mammalian visual perception

Convolves 2D gaussian

Result blurs an image hence reducing noise

Canny Edge

SIFT : Scale Invariant Feature Transform

Adaptive Non Maximal Suppression(ANMS)

Methods for detecting features

Conclusion

Image is first blurred

It is convoluted with pair of orthogonal derivative filters

Orientation :-

Amplitude :-

H is horizontal filter , V is vertical filter

Threshold is at arbitrary value (it is noisy)

Non max Suppression

Each pixel are put in 4 orientation bins 0,45,90,135 degrees

Current pixel is set to zero if any pixel in neighborhood is perpendicular to it and of greater amplitude.

**Linc Lab Seminar SP'13**

**A non linear presentation**

Most feature detectors look for local maxima which causes uneven distribution

Region with higher contrast will have higher points

Solution :- Look at the neighborhood of radius r

Feature detected locally is compared to neighborhood and 10% of feature points with largest response is selected.

Texton Map

Texton is a discrete variable which maps a pixel to a set of texture classes

Each pixel's corresponding texture is replaced by the map's key(index)

Useful in Semantic Segmentation

Higher response to change

Zero where it is flat or no change in contrast

SIFT : Scale Invariant Feature Transform

Histogram of Oriented Gradients

Bag of Words (Dictionary of Features)

Shape context descriptor

Silhouette is a better representation than RGB Values

Example :- 3D joint angle estimation which is should not be dependent on clothing

Shape context Descriptor is a fixed length vector that characterizes the object contour

Computation :

It constructs more detailed characteristic of spatial structure

Its a useful preprocessing step for quasi regular structure like pedestrian detection

orientation and amplitude quantized into 9 bins

cell descriptors with 9D orientation in 6 x 6 cell

block descriptors by concatenating 3 x 3 blocks of cell

It attempts to characterize a larger region or an entire image by summarizing statistics of descriptors

Each descriptors is considered a word from a dictionary of possible descriptors

Dictionary is computed by finding the interest point in a large number of images.

To compute bag of words for image each descriptors is replaced by the nearest entry from the dictionary.

It doesn't use spatial information

It works remarkably well for object recognition.

Drawback :- One can't do object localization, why?

Regularization

Markov Random Field

Image courtesy :- http://cs.brown.edu/courses/cs143/results/proj3/senewman/

Image Courtesy :- Computer vision : Models,Learning and Inference

Image Courtesy :- Computer vision : Models,Learning and Inference

Harris Corner Detector

As name suggests it find corners

Corner is a point in an image where intensity varies in both direction.

Can you think of a point where intensity doesn't vary or varies in one direction?

Image Structure Tensor :-

Image Courtesy :- Computer vision : Models,Learning and Inference

Solving for singular values A1,A2 for tensor

If both A1 and A2 are large it is a corner

Image Courtesy :- Computer vision : Models,Learning and Inference

Image Courtesy :- Computer vision : Models,Learning and Inference

Two major points :-

matching strategies

efficient data structure and algorithm

Matching strategies

matched points are found at low error rate

matched points correspondence is passed to next level

Euclidean distance conditioned on a threshold

Efficient data structure and algorithm

All candidate match not efficient

multi-dimensional hashing

locality sensitive hashing

Kd trees

Image is blured with two gaussian filter

A difference is taken

It is normalized

Usesful for edge detection and scale-space extremas

Image courtesy :- Modeling Gig ;)

Image Courtesy :- Computer Vision Algorithm and Application

Question ?

Notable References :-

CSCE (CMPS) 508 lecture notes

Computer Vision :- Model, Learning and Inference

Computer Vision :- Algorithm and Application

Scale. Location and Orientation invariant

Steps :

Image is blurred with K Gaussian kernels

These blurred images are stacked together

In 3D voxel neighborhood extremas are identified

Extrema position is approximated to sub-pixel level

Orientation :- 36 bins to cover 360 degrees, each values has local support

d(i,j) -> known point, w(i,j)-> data penalty, f(i,j) -> unkown points, s(i,j) -> interaction potentials

These are compact representations that summarize the contents of an image region

SIFT descriptor usually compliments sift detector.

Orientation and amplitude maps are computed around interest points

A 16 x 16 window with non overlapping 4 x 4 grid is created

At each cell 8D histogram of the image orientations is computed.

4 x 4 = 16 histograms are concatenated to make a single 128 X 1 vector

The descriptor is invariant to constant intensity changes.

To fit models to data under severally under constrained solution space

unknown function f(x,y) from d(x,y) data points (inverse problem)

Goal :-

Formulate the task of transformation (filtering)

Use some optimization method

infer the best solution (energy minimization)

Use :- Most of the Computer vision tasks are inverse problem

Steps involved : -

Construct a global energy function that describes solution parameters

Find minimum energy solution

Sparse Linear system

Iterative technique

Consciously leaving out the math details

Bayesian statistical modeling :-

prior assumption about solution space,

log-likelihood and energy minimization using maximum a posterioiri (MAP)

Probability in a MRF is a Gibbs distribution

Good for labeling problem

Used for interactive segmentation, structure detection.

Speeded Up Robust Features (SURF)

Representation as integral image

Sum = A-B-C+ D (A,B,C,D are corners of a box)

Hessian Matrix based interest points

They are fast and are computed at every point

Hessian matrix at x = (x,y) at scale sigma is given as

SURF Descriptor :-

Haar wavelet responses.

Calculate dominant orientation of an interest point

Extract 64-D descriptor vector based on sums of wavelet response

Image courtesy :- "SURF: Speeded Up Robust Features" by Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool

Original Image

SIFT points

General Idea : - Data in higher dimension and reduced to lower dimension

Some of the techniques :-

Linear Methods

PCA : describes as much of the variance in the data as possible

LDA : maximize the linear separability between data points

Non Linear

Multidimensional scaling(MDS) :- retaining the pairwise distances between the data points as much as possible (good for Visualization mri)

isomaps :- Improves MDS by preserving pairwise geodesic distance

Kernel PCA :- linear PCA in a high-dimensional space that is constructed using a kernel function (say SVM)

Local non linear

LLE : local technique for dimensionality reduction that is similar to Isomap in that it constructs a graph representation of the datapoints

Laplacian Eigenmaps : the local properties are based on the pairwise distances between near neighbors (clusterring)

A few red strokes in the foreground object

A few blue ones in the background.

System computes color distributions for the foreground and background

Solves a binary MRF.

MRF example

Can you identify the image?

Do you know the story behind the photo?

Canny example

"Most iconic picture"