Introducing

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Visual Feature Graphs and Image Recognition

Régis Behmo

Updated Sept. 26, 2010

Transcript

Visual Feature Graphs

and Image Recognition

1. Quantised graphs

Introduction

A. Why graph structures?

B. Challenges

A. Supervised image recognition

B. A finite-dimensional representation for attribute graphs

Why is image recognition a hard problem?

How to represent images?

Related work: bags of words and point localisation

Related work: the bag of words (BoW) representation

Image

space

"Here is an image. What does it contain?"

Feature sampling

Feature space quantisation

Concept

space

Visual features

(SIFT, SURF, Harris-Laplace...)

(K-means, agglomerative hierarchical...)

Image BoW vector in

Related work: the bag of words representation

+

Graph representations vs. Graph distances

Visual feature graph construction

Local features
Finite dimensionality
Robust appearance

Summary

Problems

Graph representation

-

Visual feature sampling
Codebook construction
Word count

No layout information
Feature space quantisation

Pyramid of BoW:

"The pyramid match kernel: discriminative classification with sets of image features", Grauman & Darrell, CVPR 2005
"Beyond bags of features: spatial pyramid matching for recognizing natural scene categories", Lazebnik et al., CVPR 2006

Correlatons:

"Discriminative object class models of appearance and shape by correlatons", Savarese et al., CVPR 2006

Nodes: visual features
Edges: nearest neighbours relative to point scale

N x N weight matrix

(N = #feature points)

Node ordering
Variable number of nodes

"Representing and recognizing the visual appearance of materials using three-dimensional textons", Leung & Malik, IJCV 2001
"Visual categorization with bags of keypoints", Csurka et al., ECCV 2004
"Local features and kernels for classification of texture and object categories: a comprehensive study", Zhang et al., IJCV 2007

Graph edit distance:

"On a relation between graph edit distance and maximum common subgraph", Pattern Recognition Letters, 1997
"Bayesian graph edit distance", Myers et al. PAMI 2000

Graph matching:

"Inexact Graph Matching Using Estimation of Distribution Algorithms", Bengoetxa, PhD Thesis, 2002
"Thirty years of Graph Matching in Image Recognition", Conte et al., IJPRAI, 2004

Limitations:

NP-complete problem
Cannot scale to > 1000 graph nodes

PhD Thesis

Problem
Solution

Example

Commute time distance

Large intra-class variability

Supervised training

Original feature graph

Definition

Appearance-collapsed feature graph

Representation of the appearance-collapsed feature graph

"For a random walk started at node i, CT(i,j) is the average number of steps required to reach node j for the first time, and then return to i for the first time"

= Visual feature quantisation bins

Ill posed problem

Challenges of image recognition

Computation

Original image space:

How to build a relevant graph structure?

Graph weight matrix

embedding of the graph nodes in a space where: d(i,j) = CT(i,j)

Challenges of image recognition

Commute time embedding vs Isomap

Several possibilities:

Transition matrix
Shortest path distance matrix
Commute time distance matrix

Isomap

Shortest path distance matrix

Embedding in RGB space

Graph = {nodes, edges}

Isomap embedding

Commute time embedding

Commute time matrix

Commute time embedding

(Qiu & Hancock 2007)

Goal: incorporate point layout info.

Requirements: invariance to changes in:

scale,
orientation,
illumination,
appearance (small changes).

Image labelling

Phone

Supervised image recognition

Thomas Herbert's Relation of Some Years Travels in Africa and Asia, 1677

"Let us mention the Dodo whose body is big and round. His corpulence gives it a slow and lazy walk. There are some nearing 50 pounds in weight. Its sight is of more interest than its taste and he looks melancholic as if he was sorry that Nature had given him such small wings for so big a body. Some have their head capped with a dark down, some had the top of their head bald and whitish as if it had been washed.They have a long and curved bill with the nostrils openings half way to the tip. It is greenish yellow. Their eyes are round and shiny and they have a fluffy plumage. Their tail looks like the sparsely beard of a Chinese made up of three or four short feathers. Their feet are thick and black and their toes powerful. They have a fiery stomach allowing them to digest stones like ostriches do".

Model

Test image

The picture was taken outside,
on the seashore,
it contains people, boats, some water and buildings,
the picture was shot in XingCheng, China.

Quantised graph representation: Summary

An image representation that includes point layout information

An orderless representation:

Graph nodes

Classifier

Training

Training data:

{(image, label)}

Predicted

label

Object detection

Graph edges

Datasets

Visual feature graph construction
Feature space quantisation
Graph collapse
Distance (commute time) matrix of the collapsed graph

SceneClass-13

Satellite-8

Graz02-bike

"Node i is connected to node j if and only if i is near to j relatively to their respective scales"

Node i

Node j

Quickbird 0.5m, panchromatic

Segmentation available
Large background clutter

Semantic gap

C. Outline

C. Results

defended by

D. Conclusion

Contributions

Limitations

Talk outline

Graphs vs. BoW

Synthetic dataset

Performance gain < 2 pp
Possible sources of performance loss:

Good classification rate: 92.75%

Graph construction invariant to rigid geometric changes
Graph collapse
Representation by the graph distance matrix
Consistent performance gain

------------ Part 1 ------------

Feature quantisation

Graph collapse

+ class

Image

Representation in Euclidean space

Graph of quantised features

Support Vector Machine (SVM) classification

Datasets: 1/2 training, 1/2 testing
One vs One SVM classification
SURF features

- class

Commute time distance
Node connection: scale-normalised distance < 3

Image

Graph of unquantised features

Classification by optimal NBNN

How to avoid feature quantisation?

---------------- Part 2----------------

...

Paths of length 0

Paths of length 1

= contributions

Graph of unquantized features

A

B

A

B

A

B

D

B

C

D

A

D

...

Paths of length 2

Paths of length 3

A

D

A

C

...

B

A

C

D

B

D

Graph = Multiple sets of point pairs
We need a classifier for {{points}}

Régis Behmo

Ecole Centrale Paris

Conclusions

2. Unquantised graphs

PhD advisors

Nikos Paragios

Véronique Prinet

15 September 2010

A. Summary

B. Contributions

C. Limitations

Image representation

Visual feature graph

Classification by optimal NBNN

Representation by distance matrices

Gain of graph structure < 2 pp
Non-model dependent graph construction

Multiple point clouds

Graph structure robust to rigid geometric changes
Graph representation for classification

Visual feature graphs

A. Images as point clouds

Image classification

Optimal NBNN

Appearance-collapsed feature graphs

Model overfitting in optimal NBNN
Long-distance interactions not relevant

B. Density-correction parameters estimation

Distance-collapsed feature graph

Parametric classifier of point clouds
Linear
Multi-channel
NBNN < BoW + linear SVM < opt. NBNN

Quantized features

Unquantized features

Predicted class label (equiprobable classes):

Training set

Positive and negative feature sets:

Visual feature log likelihood

Binary classification:
Parameter estimation by cross validation
Hinge loss minimisation

Parzen-Rosenblatt estimator

Naive Bayes assumption:

Nearest neighbor approximation

Validation set

= {training features from class c}

= "distance from x to nearest neighbor in E"

Training images to label

NBNN vs Optimal NBNN

Feature log-likelihood:

Naive Bayes Nearest Neighbors (NBNN, Boiman 2008):

Optimal NBNN:

constant across classes

class-dependent

Parameter estimation

Predicted image label:

Error to minimize:

Density-correction parameters

Ground truth

Hinge loss

Distance to negative feature set

Distance to positive feature set

Constrained linear program:

#variables = #validation images + 3

(dilation/compression)

Where:

(translation)

Optimal NBNN

vs NBNN vs BoW

D/2

D = feature space dimensionality

#points of class c in training set

Density correction parameters:

are class-dependent,
cannot be set by hand.

E. Publications

D. Perspectives

SceneClass13 dataset
Sum of one vs one classifiers

"Towards Optimal Naive Bayes Nearest Neighbors", Behmo, Marcombes, Dalalyan, Prinet, ECCV 2010
"Graph Commute Times for Image Representation", Behmo, Paragios, Prinet, CVPR 2008
"An Application of Graph Commute Times to Image Indexing", Behmo, Paragios, Prinet, IGARSS 2008

Naive Bayes assumption relaxation

Kernel optimal NBNN

Speed improvement

D. Related work

C. Classifier properties

Nearest-Neighbour classifiers

"Lazy learning", David Aha, 1997
"Large margin nearest neighbor classifiers", Domeniconi et al., Neural Networks, 2005

(1) Linearity

Kernels on point clouds

(11) Multiple channels

The prediction function is linear:

"A modified Hausdorff distance for object matching", Dubuisson, Jain, CVPR 1994
"The intermediate matching kernel for image local features", Boughorbel et al., Neural Networks 2005

Images as multiple point clouds

Results (1/2)

Graz02 dataset
Bike vs Background
Multichannel optimal NBNN: 5 color SIFT

= nth feature channel

Optimal NBNN multi-channel classification:

Classification

Classification by detection

Object detection, Classification by detection

Corrected nearest-neighbor distances to channel n of +/- class

Graphs as multiple point clouds

Results (2/2)

SceneClass13 dataset
Image = Graph of unquantised features
Shortest path distance

Fast object detection with optimal NBNN

Classification by detection

Image = Graph =

{node pairs separated by distance 0}

+ {node pairs separated by distance 1}

+ {node pairs separated by distance 2}

+ ...

Goal: find the image subwindow J that maximizes

Paths of zero length (No graph structure)

+ Paths of length 1 (Connected nodes)

Solution: branch-and-bound subwindow search

+ Paths of length 2

Search space: {all rectangular subwindows}
Branch: cut space in half
Upper bound: Lampert & Blaschko, CVPR 2008
Average complexity: O(N²)

"Combining efficient object localization and image classification", Harzallah et al. ICCV 2009
"Discriminative subvolume search for efficient action detection", Yuan et al., CVPR 2009

Choose a template

Constellations (AI Assisted)

Illuminate your ideas with our captivating Constellations Prezi AI-assisted presentation template, merging celestial elegance with professional design to elevate your content and guide your audience through a stellar visual experience.

Whiteboard (AI Assisted)

Unleash creativity and collaboration with our Whiteboard Prezi AI-assisted presentation template, seamlessly combining the simplicity of a traditional whiteboard with the power of digital innovation for dynamic and interactive visual storytelling.

Sheet Music (AI Assisted)

Elevate your presentations with our Sheet Music Prezi AI-assisted presentation template, seamlessly blending aesthetics and functionality for a harmonious visual experience.

See more templates →

Presentations from around the world

KIDS 2_CLASS 33

Silvina Caradaghian

Lecture: Intro to research strategy and evaluating for authority

Stephanie Roach

Creative Report

Gabriele Roncoroni

See staff picks →

Learn more about creating dynamic, engaging presentations with Prezi

Why Prezi is better