Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

Visual Feature Graphs

and Image Recognition

1. Quantised graphs

Introduction

A. Why graph structures?

B. Challenges

A. Supervised image recognition

B. A finite-dimensional representation for attribute graphs

Why is image recognition a hard problem?

How to represent images?

Related work: bags of words and point localisation

Related work: the bag of words (BoW) representation

Image

space

"Here is an image. What does it contain?"

Feature sampling

Feature space quantisation

Concept

space

Visual features

(SIFT, SURF, Harris-Laplace...)

(K-means, agglomerative hierarchical...)

Image BoW vector in

Related work: the bag of words representation

+

Graph representations vs. Graph distances

Visual feature graph construction

  • Local features
  • Finite dimensionality
  • Robust appearance

Summary

Problems

Graph representation

-

  • Visual feature sampling
  • Codebook construction
  • Word count
  • No layout information
  • Feature space quantisation

Pyramid of BoW:

  • "The pyramid match kernel: discriminative classification with sets of image features", Grauman & Darrell, CVPR 2005
  • "Beyond bags of features: spatial pyramid matching for recognizing natural scene categories", Lazebnik et al., CVPR 2006

Correlatons:

  • "Discriminative object class models of appearance and shape by correlatons", Savarese et al., CVPR 2006
  • Nodes: visual features
  • Edges: nearest neighbours relative to point scale

N x N weight matrix

(N = #feature points)

  • Node ordering
  • Variable number of nodes
  • "Representing and recognizing the visual appearance of materials using three-dimensional textons", Leung & Malik, IJCV 2001
  • "Visual categorization with bags of keypoints", Csurka et al., ECCV 2004
  • "Local features and kernels for classification of texture and object categories: a comprehensive study", Zhang et al., IJCV 2007

Graph edit distance:

  • "On a relation between graph edit distance and maximum common subgraph", Pattern Recognition Letters, 1997
  • "Bayesian graph edit distance", Myers et al. PAMI 2000

Graph matching:

  • "Inexact Graph Matching Using Estimation of Distribution Algorithms", Bengoetxa, PhD Thesis, 2002
  • "Thirty years of Graph Matching in Image Recognition", Conte et al., IJPRAI, 2004

Limitations:

  • NP-complete problem
  • Cannot scale to > 1000 graph nodes

PhD Thesis

  • Problem
  • Solution

Example

Commute time distance

Large intra-class variability

Supervised training

Original feature graph

Definition

Appearance-collapsed feature graph

Representation of the appearance-collapsed feature graph

"For a random walk started at node i, CT(i,j) is the average number of steps required to reach node j for the first time, and then return to i for the first time"

= Visual feature quantisation bins

Ill posed problem

Challenges of image recognition

Computation

Original image space:

How to build a relevant graph structure?

Graph weight matrix

embedding of the graph nodes in a space where: d(i,j) = CT(i,j)

Challenges of image recognition

Commute time embedding vs Isomap

Several possibilities:

  • Transition matrix
  • Shortest path distance matrix
  • Commute time distance matrix

Isomap

Shortest path distance matrix

Embedding in RGB space

Graph = {nodes, edges}

Isomap embedding

Commute time embedding

Commute time matrix

Commute time embedding

(Qiu & Hancock 2007)

Goal: incorporate point layout info.

Requirements: invariance to changes in:

  • scale,
  • orientation,
  • illumination,
  • appearance (small changes).

Image labelling

Phone

Supervised image recognition

Thomas Herbert's Relation of Some Years Travels in Africa and Asia, 1677

"Let us mention the Dodo whose body is big and round. His corpulence gives it a slow and lazy walk. There are some nearing 50 pounds in weight. Its sight is of more interest than its taste and he looks melancholic as if he was sorry that Nature had given him such small wings for so big a body. Some have their head capped with a dark down, some had the top of their head bald and whitish as if it had been washed.They have a long and curved bill with the nostrils openings half way to the tip. It is greenish yellow. Their eyes are round and shiny and they have a fluffy plumage. Their tail looks like the sparsely beard of a Chinese made up of three or four short feathers. Their feet are thick and black and their toes powerful. They have a fiery stomach allowing them to digest stones like ostriches do".

Model

Test image

  • The picture was taken outside,
  • on the seashore,
  • it contains people, boats, some water and buildings,
  • the picture was shot in XingCheng, China.

Quantised graph representation: Summary

An image representation that includes point layout information

An orderless representation:

Graph nodes

Classifier

Training

Training data:

{(image, label)}

Predicted

label

Object detection

Graph edges

Datasets

  • Visual feature graph construction
  • Feature space quantisation
  • Graph collapse
  • Distance (commute time) matrix of the collapsed graph

SceneClass-13

Satellite-8

Graz02-bike

"Node i is connected to node j if and only if i is near to j relatively to their respective scales"

Node i

Node j

Quickbird 0.5m, panchromatic

  • Segmentation available
  • Large background clutter

Semantic gap

C. Outline

C. Results

defended by

D. Conclusion

Contributions

Limitations

Talk outline

Graphs vs. BoW

Synthetic dataset

  • Performance gain < 2 pp
  • Possible sources of performance loss:

Good classification rate: 92.75%

  • Graph construction invariant to rigid geometric changes
  • Graph collapse
  • Representation by the graph distance matrix
  • Consistent performance gain

------------ Part 1 ------------

Feature quantisation

Graph collapse

+ class

Image

Representation in Euclidean space

Graph of quantised features

Support Vector Machine (SVM) classification

  • Datasets: 1/2 training, 1/2 testing
  • One vs One SVM classification
  • SURF features

- class

  • Commute time distance
  • Node connection: scale-normalised distance < 3

Image

Graph of unquantised features

Classification by optimal NBNN

How to avoid feature quantisation?

---------------- Part 2----------------

...

Paths of length 0

Paths of length 1

= contributions

Graph of unquantized features

A

B

A

B

A

B

D

B

C

D

A

D

...

Paths of length 2

Paths of length 3

A

D

A

C

...

B

A

C

D

B

D

  • Graph = Multiple sets of point pairs
  • We need a classifier for {{points}}

Régis Behmo

Ecole Centrale Paris

Conclusions

2. Unquantised graphs

PhD advisors

Nikos Paragios

Véronique Prinet

15 September 2010

A. Summary

B. Contributions

C. Limitations

Image representation

Visual feature graph

Classification by optimal NBNN

Representation by distance matrices

  • Gain of graph structure < 2 pp
  • Non-model dependent graph construction

Multiple point clouds

  • Graph structure robust to rigid geometric changes
  • Graph representation for classification

Visual feature graphs

A. Images as point clouds

Image classification

Optimal NBNN

Appearance-collapsed feature graphs

  • Model overfitting in optimal NBNN
  • Long-distance interactions not relevant

B. Density-correction parameters estimation

Distance-collapsed feature graph

  • Parametric classifier of point clouds
  • Linear
  • Multi-channel
  • NBNN < BoW + linear SVM < opt. NBNN

Quantized features

Unquantized features

Predicted class label (equiprobable classes):

Training set

Positive and negative feature sets:

Visual feature log likelihood

  • Binary classification:
  • Parameter estimation by cross validation
  • Hinge loss minimisation

Parzen-Rosenblatt estimator

Naive Bayes assumption:

Nearest neighbor approximation

Validation set

= {training features from class c}

= "distance from x to nearest neighbor in E"

Training images to label

NBNN vs Optimal NBNN

Feature log-likelihood:

Naive Bayes Nearest Neighbors (NBNN, Boiman 2008):

Optimal NBNN:

constant across classes

class-dependent

Parameter estimation

Predicted image label:

Error to minimize:

Density-correction parameters

Ground truth

Hinge loss

Distance to negative feature set

Distance to positive feature set

Constrained linear program:

#variables = #validation images + 3

(dilation/compression)

Where:

(translation)

Optimal NBNN

vs NBNN vs BoW

D

D/2

D = feature space dimensionality

#points of class c in training set

Density correction parameters:

  • are class-dependent,
  • cannot be set by hand.

E. Publications

D. Perspectives

  • SceneClass13 dataset
  • Sum of one vs one classifiers
  • "Towards Optimal Naive Bayes Nearest Neighbors", Behmo, Marcombes, Dalalyan, Prinet, ECCV 2010
  • "Graph Commute Times for Image Representation", Behmo, Paragios, Prinet, CVPR 2008
  • "An Application of Graph Commute Times to Image Indexing", Behmo, Paragios, Prinet, IGARSS 2008
  • Naive Bayes assumption relaxation

  • Kernel optimal NBNN

  • Speed improvement

D. Related work

C. Classifier properties

Nearest-Neighbour classifiers

  • "Lazy learning", David Aha, 1997
  • "Large margin nearest neighbor classifiers", Domeniconi et al., Neural Networks, 2005

(1) Linearity

Kernels on point clouds

(11) Multiple channels

The prediction function is linear:

  • "A modified Hausdorff distance for object matching", Dubuisson, Jain, CVPR 1994
  • "The intermediate matching kernel for image local features", Boughorbel et al., Neural Networks 2005

Images as multiple point clouds

Results (1/2)

  • Graz02 dataset
  • Bike vs Background
  • Multichannel optimal NBNN: 5 color SIFT

= nth feature channel

Optimal NBNN multi-channel classification:

Classification

Classification by detection

Object detection, Classification by detection

Corrected nearest-neighbor distances to channel n of +/- class

Graphs as multiple point clouds

Results (2/2)

  • SceneClass13 dataset
  • Image = Graph of unquantised features
  • Shortest path distance

Fast object detection with optimal NBNN

Classification by detection

Image = Graph =

{node pairs separated by distance 0}

+ {node pairs separated by distance 1}

+ {node pairs separated by distance 2}

+ ...

Goal: find the image subwindow J that maximizes

Paths of zero length (No graph structure)

+ Paths of length 1 (Connected nodes)

Solution: branch-and-bound subwindow search

+ Paths of length 2

  • Search space: {all rectangular subwindows}
  • Branch: cut space in half
  • Upper bound: Lampert & Blaschko, CVPR 2008
  • Average complexity: O(N²)
  • "Combining efficient object localization and image classification", Harzallah et al. ICCV 2009
  • "Discriminative subvolume search for efficient action detection", Yuan et al., CVPR 2009
Learn more about creating dynamic, engaging presentations with Prezi