Visual Feature Graphs
and Image Recognition
1. Quantised graphs
Introduction
A. Why graph structures?
B. Challenges
A. Supervised image recognition
B. A finite-dimensional representation for attribute graphs
Why is image recognition a hard problem?
How to represent images?
Related work: bags of words and point localisation
Related work: the bag of words (BoW) representation
Image
space
"Here is an image. What does it contain?"
Feature sampling
Feature space quantisation
Concept
space
(SIFT, SURF, Harris-Laplace...)
(K-means, agglomerative hierarchical...)
Related work: the bag of words representation
+
Graph representations vs. Graph distances
Visual feature graph construction
- Local features
- Finite dimensionality
- Robust appearance
Summary
Problems
Graph representation
-
- Visual feature sampling
- Codebook construction
- Word count
- No layout information
- Feature space quantisation
Pyramid of BoW:
- "The pyramid match kernel: discriminative classification with sets of image features", Grauman & Darrell, CVPR 2005
- "Beyond bags of features: spatial pyramid matching for recognizing natural scene categories", Lazebnik et al., CVPR 2006
Correlatons:
- "Discriminative object class models of appearance and shape by correlatons", Savarese et al., CVPR 2006
- Nodes: visual features
- Edges: nearest neighbours relative to point scale
N x N weight matrix
(N = #feature points)
- Node ordering
- Variable number of nodes
- "Representing and recognizing the visual appearance of materials using three-dimensional textons", Leung & Malik, IJCV 2001
- "Visual categorization with bags of keypoints", Csurka et al., ECCV 2004
- "Local features and kernels for classification of texture and object categories: a comprehensive study", Zhang et al., IJCV 2007
Graph edit distance:
- "On a relation between graph edit distance and maximum common subgraph", Pattern Recognition Letters, 1997
- "Bayesian graph edit distance", Myers et al. PAMI 2000
Graph matching:
- "Inexact Graph Matching Using Estimation of Distribution Algorithms", Bengoetxa, PhD Thesis, 2002
- "Thirty years of Graph Matching in Image Recognition", Conte et al., IJPRAI, 2004
Limitations:
- NP-complete problem
- Cannot scale to > 1000 graph nodes
PhD Thesis
Example
Commute time distance
Large intra-class variability
Supervised training
Original feature graph
Definition
Appearance-collapsed feature graph
Representation of the appearance-collapsed feature graph
"For a random walk started at node i, CT(i,j) is the average number of steps required to reach node j for the first time, and then return to i for the first time"
= Visual feature quantisation bins
Ill posed problem
Challenges of image recognition
Computation
Original image space:
How to build a relevant graph structure?
embedding of the graph nodes in a space where: d(i,j) = CT(i,j)
Challenges of image recognition
Commute time embedding vs Isomap
Several possibilities:
- Transition matrix
- Shortest path distance matrix
- Commute time distance matrix
Shortest path distance matrix
Isomap embedding
Commute time embedding
Commute time embedding
(Qiu & Hancock 2007)
Goal: incorporate point layout info.
Requirements: invariance to changes in:
- scale,
- orientation,
- illumination,
- appearance (small changes).
Image labelling
Supervised image recognition
Thomas Herbert's Relation of Some Years Travels in Africa and Asia, 1677
"Let us mention the Dodo whose body is big and round. His corpulence gives it a slow and lazy walk. There are some nearing 50 pounds in weight. Its sight is of more interest than its taste and he looks melancholic as if he was sorry that Nature had given him such small wings for so big a body. Some have their head capped with a dark down, some had the top of their head bald and whitish as if it had been washed.They have a long and curved bill with the nostrils openings half way to the tip. It is greenish yellow. Their eyes are round and shiny and they have a fluffy plumage. Their tail looks like the sparsely beard of a Chinese made up of three or four short feathers. Their feet are thick and black and their toes powerful. They have a fiery stomach allowing them to digest stones like ostriches do".
- The picture was taken outside,
- on the seashore,
- it contains people, boats, some water and buildings,
- the picture was shot in XingCheng, China.
Quantised graph representation: Summary
An image representation that includes point layout information
An orderless representation:
Training data:
{(image, label)}
Object detection
Datasets
- Visual feature graph construction
- Feature space quantisation
- Graph collapse
- Distance (commute time) matrix of the collapsed graph
SceneClass-13
Satellite-8
Graz02-bike
"Node i is connected to node j if and only if i is near to j relatively to their respective scales"
Quickbird 0.5m, panchromatic
- Segmentation available
- Large background clutter
Semantic gap
C. Outline
C. Results
defended by
D. Conclusion
Contributions
Limitations
Talk outline
Graphs vs. BoW
Synthetic dataset
- Performance gain < 2 pp
- Possible sources of performance loss:
Good classification rate: 92.75%
- Graph construction invariant to rigid geometric changes
- Graph collapse
- Representation by the graph distance matrix
- Consistent performance gain
------------ Part 1 ------------
Feature quantisation
Graph collapse
Representation in Euclidean space
Graph of quantised features
Support Vector Machine (SVM) classification
- Datasets: 1/2 training, 1/2 testing
- One vs One SVM classification
- SURF features
- Commute time distance
- Node connection: scale-normalised distance < 3
Graph of unquantised features
Classification by optimal NBNN
How to avoid feature quantisation?
---------------- Part 2----------------
Graph of unquantized features
A
B
A
B
A
B
D
B
C
D
A
D
A
D
A
C
B
A
C
D
B
D
- Graph = Multiple sets of point pairs
- We need a classifier for {{points}}
Régis Behmo
Ecole Centrale Paris
Conclusions
2. Unquantised graphs
PhD advisors
Nikos Paragios
Véronique Prinet
15 September 2010
A. Summary
B. Contributions
C. Limitations
Image representation
Visual feature graph
Classification by optimal NBNN
Representation by distance matrices
- Gain of graph structure < 2 pp
- Non-model dependent graph construction
- Graph structure robust to rigid geometric changes
- Graph representation for classification
A. Images as point clouds
Image classification
Optimal NBNN
Appearance-collapsed feature graphs
- Model overfitting in optimal NBNN
- Long-distance interactions not relevant
B. Density-correction parameters estimation
Distance-collapsed feature graph
- Parametric classifier of point clouds
- Linear
- Multi-channel
- NBNN < BoW + linear SVM < opt. NBNN
Predicted class label (equiprobable classes):
Positive and negative feature sets:
Visual feature log likelihood
- Binary classification:
- Parameter estimation by cross validation
- Hinge loss minimisation
Parzen-Rosenblatt estimator
Nearest neighbor approximation
= {training features from class c}
= "distance from x to nearest neighbor in E"
NBNN vs Optimal NBNN
Feature log-likelihood:
Naive Bayes Nearest Neighbors (NBNN, Boiman 2008):
Optimal NBNN:
Parameter estimation
Density-correction parameters
Distance to negative feature set
Distance to positive feature set
Constrained linear program:
#variables = #validation images + 3
Optimal NBNN
vs NBNN vs BoW
D = feature space dimensionality
#points of class c in training set
Density correction parameters:
- are class-dependent,
- cannot be set by hand.
E. Publications
D. Perspectives
- SceneClass13 dataset
- Sum of one vs one classifiers
- "Towards Optimal Naive Bayes Nearest Neighbors", Behmo, Marcombes, Dalalyan, Prinet, ECCV 2010
- "Graph Commute Times for Image Representation", Behmo, Paragios, Prinet, CVPR 2008
- "An Application of Graph Commute Times to Image Indexing", Behmo, Paragios, Prinet, IGARSS 2008
- Naive Bayes assumption relaxation
D. Related work
C. Classifier properties
Nearest-Neighbour classifiers
- "Lazy learning", David Aha, 1997
- "Large margin nearest neighbor classifiers", Domeniconi et al., Neural Networks, 2005
(1) Linearity
Kernels on point clouds
(11) Multiple channels
The prediction function is linear:
- "A modified Hausdorff distance for object matching", Dubuisson, Jain, CVPR 1994
- "The intermediate matching kernel for image local features", Boughorbel et al., Neural Networks 2005
Images as multiple point clouds
Results (1/2)
- Graz02 dataset
- Bike vs Background
- Multichannel optimal NBNN: 5 color SIFT
Optimal NBNN multi-channel classification:
Classification by detection
Object detection, Classification by detection
Corrected nearest-neighbor distances to channel n of +/- class
Graphs as multiple point clouds
Results (2/2)
- SceneClass13 dataset
- Image = Graph of unquantised features
- Shortest path distance
Fast object detection with optimal NBNN
Classification by detection
Image = Graph =
{node pairs separated by distance 0}
+ {node pairs separated by distance 1}
+ {node pairs separated by distance 2}
+ ...
Goal: find the image subwindow J that maximizes
Paths of zero length (No graph structure)
+ Paths of length 1 (Connected nodes)
Solution: branch-and-bound subwindow search
- Search space: {all rectangular subwindows}
- Branch: cut space in half
- Upper bound: Lampert & Blaschko, CVPR 2008
- Average complexity: O(N²)
- "Combining efficient object localization and image classification", Harzallah et al. ICCV 2009
- "Discriminative subvolume search for efficient action detection", Yuan et al., CVPR 2009