limitations of feature histograms:
- vectors of fixed dimension
- real numbers (just weights or frequency)
advantages of feature signatures:
- set of local features (aggregated, e.g., cluster centroids)
- variable size (suitable for both complex and simple objects)
- weigthed (e.g., cluster radius)
- set element is not limited to number, could be anything (e.g., vector)
Let's introduce an early-termination strategy for the lower bound computation, allowing to use just the best pivot pairs, for example, p of them
- assumption - close pivots to query and/or data are good
- prioritize pivots sorted w.r.t. the distances to query q and the object o
thank you for attention!
Ptolemaic Indexing of the Signature Quadratic Form Distance
Jakub Lokoč , Magnus Lie Hetland
Tomáš Skopal , Christian Beecks
SIRET Research Group, Charles University in Prague, Czech Republic
Norwegian University of Science and Technology, Norway
Data Management and Data Exploration Group, RWTH Aachen University, Germany
SISAP 2011, Lipari, Italy
Metric space model
Feature signatures
Signature Quadratic Form Distance (SQFD) is ptolemaic metric!
- triangle inequality
- lower-bounding using single pivot
Ptolemaic indexing
Fact: quadratic form distance (QFD) is ptolemaic metric
Proof idea:
- SQFD could be viewed as QFD having extremely large matrix A (each feature/centroid forms one dimension)
- matrix A of SQFD is just A's compact representation
- weight vectors <w >, <w > then become very sparse, while the zeros do not contribute to the vector-matrix product
- as long as the ground similarity function fs generates positive-definite matrix A (i.e., requirement of QFD's metricity), SQFD is ptolemaic metric as well
centroid
cluster weight
Descriptors in
multimedia databases
What's happening inside SQFD?
- Ptolemy's inequality
- lower-bounding using pair of pivots
Indexing similarity
- lower-bounding
- using cheap lower-bound distance LB(d(x,y)) <= d(x,y)
- if LB(d(q,o)) > query radius, then o cannot be in the result
- feature histograms
- feature signatures
minus
heuristic
gaussian
SQFD = adaptive distance for signatures
generalized from the original
quadratic form distance
(distance for histograms)
Intra (red) and inter (cyan) cluster comparison
... successful SQFD applications
in image retrieval and recently
3D model retrieval
Are there any
ptolemaic distances?
at least Quadratic form distance
(so also Euclidean) are ptolemaic
Ptolemaic Indexing of SQFD
Ptolemaic Indexing
Signature Quadratic Form
Distance (SQFD)
=
+
...is ptolemaic filtering better than metric filtering?
- smart distance for feature signatures
?
Experimental results
Sorting the pivots - another overhead?
not really...
- sorting all pivots w.r.t. query is performed just once per query
- sorting all pivots w.r.t. data objects requires an extension of the pivot tables structure (adding a pivot permutation for each data object)
(random data sampled from
10D Euclidean cube, 20 pivots)
Balanced
Unbalanced
Two strategies
- Unbalanced - fix the closest pivot to the data and use with all the rest
- Balanced - determine the optimal pair in a zig-zag way
- two pivots, instead of one
- natural pivot interpretation, one pivot proxy for query, the second one for data object
instead of p pivots, we have pivot pairs (?!)
- this is good for the tightness of lower bound obtained "for free"
- well, it is not really for free, it requires large CPU overhead if all the pivot pairs are used
Why ptolemaic indexing?
two datasets
- ALOI (72 000 images)
- MIR Flickr (25 000 images)
7-dimensional features extracted and clustered for each image, resulting in feature signature of 5-115 centroids per image
advantages vs. drawbacks?
- ordinary pivot tables (LAESA) could be used as ptolemaic access method
- additional requirement: need to store the distances between pivots
- during querying the ptolemaic lower bound is used instead of the triangle one
Ptolemaic Pivot Tables