Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

limitations of feature histograms:

  • vectors of fixed dimension
  • real numbers (just weights or frequency)

advantages of feature signatures:

  • set of local features (aggregated, e.g., cluster centroids)
  • variable size (suitable for both complex and simple objects)
  • weigthed (e.g., cluster radius)
  • set element is not limited to number, could be anything (e.g., vector)

Let's introduce an early-termination strategy for the lower bound computation, allowing to use just the best pivot pairs, for example, p of them

  • assumption - close pivots to query and/or data are good
  • prioritize pivots sorted w.r.t. the distances to query q and the object o

thank you for attention!

Ptolemaic Indexing of the Signature Quadratic Form Distance

+

*

Jakub Lokoč , Magnus Lie Hetland

Tomáš Skopal , Christian Beecks

#

*

+

SIRET Research Group, Charles University in Prague, Czech Republic

Norwegian University of Science and Technology, Norway

Data Management and Data Exploration Group, RWTH Aachen University, Germany

#

SISAP 2011, Lipari, Italy

Metric space model

Feature signatures

Signature Quadratic Form Distance (SQFD) is ptolemaic metric!

  • triangle inequality
  • lower-bounding using single pivot

fs

Ptolemaic indexing

Fact: quadratic form distance (QFD) is ptolemaic metric

Proof idea:

  • SQFD could be viewed as QFD having extremely large matrix A (each feature/centroid forms one dimension)
  • matrix A of SQFD is just A's compact representation

  • weight vectors <w >, <w > then become very sparse, while the zeros do not contribute to the vector-matrix product
  • as long as the ground similarity function fs generates positive-definite matrix A (i.e., requirement of QFD's metricity), SQFD is ptolemaic metric as well

centroid

cluster weight

Descriptors in

multimedia databases

What's happening inside SQFD?

  • Ptolemy's inequality
  • lower-bounding using pair of pivots

p

q

Indexing similarity

ground distance

  • lower-bounding
  • using cheap lower-bound distance LB(d(x,y)) <= d(x,y)
  • if LB(d(q,o)) > query radius, then o cannot be in the result
  • feature histograms
  • feature signatures

minus

heuristic

gaussian

SQFD = adaptive distance for signatures

generalized from the original

quadratic form distance

(distance for histograms)

Intra (red) and inter (cyan) cluster comparison

... successful SQFD applications

in image retrieval and recently

3D model retrieval

Are there any

ptolemaic distances?

at least Quadratic form distance

(so also Euclidean) are ptolemaic

Ptolemaic Indexing of SQFD

Ptolemaic Indexing

Signature Quadratic Form

Distance (SQFD)

  • ultimate performance

=

+

  • beyond the metric

space model

...is ptolemaic filtering better than metric filtering?

  • smart distance for feature signatures

?

Experimental results

depends...

(2D Euclidean space)

Sorting the pivots - another overhead?

not really...

  • sorting all pivots w.r.t. query is performed just once per query
  • sorting all pivots w.r.t. data objects requires an extension of the pivot tables structure (adding a pivot permutation for each data object)

(random data sampled from

10D Euclidean cube, 20 pivots)

Balanced

Unbalanced

Two strategies

  • Unbalanced - fix the closest pivot to the data and use with all the rest
  • Balanced - determine the optimal pair in a zig-zag way
  • two pivots, instead of one
  • natural pivot interpretation, one pivot proxy for query, the second one for data object

instead of p pivots, we have pivot pairs (?!)

  • this is good for the tightness of lower bound obtained "for free"
  • well, it is not really for free, it requires large CPU overhead if all the pivot pairs are used

Why ptolemaic indexing?

2

two datasets

  • ALOI (72 000 images)
  • MIR Flickr (25 000 images)

7-dimensional features extracted and clustered for each image, resulting in feature signature of 5-115 centroids per image

( )

p

advantages vs. drawbacks?

  • ordinary pivot tables (LAESA) could be used as ptolemaic access method
  • additional requirement: need to store the distances between pivots
  • during querying the ptolemaic lower bound is used instead of the triangle one

Ptolemaic Pivot Tables

Learn more about creating dynamic, engaging presentations with Prezi