Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Image Annotation and Understanding

No description

Ximo Castilla

on 1 October 2012

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Image Annotation and Understanding

Image Annotation and Understanding Joaquin Castilla Escobar 京都大学松山研究室 What is Image annotation? Image annotation is the process of autommatically assign labels to a digital image Wikipedia Why do we need it? Impossible to manually annotate millions of images! ...so what can we do? Methods In the beginning... But... Content based Image Retrieval CBIR Query by concept Supervised
approaches - Extraction of specific semantics: Inside/Outside, City/Landscape etc - Set of images with/without the concept used for training a classifier - Each image in the dataset annotated wrt presence or absence of the concept Lack of generality Non scalable Unsupervised
approaches Use of latent variables to represent hidden states of the world Each state induces a joint distribution on the space of semantic labels and image descriptors Problem to face: Semantic Gap between high-level concepts and low-level image features When we receive an image to annotate: Extract feature vectors Get the set of labels that maximize the joint distribution Continuous-Space Relevance Model (CRM) Idea: Associate words with image regions More formally stated: Learn the Joint probability distribution P(v,l) for an image over its visual features v and its associated set of labels l Annotation: Compute the conditional likelihood P(l|v) Retrieval: Compute the query likelihood P(lq | vj) for each image j in the dataset Segmentate Images Calculate Joint distribution
P(regions|words) Annotation by search Concept Given a new image, perform k-Nearest Neighbors search Annotate image with the most popular labels of its neighbors Some works include a topic-level text model such as LDA to calculate similarities 東京大学知能情報システム研究室 Canonical Contextual Distance (CCA) Uses both visual features and labels to measure similarity at the same time Can be used even when no labels are provided Multimodal search: query image + text query Joint Equal Contribution (JEC) Idea: Combine distance measures defined over different feature spaces Color RGB with L1 measure HSV with L1 measure LAB with KL-divergence Texture Gabor wavelet with L1 measure Haar Wavelet with L1 measure Each feature contributes equally Idea Use Saliency Why? Searched concept usually is in the salient region of the image So we we can combine distance measures like in JEC and give more importance to saliency features It can also be tried adding a topic/term-level text model for similarity 終了
Full transcript