Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
Image Annotation and Understanding
Transcript of Image Annotation and Understanding
approaches - Extraction of specific semantics: Inside/Outside, City/Landscape etc - Set of images with/without the concept used for training a classifier - Each image in the dataset annotated wrt presence or absence of the concept Lack of generality Non scalable Unsupervised
approaches Use of latent variables to represent hidden states of the world Each state induces a joint distribution on the space of semantic labels and image descriptors Problem to face: Semantic Gap between high-level concepts and low-level image features When we receive an image to annotate: Extract feature vectors Get the set of labels that maximize the joint distribution Continuous-Space Relevance Model (CRM) Idea: Associate words with image regions More formally stated： Learn the Joint probability distribution P(v,l) for an image over its visual features v and its associated set of labels l Annotation: Compute the conditional likelihood P(l|v) Retrieval: Compute the query likelihood P(lq | vj) for each image j in the dataset Segmentate Images Calculate Joint distribution
P(regions|words) Annotation by search Concept Given a new image, perform k-Nearest Neighbors search Annotate image with the most popular labels of its neighbors Some works include a topic-level text model such as LDA to calculate similarities 東京大学知能情報システム研究室 Canonical Contextual Distance (CCA) Uses both visual features and labels to measure similarity at the same time Can be used even when no labels are provided Multimodal search: query image + text query Joint Equal Contribution (JEC) Idea: Combine distance measures defined over different feature spaces Color RGB with L1 measure HSV with L1 measure LAB with KL-divergence Texture Gabor wavelet with L1 measure Haar Wavelet with L1 measure Each feature contributes equally Idea Use Saliency Why? Searched concept usually is in the salient region of the image So we we can combine distance measures like in JEC and give more importance to saliency features It can also be tried adding a topic/term-level text model for similarity 終了