Introducing

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Visual Attention

Leimin Tian

Updated March 15, 2013

Transcript

Review on Human Gaze Control

A Model of Saliency-Based Visual Attention

for Rapid Scene Analysis

Contextual Guidance of Attention in Natural scenes:

The role of Global features on object search

Human experiments

Task: counting people/paintings/mugs
Eye movements evaluation
Exhaustive search regardless of the true number of targets present
Smaller objects, longer searching time
Participants are very consistent with one another

Summary

Realized a computational model of Bayesian network of attention and proved the importance of scene-schema knowledge for visual search tasks
Improved performance of saliency-map model by adding global features
Gave a baseline system combining bottom-up process and top-down process

Issues

Training of their model is based on a relatively small trainset, and the model may not perform well when given unseen test images
When using EM, only one starting point is chosen, which may lead to finding local optimum instead of global optimum. A possible solution is to try several random seeds as starting points and run EM several times
Used horizontal layers to compute global features instead of object recognition is computational cheap, but it may fail to give good description in some cases
Computed R, G, B separately. May also need to combine these channels
Only compared bottom-up and full model. Better if provide more experiments on top-down alone model

General Discussion

Thank you for your ATTENTION!

Similarities

Open Questions

Modeling the early stage of visual processing
Feed-forward and parallel structure
Static images as stimuli

What features should be involved in a saliency map and how are they weighted?
Gaze control of moving objects?
How to model episodic knowledge?
How to make the systems perform as efficient as human?

Differences

Paper 1:

Bottom-up stimulus-based
Only use local features
Multi-scale image properties: Color, intensity, orientation
Based on Neural Network

References

Paper 2:

Combines Bottom-up & Top-down
Use local & global features
Saliency map: only orientation in R, G, and B components
Based on Bayesian framework

Visual Attention

[1] John M. Henderson. 2003. Human gaze control during real-world scene perception. Trends in Cognitive Sciences, 7:11, 498-504.
[2] L. Itti, C. Koch, E. Niebur. 1998. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20:11, 1254-1259.
[3] A. Torralba, A. Oliva, M. Castelhano and J. M. Henderson. 2006. Contextual Guidance of Attention in Natural scenes: The role of Global features on object search. Psychological Review, 113:4, 766-786.

Leimin Tian

Shang Zhao

Introduction

Primate visual system
Reduce complexity before time-consuming processing
Select a subset of the scene
Feature integration theory
Pre-attention stage:
Features registered in parallel
Focused attention stage
Features in the attention location combine in order to perceive the whole object
Bottom-up without top-down guidance
Fast selection of a small number of interesting image locations

Experiments and Results

Detect salient traffic signs quickly
Robust to noise which does not directly interfere with the main feature of the target
Predict objects of interest: faces and flags
Comparison with Spatial Frequency Content Model

Model

(Itti et al, 1998)

Spatial Frequency: any structure that is periodic across position in space
Bad performance with speckle noise
Informative - not just high SFC

Images in 9 spatial scales
Fine center: c = {2,3,4}
Coarse surround: s = c + d, d = {3,4}
Features:
normalized colors: R,G,B,Y
Intensity = (R+G+B)/3
Orientations: {0, 45, 90, 135}
Feature maps: Center-surround, across-scale subtraction
Intensity contrast: dark centers against bright surrounds or vise versa
Color double-opponent: Red/green and blue/yellow
Orientation contrast: center VS surround
Conspicuity maps:
Normalization: promote maps with strong peaks while suppress homogeneous ones
Across-scale addition
Saliency map
Average of three conspicuity maps
Equally weighted
Neural Network
Leaky integrate-and-fire
Winner-take-all

Conclusion and Discussion

(Itti et al, 1998)

Human Gaze

Bottom-up Stimulus-based

Summary
Bottom-up saliency map to guide visual attention
Feed-forward feature-extraction mechanisms
Massively parallel architecture
Multi-scale features: Intensity, color, orientation
Biologically plausible neural networks
Successfully detect local salient targets
Show kind of robustness in noisy images.
Issues
Model predictions VS human fixation statistics
Unimplemented feature types (e.g., T junctions or line terminators)
A weighted linear combination of image properties
Recurrent mechanism for contour completion and closure

high quality visual information is acquired only from a limited spatial region surrounding the center of gaze (the point of fixation)

Why Important?

Scene statistics
Difference between fixated patches and unselected patches
Saliency-map
Model prediction of fixation using scene statistics (color, orientation, contrast, edge density, etc.)

Gaze is the first step of visual cognition
Eye movements serve as a window into the operation of the attentional system
Play an important role in studies of human languages

Connection

Top-down Knowledge-driven

Where do we fixate?

Paper 2 is a more recent work combining bottom-up process with top-down process
Global-context modulated local salient features
More integral model for visual attention.

Episodic scene knowledge
clock on the wall
Scene-schema knowledge
more likely to find superman in the sky than on the road
Task-related knowledge
visual search or memorize

Early studies of gaze control demonstrated that empty, uniform, and uninformative scene regions are often not fixated.
Viewers instead concentrate their fixations, including the very first fixation in a scene, on interesting and informative regions.
What is an "interesting and informative region"?

Compared salient model, full model and human

Performed well above chance level (based on ratio of target area to the whole image)
The contextual guidance model performed better than saliency-map alone model
In people searching task, human tended to start fixating image locations selected by global features
In mug searching task, salient model performed almost as well as full model
Human performed better than all models in all tasks

Experiments and Results

Saliency map alone VS Contextual Guidance Model

Papers

(Torralba et al, 2006)

Introduction

Paper 1: Bottom-up
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis (Itti et al, 1998)
Paper 2: Bottom-up + Top-down
Contextual Guidance of Attention in Natural scenes: The role of Global features on object search (Torralba et al, 2006)

Conclusion and Discussion

Provided a contextual guidance model combining bottom-up process with top-down process
Modeled attention using a Bayesian framework
Use two parallel pathways: local features (saliency-map) and global (scene context) features.
Features are computed in a feed-forward manner.
Top-down process mainly used scene-schema knowledge to select relevant image regions for visual search task

Model

(Torralba et al, 2006)

Local features: orientations computed from R, G, B separately by passing through filter bank. Then use PCA to reduce dimensionality.
Global features: p(O = 1,X|L,G) = [p(L|O = 1,X,G) * p(X|O = 1,G) * p(O = 1|G)] / p(L|G) (p from trainset)
Scene-modulated saliency map: S(X) = p(X|O = 1,G) * p(L|G)^(-r) (EM for r)

15th, March, 2013

Choose a template

Colorful Nature - Dark (AI Assisted)

A whimsical flower motif sets the fun tone for this Prezi AI-assisted presentation template. Just add your own text, images, videos, or other content to create a memorable and engaging presentation your audience will love. Like all Prezi templates, it’s easily customizable.

Constellations (AI Assisted)

Illuminate your ideas with our captivating Constellations Prezi AI-assisted presentation template, merging celestial elegance with professional design to elevate your content and guide your audience through a stellar visual experience.

Music Festival (AI Assisted)

Elevate your presentation with our dynamic and visually stunning Music Festival Prezi AI-assisted presentation template, designed to captivate audiences and showcase the rhythm of your event in every slide.

See more templates →

Presentations from around the world

Past Continuous

Adnia Osornio

EQ Rotação por estações: Desenvolvimento de fármacos

Daniele Oliveira Rocha

Escritorio plegable Frame

Desplieguevisual 2020

See staff picks →

Learn more about creating dynamic, engaging presentations with Prezi

Why Prezi is better