Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Perception and Cognition

No description

Sahra Kunz

on 25 November 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Perception and Cognition


Sahra Kunz - PhD CTA
October/November 2016

From Seeing to Perceiving to Drawing
How do we see?
The eye (is not a camera)

“In the study of perception, the starting point is not just a piece of the world, but a piece of the world plus an observer who looks at it. As a consequence, the data of the student of perception must have an additional source of variability.(…) To control the stimuli used in a perception experiment, a researcher would need to have precise knowledge of the physical world involved. Yet in this regard, the researcher is in the same position as the participants in the experiment.” (Massironi, 2002: 242)
On Perception
Those who study perception (Visual Perception) always influence the results of their experiments as they are, themselves, perceiving beings.
“Light reflected off objects does not fall on the eyes of mindless creatures; each of us is endowed with a thinking brain that we use more or less effectively to comprehend what the eye senses”. (Solso, 2001: 4)
Diagram of the eye
The path of light once it enters the eye:

- It crosses the CORNEA, a transparent membrane that refracts ambient light

- The IRIS contracts and expands making the PUPIL larger or smaller

- The LENS also contracts and expands, adjusting to the distance of the observed object

- The inside of the EYE is full of a gelatinous substance called the VITREOUS HUMOR that bounces the light around to the projective surfaces without distortion

- Once inside the EYE, light is projected upon the RETINA and the FOVEA

- The RETINA transforms light into nervous impulses, which are sent to the VISUAL CORTEX

- The FOVEA is the spot on the RETINA with the greatest visual acuity
Simple diagram of the Retina
Rods and Cones in the Retina
The RETINA is composed of layers of cells, whose main function is to filter light. The most important of these cells (Photo-receptors) are called RODS and CONES.

- RODS are used for vision at low light levels (night, twilight, dimly lit rooms, etc.) - there are about 120 millions of them each eye, and they are distributed in the periphery of the eye.

- CONES are used for visual experiences under normal lighting conditions and for COLOR. There are about 8 millions of them in each eye, and they are distributed mainly around the FOVEA.
Visual pathways from the side
The eye is connected to the LATERAL GENICULATE BODY via the OPTIC TRACT
After a first bout of processing, the image is sent to the PRIMARY VISUAL CORTEX, located at the back of the skull
This is the physical structure of VISION
- but it is not enough for you to see!
Saccadic Eye Movements
Eye Movements are also indispensable for seeing:

Voluntary eye
movements are created by EXTRA OCULAR MUSCLES, or by tilting the head.

Involuntary eye
movements are called SACCADES, and have a very short duration: from 150 to 200 milliseconds - we are not consciously aware of them.
The major functions of eye movements are:

1. Fixation: to position fixated objects of interest on the fovea, where visual acuity is highest.

2. Tracking: to keep fixated objects on the fovea, despite movements of the object or the observers head.

"(…) The mobility of the eye allows the visual system to explore the environment by selectively sampling information in different directions.
This fact is important for a number of reasons, the foremost of which is that spatial acuity is highest in the central degree or two of the retina (the fovea) and then falls off rapidly toward the periphery." (Palmer, 1999: 520)
“What our two eyes in fact do is fixate on an object: we first adjust the positions of our eyes so that the images of the object fall on the two foveas; then we hold that position for a brief period, say, half a second; then our eyes suddenly jump to a new position by fixating on a new target whose presence somewhere out in the visual field has asserted itself, either by moving slightly, by contrasting with the background, or by presenting an interesting shape.
During the jump, or saccade (…) the eyes move so rapidly that our visual system does not even respond to the resulting movement of the scene across the retina; we are altogether unaware of the violent change.” (Hubel, 1995: 79)
We imagine that we are observing a smooth and constant visual world, when in fact what happens is that our brain has to censor the tumultuous movements and jumps performed by our eyes.

This means that there is an amount of censorship exercised by our brain as far as sensory stimuli go (we have to remember that vision is not the only sense operating at any given moment - lest we forget touch, smell, hearing, taste and proprioception (the sense that allows us at any given time to know where our body is, without having to look at it)
Alfred Yarbus and the intentionality of gaze
Internal Shape representations: 21/2D Sketch
Internal Shape representations:

Internal Shape Representations:
Canonical Views

Aaron - Harold Cohen
Representation of Spatial Experience (R.O.S.E.) - Edward Burton
Ray Kurzweil interviews Harold Cohen about AARON
Representation of shape through Drawing
Alfred L. Yarbus, a Russian psychologist studied eye movements using a complex fixation point capture system. The markers were glued to the eyes. (Yarbus, 1967)
“If the observer carefully examines any point of a stationary object, he imagines objectively that he is fixating on this point with motionless eyes. Records show that in fact this process is accompanied by involuntary saccades of which the observer is unaware (sometimes resembling spasms of the eyes).” (Yarbus, 1967: 104-105)
“The order and duration of the fixations on elements of an object are determined by the thought process accompanying the analysis of the information obtained.” and as a consequence “people who think differently also, to some extent, see differently.” (Yarbus, 1967: 211)

One could extrapolate from this observation that for instance, an "artist's eye", actually describes a different way of seeing, because an artist directs his gaze in a more focused way when observing an object or scene in view of drawing or painting it.
Yarbus uncovered the chaotic movements (saccades) that the eye makes during the observation of objects.

He was able to trace the path of the eyes, and juxtapose it to the images his subjects observed.
The dots in the path correspond to
, that occur when the eye rests for a longer period on the observed object, voluntarily or not, perhaps because that particular point has more relevant information than others.
“Analysis of the eye movement records show that the elements attracting attention contain, in the observer’s opinion, or may contain, information useful and essential for perception. Elements on which the eye does not fixate, either in fact or in the observer’s opinion, do not contain such information.” (Yarbus, 1967: 183)
SACCADIC MOVEMENTS and FIXATION escape conscious control (they are involuntary), but the same cannot be said of ATTENTION.
This is visible in the recorded observation of the image of the girl, in which the familiarity of a human face creates a different reaction:
“When looking at a human face, an observer usually pays most attention to the eyes, the lips, and the nose. The other parts of the face are given much more cursory consideration.” (Yarbus, 1967: 183)
VISION sometimes does NOT PRIORITIZE the acquisition of new information:

“If the eye movements are recorded for several minutes during perception of an object, the record obtained will clearly show that, when changing points of fixation, the observer’s eye repeatedly returns to the same elements of the picture. Additional time spent on perception is not used to examine the secondary elements, but to reexamine the most important elements.”
Yarbus also had subjects examine images after being directed to search for particular features.

a) is the original image;
b) subjects were asked to look for the outlines;
c) subjects were told to observe freely;
d) subjects were asked to count the straight lines in the image.

The example shown in b) is quite interesting, as in searching for the outlines of he images, subjects actually create a "drawing" of them.
This example of the observation of the painting "Ilya Repin, An Unexpected Visitor" (1884) also reveals different intentions in the gaze of subjects.
1- examine the painting freely
3 - assess the age of the characters
4 - determine the activities of the family prior to the visitor’s arrival
5 - remember the characters’ clothes
2- estimate the material circumstances
of the family
6 - surmise how long the “unexpected visitor” had been away
Modern approaches to the intentionality of gaze
Yarbus focused his research on people without formal artistic training. He limited it to the study of how the gaze changes according to the directions given to the subjects.
In 2001, Chris Miall and John Tchalenko (Miall & Tchalenko, 2001), conducted a similar study but using an experienced painter (Humphrey Ocean) as their subject. Their study was based on live figure drawings. They wanted to examine how the eye of an artist moves about an object whilst drawing it.
This was the device used to record Ocean's eye movements
In their research, they concluded that :

“Ocean’s fixation duration remained at 0.6-1.0 seconds.(…) The novices’ durations were about half as long. Furthermore, Ocean’s fixations were always single, whereas the novices’ were generally multiple. Ocean locked his gaze onto one position, apparently taking in a single detail, while the novices fixated on two or more positions, sometimes quite separate. (Miall & Tchalenko, 2001: 38)

and that:

“Ocean’s drawing was frequently accompanied by repeated practice strokes. The pencil would move several times just above the paper’s surface, followed precisely by Ocean’s eyes, in a smooth movement. (…) Practice movements are seen in many tasks and sports requiring skilled movements and serve to refresh a short term “motor memory” of how the body moves.”
(Miall & Tchalenko, 2001: 37)
So, their research proved that:

- There is a larger dispersion in the gaze of untrained (not artists) subjects;

- This dispersion occurs because several details of the object are observed simultaneously, and for shorter periods of time;

- A trained artist gazes less often, but longer at objects;

- While drawing, an artist follows his gaze with hand movements.
From the early seventies on, the painter Harold Cohen has been developing a robot whose main function is to draw based upon a set of rules. This project is called AARON.
AARON started of by only being able to create abstract images, but in the 80's it began being able to create figurative drawings, and in the 90's complex scenes containing human figures and vegetation.
“Harold Cohen defines the knowledge AARON has of the world as: “What it knows about a small range of world objects and what it knows about building visual representations.”
adding that:
“these two categories must be intimately inter-related in any satisfactory model of human knowledge-based performance. The conclusion is an obvious one: we can only represent what is representable in terms of available representational strategies.” (Cohen,1988: 10)
AARON drawing in the seventies
AARON's later drawings
AARON does not possess an image-recognition based input system, as he creates two-dimensional images from scratch.
He has his own shape repertoire, and a set of rules on how to use them.
This means that he is able to create drawings in which overlapping images and spatial relations are correct.
AARON raises an interesting question as far as intentionality in creation is concerned: is a drawing created by a machine the same thing as a drawing created by a human?
Harold Cohen says:

“A drawing is a drawing, not merely because it stands for something other than itself, but because we find in it evidence that the reference to that other something results from an intentional act.” (Cohen, 1982: 8)


“Human drawings are potentially interesting to human beings at least in large part because they have been made by other human beings” and that “for a machine to inspire a similar kind of interest in its products it would have to make its drawings in the same sort of way that humans produce theirs.” (Cohen, 1982: 8) and (Cohen, 1976: 17)
When speaking of intentionality in a a drawing, and comparing a human with a machine, one can assume that a human will used much more sophisticated resources in creating a drawing:

“The lines which the artist draws to represent the outline of an object do not actually correspond to its edges, in the sense that an edge-finding algorithm will replace an abrupt tonal discontinuity with a line. In fact, the edges of an object in the real world are almost never delineated by an unbroken string of abrupt tonal discontinuities” (Cohen, 1979: 23)

As such, a human makes conscious decisions on which elements to include in a drawing, as opposed to an image-recognition based software.
How a computer "sees" an image
How a human sees an image
A robot such as Aaron simulates human activity and does not makes autonomous decisions outside of its software limitations.
Its rule set, for example, is based on the observer's position (viewer centered).
This means that the drawings it produces are viewpoint dependent.
This, however, is not the only model for robotic drawing systems available.
In 1997, Edward Burton created ROSE (an acronym for Representation of Spatial Experience), a software intended to simulate the drawings of children. The main difference between Aaron and ROSE is, in the word of its creator:

“Instead of projecting an image or a shadow of a virtual world observed from a specific vantage point, ROSE actively constructs a 2D drawn world that is equivalent to the 3D virtual world. The drawn world is constructed from a vocabulary of forms that are translations of its “perception” of the virtual world. As the drawings are equivalent worlds and not projections, ROSE does not use the familiar concept of a vantage point.” (Burton, 1997: 302)
Unlike Aaron, ROSE starts producing its drawings from a three-dimensional model of the object (top left “experience”).

From this model it maps a flat representation (right - “representation”). This means that ROSE produces topological relations between objects, rather than viewpoint based (or projective) systems.

Also, ROSE does not possess an image-recognition based input system, as the object to be represented has to be introduced as a three-dimensional model.

ROSE, in a way, models certain processes based on the way Visual Perception works, but it has certain limitations in relation to humans:

“No amount of additional rules will address ROSE’s main problem: its drawing behavior is fundamentally different from that of a real child because ROSE’s drawings do not develop. Once ROSE finishes a drawing, the drawing is forgotten. The next drawing might be of a different subject and will have arbitrary random variations, but it will be exactly the same type of drawing. ROSE’s ability never changes without the intervention of a computer programmer.” (Burton, 1997: 303)
Most of contemporary understanding of Visual Perception in recent years has stemmed from the research done in the field of artificial vision. One of the pioneers in this field was David Marr, who proposed the COMPUTATIONAL THEORY OF PERCEPTION.

“The study of vision must therefore include not only the study of how to extract from images the various aspects of the world that are useful to us, but also an inquiry into the nature of the internal representations by which we capture this information and thus make it available as a basis for decisions about our thoughts and actions.” (Marr, 1982: 3)

Marr's theory is heavily based on the study of how we construct internal representations of objects.
David Marr proposes three sequential steps in the recognition, understanding and storage of an image:


- 21/2D SKETCH;


The purpose of these steps is to create a stable internal representation of images.
These three stages facilitate the recognition of objects in unfamiliar positions, and the recognition of types of objects (a chair is recognized as such even when there are significant variations in its shape).
Viewpoint-Invariant Theories
In 1987, the vision scientist Irving Biederman proposed a theory based on the work of David Marr(Biederman, 1987):

In this theory, the 3D Model Representation stage proposed by Marr is complemented by a visual alphabet composed of elemental shapes, called GEONS (Geometrical Ions)
There are 36 Geons in total, that combined in different ways allow for thousand of different combinations - they are stable and multi-configurable volumetric descriptions.
- Geons are Viewpoint Invariant: objects or scenes are identifiable from multiple viewpoints.
- Geons are discriminable: objects or scenes can be distinguished from one another from all viewpoints
- Geons are resistant to Visual Noise: objects or scenes are recognizable even when viewing conditions are not ideal.

Some of Biederman's Geons and their application in objects
The central organizational principle is that certain properties of edges in a two-dimensional image are taken by the visual system as strong evidence that the edges in the three-dimensional world contain these same properties.

For example, if there is a straight line in the image (COLLINEARITY), the visual system infers that the edge producing that line in the three-dimensional world is also straight. The visual system ignores the possibility that the property in the image might be the result of a (highly unlikely) accidental alignment of the eye and a curved edge.

Smoothly curved elements in the images (CURVILLINEARITY) are similarly inferred to arise from smoothly curved features in the three-dimensional world. These properties, and the others described later, have been termed NON-ACCIDENTAL (Witkin and Tenenbaum, 1983) in that they would rarely be produced by accidental alignments of viewpoint and object features and consequently are generally unaffected by slight variations in viewpoint (Biederman, 1987: 119)
Arnheim, R. (1969/1984). Visual Thinking.
Los Angeles: University of California Press.

Bar, M., & Biederman, I. (1998) Subliminal Visual Priming.
American Psychological Society, vol.9, nº6, 464-469

Biederman, I. (1987) Recognition by Components: a Theory of Human Image Understanding.
Psychological Review, 94, 115-147

Biederman, I., & Gerhardstein, P. C. (1993) Recognizing Depth Rotated Objects: Evidence and Conditions for Three-Dimensional Viewpoint Invariance.
Journal of Experimental Psychology; Human Perception and Performance, vol. 19, 1162-1182

Burton, E. (1997) Artificial Innocence: Interactions between the Study of Children’s Drawing and Artificial Intelligence.
MIT Press, Leonardo, Vol. 30, Nº4, 301-309

Cohen, H. (1976) The Material of Symbols.
First Annual Symposium on Symbols and Symbol Processes. University of Nevada, Las Vegas

Cohen, H. (1979) What is an Image?
Department of Visual Arts. University of California at San Diego

Cohen, H. (1982) How to make a Drawing.
Science Colloquium, National Bureau of Standards, Washington.

Cohen, H. (1988) How to Draw Three People in a Botanical Garden.
The University of California at San Diego, Department of Visual Arts, La Jolla

Hubel, D. H. (1988/1995) Eye, Brain and Vision.
New York: W. H. Freeman & Company

Ives, W., & Rovert, J. (1979) The Role of Graphic Orientations in Children's Drawings of Familiar and Novel Objects at Rest and in Motion. Merrill-Palmer Quarterly, 25, p. 281-292

Laeng, B., & Rouw, R. (2001) Canonical views of faces and the cerebral hemispheres.
Laterality: Asymmetries of Body, Brain, and Cognition, Volume 6, Number 3, p. 193-224, Psychology Press, part of the Taylor & Francis Group

Livingstone, M. (2002) Vision and Art, the Biology of Seeing.
New York: Harry N. Abrams

Marr, D. (1982) Vision – A Computational Investigation into the Human Representation and Processing of Visual Information.
New York: W. H. Freeman and Company

Massironi, M. (2002) The Psychology of Graphic Images (Seeing, Drawing, Communicating).
Mahwah: Lawrence Erlbaum Associates

Matthews, J. (2003) Drawing and Painting, Children and Visual Representation.
California: SAGE Publications.

Maynard, P. (2005) Drawing Distinctions, the Varieties of Graphic Expression.
Ithaca and London: Cornell University Press,

Miall, R. C. & Tchalenko, J. (2001) A Painter’s Eye Movements: A Study of Eye and Hand Movement during Portrait Drawing.
MIT Press, Leonardo, Vol. 34, Nº1, 35-40

Minsky, M., & Papert, S. (1972) Artificial Intelligence Progress Report.
MIT Artificial Intelligence Memo Nº 252, Cambridge: MIT

Nicolaides, K. (1969) The Natural Way to Draw. Boston, Massachusetts: Houghton Mifflin Company,

Palmer, S. E.; Rosch, E., & Chase, P. (1981) Canonical Perspective and the Perception of Objects.
J. Long & A. Baddeley (Eds.) NT

Palmer, S. E. (1999) Vision Science – From Photons to Phenomenology.
Cambridge: MIT Press

Solso, R. L. (1996/2001) Cognition and the Visual Arts.
Cambridge: MIT Press

Willats, J. (2005) Making Sense of Children’s Drawings.
Mahwah: Lawrence Erlbaum Associates,

Willats, J. (2006) Ambiguity in Drawing.
TRACEY (Online Journal of Contemporary Drawing Research)

Willats, J., & Durand, F. (2005) Defining Pictorial Style: Lessons from Linguistics and Computer Graphics.
Axiomates/Springer, 319-351

Yarbus, A. L. (1967) Eye Movements and Vision.
Institute for Problems of Information Transmission, Moscow, New York: Plenum Press

"In general, people typically evidence little difficulty in recognizing a familiar object when they view that object from a different perspective in depth." (Biederman & Gerhardstein, 1993: 1162)

This ease in recognizing familiar objects in different views indicates that the brain has some way of deciphering what type of object is being seen, without having to evaluate its specific characteristics.

According to Biederman, this is related to the PRIMING that occurs during the observation of an object.
PRIMING “Perceiving an object once improves the accuracy and the speed of its recognition in a subsequent encounter.” (Bar & Biederman, 1998: 464) This means that after seeing an object for the first time, the brain is prepared to recognize that object or class of objects more easily.
PRIMING works for recognition of objects in "normal" views. For objects in less familiar views, Biederman proposes that object recognition is based on structural descriptions:

“It is the structural description (consisting of geons, their attributes, and their relations with adjacent geons) that allows the viewpoint invariance: if two views of an object activate the same structural descriptions, then they should be treated as equivalent by that object recognition system.” (Biederman & Gerhardstein, 1993: 1164)

As such, Geons are grouped in hierarchical causal relations, such as "parent-child".
Whenever the structural relations between objects are correct, the brain will have no difficulty in recognizing objects, even from an unusual or difficult viewpoint.
Stephen Palmer's MAXIMAL INFORMATION HYPOTHESIS (Palmer, 1999) is partially based on Biederman's model in which PRIMING serves as a tool for easier and faster image recognition.

In Palmer's model, many objects may be recognized more easily not only because of PRIMING, but also because
the brain responds more rapidly to preferential views - CANONICAL VIEWS
. These views provide the maximal amount of information about an object.
Viewpoint-Dependent Theories
To determine the canonicity of objects, Palmer conducted an experiment (Palmer, Rosch & Chase, 1981) in which a group of people were asked to identify and rate the recognizability of objects.

As the view of the object becomes more easily recognizable, it is attributed a lower numerical value

"canonical view" of the horse is numbered 1.60 (BEST)
and corresponds to the view which presents the most relevant characteristics to correctly identify the shape.

The view numbered
6.36 (TOP) is the least canonical view
of the group.

The "canonicity" of these views is intuitively predictable, as it seems to be based on more or less normal approaches one would make to a horse. For example:

- View
1.60 (BEST)
is a relatively normal way to approach a horse;
- View
1.84 (SIDE)
, albeit normal, would never be maintained by a horse for long, as it moves about;
- View
4.12 (BACK-SIDE)
would be a terrible way to approach a horse, as it could lead to a kick;
- Views
3.48 (SIDE-TOP
) and
6.36 (TOP)
present very unusual views, observed from a viewpoint uncommon for an average human being.
Canonical Views can be understood INTUITIVELY
, as they are based on two main factors:

The usual interaction with an object
(distance, viewpoint, size, etc.)

The amount of information contained in a particular view
(depends on the configuration of the objects)
“When I ask you to conjure up an image of, say, a teacup, it is likely that your image is of a ‘standard’ teacup, that is, more or less, an idealized image. If I showed you an odd-shaped teacup and asked you what it was, you would probably call it a teacup. You may never actually have seen the idealized image you conjure up (or the odd-shaped teacup either), yet the mental image is clear. These images reside in memory and derive from numerous experiences with a large variety of teacups.”
(Solso, 2001: 120)
Canonical views are not the same type for all classes of objects. The most common types are:
Frontal or Side
This happens because: “Canonical views appear to provide the perceiver with what might be called the most diagnostic information about the object: the information that best discriminates it from other objects, given what the perceiver knows, derived from the views from which it is most often seen.” (Palmer, 1999: 421)
An example taken from drawing:
“Because we see people standing more often than we see them lying down, the problem of drawing a foreshortened recumbent figure is even more difficult. The artist, then, has two cognitive/perceptual problems to overcome:
he or she must draw a reclining figure ‘in (geometric) perspective’ and must overcome the archetypal image of how people look when commonly perceived.”
(Solso, 2001:181)
Egon Schiele -
Sitzender Schwangerer Akt (1910)
EXTENDEDNESS is a complementary property of CANONICITY

“Children of all ages nearly always draw horses, boats, and cars from the side rather than from the front, reflecting the fact that such objects are long rather than round. On the other hand, they also found that people and owls were nearly always drawn from the front, although in this case the representation of extendedness is not critical because people and owls look equally long in front and side views. Presumably, with these objects, it was the representation of defining features (such as the eyes and the mouth) that was more important.” (Ives & Rovert, 1979: 281-292)
Canonical Views
The Human Face
“Considerations about what shape properties may be most relevant for the identification of a person from the face, plus evidence from previous studies, converge on the proposal that there is an optimal view of human faces (canonical view) that lies intermediate between the profile and full front view. This view is conventionally or colloquially referred to as 3/4 view.”
(Laeng & Rouw, 2001: 194)
Influenced by Stephen Palmer's early work, Bruno Laeng and Romke Rouw conducted a study to determine if there is a canonical view that influences speed of recognition as far as the human face is concerned.
Before this study, common understanding was that the three-quarter view of a human face corresponded to a rotation of about 45º.
Laeng & Rouw showed the participants four photographs of a human face in different views:
Front view (0º)
Three-Quarter View
- as considered by the researchers
Beyond Three-Quarter View (45º)
Side View (90º)
The purpose of the study was to determine recognition speed for each view.
The study determined that:

The canonical view with optimal recognition speed corresponds to a 22.5º rotation.

What is conventionally called a 3/4 view has a rotation of 22.5º and not 45º.

This is true for speed of recognition
for unfamiliar faces
, because for their own face: "(…) the 0º (frontal) view was in fact superior to the 22.5º view."
Or for
familiar faces
: "[with] faces of friends and partners, not just acquaintances or highly practiced photos of unfamiliar faces, the 0º and 22.5º views could be recognized equally quickly."
Topological representations
Sequence in Drawing
Viewer-Centered Descriptions
Drawing Systems and Denotational Systems

Charlie Rose: The Perceiving Brain - Sight and Visual Perception


While the image of an observed object passes through the perceptual system, the representation of its dimensionality goes through significant changes.

Objects start by:
three dimensions (in the actual world)

and are
- flattened into a
two dimensional image
when light reflected on them passes through the

- The
then reconstructs
three-dimensional representations
of these objects.

For drawing, this transformation is not ideal:

“Since the retinal image in the eye is already flat, you would think that all artists would have to do is paint what the eye sees, before the brain gets access to that information.
The problem is that we don’t have conscious access to that retinal image; our visual perception is available to us only after the brain has processed it into a three-dimensional representation.” (Livingstone, 2002:100/101)
As far as
internal representations of reality
go, human perceptual mechanisms work in a
combinatory way
, and not unilaterally.

They create a balance between attributes specific to the object, and attributes specific to the observed view.

The type of object description influences:

how the object is perceived
how the object will be drawn
According to the object-representation models proposed by:
- David Marr (Marr, 1982)
- Irving Biederman (Biederman, 1987)
- John Willats (Willats, 1997)

Both object-centered and viewer-centered descriptions coexist in the object-recognition system.

These two types of descriptions coexist during the perceptual process and assume
different roles in the storage of viewed objects in memory and the increase of image recognition speed in different circumstances
Object-Centered Descriptions

Object-centered descriptions
are by definition
and cannot be directly transferred onto the picture surface.

Pictures can be derived from object-centered descriptions (as they are in computer programs),
but this necessarily involves some kind of transformation from three dimensions to two
. Just because a picture provides a view this does not mean that it must have been derived from a view.”
(Willats, 2005:188)
Because this type of description possesses three-dimensional attributes,

it never depends on the viewpoint from which an object is observed,

but rather from its complete and intrinsic characteristics.

Although this type of description works well internally,
it does not work well when transposed directly to a drawing.

Children often draw objects using object-centered descriptions
(Topological representations). (Willats, 1997/2005)
Viewer-centered descriptions
are part of immediate interaction with objects, and, as such,
vary very frequently
(whenever the observer moves in relation to the object, or the object in relation to the viewer).

Each movement
corresponds to a
particular “view”
of the object.

perceptually more unstable
, they are the
basis of the drawing process.
“While it is possible to train oneself to draw with quantitative accuracy some aspects of the ‘true’ visual image, the very difficulty of learning this is in itself an indicator that the
symbolic mode
is the more normal manner of performance.

Even sophisticated adults often show a preference for unreal but tidy ‘isometric’ drawings over more ‘realistic’ perspective drawings
(Minsky, 1972:12)
drawing (spatial) systems
, of which linear perspective is one common example:


spatial relations in the scene into corresponding spatial relations on the picture surface

denotation systems:
map scene primitives into corresponding picture primitives
(Willats & Durand, 2005: 323)
Denotational Systems
correspond to
the way
in which
what is seen is transformed
into what is drawn:
- and to the
these marks will have in a drawing.

Their function is to
transform contours and outlines
into a graphic language which in drawing can be perceived as “correct”.
Linear Perspective vs. Orthographic Projection
Drawing Systems
correspond to the
used to draw.

In total, there are
5 Drawing Systems
Linear Perspective
Parallel Oblique Projection
Orthographic Projection (Axonometric, Dimetric; Trimetric)
Inverted Perspective
Topological Geometry
Inverted Perspective
Topological Geometry - London Subway
Pictures can be recognized as being

- there is a
change of scale with distance
the orthogonals
(lines representing edges in the third dimension of the scene)
converge to a vanishing point

Pictures can be recognized as being

- there is
no change of scale with distance
in pictures;
- the
orthogonals are parallel
run obliquely across the picture surface

In line drawings
, for example,
lines in the picture
are commonly used to stand for a variety of different features in the scene, including:
- edges,
- contours,
- thin wire-like forms
such as hair, and

- the boundaries between areas of different colours or tones
.” (Willats, 2006: 4)
Scene Primitives
Picture Primitives
Picture Primitives:
“(…) are abstract concepts.
In practice, picture primitives are represented
in paintings, drawings, tapestries, mosaics, engravings and so on by
physical marks
(Willats & Durand, 2005: 326)

These primitives are obtained from the transformations the Denotational Systems make on the Scene Primitives.

They are the smallest meaningful units in a picture.
in total:

which correspond to

which correspond to

TWO-DIMENSIONAL (surfaces and faces
) which correspond to

THREE-DIMENSIONAL (Lumps, sticks and slabs
). These primitives have names that roughly indicate their shape.

are volumes whose
extension is identical in all three directions;
extended in only one direction
extended in two directions

These are
three-dimensional elements of a scene
which are
transformed into two-dimensional elements through drawing
Through the action of Denotational Systems
, observed objects are transformed into
3 types of Picture Primitives

ZERO-DIMENSIONAL (dots or line-junctions)
which are points or
T-intersections (T-Junctions)
in the drawing;

that can be the

that usually represent

In drawing, this means that people who use the same Drawing System (for example perspective) may use different Denotational Systems
one can represent something by not drawing it
is an important discovery in visual representation.

Leaving something out, in order to show it, is not an achievement made all at once but through a series of investigations.

The first step to representing occlusion is often to ‘
cover’ or ‘hide’ a two-dimensional object by superimposition.

(Matthews: 2003, 173)
Junctions in "Rectangular" type objects:

- L-Junctions,
- Y-Junctions,
- T-Junctions,
- Arrow-Junctions.
the point at which the outline (or edge) of one object passes behind the surface of anothe
r (when part of an object occludes another). These junctions are very important in drawing, as without them occlusion clues would not be possible.

represent similar situations.
The difference between the two is the angle they make:

More than 180º: Y-Junction
Less than 80º: Arrow-Junction

In the picture they are identified in green and yellow, and represent intersections of three edges of an object.
Junctions in "Smooth" Objects:

In the visual world there aren't only "rectangular type objects".
The second type are called
, and most objects fall into that category.

Many naturally occurring objects such as fruit, people, and animals have curved surfaces that vary smoothly in three dimensions
. Objects of this kind are sometimes referred to as
smooth objects

(Willats: 2005, 113)
Here, we can see this junction error corrected.
The line of the belly functions both as a T-Junction (where it intersects the let hind leg) and as an End-Junction (when it ends before the right hind leg).
In this image we can see some junction errors in a drawing by a first year student.
Bad planning of a drawing can lead to incorrect representation of Junctions/Occlusion
Drawing sequence:
- First the nose was drawn (it is the closest to the observer)
- The eyes were next, as they intersect the nose in a T-Junction
- Then the mouth, which contains an occlusion zone where the upper lip connects with the lower part of the nose (newer lines are placed “behind” older ones)
- The glasses were next because”“there are two T-junctions close together at the lower edge of the rim of the lens on the right.” Also, as the glasses are transparent they had to be drawn after the other elements (if they were sunglasses they could be drawn first)
- The ear was next, also containing a Line-Junction in relation to the sideburns and the glasses.

This drawing was executed in the best possible sequence in order to avoid transparency errors.
John Willats (Willats, 2005) reconstructs the
drawing sequence in a child's drawing
(ten years old) in which the premature placement of the face's outline created a
Transparency Error.
In this drawing the student started with the head, torso and legs, and placed the visible arm too late.
As the arm is the closest object to the observer, the drawing should have started there.
As such, there are no T-Junctions where the arm occludes the torso.
topological representation
was created to display the sequence of stops. It
displays the relations they hold, rather than the actual distances between them.
If you compare a portion of the map of London with the same portion of the London Subway map, you can see that the relations between the stops are sequential, and quickly deviate from the actual map.
They do not represent true distances, and don't reflect geographical specificities of the terrain.
“The brain is geared precisely to such topological features.

They inform the organism of the typical character of things, rather than of their particular measurements
(Arnheim: 1984, 77)

Drawings based on projective geometry
represent objects from a particular
point of view;

Drawings based on topological geometry
represent only the most
elementary spatial relations such as:
- touching,
- spatial order,
- enclosure
which are intrinsic to the scene and
independent of any particular point of view

(Willats,2005: 34)
“Topology is often described as
‘rubber sheet’ geometry

If a figure is printed on a rubber sheet and the sheet is stretched or twisted,
basic spatial relations such as proximity and enclosure will remain unchanged
, although the distances between the marks may change and straight lines may not remain straight.”
(Willats,1997: 70)
properties of topological relations
Spatial Order




They represent the
immutable characteristics of the relations
between objects and of the objects themselves.
The representation of the first two properties is often visible in the drawings of small children, although the marks produced may be difficult to interpret. (Willats, 1997, 2002).

For example,
if you ask a child to draw a ball on top of a table, the produced scribbles may not bear any resemblance to the objects in question, but the child will name the scribble on top "ball" and the one on the bottom as "table". The proximity between scribbles may also represent the property of "on top of"
In this image one can see:

in the almost equidistant placement of the three figures, clearly seated "side by side";

in the continuous line of the table-top (which contains the plates) and in the outlines of the faces, which contain the eyes, nose and mouth;

in the opposition between the three figures on one side of the table with the other figure, clearly separated by the extension of the table.
actual physical pictures
, the
picture primitives
are usually represented by
discrete marks
which can vary greatly in size.

pointillist paintings
, for example, the marks (dots of paint) are
relatively large

photographic prints
the marks (minute grains of pigment) are usually
very small
Large Picture Primitives
Seurat - La Parade de Cirque (1889)
- The PRIMAL SKETCH stage consists of a quick approximation of the more important structural aspects of an observed scene, such as:
- light intensity;
- basic geometric structures and lighting;
- transparency aspects.
It is viewer-centered (based on a specific viewpoint of the object or scene).
- The 21/2D SKETCH stage searches for spatial aspects of the object or scene, such as gradients or textures that indicate depth, or changes in direction of the surface. This description is also viewer-centered.
In the images, one can see that the arrows indicate changes in orientation of the surface, and the contours represent spatial discontinuities.
- The 3D MODEL REPRESENTATION stage creates and stores three-dimensional descriptions of the objects or scenes.
This type of representation is object-centered (based on a structural and volumetric description of the object, and not dependent on a particular viewpoint).

This stage also creates VOLUMETRIC PRIMITIVES, whose function is to:

“make explicit the organization of the space occupied by an object and not just its visible surfaces”,

and possess a particular hierarchy:

“primitives of various size are included, arranged in a modular, hierarchical organization” (Marr, 1982: 330)

VOLUMETRIC PRIMITIVES maintain relations such as “above”, “below”, “behind”, “in front of”, “larger than” or “smaller than”.
"Capturing life" A film by John Tchalenko
Small Picture Primitives
Henri Cartier-Bresson - Paris. Pont des Arts (1946)
Types of Primitives
The normal rules for the representation of
have been inverted or ignored
representation of occlusion
is the strongest of all
depth cues
in pictures;

therefore, the avoidance of the representation of occlusion in these drawings tends to make them look flat.
(Willats, 2005: 119)
concealing of one shape behind another

in drawing
is a special kind of overlap called

Leaving out the lines of the hidden or ‘occluded’ part is called hidden line elimination.
(Matthews, 2003:174)
T-junctions in line drawings
denote the points
where an edge or a contour passes behind a surface
, and can occur in
drawings of both smooth objects and objects having plane faces.

occur only in line drawings of smooth objects and
denote the points where the contours end

Both types of junctions
denote points of occlusion
, where one surface occludes or hides another with respect to a viewer.

Occlusion is one of the most powerful depth cues
in the perception of scenes, and the representation of occlusion in pictures can give a strong impression of depth and shape.
(Willats, 1997: 25)
In this drawing tutorial taken from the Internet we can see
errors in the representation of Junctions
, which result in inconsistencies in the representation of the elephant's belly.
Quantitative Methods in the analysis of Drawing
Written Part
Age Group
Study area
Drawing Experience
Drawing Frequency
Which is the correct representation?
Drawn Part
Analysis group:
- 44 first year students (SI)
- Q1 filled in at the beginning of year
- Q2 filled in at end of year
Copy of a face
from a photograph
Copy from a drawing
Copy of two boxes and a cylinder
from photographs
Draw a seated figure from memory
- Side view
- Front view
- Top view
Definition of Error Categories:

Internal Representation Errors (IR)
1.Viewpoint Errors/Rotation Errors
a. Vertical rotation
b. Horizontal rotation (left or right)
c. Limited perspective
d. Inverted perspective
2. Incomprehension of the object's characteristics
3. Stereotypes
4. Relative Placement errors

Manual/Material Errors (MM)
1. Intermittent line
2. Scale
3. Placement in the frame
Viewpoint Errors/Rotation Errors (IR)
a. Vertical rotation (IR)
b. Horizontal rotation
(left or right) (IR)
c. Limited perspective (IR)
d. Inverted perspective (IR)
2. Incomprehension of the object's characteristics (IR)
3. Stereotypes (IR)
4. Relative Placement errors (IR)
1. Intermittent line (MM)
2. Scale (MM)
3. Placement in the frame (MM)
Some ways of visualizing data:
Charting written answers
Deviation of angle in drawings of face (Q1 and Q2)
Deviation base angles (Q1 and Q2)
Deviation top right angle (Q1 and Q2)
Deviation top left angle (Q1 and Q2)
Drawings (Q1 and Q2)
“We have all seen thousands of cups and saucers, but we have not stored all of them in memory. We have stored some; but more importantly, we have formed a generalized impression of this class of objects that serves as a type of master model to which new items may be compared. We recognize and classify a variety of disparate objects (cups and saucers) as members of a class by rapidly comparing them with an ‘idealized’ image of the class.”
(Solso, 2001: 237)
Manual/Material Errors (MM)
Thank you
Drawing Systems
correspond to the
used to draw.

Denotational Systems
correspond to
the way
in which
what is seen is transformed
into what is drawn:
Picture Primitives and Scene Primitives
Full transcript