Authors: Robert Kurson
Familiar Size
Our knowledge of an object’s size affects how we perceive that object’s distance—and the distance and size of other objects around it.
The familiar size of the dolphins in the photo affects how we perceive their distance. Most of us would estimate that distance to be about ten feet. If, however, dolphins were the size of football fields, we might estimate that they were several thousand feet away in this photo. If dolphins were the size of insects, we might estimate their distance in this photo to be just a few inches.
Aerial Perspective
The air contains minuscule particles of water, dust, and pollution. The farther away an object is, the more particles we must look through, and therefore the hazier that object appears.
(Incidentally, aerial perspective doesn’t occur on the moon, which has no atmosphere and therefore no particles. Astronauts struggled to judge distance on the moon.)
Linear Perspective
Parallel lines converge on the retina as they recede in depth.
Texture Gradient
As a surface gets farther away from us, its texture gets smaller and appears smoother.
Shape from Shading
When an object has a three-dimensional shape, some surfaces will be in the light and others will be in shadow.
These are just a few examples of the cues our brains use to transpose the two-dimensional images on our retinas into the perception of a three-dimensional world. One can hardly imagine the immense amount of knowledge about the world required to process these pictorial cues to depth, and to do it instantaneously, automatically, and unconsciously.
It turns out that these pictorial cues are
themselves
based on knowledge—a kind of statistical knowledge about what the world is like most of the time. Such pieces of knowledge are called “priors.” They represent what we believe about the world when we come upon a new visual scene. Here are some examples:
•
Adults are between five and seven feet tall.
•
Light tends to fall from above.
•
Physical objects create shadows.
•
Certain objects are a certain color.
•
The lines in our culture are often at right angles to each other (as with buildings).
Consider this photo:
The inclusion of a barn, a boat, and a creek in this photograph greatly helps us judge the windmill’s size and distance. That’s because we possess prior knowledge—that barns, boats, and creeks are almost always a certain size. If the windmill were the only object in the photo, we might think it a toy, or we might judge it to be several times larger than it really is.
How does a person go about learning these pictorial cues and priors? By now, we’ve seen that much of visual learning is done in early childhood, through constant interaction and experimentation with the world and its objects. It’s the same with learning depth. A baby reaches, crawls, observes, tests, falls short, and goes too far, constantly calibrating visual clues with its tactile experience until the two-dimensional image on the retina translates automatically into a visual experience of depth. Infants aren’t even sensitive to the pictorial cues to depth until they’re about six months old—the age at which they start grabbing for objects. After that, the process of understanding and using pictorial-depth cues takes years to perfect. The task is astoundingly difficult—engineers still can’t build a machine that can compute depth as accurately and robustly as humans compute it. Yet the child does it without any help from the parents and over just a few years—all from interacting with its environment.
Motion Cues to Depth
Pictorial cues, remember, are just one of the ways in which the visual system goes about perceiving the world in depth. Another set of cues becomes available when the observer or the object is in motion. These are known as motion cues. Two of the most important are:
Motion Parallax
Nearby objects move faster on the retina than distant objects do.
Motion parallax can be observed by watching the passing scene from inside a moving car. Nearby objects—like houses—appear to fly past, while more distant objects—like mountains—seem hardly to move at all. We perceive the faster-moving objects to be nearer to us than the ones that are moving more slowly.
Kinetic Depth Effect
The motion of a two-dimensional representation can create a perception of its three-dimensional form.
It was the kinetic depth effect that occurred when May saw a square on Fine’s computer monitor leap into three dimensions as a cube when it began rotating on-screen.
Motion cues also rely on priors, though they are much simpler. Babies learn them more quickly and at a younger age than they do the pictorial cues for depth. Babies can perceive moving objects in a few weeks. Depth in motion is understood by the age of four months or perhaps even earlier.
Stereopsis
Stereopsis creates an impression of depth by comparing the small differences in the images produced by each eye.
Look at a nearby object. Cover one eye, then the other, then the first again. The object appears to move back and forth. The brain compares those two slightly different images to compute—and then perceive—the object’s depth.
Stereopsis, of course, occurs only in people who have two working eyes, and so is not applicable to May’s case. Stereopsis is not thought to be critical to good depth perception in humans, as it is useful only for objects that are about a yard from the body or closer. Beyond that the distance between the two eyes is so small compared to the distance to the object that the images in the two eyes are essentially identical. Many people think that stereopsis is the reason people are able to see in depth, but if you shut one eye you can still reach out and pick up a coffee cup, and you can still drive. About 10 percent of the general population doesn’t have good stereoscopic vision, and even professional athletes have been known to lack it.
OBJECT RECOGNITION
Human beings must be able to recognize objects in order to interact with them. That alone requires massive learning—there’s an endless number of objects in the world to know. But it’s even harder than that. We must also recognize each of the objects in the world from every possible viewing angle. How can that be possible? Consider this picture. What does it show?
We recognize that object as an elephant. It is the most readily recognized view of an elephant—called by some its “canonical” view. Now consider the next picture. What does it show?