Visual attention and motion Essay
The human observer is quite efficient at detecting motion. If a target is detectable when still, it becomes even more so when it is in motion. The brain uses multiple cues to help us perceive motion including information from all of our senses. The focus of this paper will be the visual system and how motion is perceived visually. Motion is in part perceived by the changing patterns of light on the retina. This cannot account for total motion perception, however, because we can perceive motion while keeping an image stable on our retina or create changes in these light patterns by moving our head and eyes.
In order to turn these spatial patterns of light into information about motion we must integrate and interpret visual information. We use motion as a cue for grouping objects in the environment together and the motion of one thing can have an effect on the way other things are perceived to move. Things that move together are seen as belonging together and things that are near to objects in motion can be perceived to be in motion themselves.
Then we use motion to interpret visual information in our environment. For instance, dots moving together in various patterns can create a percept of a 3-D object; dots moving in certain patterns can create the percept of a human or animal in motion even without lines connecting to create the form. You can also change the perception of how an object is moving by changing the focus of your attention. (Mather, 1998)Attention is one of those words like anti-social where the common use may or may not have any relation to its use in the realm of psychology. On any given day, one is likely to say or be told to pay attention. This is especially true of conversations between adults and children. However, trying to study and quantify attention can create a quandary. What is it exactly? How does it work? Even Webster is of little help. If you look it up in the dictionary you will find an entry that states to keep ones mind on something. That would describe the phrase to pay attention or to attend, but one is still left wondering what is the mechanism of attention. Some researchers dont even like to use the word attention because it is difficult to say exactly what the term is meant to identify (Pashler, 1998; Thornton & Gilden, 2001; Watamaniuk & McKee, 1995, 1998). According to H. E. Pashler (1998) there are (two primary ideas) that characterize attention: selectivity and capacity limitation. That is, at any given moment in time we are receiving a great deal of perceptual information and because we can become overwhelmed by trying to do too much at once we must select the part of our environment that is relevant at that particular moment.Pashler goes on to identify core phenomena addressed by attention research: selectivity of perception, voluntary control over this selectivity, and capacity limits in mental functioning that cannot be attributed to mere limitations in our sensory or motor systems. When reading the literature on current attention research there are a few key ideas to remember. These include early or late selection, or the point at which attention exerts its influence on perception. Early selection theories suggest that only the selected stimulus is processed sufficiently for identification. Late selection theories suggest that all stimuli are processed and then selection determines the stimulus that is brought into awareness. A third set of theories referred to as controlled parallel combines aspects of the first two and suggests that more than one thing can be processed if that is advantageous (Pashler, 1998). Attention for the purposes of this paper will be defined as the demand placed on cognitive resources (Thornton & Gilden, 2001). Attention has been described as endogenous (i.e. willful deployment of cognitive resources) or exogenous (i.e. stimulus driven deployment of cognitive resources) ((Wolfe, 1994). Endogenous attention is the same phenomena described by top-down, parallel processing or cognitively demanding. Exogenous attention is alternatively referred to as bottom-up or pre-attentive processing and is considered to be relatively undemanding of cognitive resources. It is at the levels of integration and interpretation that attention comes into play in motion perception and in the visual system in general (Mather, 1998; Pashler, 1998; Treisman & Gelade, 1980; Wolfe, 1994). Feature Integration Theory A(@ 9,A( @ 8 , A( @ 8 ,A( @ 8,A-(@ 8_ ,A-( @ 8’9C ( 8′ 8_ 88 8 [email protected](@ 9, A( @ 9,A( @ 8 , A( emailprotected 8 ,A( @ 8,A-(@ 8_ ,A-( @ 8′ emailprotected [email protected] T emailprotected 8- H _} 8_ 8 H (_A | emailprotected _a * _( emailprotected _ serially. These other things are generally believed to constitute conjoined basic features or conjunctions. Other researchers have suggested that errors occur because of noise at the decision level. Frequently referred to as signal detection theory, this decision model suggests that we can perceive and process many things at a given time, but when we engage in a divided attention task the amount of information to be searched creates a greater probability of error at the decision level. An example of this can be found in Eckstein (1998) where it is demonstrated that when low level effects are accounted for (i.e. similarity, eccentricity and eye movement) search accuracy results appear to be consistent with a parallel and independent, but neurally noisy visual system.
In general motion can be considered an orientation in the space time continuum(Adelson, 1991) and the human visual motion system does have a preference for motion that is consistent over time. The illusion of motion can be created by rapidly showing a set of images that are shifted from frame to frame. This is called apparent motion and as these shifting sequences are extended beyond two frames the motion percept becomes stronger. Edward Adelson (1991) provides a nice demonstration of how these shifting sequences can be conceived as spatio-temporal orientation by using a flipbook like you might find in a box of cracker-jacks. If you were to use an especially thick flipbook and slice through the middle of it you would see how the third dimension (representing time) extrudes the orientation of the image in the third dimension. Apparent motion is the same technique that is used to create movies, cartoons, and other video images. The changing spatio-temporal orientation is said to create motion energy. At the earliest stages of motion perception detectors specially sensitive to this motion energy are activated. The first such mechanisms proposed, called Reichardt detectors were described after studying the visual system of the fly. Reichardt detectors are tuned to direction and are inhibitory in nature, that is, when a detector is stimulated in its non-preferred direction it sends a signal to the next detector in the line that tells it to ignore the stimulus.
In the research studies discussed below random dot stimuli were used. A random dot stimulus is composed of a target or targets that use apparent motion to create a coherent motion percept. A single dot traveling in apparent motion in a consistent direction creates a trajectory percept or local motion information. Multiple dots traveling in apparent motion in a consistent similar direction will create a percept of coherent background motion or global flow information. In a random dot display, noise or distractor information, is created by using dots that travel inconsistently over time. This can be done by having the noise dots simply pop up in different places in each frame or by having the dots change the direction of their travel randomly from frame to frame (Watamaniuk, 1995). Thornton and Gilden (2001) use an additional technique of using motion transformations and smoothing on randomly colored black and white dots to create a percept of coherent motion within apertures of similar motion noise.
Sometimes motion is perceived with very little cognitive effort, as is the case for trajectory motion and optic flow. Other kinds of motion require more cognitive effort in order to be processed accurately like rotation. By studying the differences in the way motion is processed, researchers are able to infer the role of attention in the motion system. Thorton and Gilden (2001) examined the differences in processing of three types of motion: homogeneous flow (translation), rotation (or curl) and divergence (contraction and expansion). These types of motion belong to a category of motion known as optic flow. Optic flow is motion created as a result of the observers movement in relation to his or her environment and can be considered as simple or complex. Translation is optic flow in a single linear direction much like what you might see while looking out of the side window of a car, and would be considered simple in terms of computation. Both rotation and divergence flows are considered computationally complex because the direction of flow is based on relative spatial location across time or how one part of the scene moves in relation to another part of the scene. This is what they consider image-based. Translation is considered computationally simple because the direction of flow can be determined by examining any single portion of the scene. This is referred to as scene-based because it does not depend on movement in relation to any one element or image within the scene. Reichardt detectors are tuned to displacement overtime and to specific directions. While this is well suited to detecting linear motion, it is insufficient for detecting rotation because the detectors would be unable to relate motions of differing directions. Other researchers have shown that while there are neurons sensitive to translation early in the striate cortex, neurons sensitive to rotation and divergence do not show up until reaching MT and MST. These findings suggest that neural processing of these two types of motion may be similar and consequently would be similar in terms of cognitive demands. The question asked by Thorton and Gilden is what kind of representation is used by attention in acquiring information that specifies direction. The goal is to uncover the type of information that assigns attentional load and this would in turn lead to a better understanding of what is perceived efficiently and why. To determine the level of representation they use two different experimental paradigms. The first is a traditional multiple target search examining spatial parallelism and the second is a signal detection approach. The purpose of the two methodologies is to generalize findings across the late selection models and the early selection models. What they found was that translations and divergence flow produced relatively similar results, that is these two types of optic flow patterns yielded search times and error rates that appeared to be reflective of parallel, capacity limited processing. The curl condition yielded search times that appeared to be consistent with serial processing. In other words, detecting translation flow and divergence flow does not appear to be cognitively demanding or scene-based, while detecting rotation seems to be image-based and requires greater cognitive effort. Another task in motion perception that is relatively effortless under most conditions is tracking an object moving through ones field of view. An object suddenly launched through the air within ones field of view will quickly draw attention. This is particularly helpful in the light of self-preservation. Motion is considered a basic feature and can be searched in parallel ((Thornton ; Gilden, 2001; Treisman, 1986; Treisman ; Gelade, 1980; Wolfe, 1994). Multiple mechanisms for processing motion information have been identified beyond the early motion energy mechanisms. One such mechanism is a motion trajectory network that would presumably exert its influence somewhere in the middle levels of motion processing at the level of integration.
Watamaniuk et al., (1995) examined the ability to track local motion information. Motion can easily be tracked even when it disappears temporarily behind occluders or when moving through or against a moving background. However, motion signals can be degraded by introducing extra information into a display in the much the same way as static visual information (Nakayama ; Silverman, 1986). One of the greatest determinants of detectability of motion is the presence of distracting motion signals that match the target motion (Watamaniuk ; McKee, 1995; Watamaniuk, McKee, ; Grzywacs, 1995). Also, the human visual system prefers objects that move consistently over time. One possible reason for the ease of tracking a trajectory is a putative trajectory network, a network of interconnected neurons with excitatory connections that propagate a local motion signal along a sequence. This trajectory network would have to come into play at a level beyond Reichardt detectors because it is interconnected and has a broad tuning (not just a preferred linear direction) that enables it track circular or arced paths as well as straight linear paths. In order to do this, it not only needs broad tuning, but also requires a signal to be both pooled and propagated. In this network motion detectors send facilitatory signals to adjacent detectors with similar directional tuning. The feed forward nature of this network would cause the signal to strengthen over time and could be a plausible explanation for the attentional draw of objects shooting through ones field of view. The noise that is most likely to interfere with the signal in this type of network is noise that most closely resembles the target motion. This can be explained by a probability of mismatch based on nearest neighbors or the likelihood that the signal target trajectory lays outside the distribution of the noise signals. Ruth Rosenholtz (Rosenholtz, 2001) also proposes an outlier rule model for search that is based on the distance of the target signal from the distribution of the noise signal.
Divided Attention in the Motion DomainWatamaniuk and McKee (1998) later examined the ability to encode local and global motion information simultaneously. There are several key pieces of information that came from this research. No global preference was noted in the motion domain. Global precedence is the tendency to process information on a large scale prior to processing higher frequency information. The thresholds established for local information in this study were lower than thresholds for lower frequency or global information. Additionally, local and global information could be encoded simultaneously and did not require attention to be divided across the processing time as shown by the increase in threshold of both global and trajectory motion at shorter stimulus duration. Because both global and local information could be coded simultaneously these researchers suggest that to produce global percept the local information is summed over time. The difference then would be at the processing level or that local and global information is processed over different spatial extents and not input at different spatial frequencies.
Rather than examining attentional demands Dobkins and Bosworth ((Dobkins ; Bosworth, 2001) address attention as a mechanism of selection. Going back to Pashlers discussion of early and late selection the debate is over the point at which the visual system becomes capacity limited. Early selection models suggest that attention selects what will be processed and late selection models suggest that everything gets processed but attention determines what is brought into awareness. Dobkins and Bosworth examined the effects of cueing on motion processing. They asked subjects to judge motion direction of the stimuli presented in conditions where subjects either received a cue or did not receive a cue telling them which portion of the display to attend. Detectability thresholds were determined for each subject in pre-cued and non pre-cued conditions and in single and multiple display conditions. The results suggested that observers were able to process multiple stimuli at one time and that the benefit of the pre-cue may be to eliminate the time required to orient attention. The pre-cue benefit was not significant at longer stimulus durations. The uncued noise distractors actually seemed to enhance processing of the motion stimulus suggesting that there is some pre-attentive processing of the noise distractors. There was a small benefit of the pre-cue in a single display condition suggesting that the pre-cue served to decrease the time needed for the observer to orient attention to the proper location in the display.The conclusion here is that attention serves to both enhance processing and reduce the cognitive resources allocated to distractors and agreed with Pashler (1998) that a controlled parallel process could explain the results. That is to say, we can process many things at once, but can also willfully select objects that will receive processing while ignoring items that are irrelevant. These findings also agree with fMRI research showing that the attentional load of a task will modulate the resources available for processing information that is not relevant to the task(Rees ; Lavie, 2001).
Bibliography:Adelson, E. (1991). Mechanisms for Motion Perception. Optics and Photonics News, 24-30.
Dobkins, K. R., ; Bosworth, R. G. (2001). Effects of set-size and selective spatial attention on motion processing. Vision Research, 41, 1501-1517.
Eckstein, M. (1998). The lower visual search efficiency for conjunctions is due to noise and not serial attentional processing. Psychological Science, 9, 111-118.
Mather, G. (1998). Introduction to Motion Perception. www.biols.susx.ac.uk/home/George_Mather/Motion.
Nakayama, K., ; Silverman, G. H. (1986). Serial and parallel processing of visual feature conjunctions. Nature, 320, 264-265.
Pashler, H. E. (1998). The Psychology of Attention: The MIT Press.
Rees, G., ; Lavie, N. (2001). What can functional imaging reveal about the role of attention in visual awareness. Neuropsychologia, 39, 1343-1353.
Rosenholtz, R. (2001). Search asymmetries? What search asymmetries? Perception and Psychophysics, 63, 476-489.
Thornton, T., ; Gilden, D. L. (2001). Attentional Limitations in the Sensing of Motion Direction. Cognitive Psychology, 43, 23-52.
Treisman, A. (1986). Features and objects in visual processing. Scientific American, 97-110.
Treisman, A., ; Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97-1336.
Watamaniuk, S. N. J., ; McKee, S. P. (1995). Seeing Motion behind occluders. Nature, 377, 729-730.
Watamaniuk, S. N. J., ; McKee, S. P. (1998). Simultaneous encoding of direction at a local and global scale. Perception and Psychophysics, 60(2), 191-200.
Watamaniuk, S. N. J., McKee, S. P., ; Grzywacs, N. M. (1995). Detecting a Trajectory Embedded in Random-direction Motion Noise. Vision Research, 35(1), 65-77.
Wolfe, J. M. (1994). Guided Search 2.0: A revised model of visual search. Psychonomic Bulletin, 1, 202-238.