Monthly Archives: July 2016

Footnotes to Pylyshyn

In a footnote of this ‘Origins of Objectivity’ Tyler Burge criticised Pylyshyn’s theory of vision. Burge argued that though Pylyshyn’s experimental work was very important Pylyshyn was to some degree misinterpreting his own experiments. Since Burge’s book was published Fodor and Pylyshyn[1] have written a book ‘Minds without Meanings’ where they further develop their direct reference theory.  In this blog-post I will consider their new book in light of the criticisms made by Burge and evaluate to what degree F and P’s claims stand up to critical scrutiny.

In some aspects the debate between F and P and Burge is an updated and more tractable version of the Quine/Wittgenstein point that ostensive definition requires a lot of stage setting. A lot of philosophers/psychologists think that the stage setting can be provided by innate constraints on interpretation. While others think that culture sets the stage in some ways. At a more fundamental level; the level of perception F and P seem to think that FINST can pick out objects independent of any stage setting; hence they argue that FINST do the job by locking on to distal objects. This provides their foundation as they move out towards more complex conceptual capacities. They present a variety of pieces of experimental evidence to support their claim. They note the following:

“One of the main characteristics of visual perception that led Pylyshyn (1989, 2001) to postulate FINSTs is that vision appears not only to pick out several individual objects automatically, but also to keep track of them as they move about unpredictably by using only spatio temporal information and ignoring visible properties of individual objects… Pylyshyn and his students demonstrated in hundreds of experiments (described in Pylyshyn 2001, 2003, 2007 and elsewhere), that observers could keep track of up to four or five moving objects without encoding any of their distinguishing properties (including their motion and the speed or direction of their movement.” (‘Mind’s Without Meanings pp. 100-102)

They note that in these various object tracking studies some factors do somewhat effect subject performance. The first factor that affects subject performance is the distance between the objects. The second factor is the amount of time objects stay close to each other (ibid p. 105).

Tracking continues unimpaired even when objects disappear behind a screen. Now as far as I can see this fact seems to support a view of that objects files are subject to constraints like object permanence. But F and P don’t draw this obvious conclusion and seem to think that object files are empty and are little more than indexes.

In his ‘Origins of Objectivity’ Tyler Burge criticises F and P’s interpretation of the experimental data on perception:

In the psychological literature, some authors have taken the indexes in tracking multiple moving objects to represent ‘visual objects’, or two-dimensional ‘visual patterns’  that are ‘reliably associated’ with physical objects, or ‘proximal counterparts of real physical objects’, or ‘proximal features that are precursors in detection of real physical objects. The representa are taken not to be physical bodies or any other environmental entities. I believe that this way of thinking is confused and deeply mistaken about what is being studied. The experiments apply to visual systems that can represent three-dimensionally shaped bodies in three-dimensional space. The representational content of the perceptions is explained, under perceptual anti individualism and in scientific practice, by reference to the perceptual system’s evolutionary relations to bodies and other environmental entities, not merely to proximal counterparts of bodies.” ( Burge: ‘The Origins of Objectivity’ pp 453-454)

Above we can see a paradigm explanation of anti-individualism by Burge. Despite being critical of Millikan and her notion of proper function Burge largely agrees that we need to understand perception in light of proper function. Burge just adds the further proviso that perception isn’t just to be explicated in terms of proper function; understanding veridicality conditions is the key to understanding perception. Burge to some degree relies on the work of Pylyshyn who has done dozens of experiments of perceptual tracking of multiple objects. However, Burge doesn’t agree with Pylyshyn’s interpretation of the experiments. He has criticised Pylyshyn for being ambiguous as to whether the objects of indexical like representations are proximal or distal. Burge obviously thinks that the distal interpretation is the correct one and he thinks that ambiguity about this distinction has led Pylyshyn into making incorrect interpretations of his experiments. Pylyshyn has argued that indexes don’t have to be accompanied with the encoding of any property and Burge strongly disagrees with this interpretation. Burge makes the following point:

“One cannot perceive a particular without perceiving it by way of some general, repeatable grouping capacity to attribute properties, relations kinds verdically” (ibid p. 455)

Burge’s criticism seems on the face of it to be intuitively correct. Pylyshyn offers no realistic story as to how we pick out particulars except speaking of locking on to causal patterns in the environment. It is worth discussing F and P’s arguments in a bit more detail to see if they can handle Burges criticisms. F and P explicate their point as follows:

“So the empiricists were right that there is a robust sense in which theories of perception are at the heart of theories of mind world semantic relations; the relevant  causal relation between a symbol and its referent is relatively direct; perceptual processes are by and large  ‘data driven’ (or to further the computer analogy rather precisely, these processes are “interrupt driven” rather than being initiated by test operations to inputs…) Causal interactions with things in the world give rise to sensory representations, and sensory representation give rise to perceptual beliefs.” (Fodor and Pylyshyn ‘Minds Without Meaning pp. 87-88)

Burge strongly disagrees with F and P on this point and has argued that the computer analogy of something being “interrupt driven” fails. He notes that there is no need to regard “interruption” as referring to anything. Burge claims that if “interruptions” are supposed to refer to distal particulars , it is as unclear how they do so as it is how indexes (or context bound singular applications) can do so, in the absence of perceptually representing the object as being of a certain sort or as having certain properties ( Burge ‘Origins of Objectivity’ p. 455).

Burge’s claim of course directly contradicts F and P’s argument in chapter 4 of their ‘Minds Without Meanings’:

“In short, we think the causal chains that support the reference of mental representations to things-in-the-world are of two distinguishable kinds: the first kind connects distal objects that are within the perceptual circle to perceptual beliefs; the second kind connects distal objects that are outside the perceptual circle to mental representations via causal relations of the first kind” ( ‘Minds Without Meanings’ p. 88)

The main point of disagreement between F and P and Burge is on whether reference within our perceptual circle occurs directly without picking out properties, relations etc. Pylyshyn takes his experimental work to show that such direct reference within the perceptual circle occurs. Pylyshyn’s experimental work centres on what is known as a visual index (( also known as a FINST (fingers of instantiation)). Pylyshyn parses FINST as mental representations which are similar to demonstratives like this and that and he notes that they also resemble proper names, computational pointers, and deictic terms (ibid p. 91). People can track objects extremely fast (once the cardinality is no greater than four), without making mistakes. This ability is not influenced by shape[2], or colour or whether the observers are pre-cued as to where the objects will appear (ibid p.93) When the cardinality is greater than four, subjects require shape, colour and location to track the objects. Furthermore it takes much longer for a subject to track cardinalities greater than four. F and P outline a picture where three distal objects are registered by the eye and this results in three separate object files being created which represent the objects and are used to track the object. This grabbing process involves bare particulars being picked out. Though later on things like properties, colour, etc can be added to the object files. F and P stress that the property assignment occurs after the indexes have been picked out (if at all).

Philosophers like Andy Clark argue that these objects are picked out by specifying a location. But F and P disagree with this claiming that experimental data shows that people can distinguish between objects even if the location of the relevant property tokens is the same for the two of them (ibid p. 96). They go on to note:

The visual system cannot apply property-at-location encoding without first identifying the object to which the properties are ascribed; so it cannot escape individuating objects before it decides which properties belong to which objects” (ibid p. 97)

So on the picture sketched by F and P objects are indexed directly. Of course while they offer evidence that properties are not necessary to pick out objects; they have offered little evidence that to say how exactly the objects are picked out.

Paul Churchland in his 2012 book ‘Plato’s Camera’ raises two main problems with the Indicator Semantics of F and P and these problems are directly relevant to the concerns raised by Burge. Churchland outlines the first problem as follows:

“Specifically, each perceptual concept, C,  is said to acquire or enjoy its semantic content independently of every other concept in its possessor’s repertoire, and independently of whatever knowledge or beliefs, involving C, its possessor may or may not have. This semantic atomism is a simple consequence of the presumed fact that whether or not C bears the required law like connection, to an external feature, that makes its tokenings a reliable indicator of that feature, is a matter that is independent of whether other concepts its possessor may or may not command, and of whatever beliefs he may or may not embrace. Such semantic atomism is entirely plausible…if our prototypical examples of ‘reliable indication’ are the behaviours…such as the thermometer and the voltmeter… But a problem begins to emerge…when we shift to the typical application of our observation terms…face cookie, doll and sock. The problem is simple. There are no laws of nature that comprehend these things qua faces, cookies, dolls or socks. All of these features, to be sure have causal effect on the sensory apparatus of humans, but those effects are diffuse, context dependent, high-dimensional, and very hard to distinguish, as a class, from the class of perceptual effects that arise from many other things. ( ‘Plato’s Camera pp. 95-96)

Obviously in ‘Plato’s Camera’ he outlines connectionist models showing how they can handle learning complex facts. One of the jewels in his explication is how connectionist models can learn to recognise faces through Cottrell’s mature face network. Churchland notes that the upshot of these connectionist models is that there is an irreducibly holistic factor in learning. So he argues that his multidimensional models show empirically that Fodor and Pylyshyn are wrong in thinking that conceptual atomism is true in any interesting sense. For Churchland the empirical facts indicate that beyond trivial thermostat type stuff conceptual atomism is impossible. In other words, he argues, that thermostat, voltmeter type stuff is not really an example of having a concept; and when we move to actual concepts we are stuck with the holism of connectionist models. He offers a slogan “no representation without at least some comprehension” (ibid p. 97)

Churchland’s second problem with Indicator Semantics is as follows:

All of this serves to highlight what Indicator Semantics is inclined to suppress and is ill equipped to explain, namely, that one and the same objective domain can be, and typically will be differently conceived or understood by distinct individuals and cultures at the level of our spontaneous perceptual comprehension” (ibid p. 102)

This is the problem we began with; ostensive definitions require a lot of stage setting. Churchland thinks connectionist models provide this stage setting. Clark (2016) thinks Bayesian learning can provide the stage setting. Tyler Burge on the other hand opts for innate domain specific constraints, to solve the stage setting problem. The debate between these thinkers to some degree centres on whether their learning theory is sufficient to explain objective reference and conceptual capacities. Because Clark and Churchland do not attempt to deal with developmental facts of when children begin to perceive objects, how they perceive these objects and how their conceptual understanding of the world develops their models remain untested. In this sense Burge’s learning theory has a serious advantage over its rivals in that it at least tries to account for the salient facts. That said it is at least possible that when they use their different approaches on actual developmental data Churchland and Clark’s models may accurately describe the process of objective reference and development of conceptual capacities. If this were to happen then their approaches would have an advantage over Burge’s because their approaches are more easily integrated with neuroscience while Burge’s approach involves an irreducible psychological aspect.

This state of affairs would appear to leave F and P in an extremely untenable position. Their use of FINST is well supported experimentally, though the indexes don’t seem to be as encapsulated and automatic as F and P seem to want us to believe. Furthermore there seems to be no way we can interpret the FINST as picking out one object over many alternative models. F and P however argue that direct reference is not the only process at work that compositional constraints on the way concepts can come together will rule out the many alternative interpretations of concepts (ibid p. 128-131). Unfortunately, they don’t manage to provide any evidence to support this conjecture. Ultimately they are left with a kind of hopeful monster. They think that direct reference is necessary in order to have a naturalistic theory of mind, and they argue that FINST provides this direct reference. However the empirical data indicates that while the FINST does have a certain amount of autonomy it is certainly not entirely isolated from all comprehension.



Burge, T. 2010. Origins of Objectivity. Oxford: Oxford University Press.

Carey, S. 2009. The Origin of Concepts. Oxford: Oxford University Press.

Churchland, P. M. 2012. Plato’s Camera: How the Physical Brain Captures a Landscape of Abstract Universals. Cambridge MA: The MIT Press.

Clark, A. 2016. Surfing Uncertainty: Prediction, Action and the Embodied Mind. Oxford: Oxford University Press.

Fodor, J and Pylyshyn, Z. 2015. Minds without Meanings: An Essay on the Content of Concepts. Cambridge MA: The MIT Press.

[1] Henceforth F and P.

[2] Pylyshn notes that shape can have some effect when the shape is something familiar like a square.