Tag Archives: philosophy

Large Language Models and the Rationalist-Empiricist Debate.

                          Introduction.

From about 1950 there was a resurgence of interest in the Rationalist Empiricist debate. With people viewing Chomsky as an updated rationalist carrying on the traditions of Leibniz and Descartes and Quine and Skinner updated versions of empiricism carrying on the traditions of Locke and Hume. And a consensus emerged that Chomsky’s rationalism won out over the empiricism of Quine and Skinner.

                In recent years with the rise of Deep Convolutional Neural Networks, and Large Language Models some have argued that their architecture is data driven and hence they are an existence proof that empiricist learning is a viable way of modelling the mind (Bunker, C 2018). Prompting debate where others argue that these models don’t vindicate any kind of empiricism as they rely on innate architecture.

                In this paper my focus will be LLMs because their linguistic capacities make them directly relevant to the earlier debates between Chomsky, Quine and Skinner. And evaluate whether LLMs do indeed vindicate a type of empiricism. I will argue that LLMs are too dissimilar to human cognitive systems to be used as a model for human cognition. They make bad models of both human linguistic competence and human linguistic performance. So, I argue that they offer no vindication of empiricism (or rationalism for that matter).  

 Chomsky and Quine and the Rationalist-Empiricist Debate.

Famously the rationalist-empiricist debate between people such as Locke and Descartes which focused on subjects such as whether humans were born with innate concepts, was revisited in the 1950s. When Noam Chomsky burst onto the scene with his review of Skinner’s book Verbal Behaviour to many this was viewed as a reviving of the rationalist-empiricist debate. Chomsky entitled his 1966 book ‘Cartesian Linguistics: A Chapter in the History of Rationalist Thought’, thus setting himself up as the heir apparent to Descartes rationalism.

                In Skinner’s Verbal Behaviour he divided language up into seven Verbal Operants which he argued are controlled using his three-term contingency of antecedent behaviour and consequence. Chomsky criticized Skinner for taking concepts from the laboratory where they were well understood from animal literature and extending them into areas where there was no such experimental evidence. He charged Skinner with either using the technical terms literally in which his views were false, or metaphorically in which case the technical terms were as vague as ordinary terms from folk psychology.

                In his 1965 book ‘Aspects of a Theory of Syntax’ Chomsky made his famous distinction between competence and performance. Chomsky argued that the only substantive theory of performance which would be possible would come via theory of underlying competence. And he illustrated this point through showing how elements from our underlying grammatical competence could predict and explain aspects of our linguistic performance. Since behaviourism was obviously primarily concerned with behaviour many viewed Chomsky’s competence performance distinction as an attack on behaviourism. Many scientists influenced by Chomsky argued that behaviourists not having a competence-performance distinction meant that it couldn’t be taken seriously as a science (Jackendoff 2002, Collins 2007).

                Chomsky (1972, 1986) poverty of stimulus argument was deemed a further dent in the behaviourist project. Chomsky used the structure dependence syntactic movement such as auxiliary inversion as an example of syntactic knowledge which a person acquired even though a person could go through much or all their life without ever encountering evidence for the construction. The argument being if the child learned the construction despite never encountering evidence for its structure. And the child didn’t engage in trial-and-error learning where he tried out incorrect constructions which were systematically corrected by his peers until he arrived at the correct one (Crain and Nakayama 1987, Brown and Hanlon 1970). Then knowledge of the construction must be built into the child innately.

                Theorists viewed poverty of stimulus arguments as further evidence that the behaviourist project was doomed to failure. With many contrasting Chomsky’s emphasis on innate knowledge with Skinner’s supposed blank-slate philosophy (Pinker 2002). Thereby situating Skinner as a modern-day Locke battling with a modern-day Descartes (Chomsky), and the consensus was that modern science had shown that the rationalist position was the correct one.

                       Quine and Chomsky.

                The debate between Chomsky and Skinner was primarily focused on issues in linguistics, and psychology. In philosophy the rationalist-empiricist debate played out in a debate between Quine and Chomsky. Quine billed himself as an externalized empiricist whose primary aim was to explain how humans go from stimulus to science in a naturalistic manner. His entire project centred on naturalising both epistemology and metaphysics. On the epistemological side of things his need to explain how we go from stimulus to science would involve psychological speculations on how we acquire our language, how we develop the ability to refer to objects. Quine was explicit in these speculations that any linguistic theory was bound to be behaviourist in tone since we acquire our language through intersubjective mouthing of words in public settings. This commitment to behaviourism set Quine at odds with Chomsky.

                In 1969 Chomsky’s wrote a criticism of Quine called ‘Quine’s Empirical Assumptions’. This criticism noted that Quine’s notion of a pre-linguistic quality space wasn’t sufficient to account for language acquisition. That Quine’s Indeterminacy of Translation Argument was trivial and amounted to nothing more than ordinary Underdetermination. And that Quine’s invocation of the notion of the probability of a sentence being spoken was meaningless.

                Skinner never replied to Chomsky[1] arguing that Chomsky so badly misunderstood his position that further dialogue was pointless. But Quine (1969) did reply; in his reply he charged Chomsky with misunderstanding his position and of attacking a strawman. On the issue of a prelinguistic quality space he argued that it was postulated as a necessary condition of acquiring the ability to learn from induction or reinforcement; he never thought it was a sufficient condition of our acquiring language. Quine argued that “the behaviourist was knowingly and cheerfully up to his neck in innate apparatus”. He further argued that indeterminacy of translation was additional to underdetermination and revealed difficulties with linguists and philosophers’ uncritical usage of “meanings, ideas and propositions”. And finally, he noted that Chomsky misunderstood Quine’s discussion of the probability of a sentence being spoken. Quine wasn’t speaking about the absolute probability of a sentence being spoken, rather he was concerned with the probability of a sentence being spoken in response to queries in an experimental setting.

                This led to a series of back and forth between Chomsky and Quine. In his (1970) ‘Methodological Reflections on Current Linguistic Theory’, Quine criticized Chomsky’s notion of implicit rule following. Quine noted that there are two senses of rule-following he could make sense of (1) Being guided by a rule: A person following a rule they can explicitly state, (2) Fitting a Rule: A person’s behaviour can conform to any of an infinite number of extensionally equivalent rules. But Quine charged Chomsky of appealing to a third type of rule (3) A rule that the person cannot state, but is nonetheless implicitly following, and this rule is a particular rule distinct from all the other extensionally equivalent rules that the persons behaviour conforms to. Chomsky (1975) correctly responded that Quine was again arbitrarily assuming that underdetermination was somehow terminal in linguistics but harmless in physics. As Chomsky approach became less about rules and more to do with parameters switching Quine’s rule-following critique had less and less traction.

                While Chomsky’s critique of Skinner achieved the status of almost a creation myth in cognitive science. With most introductory texts in psychology or cognitive science attributing the Chomsky’s review of Verbal Behaviour being the death-knell of behaviourism and the birth of cognitive science. Whereas Chomsky’s criticism of Quine wasn’t as well known, and it had a more nuanced reading. While a lot of people came to the view that Chomsky won the debate; it didn’t attain the creation myth status that the review had. Nonetheless, it is fair to say that most philosophers, accepted Chomsky’s criticisms of Quine as to the point.

                Outside of the realm of academic debates in the popular press when Skinner is spoken about, he is referred to a blank slate theorist (Pinker 2002). With Skinner and to a lesser degree Quine placed in the camp as exemplars of the empiricist tradition and modern-day inheritors of John Locke’s mantal, and Chomsky a self-described exemplar of the Cartesian Tradition. And the scientific consensus is that Chomsky’s rationalism has won out over Quine and Skinner’s empiricism.

       Artificial Intelligence and Empiricism.

In recent years with AI getting more and more sophisticated; philosophers, psychologists, and linguists have begun to explore what these AI systems tell us about the rationalist-empiricist debate. With some theorists arguing that empiricist architecture is responsible for the success of recent AI systems (Buckner 2017, Long, 2024). While others have argued that in fact the architecture because it needs in built biases in fact supports rationalism (Childers et al 2020 p.87).

                Buckner (2017) argued that deep convolutional neural networks are useful models of mammalian cognition. And he further argued that these DCNNs use of “transformational abstraction”, vindicated Hume’s empiricist conception of how humans acquire abstract ideas. Childers et al (2020) have hit back at this view and have argued both that LLMs and DCNNs require built in biases for them to be successful. And they further argue that the need for built in biases, is analogous to the way Quine needed to posit innate knowledge to explain language acquisition, thereby, according to them, undermining their empiricist credentials (Childers et al 2020 p. 72).

                Childers et all’s reading of the rationalist empiricist debate is extremely idiosyncratic. Their assertion that the postulation of any innate dispositions is an immediate weakening of empiricism is bizarre (Ibid p. 84). This reading of the rationalist-empiricist dispute doesn’t stand up to scrutiny. Hume, and early arch-empiricist needed innate formation principles in the human mind to account for how we combine the ideas we receive from impressions into complex thoughts (Fodor 2003). And even Chomsky who is viewed as a paradigm exemplar of the rationalist tradition argued that innateness wasn’t the issue when it came to the rationalist-empiricist debate:

The various empiricist and behaviourist approaches mentioned postulate innate principles and structures (cf. Aspects, pp. 47 f.). What is at issue is not whether there are innate principles and structures, but rather what is their character: specifically, are they of the character of empiricist or rationalist hypotheses, as there construed?” (Katz & Chomsky: 1975).

“…Each postulates innate dispositions, inclinations, and natural potentialities. The two approaches differ in what they take them to be…The crucial question is not whether there are innate potentialities or innate structure. No rational person denies this, nor has the question been at issue. The crucial question is whether this structure is of the character of E or R; whether it is of the character of “powers or “dispositions”; whether it is a passive system of incremental data processing, habit formation, and induction, or an “active” system which is the source of “linguistic competence” as well as other systems of knowledge and belief” (Chomsky 1975 pp. 215-216)

And Watson, Quine and Skinner were consistent about this point throughout their careers: Wason 1924 p. 135, Skinner 1953 p. 90, Skinner 1966 p. 1205, Quine 1969 p. 57, Quine 1973 p. 13, Skinner 1974, p.43.

            The point of Childers et all’s criticism was that Hume’s empiricism with its appeal to a few laws of association., needed to be supplanted by Kant’s system which postulated many more innate priors (Childers et al 2020 p. 87). This may have been a problem of Hume, but it is no difficulty for the likes of Quine who was an externalized empiricist who had no issues whatsoever with innate priors once they could be determined experimentally (Quine 1969 p. 57). When it comes to Artificial Intelligence there is a legitimate debate on whether it is pragmatic to build the systems on rationalist or empiricists principles. But this only has relevance to the rationalist empiricist debate if it can be demonstrated that artificial intelligence systems learn in the same way as humans do. In the next section I will evaluate how closely AI systems model human cognition. To do this I will focus LLMs and the degree to which they accurately model human linguistic cognition.

 Large Language Models and Human Linguistic Competence.

Theorists have argued that the similarities between LLMs output and human linguistic output make LLMs and the way they learn directly relevant to theoretical linguistics. Thus, Piantadosi (2023), has argued that LLMs refute central claims made by Chomsky et al in the generative grammar tradition about language acquisition. This comparisons of LLMs to actual human cognition has been challenged in the literature (Chomsky et al 2023, Kodner et al 2024, Katzir, R 2023). In this section I will consider various disanalogies between LLMs and Human linguistic cognition which makes any comparison between problematic. And in the final section I will consider the relevance of these disanalogies towards considering work in AI as being pertinent to debates about Rationalism versus Empiricism.

    Poverty of Stimulus Arguments, Artificial Intelligence, & Human Linguistic Capacities.

            A clear disanalogy to human linguistic abilities and LLMs is that humans acquire their language despite a poverty of stimulus, while LLMs learn because of a richness of stimulus (Kodner et al 2023, Chomsky et al 2023, Long, R 2024, Marcus, G. 2020). To see the importance of this distinction a brief discussion of the role that Poverty of Stimulus Arguments have played in linguistics is necessary with this in place we can return to the stimulus which LLMs are trained on.

            Chomsky 1965 noted that people acquire syntactic knowledge despite a poverty of stimulus. Humans are exposed to limited fragmentary data and still manage to arrive at a steady state of linguistic including knowledge of syntactic rules which they may not have ever encountered in their primary linguistic data. Chomsky used auxiliary inversion as his paradigm example of a poverty of stimulus (Chomsky 1965, 1968, 1971, 1972, 1975, 1986, 1988[2]). Pullum and Scholz (2002) reconstructed Chomsky’s Poverty of Stimulus Argument as follows:

  1. Humans learn their language either through data driven learning or innately primed learning.
  2. If humans acquire their first language through data driven learning, then they can never acquire anything for which they lack crucial evidence.
  3. But Infants do indeed learn things for which they lack crucial knowledge.
  4. Thus, humans do not learn their first language by means of data-driven learning.
  5. Conclusion: humans learn their first language by means of innately primed learning (Pullum and Scholz 2002).

Pullum & Scholz (2002) Isolated premise three as the key premise in the argument. And they sought empirical evidence to discover the amount of times constructions with evidence for auxiliary occur in a sample of written material. The material they choose to examine was Wall Street Journal back issues. They also estimated the amount of linguistic data a person is on average exposed to do. To do this they relied on Hart and Risely (1997) ‘Meaningful Differences in the Everyday Experiences of Young Children’. They estimated that your average child from a middle-class background will have been exposed to about 30million word tokens by the age of three. Pullum and Scholz argue that the child will have been exposed to about 7500 relevant examples in three years. Which amounts to about 7 relevant questions per day. But a primary criticism of their work was that the Wall Street Journal wasn’t representative of the type of data that a child would be exposed to.  Sampson 2002 searched the British National Corpus and argued that the child would be exposed to about 1 relevant example every 10 days.

But the next question was whether a child would be able to learn the relevant construction from 1 example every 10 days (Lappin & Clark 2011). Reali and Christensen (2005) Perfors, Tenebaum & Reiger (2006) have all constructed mathematical models demonstrating that children are capable of learning from the above amounts of data.  However, Berwick & Chomsky et al. (2011) in their ‘Poverty of Stimulus Revisited’ have hit back arguing that Auxiliary Inversion is meant as an expository example to illustrate the APS to the general public. And that there are much deeper syntactic properties which children could not learn from the PLD. The debate still rages on, but it is still a consensus in generative grammar that the Poverty of Stimulus is a real phenomenon which humans need domain specific innate knowledge to overcome.

As our discussion above indicates that there has been some push back against Poverty of Stimulus Arguments however it is still the default position in linguistics.  Furthermore, even those who push back against the APS would gleefully admit that the linguistic data a LLM is trained on is not analogous to the Primary Linguistic Data of your average child. Children are exposed to 10million tokens a year, LLMs are exposed to around 300 billion tokens and this number is increasing exponentially (Kodner et al 2024). So, while the output of a LLM and a human may be roughly analogous the linguistic input they receive is in no way analogous. 

 Competence & Performance in LLMs & Humans.

The divergence on linguistic data which LLMs and humans are trained on is a clear indicator that they work of different underlying competencies. Other differences emerge in terms of the materials they use. In terms of computation considerations chips are faster than neurons (Long 2024). To the degree that outputs are similar that doesn’t prove that they rely on the same underlying competencies (Firestone 2020, Kodner et al 2024, Milliere & Buckner 2024). Kodner et al give the example of two watches both of which keep time accurately but one of which is digital, and the other is mechanical. Despite similar performances they achieve it through different underlying competencies (Kodner et al 2024). 

   But Are Human and LLM Outputs Analogous?

The question of whether Humans and LLM’s output are analogous is obviously vital if we want to understand whether they operate using the same underlying competencies. We have already seen that the two systems seem to learn differently one despite the poverty of stimulus and one because of the richness of stimulus. This points towards different underlying innate competencies. Different underlying competencies aside the next section will demonstrate that the performances of each system are very different.

          At a superficial level it LLMs and Human outputs appear very similar. Chat GPT-4 can to some degree fool a competent reader into thinking that a human produced the outputted sentences. Clark et al (2021) studied human created stories, news articles, and texts and got a LLM to create similarly sized stories. 130 participants were tested, and they couldn’t tell apart the human from LLM models at a range greater than chance. (Scwitzgebel et al 2023). Scwitzgebel, et al (2023) Created a Large Language Model which was able to simulate Daniel Dennett’s writing style and though experts were able to distinguish amongst them at rates barely above chance, it was surprising how close run the thing was given the fact was that it was scholars who were experts on Dennett who were being probed.

While a LLM can construct sentences which appear to be analogous to ordinary human sentences there are obviously a lot of disanalogies. While LLMs can reliably produce syntactically sound sentences and sentences which are semantically interpretable. The words the LLM use have no meaning to the LLMs only to the humans that interpret their output. The reason that they have no meaning is because they are not grounded in sensory experience for the LLM. Whereas for humans they obviously are (Harnard 2024). The LLM unlike the human isn’t talking about any state of affairs in the world, rather it is merely grouping together tokens according to how the tokens are fed into it in its training data.

When criticisms are made that LLMs outputs don’t have meanings. We need to be careful how we parse these statements. Obviously, they have meaning in the sense that they can distinguish between two sentences which are syntactically identical, but which don’t have the same meaning. But the sentences do not have meaning in the sense of referential relation between the words and a mind independent reality. However, given that the idea of explicating meaning in terms of a word-world relation has been questioned (Chomsky 2000, Quine 1973), it is difficult to know what to make of the claim that LLMs don’t have meanings because their words don’t refer to mind independent objects.

Bender & Koller (2020) used a thought experiment to illustrate why they believed that LLMs did not mean anything when they responded to queries. The thought experiment imagines two people trapped on different Islands who are communicating with each other via code through a wire which is stretched between the islands via the ocean floor. In this thought experiment an Octopus who is a statistical genius accesses the wire and can communicate with the other people on the island through pattern recognition. But though he is able to figure out what code to use, and when, due to the context of the code being used and the patterns of when they are grouped together, he has no understanding of what is being said. Bender & Koller argue that if a person on the Island asked the Octopus how to build a catapult out of coconut and wood he wouldn’t know how to answer because he has no real-world knowledge of interacting with the world and is instead merely grouping brute statistical patterns together.

Piantadosi & Hill (2023) in their “Meaning Without Reference in Large Language Models”, argue that thought experiments such as the Octopus one fail because it makes the unwarranted assumption that meaning can be explicated in terms of reference. They argue that meaning cannot be explicated in terms of reference for the following reasons:

  • There are many terms which are meaningful to us, but which have no clear reference e.g. Justice.
  • We can think of concepts of non-existent objects. These have meanings but don’t refer to anything in the mind-independent world.
  • We have concepts of impossible objects such as a round square, perpetual motion machine,
  • We have concepts which pick out nobody, but which are meaningful: e.g. the present King of Ireland.
  • We have concepts which have meaning but which don’t refer to concrete particulars e.g. concepts of abstract objects.
  • We have terms which have different meanings but the same reference {morning star-evening star}.

They go on to argue that conceptual role theory in which meaning is determined in terms of entire structured domains (like Quine’s web-of-belief) plays a large role in our overall theory of the world. But they do nonetheless acknowledge that reference plays some role in grounding our concepts. Just not as large a role as some theorists criticizing LLMs believe. They are surely right that as theory of the world becomes more and more sophisticated our theory will, as Quine noted, face empirical checkpoints only at the periphery. Nonetheless, when humans are acquiring their language in childhood they must go through a period where they learn to use the right word in the presence of the right object, and to somehow learn to triangulate with their peers in using the same word to pick out a common object in their environment.

As we saw above Piantadosi & Hill (2023) shared concerns about crude referentialist theories of meaning and their relation to LLMs, but they did acknowledge that there are word-world connections between some words and objects in the mind independent world. Quine famously tried to connect our sentences to the world through his notion of an observation sentences. He argued that an observation sentence was a sentence which members of a verbal community would immediately assent to in the presence of the relevant non-verbal stimuli.

But Quine immediately ran into a difficulty with this approach. The difficulty stemmed from the fact that Quine found it hard to say why different members of a speech community assented to an observation sentence. Quine tried to cash out the meaning in terms of stimulus meaning where a particular observation sentence was associated with a particular pattern of sensory receptors being triggered.  But this made intersubjective assent on observation sentences difficult to explain given that different subjects would obviously have different patterns of sensory receptors triggered in various ways in response to the same observation sentences. What pattern of sensory receptors was triggered by what observation sentences would be largely the result of each subject’s long forgotten learning history. All of this made it difficult to see how Quine could make sense of a community assenting to an observation sentence being used in various circumstances.

Quine eventually made sense of intersubjective assenting to an observation sentence by appealing to the theory of evolution. He argued that there was a pre-established harmony between our subjective standards of perceptual similarity and trends in the environment. And that humans as a species were shaped by natural selection to ensure that they shared perceptual similarity standards. This fact was what made it possible for humans to share assent and dissent to observation sentences being used in certain circumstances.

For Quine shared perceptual similarity standards and reinforcement for using certain sounds in certain circumstances gave observation sentences empirical content. But to achieve actual reference Quine argues that we need to add things like quantifiers, pronouns, demonstratives etc. The key point is though that observation sentences link with the world (even if in a manner less tight than objective reference), because of our shared perceptual similarity standards matching objective trends in our environment. This is what makes intersubjective communication possible. LLMs are not responsive to the environment in anything like the way humans are when they are acquiring their first words. Even prior to humans learning the referential capacities of language, children using observation sentences are still in contact with the world. To this degree then Piantadosi & Hill’s concerns about reference are besides the point. As humans begin to acquire their first words, they do so through observation sentences which are connected our environment. The fact that when we acquire a language complete with words and productive syntax we can speak about theoretical items, fictional items, impossible objects etc is interesting but doesn’t speak to the LLM issue. The fact is that as humans first acquire their words they do so in response to their shared sensory environment. LLMs do not learn in this way at all. Their training is entirely the result of exposure to textual examples which they group into tokens based on the statistical likelihood of textual data occurring together. So, while humans eventually learn to speak about non sensorially experienced things their first words are keyed to sensory experience, and this is a key difference between them and LLMs. In the next section I will consider some objections view but firstly I want to recap what we discussed so far.

                     Interlude: Brief Recap.

Thus far we have considered the Rationalist-Empiricist debate as it played out in debates between Skinner, Quine and Chomsky. Noting that the consensus is that Chomsky’s rationalism won the day over the empiricism of Quine and Skinner. In recent years with AI getting more and more sophisticated a resurgence of interest has occurred on AI and its relation to the rationalist-empiricist debate. With some theorists arguing that modern AI indicates that empiricist theories of cognition are, contrary to what was previously believed, being realized by some current AI. Following this I discussed whether this was the case, relating it to previous debates in the rationalist-empiricist debate. Arguing that contrary to Childers et al, an AI can learn in an empiricist manner even does so via built in constraints. With this being argued I then questioned whether the fact that some AI learned in a particular manner had much of an impact on human cognition. To sharpen the issue, I narrowed the debate down to whether if LLMs learned in an empiricist manner this would tell us much about the rationalist-empiricist debate in humans.

          To decide on this issue, I considered whether LLMs and humans learned in the same manner. Concluding like many others that they learn in entirely different ways. Humans learn despite the poverty of stimulus while LLMs learned because of the richness of their stimulus. I argued that the different manners in which they learned indicated that there were probably different competencies underlying their respective performances.

     AI and the Relevance of Rationalism and Empiricism.

Above we discussed some disanalogies between LLMs and human linguistic capacities. Two main differences were noted (1) Differences in the stimulus needed for the respective agents to acquire language, which indicates different underlying competencies. (2) LLMs language is not grounded, and Human language is grounded; and this difference is a result of the different ways in which they acquire their language, humans begin by being responsive to the world in a triangular relation with others, while the LLM acquires their language through statistical grouping of text they are trained on.

So given that humans and LLM’s outputs are the result exposure to different quantities, and types of data, which indicate different underlying competencies, a question arises as to the relevance of AI to understanding human linguistic capacities. Earlier we discussed the debate between Chomsky and Skinner & Quine and its relation to current debates on the nature of LLMs. But given the disanalogies we have noted between LLMs and humans it is questionable whether they have anything to tell us about the rationalist-empiricist debate at all.

The debate between the rationalists and empiricists was never centred on whether innate apparatus was necessary for a creature to learn a particular competency. All sides of the debate agreed that some innate apparatus was necessary to explain particular competencies, and the degree of the innate apparatus which need to be postulated is to be determined empirically. The empiricist position was one which argued that humans learned primarily through data driven learning (supported by innate architecture), while the rationalist argued that humans learned through innate domain specific competencies being triggered by environmental input.

However, given the differences between LLMs and Human’s linguistic competencies their relevance to each other on the rationalist-empiricist debate is in doubt. Even if we can conclusively demonstrate that a LLM or a DCNN learns in an empiricist manner, this will not tell us anything about whether humans learn in an empiricist manner. Because human competencies are so different than a LLMs it is simply irrelevant to the rationalist-empiricist debate whether LLMs learn in an empiricist manner or not.

It is theoretically possible that a philosopher could argue a la Kant that any form of cognition will a priori need to implement innate domain specific machinery to arrive at its steady state. And one could offer empirical data to support this a priori claim by showing that very different forms of cognition e.g. human and LLMs both learn by implementing innate domain specific architecture. But this isn’t how the debate has been played out in the literature. Typically, the literature argues that LLMs are largely empiricist, and this fact vindicates empiricism in general, or it is argued that they are largely rationalist, and this fact vindicated rationalism. I have argued here that given the very different nature of LLMs and humans it is irrelevant to the question of how humans acquire their knowledge whether LLMs are rationalist or empiricist.

But this isn’t to say that it is unimportant whether LLMs or other forms of AI learn in a rationalist or an empiricist manner. There are still practical issues in engineering as to whether one is more likely to be successful in building things like Artificial General Intelligence using empiricist architecture or not. Thus, people like Marcus (2020) argue that while we may be able to build AGI using empiricist principles, we will not be able to build AGI unless we build in substantial innate domain specific knowledge into the system. Marcus even argues that the best way to understand what innate architecture is necessary to be built into our AI models we should look to our best example of an organism with general intelligence, i.e. Humans.

While the engineering question is extremely interesting from a practical point of view and could motivate an interest in whether LLMs or other types of AI learn in a rationalist or an empiricist manner. But when it comes LLMs, the question of whether they learn in an empiricist, or a rationalist manner is largely irrelevant to the rationalist-empiricist question in relation to humans.

                            Bibliography

Berwick, R, & Pietroski, P, & Chomsky, N. 2011 “Poverty of the Stimulus Revisited.” Cognitive Science. Vol 35. Issue 7 pp. 1207-1242.

Bender & Koller. (2020) “Climbing Towards NLU: On Meaning, Form and Understanding in the Age of Data.” Proceedings of the 58th annual meeting of the Association for Computational Linguistics.

Brown, R. & Hanlon, C. 1970. “Derivational complexity and order of acquisition in child speech.” In Hayes, J.R. (eds). Cognition and the Development of Language. New York Wiley.

Buckner, C. 2018. “Empiricism without Magic: Transformational Abstraction in Deep Convolutional Neural Networks. Synthese. Vol 195 pp. 5339-5372.

Childers, T, & Hvorecky, J, & Majer, O. 2023. “Empiricism and the foundations of Cognition”. AI and Society. Vol 38 pp. 67-87.

Chomsky, N. 1959. “A review of B.F. Skinner’s Verbal Behaviour”. Language, 35. PP 26-57.

Chomsky, N. 1965. Aspects of a Theory of Syntax. MA: MIT Press.

Chomsky, N. 1966. Cartesian Linguistics. New York: Harper & Row.

Chomsky, N. 1969. “Quine’s Empirical Assumptions”. Synthese 19 pp. 53-68.

Chomsky, N. 1975. Reflections on Language. New York: Random House.

Chomsky, N. 1986. Knowledge of Language: Its Nature, Origin and Use. New York, New York: Praeger.

Chomsky, N. 2000. New Horizons in the Study of Language and Mind. Cambridge: MA.

Chomsky, N, Roberts, I, Watumull, J. 2023. “The False Promise of ChatGPT.” The New York Times.

Chomsky, N & Katz, J. 1975. “On Innateness: A Reply to Cooper.” Philosophical Review. 84 pp. 70-84.

Clark, L & Lappin, S. 2011. Linguistic Nativism and the Poverty of the Stimulus. Wiley

Collins, J. 2007. “Linguistic Competence Without Knowledge of Language”. Philosophy Compass 2 (6) pp. 880-895

Crain, S & M, Nakayama. 1987. “Structure Dependence in Grammar Formation”. Language, Vol 63 pp. 522-543.

Firestone, C. (2022) “Competence and Performance in Human Machine Comparisons.” Proceedings of the National Academy of Sciences. 117. 43. Pp. 25662-26571.

Fodor, J. 2003. Hume Variations. Oxford University Press.

Hart, B & Risley, T, & Kirby, J. 1997. “Meaningful differences in the everyday experiences of the young American.” Canadian Journal of Education. Vol 22. Issue 3.

Harnard, S. 2024. “Language Writ Large: LLMs, ChatGPT, Grounding, Meaning, and Understanding.”

Jackendoff, R. 2002. Foundations of Language. Great Clarendon Street. Oxford University Press.

Katzir, R. 2023 “Are Large Language Modules Poor Theories of Linguistic Cognition: A Reply to Piantadosi”.  

Kodner, J, & Payne, S, & Heinz, J. 2023 “Why Linguistics will thrive in the 21th Century: a reply to Piantadosi”.

Long, R. 2024. “Nativism and Empiricism in Artificial Intelligence”. Philosophical Studies. Vol 181 pp 763-788.

Marcus, G. 2020. “The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence”

Mollo, D, & Milliere, R. “The Vector Grounding Problem”.

Milliere, R & Buckner C. 2024. “A Philosophical Introduction to Language Models: Part 2”.

Piantadosi, S. 2023 “Modern Language Models Refute Chomsky’s Approach to Language.”

Piantadosi, S, & Hill, F. 2023 “Meaning with our Reference in Large Language Models”.  

Pinker, S. 2002. The Blank Slate: The Modern Denial of Human Nature. New York Viking.

Pullum, G, K. & Scholz, B. 2002 “Empirical assessment of stimulus poverty argument.” The Linguistic Review. 19 pp. 9-50.

Quine, W. 1968. “Reply to Chomsky”. Synthese 19 pp. 274-283.

Quine, W. 1970. “Methodological Reflections on Current Linguistic Theory”. Synthese 19, pp. 264-321.

Quine, W. 1974. The Roots of Reference. La Salle. Open Court Press.

Sampson, G. 2002. “Exploring the Richness of the Stimulus”. The Linguistic Review. 19 pp. 73-104.

Schwitzgebel, E, & Schwitzgebel, D, & Strasser, A. 2024. “Creating a Language Model of a Philosopher”. Mind & Language. 32 (2) 237-259.

Skinner, B.F. 1957. Verbal Behaviour. New Jersey: Prentice-Hall Inc.  


[1] For an interesting reply on Skinner’s behalf see Kenneth McCorquodale.

[2] In the following I am Pullum and Scholz (2002) ‘Empirical Assessment of Stimulus Poverty Arguments.  

Aspects of a Theory of Syntax and Behaviourism

                          Introduction

In this paper the competence-performance distinction first proposed by Chomsky (1965) will be analysed in relation to behavioural science. The paper will consider three primary criticisms of behaviourism from the point of view of the competence performance distinction: (1) behaviourists decision to stick with describing speech patterns and habits prevent them from constructing a credible theory of performance (Chomsky 1965), (2) Behaviourists methodology of only dealing with performance and eschewing explanations in terms of competence precludes them from being a serious science (Collins 2007), (3) Behaviourists don’t engage in idealisations and are committed to counting every cough in an instance of verbal behaviour and hence reduce their science to triviality (Jackendoff 2002). It will be demonstrated by considering developments in behavioural science that these criticisms are not justified. To illustrate the point, I will discuss explanations in both behavioural psychology (as exemplified by relational frame theory), and behaviourism in philosophy as exemplified by Quine. These examples will illustrate behaviourists appealing to underlying competencies to explain behaviour as well as using idealizations.

The fact that behavioural scientists use idealizations and appeal to competencies, doesn’t tell us much about the truth of their overall theories. But for interdisciplinary interaction between behaviourists and cognitivists to be fruitful; it is necessary that they understand each other positions. To that end it is imperative that the behaviourist position in the competence-performance distinction is explicated in detail.

Chomsky on the Competence and Performance Distinction

Sixty years ago, in his ‘Aspects of a Theory of Syntax’, Chomsky first explicitly made his distinction between competence and performance. In Aspects he is clear that he is not arguing against the study of performance as a field. Rather he is claiming that if one wants to study performance, then one will need to do so armed with an understanding of underlying competence mechanisms (Chomsky 1965 p.10). When discussing competence and performance Chomsky makes a distinction between acceptability judgements made by subjects and the actual grammaticalness of sentences. He argues that acceptability judgements are performance data which can be explained by underlying competence mechanisms (Ibid p. 11).

            He goes on to state that the following factors lead to unacceptability judgements; Repeated nesting, self-embedding, nesting of a long and complex element etc (Ibid p. 13). And he explains the unacceptability of, for example, repeated nesting, in terms of the finiteness of our memory (Ibid p. 14). Chomsky notes that people have been critical of generative grammar because of its focus on competence and lack of interest in performance. But he claims that the only research into performance that has had any theoretical interest, has been research that has been led by insights from underlying competence systems (Ibid p. 15). He went on to criticize descriptivist and classification philosophies as standing in the way of developing an adequate theory of performance:

“It is the descriptivist limitation-in-principle to classification and organization of data, to “extracting patterns” from a corpus of observed speech, to describing “speech habits” or “habit structures,” insofar as these may exist, etc., that precludes the development of a theory of actual performance.” (Ibid p. 15)

Chomsky doesn’t specify which theorists argue in this methodologically errant manner; but it will be demonstrated that his criticisms don’t apply to behaviourists in either psychology or philosophy.

Competence and Performance and Behaviourism

In this section I will look at the competence/performance issue from the point of view of behavioural science. I will argue that contemporary behavioural science cannot be described in terms of looking for “habit structures” or extracting patterns from the classification or organisation of data. Rather, behavioural science is discovering facts about the emergence of linguistic usage in the context of tightly controlled experimental settings. These discoveries in behavioural science such as (1) Rule-Governed Behaviour’s interaction with the contingencies of reinforcement, (2) the emergence of stimulus equivalence, (3) The emergence of relational frames, are experimentally controlled emergent properties of linguistic behaviour. Their discovery goes beyond mere “description” or “extracting of data from patterns”. Furthermore, there is no reason to think that such emergent species-specific performance data is explicable in terms of “habit structures”.

            While the performance data discovered in behavioural science cannot be reduced to “description”, or “extraction of data”, or explained in terms of “habit structures”, behavioural scientists haven’t provided much by way of an explanation of how these capacities emerge. The reason that such an explanation is wanting is because of an uncritical reliance on Skinner’s crude pragmatist philosophy. However, pace Skinner, any attempt to explain emergent behavioural capacities will be reliant on a distinction between the emergent behaviours and the underlying capacities which make the behaviours possible. Any cogent explanation of emergent behaviours will rely on behavioural science adapting a distinction between competence and performance analogous to that recommended by Chomsky (1965). We will see later that some behaviourists are already moving in this direction when we discuss Hayes and Sanford 2014 later in the paper.

Theorists have long critiqued behaviourists for failing to adequately account for the distinction between competence and performance. For example, philosopher John Collins has argued that behaviourism is not a serious discipline because it doesn’t even try to explain the underlying capacities responsible for behaviour (Collins 2007 p. 883). Arguing further that a focus on competence doesn’t involve ignoring performance; rather it involves explaining performance, through explicating the mechanisms underlying linguistic competence (Ibid p. 883).

            Collins’s claim that behaviourism is not a serious discipline is problematic. Experimental work in behaviourism over the last hundred or so years has yielded a wide range of experimental results which wouldn’t have been possible without research into behavioural science. The discovery of classical conditioning, and operant conditioning has revolutionized both psychology and biology. Since Chomsky wrote his ‘Aspects of a Theory of Syntax’ 60 years ago the field of behaviourism has continued to flourish demonstrating that it is a serious discipline which has made considerable advances over the last 60 years. Some discoveries in behavioural science of note have been Robert Rescorla’s discoveries of predictive mechanisms underlying classical conditioning (Rescorla 1969), the use of these predictive mechanisms to explain taste aversion in rats (Hayes & Sanford 2014), experimental literature demonstrating contingency insensitivity in rule following creatures (Galizio 1979, Shimoff et al 1981, Skinner 1984), the discovery of emergent stimulus equivalence (Sidman 1971), the discovery of emergent relational frames (Hayes & Thompson 1989)  (Hayes 2001).

            Even Skinner’s much maligned book ‘Verbal Behaviour’ has spawned hundreds of experiments on human subjects which have demonstrated some experimental control over his seven verbal operants (Sauter & Leblanc 2006). Likewise, behavioural science has demonstrated its use in applied disciplines, such as Applied Behavioural Analysis. So, any claim that behaviourism isn’t a serious discipline is conclusively refuted by the incredible predictive control it gives us over certain domains of interest.

            Nonetheless, Collins does have a point. There is a reluctance of some in behavioural science to provide explanations of the behavioural patterns in terms of underlying competencies some of which may be innate. This reluctance doesn’t demonstrate that behaviourism isn’t a serious discipline, but it does pose serious limitations on the explanatory capacity of the discipline to account for the discoveries they make. Later in the paper I will explore some recent tentative attempts to explain competencies underlying our capacity to relation frame. I will argue that these competencies demonstrate that these tentative steps are a step in the right direction in bridging the gap between behavioural science and cognitive science.

                What is Linguistic Competence?

            When discussing Chomsky’s distinction between competence and performance it is typical to justify the distinction in terms of idealizations which usually occur in any discipline. When Chomsky is talking about linguistic competence, he notes that he is doing so using a series of idealizations which are necessary to understand the complex object under study:

“Linguistic theory is concerned primarily with an ideal speaker listener, in a completely homogeneous speech-community, who knows its language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance.” (Chomsky 1965 p. 3)[1]

A grammar will then be a description of the rules the idealized subject is using when understanding or speaking their sentences (Ibid p.4). Chomsky’s notion of an idealized subject’s competence is analogous to the idealizations which physicists use all the time to gain traction over the physical phenomena they are studying. A cliched example is of a physicist using idealizations such as studying a frictionless plane to give them an understanding of force and energy (Jackendoff 2002 p. 33).

            Chomsky’s distinction between competence and performance which relies on idealizations is a sensible proposal and one which has led to success in gaining traction on the nature of language. While the use of idealizations is justified and common practice in science; Chomsky’s use of idealizations has had its critics. While most theorists would agree that some level of idealizations are needed it has been argued persuasively that Chomsky’s use of idealizations and his competence/performance distinction has ossified in such a manner as to make aspects of his theories irrefutable:

“Still, one can make a distinction between “soft” and “hard” idealizations. A “soft” idealization is acknowledged to be a matter of convenience, and one hopes eventually to find a natural way to re-integrate the excluded matters. A standard example is the friction of a frictionless plane in physics, which yields important generalizations about forces and energy. But one aspires eventually go beyond the idealization and integrate friction into the picture. By contrast, a “hard” idealization denies the need to go beyond itself; in the end it cuts itself off from the possibility of integration into a larger context…It is my unfortunate impression that, over the years, Chomsky’s articulation of the competence-performance distinction has moved from relatively soft…to considerably harder.” (Jackendoff 2002p. 33)

Other theorists have made similar points (Lakoff, G. 1987 p. 181, Palmer, D.  2023 p. 528).

Jackendoff isn’t arguing that we shouldn’t use idealizations or a competence-performance distinct rather he is warning of the possibility of an idealizing assumption becoming so hardened that it shields a theory from considering alternative ways of dealing with the data of experience. Thus, as we saw above Chomsky championed ignoring things like memory limitations, shifts of attention etc. But other work which doesn’t make this idealization actually use memory limitations to explain the hierarchical embedding in our language (Christiansen and Chater 2023, Christiansen and Chater 2016). Likewise, speech errors which Chomsky tells us to ignore in the name idealization have been used as productive data in explaining the cognitive processes underlying speech production (Hofstadter et al. 1989, Wijnen, F., 1992). Nonetheless, even Chomsky’s sternest critics would admit that his appeals to idealizations and the competence performance distinction have yielded interesting linguistic generalizations.

Jackendoff argued that the need for Chomsky to make the distinction between competence and performance was that other disciplines had failed to make the distinction:

“Chomsky makes the competence-performance distinction in part to ward off alternative proposals for how linguistics must be studied. In particular, he is justifiably resisting the behaviourists, who insisted that proper science requires counting every cough in the middle of a sentence as part of linguistic behaviour.” (Jackendoff 2002 p. 30).

While it is undeniable that behavioural science has a difficulty in explaining its own data because they do not always explain behaviour in terms of underlying competencies, nonetheless, Jackendoff in the above quote is engaging in a wild caricature of behaviourism.

            Some behaviourists do obviously argue that that the subject matter they are interested in are actual instances of behaviour in a particular context (Palmer, D., 2023 p. 528). And in criticisms of Chomsky’s conceptions of linguistics they do argue that as behaviourists they are not obliged to explain possible sentences created by linguists which conform to the purported grammatical principles but never used in actual verbal behaviour. If speakers do not actually use such verbal forms while behaving in relation to each other the behaviourist considers it beyond their purview to explain them (Ibid, p. 529). But whatever one thinks of this behaviourist philosophy; it doesn’t entail the extremes that Jackendoff attributes to it. 

            Jackendoff attributes to behaviourists the view that scientists should “count every cough in the middle of a sentences as part of linguistic behaviour”.  This claim amounts to the assertion that behaviourists do not think that scientists should use idealizations. This is an absurd accusation; any science which deprived itself of idealizations would be overwhelmed with complexity and wouldn’t be able get off the ground. It is simplifying idealizations which makes it possible for a scientific theory to gain any prediction and control.

            But contrary to Jackendoff’s wild claim behavioural science was from the start engaging in idealizations. Studying Classical and Operant Conditioning in a laboratory setting was an idealization which assumed that these artificial experiments could explain the complex learning processes of animals in the wild. Furthermore, in his Verbal Behaviour Skinner used a variety of idealizations. Skinner divided language into seven main verbal operants and treated them separately and under different types of stimulus control. But he noted that this was an idealizing assumption and that in practice the verbal operants would be intertwined and could be acquired together (Skinner 1957 p. 188).

 Furthermore, the notion of an operant was concerned with kinds of behaviour, that shared an effect on the environment and that, as a kind, are demonstrated to vary lawfully in their relations to other variables (Smith 1987 p. 289). Importantly, in his ‘Behaviourism and Logical Positivism’ Smith noted:

“The actual movements involved in pressing a lever, for example, might vary from instance to instance (e.g., left paw, right paw, nose), but they are equivalent with respect to producing reinforcement and they demonstrably function together in the face of changing conditions.” (ibid p. 289)

So even in the case of the operant itself idealisations are being used which results on the focus being on classes of behaviour[2]; so Jackendoff’s notion that behaviourists are engaging in “counting every cough” is simply absurd. Behavioural science like any other science is up to its eyes in idealizations from the start.

            Nonetheless, while Jackendoff is incorrect in his assertion that behaviourists don’t use idealizations he is correct that behaviourists have sometimes eschewed using the notion of competence in their explications of language. And this lack of a theory of competence does hinder their ability to explain the behavioural data which they have discovered.

         Behaviourists and Competence

Behaviourists by definition are concerned with behaviour. Hence, when Skinner wrote on language, he parsed this as the study of Verbal Behaviour. He divided language up into seven main verbal operants recommended empirically studying how various schedules of reinforcement maintained the use of these Verbal Operants. While he justified this research program in terms of studies which had been done on non-human animals; in the 70 years since Verbal Behaviour was written there has been hundreds of experimental studies on conditions responsible for maintaining the use of these Verbal Operants. Skinner’s theory is a paradigm case of an attempted explanation of human linguistic performance.

            Subsequent Behaviourist Research has moved beyond Skinner’s claims about language, e.g. Galizio (1979), Sidman (1971), Hayes & Thompson (1989) and Hayes (2001). But they still focus on behaviour and the degree to which it can be predicted and controlled using conditioning. Much behavioural science has had a heavy emphasis on practical predictive control to aid in applied work. Thus, the Verbal Behaviour Approach inspired by Skinner is heavily involved in teaching functional communication to people with severe autism and or an Intellectual disability. Sidman’s work on Stimulus Equivalence sprung out of his work trying to teach people with an intellectual disability how to read. While relational frame theory is used in attempts to teach people with intellectual disabilities functional communication, as well as a tool in Acceptance Commitment Therapy. This emphasis on applied work is sometimes used as a justification for a heavy emphasis on prediction and control (Dymond & Roche pp. 220-221). From a philosophical point of view, this emphasis on prediction and control is justified by appeal to a pragmatic philosophy of the type espoused by Steven Pepper (1942).

            This emphasis on pragmatic prediction and control of the organism clearly lays heavy weight on performance data; and the histories of reinforcement which shape this performance. But while the underlying competencies are not focused on, they do play a role in the explications. Skinner noted throughout his career that phylogenetic factors are important in shaping the organism:

“Just as we point to the contingencies of survival to explain an unconditioned reflex, so we point out to contingencies of reinforcement to explain a conditioned reflex.” (Skinner 1974 p. 43)

He emphasised that natural selection shaped the structure of the organism through the scythe of survival of the fittest, in analogous manner to the way operant conditioning shaped the behaviour of the organism through selection by consequence. In Skinner’s view selection by consequences rules in both phylogeny and ontogeny. And as early as the mid 60’s behaviourists the Breland’s were emphasising instinctive drift would ensure that different organisms would not be susceptible to have their behaviour shaped in the same way because of their different instinctive natures.

            Analogously, Relational Frame Theorists don’t deny that any prediction and control we gain needs to be explained by underlying genetic, epigenetic, and neural structures. It is just that their primary emphasis is on emergent behavioural data which can be predicted and controlled through behavioural principles. Thus, while relational frames, are emergent phenomena discovered through behavioural training and testing, Hayes does try to explain their emergence as resulting from a combination of genetic constraints and social learning. Thus, he appeals to group selection favouring a cooperative instinct which makes the acquisition of relational frames such as coordination more likely and speculates on how other frames can be derived from a combination of coordination and equivalence (Hayes and Sanford 2014). Nonetheless, the primary emphasis in both relational frame theory and Skinnenarian behaviourism is on prediction and control of the organism using behavioural principles.  

            As discussed above Chomsky wasn’t against performance data per-se, on the contrary he believed that the only theory of performance of any theoretical interest was a theory which took on board competence-based insights (Chomsky 1965 p. 16). Chomsky’s claim has a degree of truth to it; but it is obviously not the whole story. As discussed above in the 60 years since Chomsky wrote Aspects behaviourists have discovered performance data which has ample theoretical interest. The performance data elicited by the behavioural tests, such as stimulus equivalence, rule following contingency insensitivity, relational frames are of great theoretical interest. But to yield a theoretically interesting theory of this performance data we will need to do so in terms of underlying competence systems.

            Behaviourists sometimes do try to explain behaviour in terms of underlying competencies. As we saw above Skinner’s notion of phylogenetic shaping can be used to explain the Breland’s notion of instinctive drift, which is used in a theoretical explanation of animal’s divergent behaviour under schedules of reinforcement. Likewise, Quine appealed to a phylogenetically shaped similarity space which underlay our capacity to successfully engage in induction. However, when it comes to emergent phenomena such as stimulus equivalence and relational frames there has been a reluctance by some theorists to explain the phenomenon in terms of underlying innate mechanisms. In the next section I will give an example from psychological behaviourism which aims to explain relation framing in terms of evolved underlying competencies, and I will then discuss an example of a philosophical behaviourist explaining his data in terms of underlying competencies. This will be proof in principle that some behaviourists do make a distinction between competence and performance and do use idealizations in their theoretical endeavours.

   Quine and Relational Frame Theory.

            In this section I will outline and discuss an attempt by Hayes and Sanford (2014) to explain our species-specific capacity to engage in relational framing in terms of group selection for a cooperative instinct. This section will demonstrate that behaviourists do appeal to competencies to explain behavioural patterns to explain novel behaviour when necessary and that furthermore they routinely engage in idealizations to explain data. I will then further develop this point by showing how it isn’t just psychological behaviourists who appeal to underlying competencies and idealizations to explain behaviour. Some philosophical behaviourists do so as well. This will be demonstrated by evaluating Quines work in this area.

In Hayes and Sanford (2014) they discuss evolution of our ability to engage in verbal behaviour. To understand this capacity, they explain it in terms of abilities humans partially share with non-verbal creatures such as (1) the ability to use vocalizations to regulate the behaviour of others (shared with many mammals), (2) Social Referencing (dogs and Chimpanzees can do this), (3) Joint attention and non-verbal forms of perspective taking (chimps and apes) (4) Non-arbitrary Relational learning (all animals). And they argue that these competencies were modified through group selection. They note that a cooperative instinct within a group will give them the ability to out compete other groups.

With a Cooperative Instinct in place, if two humans are near an apple tree and both know how to say apple when they see an apple. If the apple is out of reach for person A and person A says “apple” then if it is within reach for person B then they will get it for person A. (Ibid p. 121). They argue that it would take the capacity of perspective taking along with the capacity for cooperation to bridge the gap between this epistemological triangle. They call this the beginning of a person’s capacity to engage in a frame of coordination. People acquire the ability to know that “apple” → apple and apple →”apple”. Thus, they know that the sound refers to the object and the object is referred to by the sound. This relationship can then come under contextual control of the “is” relation: Apple is “Apple” and “Apple” is Apple.

Deriving this relation of identity between the sound and the word will be helped through reinforcement for providing the object when the word is said, and having the object provided when you say the word. Relational Frame Theorists have argued that this ability to derive this frame (which is species specific), is made possible through coopting our capacities such as joint attention, the ability to modify others behaviours through vocalizations, social referencing, and non-arbitrary relation framing with our cooperative instinct.

Once this frame of coordination was acquired humans would then have the capacity to recognize mutuality in a frame. And they argue that repeated application of mutuality would give an organism the ability to use combinatorial entailment (ibid p.123). Thus, on this conception the capacity to relationally frame is created primarily by our cooperative instinct. So in effect they explain our capacity to relational frame in terms of underlying competencies such as social referencing, joint attention, the ability to control others using vocal signals, non-arbitrary relational responding being modified in terms of a human specific cooperative instinct shaped by natural selection. In this instance the novel performance data is explained in terms of underlying competencies. They aren’t just describing behavioural patterns they are explaining their arrival in terms of underlying competencies.

                     Quine Idealization

As discussed above the competence-performance distinction is aligned with the notion of idealizations where you can abstract out from aspects of a phenomenon and deal with more tractable subject matters. Thus, the scientist can be dealing with frictionless plans, or humans not subject to memory limitations or distractions etc. The charge made by Jackendoff, and others was that behaviourists don’t use idealizations and hence they have no resources to make the competence-performance distinction. We have seen that this is simply not true when it comes to behavioural science which from the start is up to its eyes in idealizing assumptions. This is true of not just of behaviourists in the scientific sphere but also of prominent behaviourists working on philosophical problems.

Quine’s conception of language is famously terse. While he talks of a child acquiring and being shaped by a language of his peers. He typically focuses on things such as observation sentences. There is little in his conception of language about other uses of language such as interrogatives. Quine justifies this because he is interested in language only in so far as it pertains to epistemology and ontology. Hence, he engages in an simplifying idealization when dealing with science.  Thus in ‘The Roots of Reference’ Quine speaks of the fact that requesting makes up a large part of our linguistic usage, but he doesn’t account for it because it has little relevance in his attempts to explain how we acquire our scientific theory of the world (Quine 1973 p. 46). And later in the same book he notes that he doesn’t want a factual account of how children acquire English, rather he is concerned with telling a plausible story of how we go from infancy to developing a regimented language of science (ibid p. 92). Quine made the same point again in his Mind and Verbal Dispositions:

“One and the same little sentence may be uttered for various purposes: to warn, remind, to obtain possession, to gain confirmation, to gain admiration, or to give pleasure by pointing something out… somehow we must further divide; we must find some significant central strand to extract from the tangle…Truth will do nicely…a man understands a sentence in so far as he knows its truth conditions…this kind of understanding stops short of humour, irony, innuendo, and other literary values, but it goes a long way. In particular it is all we can ask of an understanding of science. (Quine 2008 pp. 448-249).

Again, we can see that Quine is abstracting away from various uses of language because they aren’t useful to him in sketching his story of how we go from stimulus to science. Quine, like his behavioural science colleagues, is engaging in idealizations at every step of his philosophical project. Furthermore, Quine is appealing to underlying competencies to explain how it is that humans go from stimulus to science (Quine 1980 p. 6, Quine 1989 p. 348, Quine 1998 p. 4). He appeals to innate similarity quality space to explain our ability to be able to differentially reinforced, an innate perceptual similarity space to explain our convergence on stimulus meaning, as well as appealing to body mindedness to explain children’s ability to understand object-permanence.

                             Conclusion

In this paper I have demonstrated that behaviourists of both philosophical and scientific bent do indeed make use of both idealizations and of a distinction between competence and performance in their work. Despite the criticisms of Chomsky and his followers; behaviourism’s focus on performance doesn’t necessitate them ignoring competence or shunning the use of idealizations. It is probable that the followers of Chomsky will be un-moved they will note that there has been no behavioural work which can account for the grammatical regularities which are discovered in linguistics. And this I would agree with. Behaviourism even modern behaviourism still hasn’t demonstrated that it has the conceptual resources to handle the syntax of natural language. Nonetheless, we are increasingly discovering more and more interesting performance regularities through behavioural research. These data do need to be explained in terms of underlying competencies. But the discovery of these interesting facts about our behaviour (including Verbal Behaviour), indicate that pace Chomsky we can discover interesting facts about performance prior to having a worked-out theory of competence. In fact, behavioural research has lead us towards achieving a greater understanding of the competencies underlying them not vice-versa. With the sciences of biolinguistics and behavioural science still in their infancies there is still a lot of data to acquire and experimental work to be done. But any attempts to understand either side will involve a greater attention to the what the practitioner of each discipline is doing and not relying on caricatures.

                                          References

Barnes-Holmes, D. 2000. Behavioural Pragmatism: No Place for Reality and Truth”, The Behaviour Analyst 23 pp. 191-202.

Barnes-Holmes, D. 2005. “Behavioural Pragmatism is A-Ontological, Not Anti-Realist: A Reply to Tonneau”, Behaviour and Philosophy 33 pp. 67-79.

Baum, W. 2002. “From Molecular to Molar: A Paradigm Shift in Behaviour Analysis”. Journal of the Experimental Analysis of Behaviour. 78 (1) pp. 95-116.

Chomsky, N. 1965. Aspects of a Theory of Syntax. MA: MIT Press.

Christensen, M, H, & Chater, N. “The Now-or-Never Bottleneck: A fundamental Constraint on language”. Behavioural and Brain sciences 39 (2016).

Christensen, M, H, & Chater, N.  2023. The Language Game. Random House. Penguin.

Collins, J. 2007. “Linguistic Competence Without Knowledge of Language”. Philosophy Compass 2 (6) pp. 880-895)

Dymond, S, & Roche, B, (2013) Advances in Relational Frame Theory. Context Press. New Harbinger Publications.

Galizio, M. (1979). “Contingency-shaped and rule-governed behaviour: Instructional control of human loss avoidance.” Journal of the Experimental Analysis of Behaviour. 31, pp. 53-70.

Ginsburg, S., & Jablonka, E,. 2019. The Evolution of the Sensitive Soul.  MIT Press. Cambridge MA.

Hayes, L, J, & Thompson, S. 1989. “Stimulus Equivalence and Rule-Following”. Journal of the Experimental Analysis of Behaviour. 52 (3) pp. 275-291.

Hayes, S, & Barnes-Holmes, D, & Roche, B. 2001. Relational Frame Theory: A post Skinnearian Account of Language and Cognition. Springer Science & Business Media.

Hayes, S. 2014. “Cooperation Came First: Evolution and Human Cognition.” Journal of the Experimental Analys of Behaviour 101 pp. 112-129.

Hofstadter, D, R, Moser avid J, M., “To Err is human; To study error-making is cognitive science”. Michigan Quarterly Review, 28 (2), pp. 185-215.

Jackendoff, R. 2002. Foundations of Language. Great Clarendon Street. Oxford University Press.

Kemp, G. 2017. “Quine, Publicity and Pre-Established Harmony”, Protosociology 34 pp. 59-72.

Lakoff, G. 1987. Women, Fire, and Dangerous Things.  The University of Chicago Press.

Palmer, D. 2023. “Towards a Behavioural Interpretation of English Grammar.” Perspectives on Behaviour Science. 46 (3) pp. 521-538.

Pepper, S, C. 1942. World Hypotheses: A Study in Evidence. University of California Press.

Quine, W. 1953.  From a Logical Point of View. Harvard University Press. Cambridge MA.

Quine, W. 1974. The Roots of Reference. La Salle. Open Court Press.

Quine, W. 1995. From Stimulus to Science. Cambridge. Mass. Harvard University Press.

Quine, W. 1996. “Progress on Two Fronts”, The Journal of Philosophy. 93/4 pp. 159-163

Quine, W. 2008. “The Flowering of Thought in Language” pp. 478-484 in Follesdal & Quine (EDS) Quine: Confessions of a Confirmed Extensionalist.

Rescorla, R. 1969. “Pavlovian Conditioned Inhibition.” Psychological Bulletin 72 (2) pp. 77

Sautter, R, & Leblanc L, “Empirical Applications of Skinner’s Analysis of Verbal Behaviour with Humans”. The Analysis of Verbal Behaviour 22 pp. 35-48.

Shimoff, E, Catania, A, Matthews, B, 1981. “Unstructured Human Responding: Sensitivity of Low Rate Performances to Schedule Contingencies.” Journal of Experimental Analysis of Behaviour. 36 (2) pp.207-220.

Sidman, M. 1971. “Reading and Auditory Visual Equivalences”. Journal of Speech and Hearing Research. 14 (1) pp. 5-13.

Skinner, B.F. 1957. Verbal Behaviour. New Jersey: Prentice-Hall Inc.

Skinner, B.F. 1974. About Behaviourism. Knoph Doubleday Publishing Group.

Skinner, B.F. 1984. “Contingencies and Rules”. Behavioural and Brain Sciences. 7 (4) pp. 607-613.

Smith, L. D. 1986. Behaviourism and Logical Positivism: A Reassessment of The Alliance. California: Stanford University Press.

Wijnen, F, “Incidental word and sound errors in young speakers”, Journal of Memory and Language 31 pp.734-755.

Wilson, D, S, & Hayes, S. 2018 Evolution and Contextual Behavioural Science. Context Press. New Harbinger Publications Inc.


 

[2]  There is a debate within behavioural science on whether scientists are studying of classes or individuals see Baum ‘From Molecular to Molar: A Paradigm Shift in Behaviour Analysis’ (2002).