Author Archives: surtymind

Unknown's avatar

About surtymind

My name is David King my primary interests are Cognitive Science, Linguistics, Neuroscience, Analytic Philosophy, Philosophy of language, Philosophy of Mind, Philosophy of Psychology, Philosophy of Biology, Philosophy of Psychoanalysis. This blog is dedicated to discussing a variety of issues on the intersection between philosophy and science. In particular I will be discussing issues relating to the philosophy of mind, and the philosophy of linguistics. I will also explore the nature of naturalism as a philosophical issue. A lot of the blog will centre on triangulating my position with those of contemporary thinkers like Noam Chomsky, Daniel Dennett, Jerry Fodor, Thomas Nagel etc. I will engage with aspects of a lot of contemporary philosophers and scientists in an attempt to construct an entirely naturalistic picture of the mind. Other topics dealt with will be intellectual disability and radical interpretation. I will discuss Artificial Intelligence, Animal Cognition.

Thought Control

In this blog post I will discuss the fears of humans being controlled by behavioural science. This fear receded as criticisms of behaviourism made the discipline seem less capable of achieving the bombastic claims made by the likes of Skinner and Watson. But behaviourism has advanced well beyond Skinner and Watson and current behavioural science is again looking at ways of controlling rule governed behaviour. After discussing Skinner’s early work on control and counter control I will evaluate contemporary behavioural science and its potential for predicting and controlling verbal behaviour linking behavioural control with the rise of Artificial Intelligence.

In Skinner’s ‘Science and Human Behaviour’ in 1953 he speculated on ways behavioural technology could improve the lives of people if implemented properly. However, there were those who didn’t like the imposition of scientists into human affairs. Some of this opposition was simple fear that science was encroaching on subject matters typically tackled by the humanities. But for others they found the Skinner’s emphasis on “prediction and control” chilling. People feared a future where behavioural scientists would eventually achieve control over ordinary human subjects and shape and control their behaviour at the whim of those in power.

            However, as we learned more about animal behaviour, and the factors impinging on such behaviour, the threat from behavioural science began to seem less and less. Breland and Breland’s (1963) work on animal training demonstrated that instinctual drift would mean that non-human animals’ behaviour wasn’t as malleable as earlier naïve behaviourists thought. Skinner, who had long stressed both phylogenetic and ontogenetic factors playing a role in animal behaviour welcomed the work of the Breland’s. On Skinner’s way of thinking it was the behaviourists job to discover the different ways behaviour could be shaped and controlled through different schedules of reinforcement. The behaviourist wasn’t in the game of stipulating how malleable different organisms were. Nonetheless, despite the Breland’s work being congenial to Skinner’s behaviourism, for the public, instinctive drift made the thoughts of behaviourists gaining control over people and shaping their behaviour seem less threatening.

             As behavioural scientists continued to study human’s operating under schedules of reinforcement there was more reason to think that humans couldn’t be just shaped at a whim through schedules of reinforcement. Hundreds of behavioural studies on rule governed behaviour[1], have demonstrated that when humans were operating under rules this made them less sensitive to the contingencies of reinforcement. Children below the age of 5 could be shaped under schedules of reinforcement in a similar way to a rat, but once they passed 5 and could follow rules their behaviour became less sensitive to the contingencies of reinforcement (Bentall 1985).

            Like the Breland’s work, work on rule-following sprung up in behaviourism (Skinner 1963) and demonstrated that human behaviour was more complex than the behaviour of other organisms. When Sidman (1971) experimentally demonstrated that humans could derive untrained stimulus equivalence, and Hayes (1989) demonstrated that humans could derive untrained relational frames (coordination, comparison, hierarchy, etc). The human under behavioural science began to closer resemble the human as described by cognitive scientists (built with innate constraints, have a species-specific capacity for productive reasoning etc), than it did the human as described by early behaviourists.

            With all these facts in place people in the popular press stopped worrying about nefarious behaviourists controlling them. The stuff of nightmares in the popular imagination is now neuroscientists controlling us, or AI controlling us. Fear of behaviourists controlling us is now a part of our quaint past on a par with fears in the nineteen fifties fearing visitors from the Moon.

            Nonetheless, Skinner’s point still stands. Our behaviour is controlled by facts in our ontogeny and phylogeny as well as a variety of other factors. Our behaviour is always controlled and science, which gives us the best tools to predict and control phenomenon can help us as a species, gain some degree of control of our collective behaviour. Some of this control is welcome, we are glad that people follow rules that forbid them to enter a stranger’s house without permission or forbid them to drive on the wrong side of the road.

            But when it comes to things like Verbal Behaviour people are particularly reticent in ceding control to scientists. Our ability to talk and think about whatever we want is a key component of what it means to be human. Thus, it is the stuff of our nightmares that scientists or politicians gain the ability to control what we can say. Famously, in Orwell’s dystopian novel 1984 the party controlling society, had banned the use of certain words such as ‘Freedom’ and ‘Equality’ etc. With the aim being that to the extent that thought depends on language if you can ban the words from public consciousness people will no longer be able to think the thoughts. Compelling evidence from cognitive science indicates that it is not necessary to have a word in public language in order use a concept (Pinker 1994, 2002). While cognitive science has conclusively shown that our thought is not entirely determined by our language. It is evident that our thought processes are shaped by the type of language we acquire. Lakoff (1987), has demonstrated how the metaphors we adapt will influence the way we think about the world. So, the fear still exists that nefarious scientists in conjunction with political elites will some how conspire to control what we can say perhaps even influencing what we can think. Hence, in recent years there has been an explosion of public intellectuals; usually podcasters who argue that there should be no limits on who can say what. These people call themselves free-speech absolutists.

            But as of yet the science of verbal behaviour doesn’t demonstrate any real capacity to control Verbal Behaviour. In fact, as discussed above, once a creature becomes verbal and starts following rules their behaviour becomes less sensitive to the contingencies of reinforcement. Furthermore, as linguists like to note humans appear to have the capacity to construct new sentences which have never been spoken before using their ability to use productive syntax. This type of creativity doesn’t appear to be under the control of any kind environmental or behavioural control.

            Nonetheless, we known anecdotally that the type of sentences we use are to some degree shaped because of our history of interaction with particular groups. Skinner tells an anecdotal story about meeting his parents and his lecturers at his graduation. He found the conversation difficult to navigate because he had a history of reinforcement for talking to his family in a particular manner, and a different history of reinforcement for talking in a different manner to his professors. In the conversation he felt the pull of two competing verbal repertoires as he interacted with the two groups of people.

            Obviously, however just because Skinner has verbal repertories he used when talking to people in different groups, he was nonetheless capable of creating new sentence to talk about other things which may have interested him or his interlocuters. There is no reason to think that his verbal repertoires were entirely determined by past reinforcement history. Nonetheless, the fact that he felt the pull of two competing verbal repertoires does indicate that the type of speech acts we engage in are to some degree shaped by social contingencies.

            Behaviourists have long noted that humans aren’t just passive organisms who submit to control. Humans typically don’t like being controlled and respond to control with two key tactics (1) Aggressive Responses, (2) Moving out of the range of range of controllers or passive resistance (Skinner 1953 p. 193). But in an unequal society with one group being more powerful than another group; aggressive responses may be pointless. Furthermore, with the ability to increase of technology into every aspect of our lives it is more and more difficult to move out of the range of controllers. Humans spend an inordinate amount of time online where they can be reached by a variety of different forces.

            As we discussed above when humans begin to develop their language and start following verbal rules this makes them less sensitive the contingencies of reinforcement. Hence, their behaviour when rule governed is more difficult to behaviourally control. Behaviourists have noted that humans engage in a type of rule governed behaviour which is shaped by social consequences. They call this Pliance. There is rule governed behaviour which is controlled by tracking. Where the rules are adopted because they accurately track environmental facts. And augmentative rule following which alters the extent to which rule governed behaviour is reinforcing.

Importantly for our purposes they have started to develop techniques to help them control rule governed behaviour. In a recent paper behaviourists Spencer et al (2022) noted that people had trouble following verbal rules for three reasons: (2) Lack of Credibility of the speaker, (2) Lack of the ability of the speaker or authority figure to mediate contingencies of rule-following, (3) implausibility of a given rule. And they argue that these facts can be overcome through the following techniques. Strengthening augmental control by connecting the rule to what the individual values. Monitoring rule following to assess Pliance rates. Ensuring the contingencies supporting rule following track reality accurately. Deemphasizing freedom threatening language (Spencer et al 2022).

As things stand behavioural techniques employed by psychologists in relation to things like tracking and Pliance typically relate to therapeutic settings. Thus, in therapy a therapist may discuss with a patient how their verbal repertoire is geared toward social compliance sometimes at the expense of tracking reality accurately. But as accuracy in understanding rule following behaviour and how it is controlled increased it may eventually be used by governments to shape how people verbally respond to societal rules. A step in this direction was Stapelton et al 2022 which explored the rule following behaviour of people during the Covid-19 pandemic.

With the creation of Large Language Models which will increasingly appear online it will be possible that most “people” you interact with online will be Large Language Models or other kinds of AI. Large Language Models when being trained can be shaped to reflect the values of the person designing it. Thus, during training using reinforcement the Models will be trained to only answer in ways consistent with the biases of the creators. Children growing up who may be perennially online could end up receiving a massive amount their linguistic data from Large Language Models.

These models sometimes hallucinate answers based on statistical expectations. And other times they will be trained to reflect the values of their designers. It is more and more possible to track what people’s values are based on their online behaviour. What they look at; what they buy, who they interact with. If Large Language Models are linked to attractive avatars and given their impressive manner of being able to answer a surprising number of questions. People may start viewing them as authorities. And this is the first step in them being used as reliable givers of information. Children growing up could end up following the rules of unconscious LLMs whose values were shaped by their creators. This wouldn’t be verbal control of the type Orwell worried about. But it would be a way of shaping the values and verbal repertories of people in ways we may be entirely unaware of.


[1] See for example: Weiner et al (1964), Lippman and Myer (1967), Lowe et al (1983), Hayes et al (1986).

Large Language Models and the Rationalist-Empiricist Debate.

                          Introduction.

From about 1950 there was a resurgence of interest in the Rationalist Empiricist debate. With people viewing Chomsky as an updated rationalist carrying on the traditions of Leibniz and Descartes and Quine and Skinner updated versions of empiricism carrying on the traditions of Locke and Hume. And a consensus emerged that Chomsky’s rationalism won out over the empiricism of Quine and Skinner.

                In recent years with the rise of Deep Convolutional Neural Networks, and Large Language Models some have argued that their architecture is data driven and hence they are an existence proof that empiricist learning is a viable way of modelling the mind (Bunker, C 2018). Prompting debate where others argue that these models don’t vindicate any kind of empiricism as they rely on innate architecture.

                In this paper my focus will be LLMs because their linguistic capacities make them directly relevant to the earlier debates between Chomsky, Quine and Skinner. And evaluate whether LLMs do indeed vindicate a type of empiricism. I will argue that LLMs are too dissimilar to human cognitive systems to be used as a model for human cognition. They make bad models of both human linguistic competence and human linguistic performance. So, I argue that they offer no vindication of empiricism (or rationalism for that matter).  

 Chomsky and Quine and the Rationalist-Empiricist Debate.

Famously the rationalist-empiricist debate between people such as Locke and Descartes which focused on subjects such as whether humans were born with innate concepts, was revisited in the 1950s. When Noam Chomsky burst onto the scene with his review of Skinner’s book Verbal Behaviour to many this was viewed as a reviving of the rationalist-empiricist debate. Chomsky entitled his 1966 book ‘Cartesian Linguistics: A Chapter in the History of Rationalist Thought’, thus setting himself up as the heir apparent to Descartes rationalism.

                In Skinner’s Verbal Behaviour he divided language up into seven Verbal Operants which he argued are controlled using his three-term contingency of antecedent behaviour and consequence. Chomsky criticized Skinner for taking concepts from the laboratory where they were well understood from animal literature and extending them into areas where there was no such experimental evidence. He charged Skinner with either using the technical terms literally in which his views were false, or metaphorically in which case the technical terms were as vague as ordinary terms from folk psychology.

                In his 1965 book ‘Aspects of a Theory of Syntax’ Chomsky made his famous distinction between competence and performance. Chomsky argued that the only substantive theory of performance which would be possible would come via theory of underlying competence. And he illustrated this point through showing how elements from our underlying grammatical competence could predict and explain aspects of our linguistic performance. Since behaviourism was obviously primarily concerned with behaviour many viewed Chomsky’s competence performance distinction as an attack on behaviourism. Many scientists influenced by Chomsky argued that behaviourists not having a competence-performance distinction meant that it couldn’t be taken seriously as a science (Jackendoff 2002, Collins 2007).

                Chomsky (1972, 1986) poverty of stimulus argument was deemed a further dent in the behaviourist project. Chomsky used the structure dependence syntactic movement such as auxiliary inversion as an example of syntactic knowledge which a person acquired even though a person could go through much or all their life without ever encountering evidence for the construction. The argument being if the child learned the construction despite never encountering evidence for its structure. And the child didn’t engage in trial-and-error learning where he tried out incorrect constructions which were systematically corrected by his peers until he arrived at the correct one (Crain and Nakayama 1987, Brown and Hanlon 1970). Then knowledge of the construction must be built into the child innately.

                Theorists viewed poverty of stimulus arguments as further evidence that the behaviourist project was doomed to failure. With many contrasting Chomsky’s emphasis on innate knowledge with Skinner’s supposed blank-slate philosophy (Pinker 2002). Thereby situating Skinner as a modern-day Locke battling with a modern-day Descartes (Chomsky), and the consensus was that modern science had shown that the rationalist position was the correct one.

                       Quine and Chomsky.

                The debate between Chomsky and Skinner was primarily focused on issues in linguistics, and psychology. In philosophy the rationalist-empiricist debate played out in a debate between Quine and Chomsky. Quine billed himself as an externalized empiricist whose primary aim was to explain how humans go from stimulus to science in a naturalistic manner. His entire project centred on naturalising both epistemology and metaphysics. On the epistemological side of things his need to explain how we go from stimulus to science would involve psychological speculations on how we acquire our language, how we develop the ability to refer to objects. Quine was explicit in these speculations that any linguistic theory was bound to be behaviourist in tone since we acquire our language through intersubjective mouthing of words in public settings. This commitment to behaviourism set Quine at odds with Chomsky.

                In 1969 Chomsky’s wrote a criticism of Quine called ‘Quine’s Empirical Assumptions’. This criticism noted that Quine’s notion of a pre-linguistic quality space wasn’t sufficient to account for language acquisition. That Quine’s Indeterminacy of Translation Argument was trivial and amounted to nothing more than ordinary Underdetermination. And that Quine’s invocation of the notion of the probability of a sentence being spoken was meaningless.

                Skinner never replied to Chomsky[1] arguing that Chomsky so badly misunderstood his position that further dialogue was pointless. But Quine (1969) did reply; in his reply he charged Chomsky with misunderstanding his position and of attacking a strawman. On the issue of a prelinguistic quality space he argued that it was postulated as a necessary condition of acquiring the ability to learn from induction or reinforcement; he never thought it was a sufficient condition of our acquiring language. Quine argued that “the behaviourist was knowingly and cheerfully up to his neck in innate apparatus”. He further argued that indeterminacy of translation was additional to underdetermination and revealed difficulties with linguists and philosophers’ uncritical usage of “meanings, ideas and propositions”. And finally, he noted that Chomsky misunderstood Quine’s discussion of the probability of a sentence being spoken. Quine wasn’t speaking about the absolute probability of a sentence being spoken, rather he was concerned with the probability of a sentence being spoken in response to queries in an experimental setting.

                This led to a series of back and forth between Chomsky and Quine. In his (1970) ‘Methodological Reflections on Current Linguistic Theory’, Quine criticized Chomsky’s notion of implicit rule following. Quine noted that there are two senses of rule-following he could make sense of (1) Being guided by a rule: A person following a rule they can explicitly state, (2) Fitting a Rule: A person’s behaviour can conform to any of an infinite number of extensionally equivalent rules. But Quine charged Chomsky of appealing to a third type of rule (3) A rule that the person cannot state, but is nonetheless implicitly following, and this rule is a particular rule distinct from all the other extensionally equivalent rules that the persons behaviour conforms to. Chomsky (1975) correctly responded that Quine was again arbitrarily assuming that underdetermination was somehow terminal in linguistics but harmless in physics. As Chomsky approach became less about rules and more to do with parameters switching Quine’s rule-following critique had less and less traction.

                While Chomsky’s critique of Skinner achieved the status of almost a creation myth in cognitive science. With most introductory texts in psychology or cognitive science attributing the Chomsky’s review of Verbal Behaviour being the death-knell of behaviourism and the birth of cognitive science. Whereas Chomsky’s criticism of Quine wasn’t as well known, and it had a more nuanced reading. While a lot of people came to the view that Chomsky won the debate; it didn’t attain the creation myth status that the review had. Nonetheless, it is fair to say that most philosophers, accepted Chomsky’s criticisms of Quine as to the point.

                Outside of the realm of academic debates in the popular press when Skinner is spoken about, he is referred to a blank slate theorist (Pinker 2002). With Skinner and to a lesser degree Quine placed in the camp as exemplars of the empiricist tradition and modern-day inheritors of John Locke’s mantal, and Chomsky a self-described exemplar of the Cartesian Tradition. And the scientific consensus is that Chomsky’s rationalism has won out over Quine and Skinner’s empiricism.

       Artificial Intelligence and Empiricism.

In recent years with AI getting more and more sophisticated; philosophers, psychologists, and linguists have begun to explore what these AI systems tell us about the rationalist-empiricist debate. With some theorists arguing that empiricist architecture is responsible for the success of recent AI systems (Buckner 2017, Long, 2024). While others have argued that in fact the architecture because it needs in built biases in fact supports rationalism (Childers et al 2020 p.87).

                Buckner (2017) argued that deep convolutional neural networks are useful models of mammalian cognition. And he further argued that these DCNNs use of “transformational abstraction”, vindicated Hume’s empiricist conception of how humans acquire abstract ideas. Childers et al (2020) have hit back at this view and have argued both that LLMs and DCNNs require built in biases for them to be successful. And they further argue that the need for built in biases, is analogous to the way Quine needed to posit innate knowledge to explain language acquisition, thereby, according to them, undermining their empiricist credentials (Childers et al 2020 p. 72).

                Childers et all’s reading of the rationalist empiricist debate is extremely idiosyncratic. Their assertion that the postulation of any innate dispositions is an immediate weakening of empiricism is bizarre (Ibid p. 84). This reading of the rationalist-empiricist dispute doesn’t stand up to scrutiny. Hume, and early arch-empiricist needed innate formation principles in the human mind to account for how we combine the ideas we receive from impressions into complex thoughts (Fodor 2003). And even Chomsky who is viewed as a paradigm exemplar of the rationalist tradition argued that innateness wasn’t the issue when it came to the rationalist-empiricist debate:

The various empiricist and behaviourist approaches mentioned postulate innate principles and structures (cf. Aspects, pp. 47 f.). What is at issue is not whether there are innate principles and structures, but rather what is their character: specifically, are they of the character of empiricist or rationalist hypotheses, as there construed?” (Katz & Chomsky: 1975).

“…Each postulates innate dispositions, inclinations, and natural potentialities. The two approaches differ in what they take them to be…The crucial question is not whether there are innate potentialities or innate structure. No rational person denies this, nor has the question been at issue. The crucial question is whether this structure is of the character of E or R; whether it is of the character of “powers or “dispositions”; whether it is a passive system of incremental data processing, habit formation, and induction, or an “active” system which is the source of “linguistic competence” as well as other systems of knowledge and belief” (Chomsky 1975 pp. 215-216)

And Watson, Quine and Skinner were consistent about this point throughout their careers: Wason 1924 p. 135, Skinner 1953 p. 90, Skinner 1966 p. 1205, Quine 1969 p. 57, Quine 1973 p. 13, Skinner 1974, p.43.

            The point of Childers et all’s criticism was that Hume’s empiricism with its appeal to a few laws of association., needed to be supplanted by Kant’s system which postulated many more innate priors (Childers et al 2020 p. 87). This may have been a problem of Hume, but it is no difficulty for the likes of Quine who was an externalized empiricist who had no issues whatsoever with innate priors once they could be determined experimentally (Quine 1969 p. 57). When it comes to Artificial Intelligence there is a legitimate debate on whether it is pragmatic to build the systems on rationalist or empiricists principles. But this only has relevance to the rationalist empiricist debate if it can be demonstrated that artificial intelligence systems learn in the same way as humans do. In the next section I will evaluate how closely AI systems model human cognition. To do this I will focus LLMs and the degree to which they accurately model human linguistic cognition.

 Large Language Models and Human Linguistic Competence.

Theorists have argued that the similarities between LLMs output and human linguistic output make LLMs and the way they learn directly relevant to theoretical linguistics. Thus, Piantadosi (2023), has argued that LLMs refute central claims made by Chomsky et al in the generative grammar tradition about language acquisition. This comparisons of LLMs to actual human cognition has been challenged in the literature (Chomsky et al 2023, Kodner et al 2024, Katzir, R 2023). In this section I will consider various disanalogies between LLMs and Human linguistic cognition which makes any comparison between problematic. And in the final section I will consider the relevance of these disanalogies towards considering work in AI as being pertinent to debates about Rationalism versus Empiricism.

    Poverty of Stimulus Arguments, Artificial Intelligence, & Human Linguistic Capacities.

            A clear disanalogy to human linguistic abilities and LLMs is that humans acquire their language despite a poverty of stimulus, while LLMs learn because of a richness of stimulus (Kodner et al 2023, Chomsky et al 2023, Long, R 2024, Marcus, G. 2020). To see the importance of this distinction a brief discussion of the role that Poverty of Stimulus Arguments have played in linguistics is necessary with this in place we can return to the stimulus which LLMs are trained on.

            Chomsky 1965 noted that people acquire syntactic knowledge despite a poverty of stimulus. Humans are exposed to limited fragmentary data and still manage to arrive at a steady state of linguistic including knowledge of syntactic rules which they may not have ever encountered in their primary linguistic data. Chomsky used auxiliary inversion as his paradigm example of a poverty of stimulus (Chomsky 1965, 1968, 1971, 1972, 1975, 1986, 1988[2]). Pullum and Scholz (2002) reconstructed Chomsky’s Poverty of Stimulus Argument as follows:

  1. Humans learn their language either through data driven learning or innately primed learning.
  2. If humans acquire their first language through data driven learning, then they can never acquire anything for which they lack crucial evidence.
  3. But Infants do indeed learn things for which they lack crucial knowledge.
  4. Thus, humans do not learn their first language by means of data-driven learning.
  5. Conclusion: humans learn their first language by means of innately primed learning (Pullum and Scholz 2002).

Pullum & Scholz (2002) Isolated premise three as the key premise in the argument. And they sought empirical evidence to discover the amount of times constructions with evidence for auxiliary occur in a sample of written material. The material they choose to examine was Wall Street Journal back issues. They also estimated the amount of linguistic data a person is on average exposed to do. To do this they relied on Hart and Risely (1997) ‘Meaningful Differences in the Everyday Experiences of Young Children’. They estimated that your average child from a middle-class background will have been exposed to about 30million word tokens by the age of three. Pullum and Scholz argue that the child will have been exposed to about 7500 relevant examples in three years. Which amounts to about 7 relevant questions per day. But a primary criticism of their work was that the Wall Street Journal wasn’t representative of the type of data that a child would be exposed to.  Sampson 2002 searched the British National Corpus and argued that the child would be exposed to about 1 relevant example every 10 days.

But the next question was whether a child would be able to learn the relevant construction from 1 example every 10 days (Lappin & Clark 2011). Reali and Christensen (2005) Perfors, Tenebaum & Reiger (2006) have all constructed mathematical models demonstrating that children are capable of learning from the above amounts of data.  However, Berwick & Chomsky et al. (2011) in their ‘Poverty of Stimulus Revisited’ have hit back arguing that Auxiliary Inversion is meant as an expository example to illustrate the APS to the general public. And that there are much deeper syntactic properties which children could not learn from the PLD. The debate still rages on, but it is still a consensus in generative grammar that the Poverty of Stimulus is a real phenomenon which humans need domain specific innate knowledge to overcome.

As our discussion above indicates that there has been some push back against Poverty of Stimulus Arguments however it is still the default position in linguistics.  Furthermore, even those who push back against the APS would gleefully admit that the linguistic data a LLM is trained on is not analogous to the Primary Linguistic Data of your average child. Children are exposed to 10million tokens a year, LLMs are exposed to around 300 billion tokens and this number is increasing exponentially (Kodner et al 2024). So, while the output of a LLM and a human may be roughly analogous the linguistic input they receive is in no way analogous. 

 Competence & Performance in LLMs & Humans.

The divergence on linguistic data which LLMs and humans are trained on is a clear indicator that they work of different underlying competencies. Other differences emerge in terms of the materials they use. In terms of computation considerations chips are faster than neurons (Long 2024). To the degree that outputs are similar that doesn’t prove that they rely on the same underlying competencies (Firestone 2020, Kodner et al 2024, Milliere & Buckner 2024). Kodner et al give the example of two watches both of which keep time accurately but one of which is digital, and the other is mechanical. Despite similar performances they achieve it through different underlying competencies (Kodner et al 2024). 

   But Are Human and LLM Outputs Analogous?

The question of whether Humans and LLM’s output are analogous is obviously vital if we want to understand whether they operate using the same underlying competencies. We have already seen that the two systems seem to learn differently one despite the poverty of stimulus and one because of the richness of stimulus. This points towards different underlying innate competencies. Different underlying competencies aside the next section will demonstrate that the performances of each system are very different.

          At a superficial level it LLMs and Human outputs appear very similar. Chat GPT-4 can to some degree fool a competent reader into thinking that a human produced the outputted sentences. Clark et al (2021) studied human created stories, news articles, and texts and got a LLM to create similarly sized stories. 130 participants were tested, and they couldn’t tell apart the human from LLM models at a range greater than chance. (Scwitzgebel et al 2023). Scwitzgebel, et al (2023) Created a Large Language Model which was able to simulate Daniel Dennett’s writing style and though experts were able to distinguish amongst them at rates barely above chance, it was surprising how close run the thing was given the fact was that it was scholars who were experts on Dennett who were being probed.

While a LLM can construct sentences which appear to be analogous to ordinary human sentences there are obviously a lot of disanalogies. While LLMs can reliably produce syntactically sound sentences and sentences which are semantically interpretable. The words the LLM use have no meaning to the LLMs only to the humans that interpret their output. The reason that they have no meaning is because they are not grounded in sensory experience for the LLM. Whereas for humans they obviously are (Harnard 2024). The LLM unlike the human isn’t talking about any state of affairs in the world, rather it is merely grouping together tokens according to how the tokens are fed into it in its training data.

When criticisms are made that LLMs outputs don’t have meanings. We need to be careful how we parse these statements. Obviously, they have meaning in the sense that they can distinguish between two sentences which are syntactically identical, but which don’t have the same meaning. But the sentences do not have meaning in the sense of referential relation between the words and a mind independent reality. However, given that the idea of explicating meaning in terms of a word-world relation has been questioned (Chomsky 2000, Quine 1973), it is difficult to know what to make of the claim that LLMs don’t have meanings because their words don’t refer to mind independent objects.

Bender & Koller (2020) used a thought experiment to illustrate why they believed that LLMs did not mean anything when they responded to queries. The thought experiment imagines two people trapped on different Islands who are communicating with each other via code through a wire which is stretched between the islands via the ocean floor. In this thought experiment an Octopus who is a statistical genius accesses the wire and can communicate with the other people on the island through pattern recognition. But though he is able to figure out what code to use, and when, due to the context of the code being used and the patterns of when they are grouped together, he has no understanding of what is being said. Bender & Koller argue that if a person on the Island asked the Octopus how to build a catapult out of coconut and wood he wouldn’t know how to answer because he has no real-world knowledge of interacting with the world and is instead merely grouping brute statistical patterns together.

Piantadosi & Hill (2023) in their “Meaning Without Reference in Large Language Models”, argue that thought experiments such as the Octopus one fail because it makes the unwarranted assumption that meaning can be explicated in terms of reference. They argue that meaning cannot be explicated in terms of reference for the following reasons:

  • There are many terms which are meaningful to us, but which have no clear reference e.g. Justice.
  • We can think of concepts of non-existent objects. These have meanings but don’t refer to anything in the mind-independent world.
  • We have concepts of impossible objects such as a round square, perpetual motion machine,
  • We have concepts which pick out nobody, but which are meaningful: e.g. the present King of Ireland.
  • We have concepts which have meaning but which don’t refer to concrete particulars e.g. concepts of abstract objects.
  • We have terms which have different meanings but the same reference {morning star-evening star}.

They go on to argue that conceptual role theory in which meaning is determined in terms of entire structured domains (like Quine’s web-of-belief) plays a large role in our overall theory of the world. But they do nonetheless acknowledge that reference plays some role in grounding our concepts. Just not as large a role as some theorists criticizing LLMs believe. They are surely right that as theory of the world becomes more and more sophisticated our theory will, as Quine noted, face empirical checkpoints only at the periphery. Nonetheless, when humans are acquiring their language in childhood they must go through a period where they learn to use the right word in the presence of the right object, and to somehow learn to triangulate with their peers in using the same word to pick out a common object in their environment.

As we saw above Piantadosi & Hill (2023) shared concerns about crude referentialist theories of meaning and their relation to LLMs, but they did acknowledge that there are word-world connections between some words and objects in the mind independent world. Quine famously tried to connect our sentences to the world through his notion of an observation sentences. He argued that an observation sentence was a sentence which members of a verbal community would immediately assent to in the presence of the relevant non-verbal stimuli.

But Quine immediately ran into a difficulty with this approach. The difficulty stemmed from the fact that Quine found it hard to say why different members of a speech community assented to an observation sentence. Quine tried to cash out the meaning in terms of stimulus meaning where a particular observation sentence was associated with a particular pattern of sensory receptors being triggered.  But this made intersubjective assent on observation sentences difficult to explain given that different subjects would obviously have different patterns of sensory receptors triggered in various ways in response to the same observation sentences. What pattern of sensory receptors was triggered by what observation sentences would be largely the result of each subject’s long forgotten learning history. All of this made it difficult to see how Quine could make sense of a community assenting to an observation sentence being used in various circumstances.

Quine eventually made sense of intersubjective assenting to an observation sentence by appealing to the theory of evolution. He argued that there was a pre-established harmony between our subjective standards of perceptual similarity and trends in the environment. And that humans as a species were shaped by natural selection to ensure that they shared perceptual similarity standards. This fact was what made it possible for humans to share assent and dissent to observation sentences being used in certain circumstances.

For Quine shared perceptual similarity standards and reinforcement for using certain sounds in certain circumstances gave observation sentences empirical content. But to achieve actual reference Quine argues that we need to add things like quantifiers, pronouns, demonstratives etc. The key point is though that observation sentences link with the world (even if in a manner less tight than objective reference), because of our shared perceptual similarity standards matching objective trends in our environment. This is what makes intersubjective communication possible. LLMs are not responsive to the environment in anything like the way humans are when they are acquiring their first words. Even prior to humans learning the referential capacities of language, children using observation sentences are still in contact with the world. To this degree then Piantadosi & Hill’s concerns about reference are besides the point. As humans begin to acquire their first words, they do so through observation sentences which are connected our environment. The fact that when we acquire a language complete with words and productive syntax we can speak about theoretical items, fictional items, impossible objects etc is interesting but doesn’t speak to the LLM issue. The fact is that as humans first acquire their words they do so in response to their shared sensory environment. LLMs do not learn in this way at all. Their training is entirely the result of exposure to textual examples which they group into tokens based on the statistical likelihood of textual data occurring together. So, while humans eventually learn to speak about non sensorially experienced things their first words are keyed to sensory experience, and this is a key difference between them and LLMs. In the next section I will consider some objections view but firstly I want to recap what we discussed so far.

                     Interlude: Brief Recap.

Thus far we have considered the Rationalist-Empiricist debate as it played out in debates between Skinner, Quine and Chomsky. Noting that the consensus is that Chomsky’s rationalism won the day over the empiricism of Quine and Skinner. In recent years with AI getting more and more sophisticated a resurgence of interest has occurred on AI and its relation to the rationalist-empiricist debate. With some theorists arguing that modern AI indicates that empiricist theories of cognition are, contrary to what was previously believed, being realized by some current AI. Following this I discussed whether this was the case, relating it to previous debates in the rationalist-empiricist debate. Arguing that contrary to Childers et al, an AI can learn in an empiricist manner even does so via built in constraints. With this being argued I then questioned whether the fact that some AI learned in a particular manner had much of an impact on human cognition. To sharpen the issue, I narrowed the debate down to whether if LLMs learned in an empiricist manner this would tell us much about the rationalist-empiricist debate in humans.

          To decide on this issue, I considered whether LLMs and humans learned in the same manner. Concluding like many others that they learn in entirely different ways. Humans learn despite the poverty of stimulus while LLMs learned because of the richness of their stimulus. I argued that the different manners in which they learned indicated that there were probably different competencies underlying their respective performances.

     AI and the Relevance of Rationalism and Empiricism.

Above we discussed some disanalogies between LLMs and human linguistic capacities. Two main differences were noted (1) Differences in the stimulus needed for the respective agents to acquire language, which indicates different underlying competencies. (2) LLMs language is not grounded, and Human language is grounded; and this difference is a result of the different ways in which they acquire their language, humans begin by being responsive to the world in a triangular relation with others, while the LLM acquires their language through statistical grouping of text they are trained on.

So given that humans and LLM’s outputs are the result exposure to different quantities, and types of data, which indicate different underlying competencies, a question arises as to the relevance of AI to understanding human linguistic capacities. Earlier we discussed the debate between Chomsky and Skinner & Quine and its relation to current debates on the nature of LLMs. But given the disanalogies we have noted between LLMs and humans it is questionable whether they have anything to tell us about the rationalist-empiricist debate at all.

The debate between the rationalists and empiricists was never centred on whether innate apparatus was necessary for a creature to learn a particular competency. All sides of the debate agreed that some innate apparatus was necessary to explain particular competencies, and the degree of the innate apparatus which need to be postulated is to be determined empirically. The empiricist position was one which argued that humans learned primarily through data driven learning (supported by innate architecture), while the rationalist argued that humans learned through innate domain specific competencies being triggered by environmental input.

However, given the differences between LLMs and Human’s linguistic competencies their relevance to each other on the rationalist-empiricist debate is in doubt. Even if we can conclusively demonstrate that a LLM or a DCNN learns in an empiricist manner, this will not tell us anything about whether humans learn in an empiricist manner. Because human competencies are so different than a LLMs it is simply irrelevant to the rationalist-empiricist debate whether LLMs learn in an empiricist manner or not.

It is theoretically possible that a philosopher could argue a la Kant that any form of cognition will a priori need to implement innate domain specific machinery to arrive at its steady state. And one could offer empirical data to support this a priori claim by showing that very different forms of cognition e.g. human and LLMs both learn by implementing innate domain specific architecture. But this isn’t how the debate has been played out in the literature. Typically, the literature argues that LLMs are largely empiricist, and this fact vindicates empiricism in general, or it is argued that they are largely rationalist, and this fact vindicated rationalism. I have argued here that given the very different nature of LLMs and humans it is irrelevant to the question of how humans acquire their knowledge whether LLMs are rationalist or empiricist.

But this isn’t to say that it is unimportant whether LLMs or other forms of AI learn in a rationalist or an empiricist manner. There are still practical issues in engineering as to whether one is more likely to be successful in building things like Artificial General Intelligence using empiricist architecture or not. Thus, people like Marcus (2020) argue that while we may be able to build AGI using empiricist principles, we will not be able to build AGI unless we build in substantial innate domain specific knowledge into the system. Marcus even argues that the best way to understand what innate architecture is necessary to be built into our AI models we should look to our best example of an organism with general intelligence, i.e. Humans.

While the engineering question is extremely interesting from a practical point of view and could motivate an interest in whether LLMs or other types of AI learn in a rationalist or an empiricist manner. But when it comes LLMs, the question of whether they learn in an empiricist, or a rationalist manner is largely irrelevant to the rationalist-empiricist question in relation to humans.

                            Bibliography

Berwick, R, & Pietroski, P, & Chomsky, N. 2011 “Poverty of the Stimulus Revisited.” Cognitive Science. Vol 35. Issue 7 pp. 1207-1242.

Bender & Koller. (2020) “Climbing Towards NLU: On Meaning, Form and Understanding in the Age of Data.” Proceedings of the 58th annual meeting of the Association for Computational Linguistics.

Brown, R. & Hanlon, C. 1970. “Derivational complexity and order of acquisition in child speech.” In Hayes, J.R. (eds). Cognition and the Development of Language. New York Wiley.

Buckner, C. 2018. “Empiricism without Magic: Transformational Abstraction in Deep Convolutional Neural Networks. Synthese. Vol 195 pp. 5339-5372.

Childers, T, & Hvorecky, J, & Majer, O. 2023. “Empiricism and the foundations of Cognition”. AI and Society. Vol 38 pp. 67-87.

Chomsky, N. 1959. “A review of B.F. Skinner’s Verbal Behaviour”. Language, 35. PP 26-57.

Chomsky, N. 1965. Aspects of a Theory of Syntax. MA: MIT Press.

Chomsky, N. 1966. Cartesian Linguistics. New York: Harper & Row.

Chomsky, N. 1969. “Quine’s Empirical Assumptions”. Synthese 19 pp. 53-68.

Chomsky, N. 1975. Reflections on Language. New York: Random House.

Chomsky, N. 1986. Knowledge of Language: Its Nature, Origin and Use. New York, New York: Praeger.

Chomsky, N. 2000. New Horizons in the Study of Language and Mind. Cambridge: MA.

Chomsky, N, Roberts, I, Watumull, J. 2023. “The False Promise of ChatGPT.” The New York Times.

Chomsky, N & Katz, J. 1975. “On Innateness: A Reply to Cooper.” Philosophical Review. 84 pp. 70-84.

Clark, L & Lappin, S. 2011. Linguistic Nativism and the Poverty of the Stimulus. Wiley

Collins, J. 2007. “Linguistic Competence Without Knowledge of Language”. Philosophy Compass 2 (6) pp. 880-895

Crain, S & M, Nakayama. 1987. “Structure Dependence in Grammar Formation”. Language, Vol 63 pp. 522-543.

Firestone, C. (2022) “Competence and Performance in Human Machine Comparisons.” Proceedings of the National Academy of Sciences. 117. 43. Pp. 25662-26571.

Fodor, J. 2003. Hume Variations. Oxford University Press.

Hart, B & Risley, T, & Kirby, J. 1997. “Meaningful differences in the everyday experiences of the young American.” Canadian Journal of Education. Vol 22. Issue 3.

Harnard, S. 2024. “Language Writ Large: LLMs, ChatGPT, Grounding, Meaning, and Understanding.”

Jackendoff, R. 2002. Foundations of Language. Great Clarendon Street. Oxford University Press.

Katzir, R. 2023 “Are Large Language Modules Poor Theories of Linguistic Cognition: A Reply to Piantadosi”.  

Kodner, J, & Payne, S, & Heinz, J. 2023 “Why Linguistics will thrive in the 21th Century: a reply to Piantadosi”.

Long, R. 2024. “Nativism and Empiricism in Artificial Intelligence”. Philosophical Studies. Vol 181 pp 763-788.

Marcus, G. 2020. “The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence”

Mollo, D, & Milliere, R. “The Vector Grounding Problem”.

Milliere, R & Buckner C. 2024. “A Philosophical Introduction to Language Models: Part 2”.

Piantadosi, S. 2023 “Modern Language Models Refute Chomsky’s Approach to Language.”

Piantadosi, S, & Hill, F. 2023 “Meaning with our Reference in Large Language Models”.  

Pinker, S. 2002. The Blank Slate: The Modern Denial of Human Nature. New York Viking.

Pullum, G, K. & Scholz, B. 2002 “Empirical assessment of stimulus poverty argument.” The Linguistic Review. 19 pp. 9-50.

Quine, W. 1968. “Reply to Chomsky”. Synthese 19 pp. 274-283.

Quine, W. 1970. “Methodological Reflections on Current Linguistic Theory”. Synthese 19, pp. 264-321.

Quine, W. 1974. The Roots of Reference. La Salle. Open Court Press.

Sampson, G. 2002. “Exploring the Richness of the Stimulus”. The Linguistic Review. 19 pp. 73-104.

Schwitzgebel, E, & Schwitzgebel, D, & Strasser, A. 2024. “Creating a Language Model of a Philosopher”. Mind & Language. 32 (2) 237-259.

Skinner, B.F. 1957. Verbal Behaviour. New Jersey: Prentice-Hall Inc.  


[1] For an interesting reply on Skinner’s behalf see Kenneth McCorquodale.

[2] In the following I am Pullum and Scholz (2002) ‘Empirical Assessment of Stimulus Poverty Arguments.  

Chat GPT and the Meaning of Meaning.

In the last decade there has been an explosion of literature on Large Language Modules. The availability of Chat GPT-3 and Chat GPT-4 has created an interest in the topic for the general public. The resulting literature has to some degree led to a rehashing of age-old debates in philosophy which began with the onset of symbolic systems in the 1980’s. John Searle circa 1980 with his Chinese Room argument claimed that there was no reason that a syntactic system should automatically acquire semantic content. Searle had his critics such as Dennett’s 1991 Systems Objection. It is fair to say that most computer scientists took Dennett’s side in the debate.

                With the rise of Chat GPT and other LLM’s this debate has arisen again. But the debate is a bit different. Virtually nobody thinks that LLM’s are conscious agents. Furthermore, nobody thinks that as they stand, they exhibit artificial general intelligence. Nonetheless, a debate has begun as to whether as they stand, they exhibit any meaning. Leaving aside the question of “hallucinations” which sometimes occur when people use GPT; the answers which the device provide generally seem to be largely pragmatically coherent, syntactically well formed, and semantically coherent from the point of view of people interpreting them.

                But despite users finding the models semantically interpretable; theorists argue that there is little reason to attribute any semantics to the systems. Now to evaluate this claim one needs to be explicit on what is meant by semantics in such discussions. There is a long tradition in linguistics, model theory, and philosophy to think of semantics in terms of a referential relation. Thus, the meaning of “Apple” is a physical apple in our intersubjective world of experience. The same would be true of other words in our language. On this conception LLM’s clearly have no semantics because they refer to no mind independent properties and objects. The story would go that there is a causal interaction between humans, their socio-linguistic community, and shared objects of experience as humans causally interact with their world and others. On this view it is blindingly obvious that Chat-GPT doesn’t have any semantics in the sense of a word world relation.

                But semantics in the sense of referential semantics isn’t the sole criterion of semantics. Critics of referential semantics abound in the literature. Some critics such as Chomsky who do think of language as involving a representational system don’t think of language been explicable in terms of a referential relation (Chomsky 2000). While philosophers such as Quine have drawn similar conclusions using different arguments (Quine 1960). However, reference aside it is apparent that humans’ words clearly have meaning, even if this meaning cannot be accounted for in terms of a crude referential relation.  And if our words have a meaning without a crude referential relation then there is no reason to think that Chat GPT-4 or any other LLM needs to use a referential relation in order to mean something by what they say.

                Nonetheless there seems to be something almost perverse in attributing semantics to an unthinking Large Language Model. But as Sogaard (2022) has noted we need some concept of meaning to explain how we can differentiate between syntactically identical sentences such as (1) Colourless Green Ideas sleep furiously, (2) Happy Purple dreams dance silently.  We need something to decide what distinguishes these syntactically identical sentences. And a key candidate is the meanings of the individual words {however we parse meaning}. In the next section we will try to explicate what we mean by meaning and whether LLM’s exhibit meaning in any of the traditional senses of meaning.

                    Meaning as Reference.

                A commonsensical parsing of what people understand by meaning is that they understand it in terms of reference. Nonetheless, experimental philosophy has revealed some cultural variation on how people conceive of the reference relation (Deutsch 2009). However, while it is interesting to note cultural variation in how people intuitively understand the reference relation, the job of the philosopher is to evaluate the logical coherence of understanding meaning in terms of a reference relation.

                Obviously, there are a class of words which clearly cannot be cashed out in terms of a reference relation. Such words are grammatical particles such as “to”, “not”, “ing”, “s”, “ed”, “ing” which modify nouns and verbs. These particles obviously cannot be parsed in terms of a reference relation. The same is true of logical operators such as “And”, “Or”, “Not”, “If and Only If” etc. While any attempt to understand the grammatical particles in terms of a reference relation is defunct. Understanding the logical operators in terms reference seems equally hopeless unless one were to appeal to reference to some non-physical Platonic Realm.

                But logical operators and grammatical particles aside people typically think of the reference relation as best cashed out in terms of nouns and categories which describe and modify nouns such as verbs and adjectives. Thus, a sentence such as the ‘The tall man runs Fastly’. Can be cashed out referentially as referring to a particular man who we are ascribing certain properties to. At an intuitive level we could argue that the meaning of this sentence can be cashed out in referential terms.

                But even with nouns we run into intractable problems in cashing out meanings in terms of reference. Fodor and Pylyshyn (2015) have summed up these problems. We use words to describe objects which we cannot cash out in sensory experience. We talk about extremely large events the multiverse which cannot be cashed out in any simple way in terms of a referential relation. And extremely small objects such as Quarks which cannot be directly observed (and hence cannot be cashed out in terms of a reference relation). There are impossible objects such as a Round-Square which clearly have a meaning, but which do not refer to any objects. There are fictional objects such a Unicorn which do not refer to any objects, but which clearly have meaning.

                When combined with the fact that grammatical particles and logical constants cannot be cashed out in terms of reference; the above considerations which show that many of our concepts which have meanings, cannot be cashed out in terms of reference. Therefore, it is rational to doubt that meanings can be cashed out in terms of reference. But this brings us full circle to our considerations of LLM’s and whether the words they use have meanings.

                 Some points to note. The sentences used by LLM’s clearly have a meaning to us as interpreters. We can parse their sentences as either sensible or senseless, as true or false etc. But the question is do the words the LLM’s use have meaning for the LLM? The answer to this question is clearly no. And one of the key reasons is that they do not acquire their words in terms of a grounding in sensory experience. Hence none of the words they use mean anything to them because they are not grounded in sensory experience and hence are nothing other than collections of words grouped together based on statistical patterns derived from the LLM.

                       The Contradiction.

But we seem to have arrived at a contradiction here. We have argued that LLM’s do not have meanings because their words are not grounded in sensory experience, but when it comes to humans, we have argued that we cannot cash out meanings in terms of reference. But this conflict is more apparent than real. When it comes to sensory grounding it occurs in the context of what Quine calls the mid-level scheme of ordinary enduring physical objects. Or as what Husserl would call the lifeworld the intersubjective shared world which embodied humans live in.

                It will help to begin simple. Let us take the grounding of a word “Apple”. Before the child acquires any words, their cognitive apparatus is engaged in ontological assumptions. Children will perceptually engage in categorising the objects they experience, and they will have implicit assumptions about object behaviour such as object permanence, object solidity etc (Carey, 2009). Such ontological categories shared by all humans will ensure a surprising convergence between humans when they interpret the sounds others are using in relation to shared objects of experience (Pinker 1994). But in order for a child to interpret what the sounds being spoken to him signify the child will need more than pre-linguistic ontological commitments to guide him. He will also need to know that the sound’s being used in his presence are intended in a communicative sense. To do this he will need a theory of mind, the ability to interpret pointing in terms of a shared object of experience, the ability to track eye contact in terms of a shared object of experience and a cooperative instinct.

                When thinking about this issue in terms of epistemology Quine called this the problem of how we go from our meagre input (impact of light and sound on our sensory receptors) to torrential out put (our total scientific theory of the world). Quine recognised that we couldn’t just appeal to shared objects of experience to explain the epistemological question of how we go from stimulus to science, we must explain how this is done. And his explanations relied on both preestablished harmony of perceptual similarity space built by natural selection and shared empathy to ensure that we understood that others who were gesturing towards objects had a psychology and motivation similar to our own.

                Sandford and Hayes (2014) cashed out our ability to go from simple referential relations between two people to more complex capacities such as the capacity to grasp the frame of coordination, and other complex relational frames. They do this by appealing to group selection as a factor which selected for cooperation amongst humans which in turn made it easier to triangulate on shared objects of experience.

                The key point to note is that on this picture. (1) The prelinguistic ontological commitments to object ontology (hinted at by Quine’s pre-established harmony and demonstrated by results in developmental psychology by Carey, Spelke et all). (2) Cooperation and Empathy (an empathic instinct argued for by Quine and demonstrated by Tomasello 2013, Sanford & Hayes 2014). All make it possible for a child to triangulate with a parent on a shared object of experience when they are acquiring their basic concepts. This epistemological triangle results in the child’s early concepts being grounded in an umwelt shared with their species and social community. Now when a child has acquired his grounded base level concepts, he will have the capacity to learn other concepts which are not grounded e.g. Unicorn, Quark, Round-Square etc. But this process will involve using combinatorial syntax (merge), along with analogical reasoning, and perhaps the use of relational frames to create more complex concepts which cannot be cashed out in terms of our shared sensory umwelt.

                This removes our supposed contradiction. A lot of our complex concepts are not explicable in terms of our shared umwelt. Nonetheless, our base level concepts cashed out in terms of our shared umwelt are grounded in terms of triangulation on shared objects of experience for humans when acquiring their concepts. This triangulation is something that LLM’s do not have. Rather than triangulate on shared objects of experience they simply put have to detect statistical patterns in the input they receive as well as modify them depending on the biases and/or reinforcement learning the module is subjected to.

                So, if we go back to our reference problem from above. We noted that impossible objects, very small objects, very large objects, fictional objects have meanings, but they cannot be cashed out in terms of reference. This created a puzzle that when LLM’s were answering questions they appear meaningful to us. But the sceptic would argue that they cannot be meaningful because the words aren’t picking out mind independent objects. But our consideration of human concepts shows that a lot of our concepts do not have meanings conferred in terms of mind independent objects. But if cashing out meanings in terms of mind independent objects isn’t what makes human words meaningful and LLM’s meaningless (unless interpreted by humans), one wonders what is. The answer is that when humans are acquiring a word they do so triangulating on a shared object of experience made possible by shared concepts of objects and a shared instinct to cooperate. This means that as humans acquire their base level concepts, they do so in terms of a shared world of experience in which multiple humans are causally interacting with the world while communicating with each other. While humans can learn new concepts which are grounded directly in the lived world of sensory experience; the concepts they first acquire are grounded in the human shared world of experience. Where as for LLM’s none of their words are grounded hence they mean nothing by what they say. Though we as agents with grounded concepts can attribute meaning to their linguistic output (Harnard, 2024).

                                  Some Objections

                Not everyone agrees that grounding is what separates Humans and LLM. Some theorists even go as far as to argue that LLM’s have can derive meaning via reference in a similar manner to Humans.

“Referential semantics or grounding now amounts to learning a mapping between the Transformer model vector space and this target space. But why, you may ask, would language model vector spaces be isomorphic to representations of our physical, mental, and social world? After all, language model vector spaces are induced merely from higher-order co-occurrence statistics. I think the answer is straight-forward: Words that are used together, tend to refer to things that, in our experience, occur together. When you tell someone about your recent hiking trip, you are likely to use words like mountain, trail, or camping. Such words, as a consequence, end up close in the vector space of a language model, while being also intimately connected in our mental representations of the world. If we accept the idea that our mental organization maps (is approximately isomorphic to) the structure of the world, the world-model isomorphism follows straight-forwardly (by closure of isomorphisms) from the distributional hypothesis.” (Sogaard, 2022 pp. 442-443).

Even if we accept Sogaard’s tendentious statement that the words which are statistically likely to occur together are to some degree isomorphic to structural features of the world. We still have a gap to fill to tell us how such vector spaces are used to refer to objective features of the mind independent world.  It is a leap to go from saying that pattern A is roughly isomorphic with pattern B to saying that a subject is using pattern A as a representation of pattern B. And thus far there is no reason to think that GPT or any other LLM is using words in a representational manner at all.

                 Bender and Koller used an interesting thought experiment which is useful as an intuition pump to demonstrate the lack of grounding in LLM’s speech.  They ask us to think of two agents who are trapped on separate islands who can communicate using a code they send to each other through a wire. They then ask us to imagine an Octopus who a statistical genius who floats beneath the ocean. This genius manages to intercept the wire and learns how to interpret the signals (pick out statistical patterns in the data) and communicate with the Islanders. He becomes so effective that he manages to fool Islander A that he is Islander B. But Bender and Koller argue that if Islander A asked the Octopus to find some coconuts and build a coconut on the Island and then report its findings back, the Octopus would not be able to do so. It is important to note that the fact that Chat GPT could answer this question by having access to the world’s written material is irrelevant. The point is that while the Octopus can mimic the Islanders based on detecting statistical patterns in their conversations, he would not have the capacity to think up an answer to the Catapult question unless he was repeating what he was exposed to before behaving like a large “stochastic Parrot” or if he was an embodied agent whose understanding of the concepts was grounded in experience interacting with the objects under discussion.

                Bender and Koller’s thought experiment reiterates what we have been arguing throughout this blogpost. While LLM’s can appear surprisingly sophisticated when answering questions about various topics there is no evidence that their words have any meaning in the sense that humans have meaning. And the reason for this is because the robots are not embodied and are not causally interacting with a world they are communicating about with other agents.

Aspects of a Theory of Syntax and Behaviourism

                          Introduction

In this paper the competence-performance distinction first proposed by Chomsky (1965) will be analysed in relation to behavioural science. The paper will consider three primary criticisms of behaviourism from the point of view of the competence performance distinction: (1) behaviourists decision to stick with describing speech patterns and habits prevent them from constructing a credible theory of performance (Chomsky 1965), (2) Behaviourists methodology of only dealing with performance and eschewing explanations in terms of competence precludes them from being a serious science (Collins 2007), (3) Behaviourists don’t engage in idealisations and are committed to counting every cough in an instance of verbal behaviour and hence reduce their science to triviality (Jackendoff 2002). It will be demonstrated by considering developments in behavioural science that these criticisms are not justified. To illustrate the point, I will discuss explanations in both behavioural psychology (as exemplified by relational frame theory), and behaviourism in philosophy as exemplified by Quine. These examples will illustrate behaviourists appealing to underlying competencies to explain behaviour as well as using idealizations.

The fact that behavioural scientists use idealizations and appeal to competencies, doesn’t tell us much about the truth of their overall theories. But for interdisciplinary interaction between behaviourists and cognitivists to be fruitful; it is necessary that they understand each other positions. To that end it is imperative that the behaviourist position in the competence-performance distinction is explicated in detail.

Chomsky on the Competence and Performance Distinction

Sixty years ago, in his ‘Aspects of a Theory of Syntax’, Chomsky first explicitly made his distinction between competence and performance. In Aspects he is clear that he is not arguing against the study of performance as a field. Rather he is claiming that if one wants to study performance, then one will need to do so armed with an understanding of underlying competence mechanisms (Chomsky 1965 p.10). When discussing competence and performance Chomsky makes a distinction between acceptability judgements made by subjects and the actual grammaticalness of sentences. He argues that acceptability judgements are performance data which can be explained by underlying competence mechanisms (Ibid p. 11).

            He goes on to state that the following factors lead to unacceptability judgements; Repeated nesting, self-embedding, nesting of a long and complex element etc (Ibid p. 13). And he explains the unacceptability of, for example, repeated nesting, in terms of the finiteness of our memory (Ibid p. 14). Chomsky notes that people have been critical of generative grammar because of its focus on competence and lack of interest in performance. But he claims that the only research into performance that has had any theoretical interest, has been research that has been led by insights from underlying competence systems (Ibid p. 15). He went on to criticize descriptivist and classification philosophies as standing in the way of developing an adequate theory of performance:

“It is the descriptivist limitation-in-principle to classification and organization of data, to “extracting patterns” from a corpus of observed speech, to describing “speech habits” or “habit structures,” insofar as these may exist, etc., that precludes the development of a theory of actual performance.” (Ibid p. 15)

Chomsky doesn’t specify which theorists argue in this methodologically errant manner; but it will be demonstrated that his criticisms don’t apply to behaviourists in either psychology or philosophy.

Competence and Performance and Behaviourism

In this section I will look at the competence/performance issue from the point of view of behavioural science. I will argue that contemporary behavioural science cannot be described in terms of looking for “habit structures” or extracting patterns from the classification or organisation of data. Rather, behavioural science is discovering facts about the emergence of linguistic usage in the context of tightly controlled experimental settings. These discoveries in behavioural science such as (1) Rule-Governed Behaviour’s interaction with the contingencies of reinforcement, (2) the emergence of stimulus equivalence, (3) The emergence of relational frames, are experimentally controlled emergent properties of linguistic behaviour. Their discovery goes beyond mere “description” or “extracting of data from patterns”. Furthermore, there is no reason to think that such emergent species-specific performance data is explicable in terms of “habit structures”.

            While the performance data discovered in behavioural science cannot be reduced to “description”, or “extraction of data”, or explained in terms of “habit structures”, behavioural scientists haven’t provided much by way of an explanation of how these capacities emerge. The reason that such an explanation is wanting is because of an uncritical reliance on Skinner’s crude pragmatist philosophy. However, pace Skinner, any attempt to explain emergent behavioural capacities will be reliant on a distinction between the emergent behaviours and the underlying capacities which make the behaviours possible. Any cogent explanation of emergent behaviours will rely on behavioural science adapting a distinction between competence and performance analogous to that recommended by Chomsky (1965). We will see later that some behaviourists are already moving in this direction when we discuss Hayes and Sanford 2014 later in the paper.

Theorists have long critiqued behaviourists for failing to adequately account for the distinction between competence and performance. For example, philosopher John Collins has argued that behaviourism is not a serious discipline because it doesn’t even try to explain the underlying capacities responsible for behaviour (Collins 2007 p. 883). Arguing further that a focus on competence doesn’t involve ignoring performance; rather it involves explaining performance, through explicating the mechanisms underlying linguistic competence (Ibid p. 883).

            Collins’s claim that behaviourism is not a serious discipline is problematic. Experimental work in behaviourism over the last hundred or so years has yielded a wide range of experimental results which wouldn’t have been possible without research into behavioural science. The discovery of classical conditioning, and operant conditioning has revolutionized both psychology and biology. Since Chomsky wrote his ‘Aspects of a Theory of Syntax’ 60 years ago the field of behaviourism has continued to flourish demonstrating that it is a serious discipline which has made considerable advances over the last 60 years. Some discoveries in behavioural science of note have been Robert Rescorla’s discoveries of predictive mechanisms underlying classical conditioning (Rescorla 1969), the use of these predictive mechanisms to explain taste aversion in rats (Hayes & Sanford 2014), experimental literature demonstrating contingency insensitivity in rule following creatures (Galizio 1979, Shimoff et al 1981, Skinner 1984), the discovery of emergent stimulus equivalence (Sidman 1971), the discovery of emergent relational frames (Hayes & Thompson 1989)  (Hayes 2001).

            Even Skinner’s much maligned book ‘Verbal Behaviour’ has spawned hundreds of experiments on human subjects which have demonstrated some experimental control over his seven verbal operants (Sauter & Leblanc 2006). Likewise, behavioural science has demonstrated its use in applied disciplines, such as Applied Behavioural Analysis. So, any claim that behaviourism isn’t a serious discipline is conclusively refuted by the incredible predictive control it gives us over certain domains of interest.

            Nonetheless, Collins does have a point. There is a reluctance of some in behavioural science to provide explanations of the behavioural patterns in terms of underlying competencies some of which may be innate. This reluctance doesn’t demonstrate that behaviourism isn’t a serious discipline, but it does pose serious limitations on the explanatory capacity of the discipline to account for the discoveries they make. Later in the paper I will explore some recent tentative attempts to explain competencies underlying our capacity to relation frame. I will argue that these competencies demonstrate that these tentative steps are a step in the right direction in bridging the gap between behavioural science and cognitive science.

                What is Linguistic Competence?

            When discussing Chomsky’s distinction between competence and performance it is typical to justify the distinction in terms of idealizations which usually occur in any discipline. When Chomsky is talking about linguistic competence, he notes that he is doing so using a series of idealizations which are necessary to understand the complex object under study:

“Linguistic theory is concerned primarily with an ideal speaker listener, in a completely homogeneous speech-community, who knows its language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance.” (Chomsky 1965 p. 3)[1]

A grammar will then be a description of the rules the idealized subject is using when understanding or speaking their sentences (Ibid p.4). Chomsky’s notion of an idealized subject’s competence is analogous to the idealizations which physicists use all the time to gain traction over the physical phenomena they are studying. A cliched example is of a physicist using idealizations such as studying a frictionless plane to give them an understanding of force and energy (Jackendoff 2002 p. 33).

            Chomsky’s distinction between competence and performance which relies on idealizations is a sensible proposal and one which has led to success in gaining traction on the nature of language. While the use of idealizations is justified and common practice in science; Chomsky’s use of idealizations has had its critics. While most theorists would agree that some level of idealizations are needed it has been argued persuasively that Chomsky’s use of idealizations and his competence/performance distinction has ossified in such a manner as to make aspects of his theories irrefutable:

“Still, one can make a distinction between “soft” and “hard” idealizations. A “soft” idealization is acknowledged to be a matter of convenience, and one hopes eventually to find a natural way to re-integrate the excluded matters. A standard example is the friction of a frictionless plane in physics, which yields important generalizations about forces and energy. But one aspires eventually go beyond the idealization and integrate friction into the picture. By contrast, a “hard” idealization denies the need to go beyond itself; in the end it cuts itself off from the possibility of integration into a larger context…It is my unfortunate impression that, over the years, Chomsky’s articulation of the competence-performance distinction has moved from relatively soft…to considerably harder.” (Jackendoff 2002p. 33)

Other theorists have made similar points (Lakoff, G. 1987 p. 181, Palmer, D.  2023 p. 528).

Jackendoff isn’t arguing that we shouldn’t use idealizations or a competence-performance distinct rather he is warning of the possibility of an idealizing assumption becoming so hardened that it shields a theory from considering alternative ways of dealing with the data of experience. Thus, as we saw above Chomsky championed ignoring things like memory limitations, shifts of attention etc. But other work which doesn’t make this idealization actually use memory limitations to explain the hierarchical embedding in our language (Christiansen and Chater 2023, Christiansen and Chater 2016). Likewise, speech errors which Chomsky tells us to ignore in the name idealization have been used as productive data in explaining the cognitive processes underlying speech production (Hofstadter et al. 1989, Wijnen, F., 1992). Nonetheless, even Chomsky’s sternest critics would admit that his appeals to idealizations and the competence performance distinction have yielded interesting linguistic generalizations.

Jackendoff argued that the need for Chomsky to make the distinction between competence and performance was that other disciplines had failed to make the distinction:

“Chomsky makes the competence-performance distinction in part to ward off alternative proposals for how linguistics must be studied. In particular, he is justifiably resisting the behaviourists, who insisted that proper science requires counting every cough in the middle of a sentence as part of linguistic behaviour.” (Jackendoff 2002 p. 30).

While it is undeniable that behavioural science has a difficulty in explaining its own data because they do not always explain behaviour in terms of underlying competencies, nonetheless, Jackendoff in the above quote is engaging in a wild caricature of behaviourism.

            Some behaviourists do obviously argue that that the subject matter they are interested in are actual instances of behaviour in a particular context (Palmer, D., 2023 p. 528). And in criticisms of Chomsky’s conceptions of linguistics they do argue that as behaviourists they are not obliged to explain possible sentences created by linguists which conform to the purported grammatical principles but never used in actual verbal behaviour. If speakers do not actually use such verbal forms while behaving in relation to each other the behaviourist considers it beyond their purview to explain them (Ibid, p. 529). But whatever one thinks of this behaviourist philosophy; it doesn’t entail the extremes that Jackendoff attributes to it. 

            Jackendoff attributes to behaviourists the view that scientists should “count every cough in the middle of a sentences as part of linguistic behaviour”.  This claim amounts to the assertion that behaviourists do not think that scientists should use idealizations. This is an absurd accusation; any science which deprived itself of idealizations would be overwhelmed with complexity and wouldn’t be able get off the ground. It is simplifying idealizations which makes it possible for a scientific theory to gain any prediction and control.

            But contrary to Jackendoff’s wild claim behavioural science was from the start engaging in idealizations. Studying Classical and Operant Conditioning in a laboratory setting was an idealization which assumed that these artificial experiments could explain the complex learning processes of animals in the wild. Furthermore, in his Verbal Behaviour Skinner used a variety of idealizations. Skinner divided language into seven main verbal operants and treated them separately and under different types of stimulus control. But he noted that this was an idealizing assumption and that in practice the verbal operants would be intertwined and could be acquired together (Skinner 1957 p. 188).

 Furthermore, the notion of an operant was concerned with kinds of behaviour, that shared an effect on the environment and that, as a kind, are demonstrated to vary lawfully in their relations to other variables (Smith 1987 p. 289). Importantly, in his ‘Behaviourism and Logical Positivism’ Smith noted:

“The actual movements involved in pressing a lever, for example, might vary from instance to instance (e.g., left paw, right paw, nose), but they are equivalent with respect to producing reinforcement and they demonstrably function together in the face of changing conditions.” (ibid p. 289)

So even in the case of the operant itself idealisations are being used which results on the focus being on classes of behaviour[2]; so Jackendoff’s notion that behaviourists are engaging in “counting every cough” is simply absurd. Behavioural science like any other science is up to its eyes in idealizations from the start.

            Nonetheless, while Jackendoff is incorrect in his assertion that behaviourists don’t use idealizations he is correct that behaviourists have sometimes eschewed using the notion of competence in their explications of language. And this lack of a theory of competence does hinder their ability to explain the behavioural data which they have discovered.

         Behaviourists and Competence

Behaviourists by definition are concerned with behaviour. Hence, when Skinner wrote on language, he parsed this as the study of Verbal Behaviour. He divided language up into seven main verbal operants recommended empirically studying how various schedules of reinforcement maintained the use of these Verbal Operants. While he justified this research program in terms of studies which had been done on non-human animals; in the 70 years since Verbal Behaviour was written there has been hundreds of experimental studies on conditions responsible for maintaining the use of these Verbal Operants. Skinner’s theory is a paradigm case of an attempted explanation of human linguistic performance.

            Subsequent Behaviourist Research has moved beyond Skinner’s claims about language, e.g. Galizio (1979), Sidman (1971), Hayes & Thompson (1989) and Hayes (2001). But they still focus on behaviour and the degree to which it can be predicted and controlled using conditioning. Much behavioural science has had a heavy emphasis on practical predictive control to aid in applied work. Thus, the Verbal Behaviour Approach inspired by Skinner is heavily involved in teaching functional communication to people with severe autism and or an Intellectual disability. Sidman’s work on Stimulus Equivalence sprung out of his work trying to teach people with an intellectual disability how to read. While relational frame theory is used in attempts to teach people with intellectual disabilities functional communication, as well as a tool in Acceptance Commitment Therapy. This emphasis on applied work is sometimes used as a justification for a heavy emphasis on prediction and control (Dymond & Roche pp. 220-221). From a philosophical point of view, this emphasis on prediction and control is justified by appeal to a pragmatic philosophy of the type espoused by Steven Pepper (1942).

            This emphasis on pragmatic prediction and control of the organism clearly lays heavy weight on performance data; and the histories of reinforcement which shape this performance. But while the underlying competencies are not focused on, they do play a role in the explications. Skinner noted throughout his career that phylogenetic factors are important in shaping the organism:

“Just as we point to the contingencies of survival to explain an unconditioned reflex, so we point out to contingencies of reinforcement to explain a conditioned reflex.” (Skinner 1974 p. 43)

He emphasised that natural selection shaped the structure of the organism through the scythe of survival of the fittest, in analogous manner to the way operant conditioning shaped the behaviour of the organism through selection by consequence. In Skinner’s view selection by consequences rules in both phylogeny and ontogeny. And as early as the mid 60’s behaviourists the Breland’s were emphasising instinctive drift would ensure that different organisms would not be susceptible to have their behaviour shaped in the same way because of their different instinctive natures.

            Analogously, Relational Frame Theorists don’t deny that any prediction and control we gain needs to be explained by underlying genetic, epigenetic, and neural structures. It is just that their primary emphasis is on emergent behavioural data which can be predicted and controlled through behavioural principles. Thus, while relational frames, are emergent phenomena discovered through behavioural training and testing, Hayes does try to explain their emergence as resulting from a combination of genetic constraints and social learning. Thus, he appeals to group selection favouring a cooperative instinct which makes the acquisition of relational frames such as coordination more likely and speculates on how other frames can be derived from a combination of coordination and equivalence (Hayes and Sanford 2014). Nonetheless, the primary emphasis in both relational frame theory and Skinnenarian behaviourism is on prediction and control of the organism using behavioural principles.  

            As discussed above Chomsky wasn’t against performance data per-se, on the contrary he believed that the only theory of performance of any theoretical interest was a theory which took on board competence-based insights (Chomsky 1965 p. 16). Chomsky’s claim has a degree of truth to it; but it is obviously not the whole story. As discussed above in the 60 years since Chomsky wrote Aspects behaviourists have discovered performance data which has ample theoretical interest. The performance data elicited by the behavioural tests, such as stimulus equivalence, rule following contingency insensitivity, relational frames are of great theoretical interest. But to yield a theoretically interesting theory of this performance data we will need to do so in terms of underlying competence systems.

            Behaviourists sometimes do try to explain behaviour in terms of underlying competencies. As we saw above Skinner’s notion of phylogenetic shaping can be used to explain the Breland’s notion of instinctive drift, which is used in a theoretical explanation of animal’s divergent behaviour under schedules of reinforcement. Likewise, Quine appealed to a phylogenetically shaped similarity space which underlay our capacity to successfully engage in induction. However, when it comes to emergent phenomena such as stimulus equivalence and relational frames there has been a reluctance by some theorists to explain the phenomenon in terms of underlying innate mechanisms. In the next section I will give an example from psychological behaviourism which aims to explain relation framing in terms of evolved underlying competencies, and I will then discuss an example of a philosophical behaviourist explaining his data in terms of underlying competencies. This will be proof in principle that some behaviourists do make a distinction between competence and performance and do use idealizations in their theoretical endeavours.

   Quine and Relational Frame Theory.

            In this section I will outline and discuss an attempt by Hayes and Sanford (2014) to explain our species-specific capacity to engage in relational framing in terms of group selection for a cooperative instinct. This section will demonstrate that behaviourists do appeal to competencies to explain behavioural patterns to explain novel behaviour when necessary and that furthermore they routinely engage in idealizations to explain data. I will then further develop this point by showing how it isn’t just psychological behaviourists who appeal to underlying competencies and idealizations to explain behaviour. Some philosophical behaviourists do so as well. This will be demonstrated by evaluating Quines work in this area.

In Hayes and Sanford (2014) they discuss evolution of our ability to engage in verbal behaviour. To understand this capacity, they explain it in terms of abilities humans partially share with non-verbal creatures such as (1) the ability to use vocalizations to regulate the behaviour of others (shared with many mammals), (2) Social Referencing (dogs and Chimpanzees can do this), (3) Joint attention and non-verbal forms of perspective taking (chimps and apes) (4) Non-arbitrary Relational learning (all animals). And they argue that these competencies were modified through group selection. They note that a cooperative instinct within a group will give them the ability to out compete other groups.

With a Cooperative Instinct in place, if two humans are near an apple tree and both know how to say apple when they see an apple. If the apple is out of reach for person A and person A says “apple” then if it is within reach for person B then they will get it for person A. (Ibid p. 121). They argue that it would take the capacity of perspective taking along with the capacity for cooperation to bridge the gap between this epistemological triangle. They call this the beginning of a person’s capacity to engage in a frame of coordination. People acquire the ability to know that “apple” → apple and apple →”apple”. Thus, they know that the sound refers to the object and the object is referred to by the sound. This relationship can then come under contextual control of the “is” relation: Apple is “Apple” and “Apple” is Apple.

Deriving this relation of identity between the sound and the word will be helped through reinforcement for providing the object when the word is said, and having the object provided when you say the word. Relational Frame Theorists have argued that this ability to derive this frame (which is species specific), is made possible through coopting our capacities such as joint attention, the ability to modify others behaviours through vocalizations, social referencing, and non-arbitrary relation framing with our cooperative instinct.

Once this frame of coordination was acquired humans would then have the capacity to recognize mutuality in a frame. And they argue that repeated application of mutuality would give an organism the ability to use combinatorial entailment (ibid p.123). Thus, on this conception the capacity to relationally frame is created primarily by our cooperative instinct. So in effect they explain our capacity to relational frame in terms of underlying competencies such as social referencing, joint attention, the ability to control others using vocal signals, non-arbitrary relational responding being modified in terms of a human specific cooperative instinct shaped by natural selection. In this instance the novel performance data is explained in terms of underlying competencies. They aren’t just describing behavioural patterns they are explaining their arrival in terms of underlying competencies.

                     Quine Idealization

As discussed above the competence-performance distinction is aligned with the notion of idealizations where you can abstract out from aspects of a phenomenon and deal with more tractable subject matters. Thus, the scientist can be dealing with frictionless plans, or humans not subject to memory limitations or distractions etc. The charge made by Jackendoff, and others was that behaviourists don’t use idealizations and hence they have no resources to make the competence-performance distinction. We have seen that this is simply not true when it comes to behavioural science which from the start is up to its eyes in idealizing assumptions. This is true of not just of behaviourists in the scientific sphere but also of prominent behaviourists working on philosophical problems.

Quine’s conception of language is famously terse. While he talks of a child acquiring and being shaped by a language of his peers. He typically focuses on things such as observation sentences. There is little in his conception of language about other uses of language such as interrogatives. Quine justifies this because he is interested in language only in so far as it pertains to epistemology and ontology. Hence, he engages in an simplifying idealization when dealing with science.  Thus in ‘The Roots of Reference’ Quine speaks of the fact that requesting makes up a large part of our linguistic usage, but he doesn’t account for it because it has little relevance in his attempts to explain how we acquire our scientific theory of the world (Quine 1973 p. 46). And later in the same book he notes that he doesn’t want a factual account of how children acquire English, rather he is concerned with telling a plausible story of how we go from infancy to developing a regimented language of science (ibid p. 92). Quine made the same point again in his Mind and Verbal Dispositions:

“One and the same little sentence may be uttered for various purposes: to warn, remind, to obtain possession, to gain confirmation, to gain admiration, or to give pleasure by pointing something out… somehow we must further divide; we must find some significant central strand to extract from the tangle…Truth will do nicely…a man understands a sentence in so far as he knows its truth conditions…this kind of understanding stops short of humour, irony, innuendo, and other literary values, but it goes a long way. In particular it is all we can ask of an understanding of science. (Quine 2008 pp. 448-249).

Again, we can see that Quine is abstracting away from various uses of language because they aren’t useful to him in sketching his story of how we go from stimulus to science. Quine, like his behavioural science colleagues, is engaging in idealizations at every step of his philosophical project. Furthermore, Quine is appealing to underlying competencies to explain how it is that humans go from stimulus to science (Quine 1980 p. 6, Quine 1989 p. 348, Quine 1998 p. 4). He appeals to innate similarity quality space to explain our ability to be able to differentially reinforced, an innate perceptual similarity space to explain our convergence on stimulus meaning, as well as appealing to body mindedness to explain children’s ability to understand object-permanence.

                             Conclusion

In this paper I have demonstrated that behaviourists of both philosophical and scientific bent do indeed make use of both idealizations and of a distinction between competence and performance in their work. Despite the criticisms of Chomsky and his followers; behaviourism’s focus on performance doesn’t necessitate them ignoring competence or shunning the use of idealizations. It is probable that the followers of Chomsky will be un-moved they will note that there has been no behavioural work which can account for the grammatical regularities which are discovered in linguistics. And this I would agree with. Behaviourism even modern behaviourism still hasn’t demonstrated that it has the conceptual resources to handle the syntax of natural language. Nonetheless, we are increasingly discovering more and more interesting performance regularities through behavioural research. These data do need to be explained in terms of underlying competencies. But the discovery of these interesting facts about our behaviour (including Verbal Behaviour), indicate that pace Chomsky we can discover interesting facts about performance prior to having a worked-out theory of competence. In fact, behavioural research has lead us towards achieving a greater understanding of the competencies underlying them not vice-versa. With the sciences of biolinguistics and behavioural science still in their infancies there is still a lot of data to acquire and experimental work to be done. But any attempts to understand either side will involve a greater attention to the what the practitioner of each discipline is doing and not relying on caricatures.

                                          References

Barnes-Holmes, D. 2000. Behavioural Pragmatism: No Place for Reality and Truth”, The Behaviour Analyst 23 pp. 191-202.

Barnes-Holmes, D. 2005. “Behavioural Pragmatism is A-Ontological, Not Anti-Realist: A Reply to Tonneau”, Behaviour and Philosophy 33 pp. 67-79.

Baum, W. 2002. “From Molecular to Molar: A Paradigm Shift in Behaviour Analysis”. Journal of the Experimental Analysis of Behaviour. 78 (1) pp. 95-116.

Chomsky, N. 1965. Aspects of a Theory of Syntax. MA: MIT Press.

Christensen, M, H, & Chater, N. “The Now-or-Never Bottleneck: A fundamental Constraint on language”. Behavioural and Brain sciences 39 (2016).

Christensen, M, H, & Chater, N.  2023. The Language Game. Random House. Penguin.

Collins, J. 2007. “Linguistic Competence Without Knowledge of Language”. Philosophy Compass 2 (6) pp. 880-895)

Dymond, S, & Roche, B, (2013) Advances in Relational Frame Theory. Context Press. New Harbinger Publications.

Galizio, M. (1979). “Contingency-shaped and rule-governed behaviour: Instructional control of human loss avoidance.” Journal of the Experimental Analysis of Behaviour. 31, pp. 53-70.

Ginsburg, S., & Jablonka, E,. 2019. The Evolution of the Sensitive Soul.  MIT Press. Cambridge MA.

Hayes, L, J, & Thompson, S. 1989. “Stimulus Equivalence and Rule-Following”. Journal of the Experimental Analysis of Behaviour. 52 (3) pp. 275-291.

Hayes, S, & Barnes-Holmes, D, & Roche, B. 2001. Relational Frame Theory: A post Skinnearian Account of Language and Cognition. Springer Science & Business Media.

Hayes, S. 2014. “Cooperation Came First: Evolution and Human Cognition.” Journal of the Experimental Analys of Behaviour 101 pp. 112-129.

Hofstadter, D, R, Moser avid J, M., “To Err is human; To study error-making is cognitive science”. Michigan Quarterly Review, 28 (2), pp. 185-215.

Jackendoff, R. 2002. Foundations of Language. Great Clarendon Street. Oxford University Press.

Kemp, G. 2017. “Quine, Publicity and Pre-Established Harmony”, Protosociology 34 pp. 59-72.

Lakoff, G. 1987. Women, Fire, and Dangerous Things.  The University of Chicago Press.

Palmer, D. 2023. “Towards a Behavioural Interpretation of English Grammar.” Perspectives on Behaviour Science. 46 (3) pp. 521-538.

Pepper, S, C. 1942. World Hypotheses: A Study in Evidence. University of California Press.

Quine, W. 1953.  From a Logical Point of View. Harvard University Press. Cambridge MA.

Quine, W. 1974. The Roots of Reference. La Salle. Open Court Press.

Quine, W. 1995. From Stimulus to Science. Cambridge. Mass. Harvard University Press.

Quine, W. 1996. “Progress on Two Fronts”, The Journal of Philosophy. 93/4 pp. 159-163

Quine, W. 2008. “The Flowering of Thought in Language” pp. 478-484 in Follesdal & Quine (EDS) Quine: Confessions of a Confirmed Extensionalist.

Rescorla, R. 1969. “Pavlovian Conditioned Inhibition.” Psychological Bulletin 72 (2) pp. 77

Sautter, R, & Leblanc L, “Empirical Applications of Skinner’s Analysis of Verbal Behaviour with Humans”. The Analysis of Verbal Behaviour 22 pp. 35-48.

Shimoff, E, Catania, A, Matthews, B, 1981. “Unstructured Human Responding: Sensitivity of Low Rate Performances to Schedule Contingencies.” Journal of Experimental Analysis of Behaviour. 36 (2) pp.207-220.

Sidman, M. 1971. “Reading and Auditory Visual Equivalences”. Journal of Speech and Hearing Research. 14 (1) pp. 5-13.

Skinner, B.F. 1957. Verbal Behaviour. New Jersey: Prentice-Hall Inc.

Skinner, B.F. 1974. About Behaviourism. Knoph Doubleday Publishing Group.

Skinner, B.F. 1984. “Contingencies and Rules”. Behavioural and Brain Sciences. 7 (4) pp. 607-613.

Smith, L. D. 1986. Behaviourism and Logical Positivism: A Reassessment of The Alliance. California: Stanford University Press.

Wijnen, F, “Incidental word and sound errors in young speakers”, Journal of Memory and Language 31 pp.734-755.

Wilson, D, S, & Hayes, S. 2018 Evolution and Contextual Behavioural Science. Context Press. New Harbinger Publications Inc.


 

[2]  There is a debate within behavioural science on whether scientists are studying of classes or individuals see Baum ‘From Molecular to Molar: A Paradigm Shift in Behaviour Analysis’ (2002).

 A short blog-post re-written by ChatGpt in the style of Richard Dawkins

The Setting

The act of referring to objects, so effortlessly performed in everyday discourse, belies the complex web of philosophical inquiry it has spurred since the dawn of analytic philosophy. In the annals of this intellectual tradition, a recurring motif emerges: the relentless quest to mend the perceived fractures in the relationship between words and the world they represent. As the century unfolded, a dichotomy crystallized within the realm of linguistic philosophy. On one side stood the cognitivists, disciples of Chomsky, who espoused the view of language as an internal computational apparatus for thought. According to Chomsky’s doctrine, language itself did not possess inherent reference but served as a conduit for expressing thoughts about the external realm. On the opposing front were the behaviourists, led by Skinner, who conceptualized language as verbal behavior moulded by environmental contingencies. Skinner’s contention that the traditional notion of reference was dispensable gave rise to the notion of the “tact,” a verbal operant subject to control by non-verbal stimuli and shaped by the three-term contingency.

Despite their discordant philosophies, both camps found common ground in their dismissal of the reference relation as dissected by analytic philosophers. This scepticism toward linguistic philosophy’s pursuits of clarity and repair continues. But work over the last half a century has laid the groundwork for some convergence between analytic philosophy and behavioural science. In this exploration, we delve into the fertile terrain where analytic philosophy and behavioural science intersect, focusing on the burgeoning field of Relational Frame Theory (RFT). Through an examination of the perspectives of Quine and relational frame theorists, we embark on a journey to unravel the intricacies of observation sentences—how they are acquired, developed, and intertwined with our perception of the world. Join us as we navigate the intricate labyrinth of language and cognition, seeking to illuminate the elusive nature of referentiality and its implications for our understanding of the human mind.

              The Evolution of language

In their groundbreaking work, Hayes and Sanford (2014) delve into the evolutionary origins of our verbal behavior, painting a vivid picture of how our linguistic abilities evolved over time. They dissect our communication skills, drawing parallels between humans and other non-verbal creatures, highlighting shared traits such as vocalizations to influence behavior and social referencing. Arguing from the vantage point of group selection, Hayes and Sanford suggest that our cooperative instincts played a pivotal role in shaping our linguistic capacities. They illustrate this through a compelling scenario: two individuals near an apple tree, both equipped with the word “apple.” When one person vocalizes the word and the fruit is out of reach, the other steps in to retrieve it, fostering cooperation and mutual understanding. This cooperative dynamic, they contend, marks the genesis of our ability to engage in a frame of coordination, where words seamlessly align with their referents. Through reinforcement and shared experiences, individuals develop a nuanced understanding of the relationship between sounds and objects, paving the way for combinatorial entailment and the intricate web of relational frames. In essence, Hayes and Sanford propose that our cooperative instincts underpin our capacity to frame and derive meaning from language, ushering in a new era of understanding in the evolutionary origins of human communication.

Once this frame of coordination was acquired humans would then have the capacity to recognize mutuality in a frame. And they argue that repeated application of mutuality would give an organism the ability to use combinatorial entailment (ibid p.123). Thus, on this conception the capacity to relationally frame is created primarily by our cooperative instinct.

   Individual or Shared Behavioural Streams

In the intricate narrative spun by Sanford and Hayes (2014), the spotlight falls on the profound interplay between human cognition and cooperative instincts, casting a revealing light on the evolution of verbal behavior. Central to their thesis is the notion of coordination between words and objects, a feat achieved through social derivation and contextual control over the “is” relation. Illustrating their argument with the allegory of two individuals wielding the word “apple” as a Mand, Sanford and Hayes beckon us into a world where language becomes a conduit for shared experiences and mutual understanding. Yet, lurking beneath the surface lies a critique of individualistic interpretations, epitomized by Barnes-Holmes (2000), who dissected such instances of triangulation within the confines of individual behavioural streams. In this intricate dance of cognition and cooperation, the tale takes a twist as the narrative shifts to the realm of joint attention. Here, the shared object of experience emerges as the linchpin, transcending individual streams and fostering a communal understanding. While behavioural pragmatists grapple with the ontological underpinnings, their reluctance to delve into intersubjective agreement leaves a void in their narrative, one that can only be filled by acknowledging the symbiotic relationship between individual streams and communal experiences. Thus, as the story unfolds, Sanford and Hayes beckon us to embark on a journey through the labyrinth of human cognition, where language and cooperation intertwine to shape our collective understanding of the world.

         Cometh the Hour Cometh the Man

Enter Quine, the philosophical voyager navigating the treacherous waters of linguistic ambiguity and cognitive convergence. Armed with intellect and inquiry, Quine set out to unravel the enigma of shared understanding amidst the cacophony of sensory receptors. In Quine’s labyrinth of inquiry, the conundrum of observation sentences loomed large, casting shadows of doubt upon the very fabric of linguistic communion. While his behavioural contemporaries grappled with the intricacies of Mand and Tact, Quine delved into the depths of epistemological and semantic inquiry, seeking to unveil the secrets of meaning and consensus. Yet, amidst the philosophical fog, parallels emerged between Quine’s quest and the behavioural pragmatists’ plight. Both faced the daunting task of reconciling individual streams of experience with the communal tapestry of cognition, each seeking solace in the embrace of naturalistic explanation. Quine’s solution, an appeal to perceptual harmony and empathetic resonance, stood as a beacon of enlightenment amidst the turbulent seas of inquiry. Through the lens of empathy, Quine discerned the threads of joint attention and social referencing, weaving a tapestry of understanding that transcended individual perspectives. And yet, the tension persisted between Quine’s ontological aspirations and the pragmatists’ yearning for epistemological clarity. As Barnes-Holmes et al. posited the notion of separate behavioural streams, Quine’s empathy-driven paradigm stood as a testament to the interconnectedness of human cognition. In the end, Quine’s journey served as a testament to the intricate dance of intellect and instinct, weaving a narrative of inquiry and insight that continues to resonate across the landscape of philosophical discourse.

Barnes-Holmes, Subjective Idealism and Behaviourism

                           Introduction

David Barnes-Holmes (2000) in his ‘Behavioural Pragmatism: No Place for Reality and Truth’ criticized Quine for offering a non-solution to the problem of homology. The problem of homology sprung up when Quine was trying to explain how different subjects could assent or dissent from observation sentences. Quine tried to cash out the stimulus meaning of observation sentences in terms of the triggering of sensory receptors. But this led to a difficulty of explaining how two people whose sensory receptors could be triggered differently could converge on stimulus meanings. Quine’s non-solution was that the scientist in practice could ignore the problem because it wouldn’t affect their experiments (Quine 1974 p. 24). Barnes-Holmes argued that while Quine was counselling to ignore the problem based on pragmatic principles contextual behaviourists could remove the problem altogether by adopting a consistent pragmatist approach (Barnes-Holmes 2000 p. 197).

                Quine’s Darwinian Solution

            But Barnes-Holmes didn’t note that Quine (1996) had already come up with a different solution to the problem of homology which didn’t involve merely ignoring it on pragmatic grounds (Barnes-Holmes 2000 p. 196). Quine’s original definition of stimulus-meaning as being provided by the triggering of sensory receptors was difficult to sustain because of his publicity criterion in relation to observation sentences (Kemp 2019 p. 60). Given that people’s sensory receptors could be triggered in a myriad of different ways there is no real justification for assuming shared sensory triggering’s are responsible for our agreement on whether to assent or to dissent from an observation sentence at all.

            Quine though thought that shared stimulus meanings could be accounted for by assuming that humans have a shared perceptual similarity space which underlay their grouping of things together (Quine 1996 p. 161). This similarity space would be partly innate though modifiable through training (Quine 1995 p. 21). He argued that this perceptual similarity space would have survival value and hence would be passed on through natural selection. Put in Skinnerian terminology Quine speculated that the phylogenetic contingencies of survival would be responsible for our perceptual similarity space. And he argued that the nature of these similarity spaces could be determined empirically:

“But perception for all its mentalistic overtones, is accessible to behavioural criteria. It shows itself in conditioning responses. Thus, suppose we provide an animal with a screen to look at and a lever to press. He finds that the pressed lever brings a pellet of food when the screen shows a circular stripe, and that it brings a shock when the screen shows four spots spaced in a semi-circular arc. Now we present him with these same four spots, arranged as before, but supplemented with three more to suggest the complementary semicircle. If the animal pressed the lever, he may be said to have perceived the circular Gestalt rather than the component spots. (Quine 1974 p. 4)

All this is well within the remit of typical animal studies.

Quine was bringing in the notion of a partially innate perceptual similarity shared amongst humans to explain how we converge on the same stimulus meanings for observation sentences. And observation sentences were obviously meant as our empirical check point for our theory of the world. The purpose of this was to give us a naturalistic theory of how we go from Stimulus to Science. And shared innate perceptual similarities cashed out in terms of natural selection and neural structures is sufficiently objective for Quine to rest satisfied with the explanation.

    Barnes-Holmes Behavioural Pragmatism

Barnes-Holmes never mentioned Quine’s Darwinian attempt of a solution to the problem of homology. So obviously he never framed any reply. Before, considering what he may think of Quine’s attempted solution I will first briefly outline his own behavioural pragmatist solution to the homology problem and then relate his attempted solution to Quine’s Darwinian Solution.

 When outlining his behavioural pragmatism Barnes-Holmes notes that it relies on three key assumptions to justify the position. Assumption 1: What is known is always a behavioural function. Assumption 2: The activity of each organism participates in a different behavioural stream. Assumption 3: The activity of the behavioural pragmatist participates in a behavioural stream (Ibid p. 198).

            Each of these assumptions are extremely controversial. Assumption 1 argues that what is known is always a behavioural function not a behaviour independent reality. It isn’t particularly controversial to state that we know things through a function of our behaviour. Barnes-Holmes gives the example of an apple and states that we know the apple through its functions such-as how it elicits stimulations like salivation, or it being a discriminative stimulus for Verbal Behaviour such as ‘this is an apple’, or as reinforcing stimulus for saying something like ‘give me an apple’ (Ibid p. 197). The few examples he gives involve the ontogenetic interaction of a human subject with an apple. He doesn’t mention any phylogenetic factors which would go into the human’s behavioural interaction with the apple. Though presumably he would admit that phylogenetic factors may play such a role. Skinner long stressed that some phylogenetic factors will play a role in an organism’s behaviour (Skinner 1974 p. 228). Likewise, Hayes and Sanford (2014) suggest that to ensure that asking for an apple will have reinforcing consequences we will have to assume a cooperative instinct.

            None of this will be particularly controversial. There are empirical details to fill in about the nature of our behavioural interactions with entities such as apples. And debates about the role of perception as opposed to brute behaviour in our knowledge of things like apples. But depending on how we construe behaviour it is eminently sensible to suggest that we only know something though our behavioural functions with it. But Barnes-Holmes goes further than this. He isn’t content to claim we only know something through behavioural interactions with it; he goes on contrast this with a belief in a physical apple at all:

“In commonsense terms, the apple is a physical thing that exists independently of behavior. For the behavioural pragmatist, however, the apple is defined only in terms of its behavioural functions that emerge in a particular stream of behavioural interactions.” (Ibid p. 197)

In the above quote the notion of a physical thing which exists independent of behaviour is rejected. He isn’t claiming that a mind independent apple doesn’t exist; rather he merely states that the behavioural pragmatist’s definition of apple doesn’t rely on notions of physical objects that exist independent of behaviours. Though he does go on to say that behavioural pragmatist will sometimes talk as if some objects exist independent of behaviour (Ibid p. 198). But such talk involves no ontological commitment as Verbal Behaviour in the technical sense doesn’t ‘refer’ or ‘correspond’ to an external reality (Ibid p. 199).

            It is important to be clear about the commitments they are making. When Barnes-Holmes is making claims about defining objects in terms of their behavioural functions the examples he gives of behaviour are of classical conditioning where an object elicits salivation in an organism, discriminative stimulus which increases the probability of a particular tact being used, and operant processes being involved in making the likelihood of a Mand being used increase (Ibid p. 197). A difficulty with this conception is that internal to theorising about the origins of Verbal Behaviour there is a consensus that Verbal Behaviour first emerged about 100,000 years ago (Hayes and Sandford 2014 p. 114). Furthermore, there is compelling evidence that the capacity for a creature to learn through conditioning of any kind began about 520 million years ago (Ginsburg, S, & Jablonka, E. 2019 p. 293). So, if we think of things such as Cyanobacteria which our best theories tell us existed for billions of years before either the evolution of a capacity to be conditioned or a capacity to engage in Verbal Behaviour (Schirrmeister et al 2015 p. 777). Given the capacities Barnes-Holmes uses to illustrate behavioural functions we use to define an object, he appears to be committed to the view that Cyanobacteria did not exist until creatures with a sophisticated behavioural capacity emerged on the scene, with the capacity to engage with them is a manner sophisticated enough to develop a verbal and non-verbal repertoire in relation to them. It could be argued that this position is a behavioural version of Berkeleyan Subjective Idealism where we must deny the existence of a behavioural independent world.

            Barnes-Holmes would deny that he is engaging in any type of Idealism. He has argued he is not making some anti-realist argument, rather he is making an a-ontological claim about reality. He says that the anti-realist is arguing that either nothing exists beyond scientific language, or that scientific language doesn’t capture reality as it really is (Barnes-Holmes 2005 p. 68). Whereas he defines a-ontological claims as claims which remain silent on behaviour independent reality (Ibid p. 68). Furthermore, he would argue that Cyanobacteria do enter into the behavioural stream of our working scientists, so we are therefore justified in postulating their existence internal to our overall theory of the world. But given that our best theories tell us that entities existed prior to creatures who have behavioural capacities like the ones Barnes-Holmes mentions; why remain silent about their ontological status. Why remain happy to say that it is sometimes ok to talk as if these entities existed, or that to the extent that these entities enter a scientist’s behavioural stream we are justified in saying they exist?

            Why not accord these entities robust realist ontological status? The following passage is instructive as an explanation of Barnes-Holmes reluctance:

“Assumption 3, however, appears to preclude the possibility, in behavioural pragmatism, of finding a scientific truth statement that corresponds to an ontological reality. In effect, if the scientific activity of the behavioural pragmatist is the product of a behavioural history, then he or she can never claim to have found an ontological truth, because a different or more extended history may have produced a different truth (an ontological truth, by definition, is immutable, absolute, and final).” (Barnes-Holmes 2000 p. 198).

So, assumption 3 notes that even the behaviourist’s behaviour is a product of their learning history as they interact with their own behavioural stream. Which means that a different learning history would give them a different theory.  From this Barnes-Holmes concludes that since an ontological truth is by definition “immutable, absolute, and final”, contingent creatures such as us can never arrive at such ontological truths.

            But one wonders why he thinks that we need to parse ontological truths as truths which are immutable absolute and final. Ontology is the philosophical discipline which aims to discover the basic furniture of the universe. To discover what is in the most general sense possible. In traditional philosophy ontology was opposed to epistemology. Famously, Kant argued that reality as it is in itself is unknowable, and that we can only know reality in so far as it conforms to our mode of cognition. So true reality in its ontological form is something which we can never know. Hence, Kant argued that metaphysics should give up the proud name of ontology. In this sense of ontology, we cannot say anything about ontology; even statements about whether it is “immutable, absolute, and final” would be out of place because whatever the nature of ontology beyond our mode of cognition we would not be justified in speaking about it.  So, it is doubtful, whether Barnes-Holmes meant ontology in the sense the Kantian sense.

            Older versions of ontology which would have begun with Plato would involve studying entities in the world and trying to discover their accidental and essential natures. The discovery of essences would in this sense would typically be conceived of as “immutable, absolute and final”. Metaphysics in this sense would have ontological purport and is still studied to this day in philosophy departments. If Barnes-Holmes is critical of this type of ontology that is one thing, however, it doesn’t follow that because the behavioural scientist is critical of ontology in Plato’s sense that it should therefore become a-ontological.

            There is another strand of ontology which doesn’t involve appeals to ‘immutable, absolute and final” properties and that version of ontology was developed by Quine who Barnes-Holmes discussed in the paper. It is surprising therefore that he didn’t criticise Quine’s view of ontology directly.

            Barnes-Holmes argues that the issue of scientific truth is defined ultimately in terms of whether it achieves certain goals; and for the behaviourist the ultimate goal is prediction and control. So, ontology doesn’t come into the issue at all. He explicates his a-ontological position by arguing that “no fundamental or final or absolute assumptions are ever made about the nature or substance of behaviour independent reality” (Barnes Holmes 2005 p. 68). In effect they ignore issues in relation to realism-vs-anti-realism and stick doing behavioural science involving the prediction and control of organism’s understudy.

“Functional relations, at least in behavior analysis, are correlational, and no mentalistic, cognitive, or intentional act of reference from the response to an ontologically real stimulus is implied when functional-analytic terms are used in a behavioural explanation. For the behavioural pragmatist, therefore, a technical analysis of ontological talk will be cast in terms of patterns of stimulus-response-stimulus interactions, not semantic reference, literal meaning, or some form of word-referent correspondence. The procedural instruction “set the tone to between x and y cycles per second,” for example, could be interpreted as a relational network of derived stimulus relations (Barnes-Holmes, Hayes, & Dymond, 2001; Barnes-Holmes, O’Hora, et. al., 2001) or an instructional stimulus composed of Tacts, intraverbals, relational autoclitics, and the like (Skinner, 1957), or perhaps a combination of both interpretations (Barnes-Holmes, et. al., 2000). In neither case, however, is semantic reference or literal correspondence to an ontological reality included as part of the explanatory nomenclature. The technical terms of behavior analysis are simply empty with respect to ontological reality, and thus neither realism nor antirealism is implied.” (Barnes-Holmes 2005 pp. 73-74)

Despite above writing that the technical terms of behavioural analysis are ontologically empty, he notes that ontological talk is regularly used in report sections of daily articles and in ordinary scientific activities (ibid p. 74), arguing that such talk is fine as long as it doesn’t involve talking about the fundamental nature of reality.

            The idea that it is ok for behavioural analysts to talk about a behaviour independent reality when describing the results of their experiments, but this talk isn’t to be taken seriously as a description of the reality.  Part of the motivation is that Barnes-Holmes wants to avoid being committed to making assumptions about the fundamental or absolute nature of reality. So committed  is he to avoiding speaking about absolute reality is he that he goes as far to define something as true in so far as it achieves certain scientific goals (prediction and control). Aside from the fact that this assumption is arbitrary; why for example is prediction and control given priority over explanatory depth?

            More importantly the behaviourist can achieve their goals without bending over backwards and trying to eschew all talk about a mind independent reality. A more realistic goal would be to treat ontology as a part of science; and treat ontological commitment as the working out of what theoretical presuppositions cannot be done away with to make sense of our total theory of the world. If we cannot make sense of our experimental results without presupposing certain entities, then we are justified in admitting into our ontology. Quine, argued that a good technique to lay bare the ontological commitments of a science are to translate into the syntax of first order logic arguing that:

“a theory is committed to those and only those entities to which bound variables of the theory must be capable of referring in order that the affirmations made in the theory be true” (Quine, 1953 pp. 13-14)

And he used this technique to try and dis-entangle philosophical disputes on ontological commitment in subjects like mathematics.

            It could be argued that while mathematicians and physicists can do their work without solving debates about ontological commitment all behavioural analysts are asking for is the freedom to do their work without solving debates about ontological commitment. However, I don’t think that this is a fair way of interpreting the debate. The behaviourists aren’t just eschewing ontological talk in some innocent theory independent way; rather they are making truth relative to their own personal goals; and judging a theory as to whether meets those goals adequately. Anything outside of these personal goals are deemed irrelevant.

     Quine and the Pre-Established Harmony.

Barnes-Holmes describes the problem of homology as a problem which arises when we assume a correspondence between observation sentences and ontological reality (Barnes-Holmes 2000 p. 195). Quine’s attempted solution does indeed make assumptions about ontological reality. His assumptions are largely physicalist. He is assuming mind independent world with features such that an organism must track if it is to survive, he assuming that creatures whose neural structures with innate standards of perceptual similarity which tend to harmonize with trends in the environment will be more likely to survive than animals whose innate structures don’t harmonize with the environment (Quine 2008 p.204).  And this assumption is then used to explain how human’s shared neural perceptual similarity structures will explain how they converge on stimulus meanings of observation sentences.

            One may or may not agree with Quine’s account of how humans shared perceptual similarity standards lead to similar stimulus meanings which accounts for agreement on observation sentences. But his account is no more ontologically profligate than behavioural scientists appeal to group selection for cooperation to explain the emergence of frames of coordination in children’s ontogeny (Hayes 2014 p. 123). Or when behaviourists explain taste aversion in rats through natural selection resulting in the temporal parameters of classical conditioning becoming distorted (Wilson and Hayes 2018 p. 53).

            It could be replied that behaviourists like Barnes-Holmes have difficulties not with appeals to neural structures, or sensory receptors being impinged on. Rather they want such explanations cashed out functionally instead of in terms of shared structures (Barnes-Holmes 2000 p. 201). But this reading cannot be correct. Behaviourists emphasize the central importance prediction in their theories. And Quine’s appeal to shared structures is making predictions which are eminently testable by future science. It would be churlish to critique Quine for appealing to shared structures if this appeal involves empirical check points and predictions which could be tested. Furthermore, explanations of distortions in classical conditioning in taste aversion which behaviourists unproblematically study involve appeals to shared structure in rats.

            Barnes-Holmes though could argue that it is not just Quine’s appeals to shared structure which he has difficulties with. He also had difficulties with Quine’s appeals to Observation Sentences as part of his check points which are used to ensure that theories are tested for truth. Barnes-Holmes has difficulties with any non-pragmatic appeal to truth as his assumption 3 that scientists views develop as a result of their learning history means that with a different learning history, they may have held different theories of the world. Barnes-Holmes believes that assumption 3 means that we can never hold “immutable, final, absolute” ontological truths. But when Quine speaks about ontology, he is not speaking in terms of “absolute, immutable, and final” truths. Quine’s entire philosophy is built around the concept of radical revisability of our overall theories of the world, and no aspect of our web-of-belief is immune from potential revision; including mathematics or logic.  Therefore Barnes-Holmes criticisms completely miss the mark when it comes to Quine, as when he speaks about ontology he is never speaking about absolute, immutable and final truths.  

                         Bibliography

Barnes-Holmes, D. 2000. Behavioural Pragmatism: No Place for Reality and Truth”, The Behaviour Analyst 23 pp. 191-202.

Barnes-Holmes, D. 2005. “Behavioural Pragmatism is A-Ontological, Not Anti-Realist: A Reply to Tonneau”, Behaviour and Philosophy 33 pp. 67-79.

Ginsburg, S., & Jablonka, E,. 2019. The Evolution of the Sensitive Soul.  MIT Press. Cambridge MA.

Hayes, S. 2014. “Cooperation Came First: Evolution and Human Cognition.” Journal of the Experimental Analys of Behaviour 101 pp. 112-129.

Kemp, G. 2017. “Quine, Publicity and Pre-Established Harmony”, Protosociology 34 pp. 59-72.  

Quine, W. 1953.  From a Logical Point of View. Harvard University Press. Cambridge MA.

Quine, W. 1974. The Roots of Reference. La Salle. Open Court Press.

Quine, W. 1995. From Stimulus to Science. Cambridge. Mass. Harvard University Press.

Quine, W. 1996. “Progress on Two Fronts”, The Journal of Philosophy. 93/4 pp. 159-163

Quine, W. 2008. “The Flowering of Thought in Language” pp. 478-484 in Follesdal & Quine (EDS) Quine: Confessions of a Confirmed Extensionalist.

Wilson, D, S, & Hayes, S. 2018 Evolution and Contextual Behavioural Science. Context Press. New Harbinger Publications Inc.

Schirrmeister BE, Gugger M, Donoghue PCJ. 2015. “Cyanobacteria and the Great Oxidation Event: evidence from genes and fossils”. Palaeontology 58: 769–785.

Skinner, B.F. 1974. About Behaviourism. Vintage Books. New York.

Chomsky ‘Psychology and Ideology’ 50 Years On.

Introduction

            In this blog post I will discuss Noam Chomsky’s 1971 paper ‘Psychology and Ideology’ where Chomsky critiques Skinner’s popular science book ‘Beyond Freedom and Dignity’. In Beyond Freedom and Dignity Skinner claimed that freewill was an illusion and we could explain it away using an effective behavioural science. And once we did this we would be in a position to use behavioural science to engineer a more effective society than the one we currently live in.

I will argue that while Chomsky sometimes caricatures Skinner, and he is blind to the strengths of behaviourism as a discipline, his criticisms still hit the mark. Skinner’s remarks in ‘Beyond Freedom and Dignity’, exaggerated what the science of behaviour was capable of in 1971, and is still beyond what behavioural science can achieve 50 years later. To this end, Chomsky’s 1971 paper did the field of psychology and philosophy a favour with its terse criticism of Skinner’s attempt at popular science.

Psychology and Ideology

 At the beginning of ‘Psychology and Ideology’ Chomsky noted when reflecting on psychological claims that we need to ask two different kinds of questions: (1) What is the scientific status of the claims, (2) What social or ideological needs do they serve. He correctly noted that these two claims are logically independent. He argues that Skinner’s empirical claims are vacuous and completely without scientific merit. And he argues that because of their null scientific status they can serve the purposes of any would be dictator as rhetoric whether the dictator was on the left or the right.

            Chomsky even goes as far to argue that the these of ‘Beyond Freedom and Dignity’ results in the whole project being incoherent:

 “But if his thesis is true, then there is also no point in his having written the book or our reading it. For the only point could be to modify behavior, and behavior, according to the thesis, is entirely controlled by arrangement of reinforcers. Therefore, reading the book can modify behavior only if it is a reinforcer, that is, if reading the book increases the probability of the behavior which led to reading the book (assuming an appropriate state of deprivation).” (Psychology and Ideology: p.21).

This is a forced choice it is probable that there are some true claims in the book and some false claims in the book. The cause of us reading the book could be specified in various manners; being reinforced by reading that class of book in the past; a book recommendation from a friend whose recommendations in the past have been reinforcing. But Chomsky doesn’t ask what the cause of reading it is; he asks what the point of reading it. He says on Skinner’s central thesis, the only point of reading the book, is that it will modify behaviour. And it will only modify behaviour if it is a reinforcer, that is if reading the book increases the probability of the behaviour which led to reading the book (ibid p. 21). This is a strange interpretation of Skinner’s project. We have already discussed possible causes of reading the book. Possible consequences of reading the book are negative reinforcement; reading the book takes away boredom and increases the probability of reading more books like this in the future. Or the reader could be punished as a result of finding the book unintelligible, and excruciating to read, which will decrease the probability of reading books of this class in the future. Or the person could find the book positively reinforcing and this may lead to reading more books of this class and possible seeking a career in behavioural psychology.

            It is important to note that behavioural psychology doesn’t stand or fall based on what point each individual gets from reading one popular book by B.F. Skinner, there are now over a hundred years of behavioural research. And behavioural science should be evaluated on its own terms, which are the degree to which their principles have given us the ability to predict and control the behaviour of various organisms.

            Chomsky goes on to argue as follows:

“Consider the claim that reading the book might reinforce such behavior. Unfortunately, the claim is clearly false, if we use the term ‘reinforce’ with anything like its technical meaning. Recall that reading the book reinforces the desired behavior only if it is a consequence of the behavior; and obviously putting our fate in the hands of behavioural technologists is not behavior that led to (and hence can be reinforced by) reading Skinner’s book. Therefore, the claim can be true only if we deprive the term ‘reinforce’ of its technical meaning.” (Ibid p. 22)

Reading books in the past has been reinforced positively in our school and college environment. Doing so has led to reinforcing consequences in the past. Reading behavioural books in the past (Watson’s Psychology as the Behaviourist Views it), led Skinner to pursue a career in behaviourism, this has been reinforcing in terms of (discovering things which he finds reinforcing), and a long career (which he found reinforcing). Similarly, for us, the general public, reading in the past has been reinforced, if ‘Beyond Freedom and Dignity’ is in the stimulus class of popular science books people found reinforcing in the past then this may lead to people reading it. Lots of people reading it could influence behavioural change which will alter the probability of how people behave in the future. What courses they study, how they manage selection by consequences etc[1].

Chomsky’s critique though goes beyond the idea of reinforcing consequences of reading the Beyond, Freedom, and Dignity. He is also critical of Skinner’s casual manner of translating ordinary discourse into behaviourist language, and at the perceived lack of progress in behavioural science.

“Because of this unwillingness, there is also no discernible progress – today’s formulations in this domain are hardly different from those of 15 or 20 years ago – and no convincing refutation, for those who are untroubled by the fact that explanations can be invented on the spot, whatever the facts may be, within a system that is devoid of substance.” (Ibid pp 29-30)

Here it is fair to say Chomsky does have a point. Skinner had a penchant for inventing explanations for any behaviour or cognitive capacities in terms of reinforcement. And there was little indication that Skinner was overly concerned about testing the empirical validity of his claims about various complex behaviour being explicable in terms of reinforcement. Even today some behaviourists who have been heavily influenced by Skinner are critical of him for this tendency:

Evolution was for many years dramatically gene-centric…ontogenetic evolution was virtually ignored…behaviour analysis seemed to have made the opposite error…A good example is provided by the transcript of the recorded interview between B.F. Skinner and E.O. Wilson, in which almost every specialized, evolutionarily established behaviour put forward by Wilson was promptly interpreted by in Operant terms (Hayes and Sanford 2014 p. 115).

So, Chomsky’s criticism of Skinner casually translating every complex trait into something explicable in terms of operant conditioning is to the point. However, his criticism about a lack of advancement in behavioural science doesn’t stand up to critical scrutiny. Chomsky wrote ‘Ideology and Psychology’ in 1971 and his claim that there hadn’t been a huge advancement in behavioural science since around 1950 isn’t true.

             Behavioural science since 1950 had undergone rapid changes. Breland and Breland’s (1963) work on animal training demonstrated that instinctual drift would mean that non-human animals’ behaviour wasn’t as malleable as earlier naïve behaviourists thought. Skinner, who had long stressed both phylogenetic and ontogenetic factors playing a role in animal behaviour welcomed the work of the Breland’s. On Skinner’s way of thinking it was the behaviourists job to discover the different ways behaviour could be shaped and controlled through different schedules of reinforcement. The behaviourist wasn’t in the game of stipulating how malleable different organisms were. Nonetheless, despite the Breland’s work being congenial to Skinner’s behaviourism, for the public, instinctive drift made the thoughts of behaviourists gaining control over people and shaping their behaviour seem less threatening.

             As behavioural scientists continued to study human’s operating under schedules of reinforcement there was more reason to think that humans couldn’t be just shaped at a whim through schedules of reinforcement. Hundreds of behavioural studies on rule governed behaviour[2], have demonstrated that when humans were operating under rules this made them less sensitive to the contingencies of reinforcement. Children below the age of 5 could be shaped under schedules of reinforcement in a similar way to a rat, but once they passed 5 and could follow rules their behaviour became less sensitive to the contingencies of reinforcement (Bentall).

            Like the Breland’s work, work on rule-following sprung up from within behaviourism (Skinner 1963) and demonstrated that human behaviour was more complex than the behaviour of other organisms. The year Skinner wrote ‘Ideology and Psychology Sidman (1971) experimentally demonstrated that humans could derive untrained stimulus equivalence. And in the years since Steven Hayes (1989) demonstrated that humans could derive untrained relational frames (coordination, comparison, hierarchy, etc). The human under behavioural science began to closer resemble the human as described by cognitive scientists (built with innate constraints, have a species-specific capacity for productive reasoning etc), than it did the human as described by early behaviourists.  

Now Chomsky would parse some of these studies as resulting in the death of behaviourism as derived by Skinner.  However, things don’t have to be parsed in this manner. One could look at the work of behaviourists, such as Rescorla, Breland, Sidman, Lowe et all’s experimental refutations of previously held beliefs by behaviourists as a sign of an evolving healthy discipline.

One thing that should be emphasised was that when Skinner was writing ‘Beyond Freedom and Dignity’ work on rule following in behavioural science was being done which was already demonstrating that we couldn’t simply reinforce behaviours we wanted repeated, when people operated under verbal rules, they were less sensitive to the contingencies of reinforcement than non-verbal animals. Thus, even for behaviourists at the time Skinner wrote ‘Beyond Freedom and Dignity his work was outdated.  Society couldn’t be shaped in the manner Skinner wanted.

Chomsky goes on to quote Skinner’s claims about techniques we could use to control speech. Chomsky notes, correctly that Skinner’s science isn’t up to the task of doing such job. But he notes it would be abhorrent if such controls could be put in place:

Or consider freedom of speech. Skinner’s approach suggests that control of speech by direct punishment should be avoided, but that it is quite appropriate for speech to be controlled, say, by restricting good jobs to people who say what is approved by the designer of the culture. In accordance with Skinner’s ideas, these would be no violation of ‘academic freedom’ if promotions were granted only to those who conform, in their speech and writings, to the rules of the culture, though it would be wrong to go farther and punish those who deviate by saying what they believe to be true. Such deviants will simply remain in a state of deprivation. In fact, by giving people strict rules to follow, so that they know just what to say to be ‘reinforced’ by promotion, we will be ‘making the world safer’ and thus achieving the ends of behavioural technology (74,81). The literature of freedom would, quite properly, reject and abhor such controls. (Ibid pp. 30-31).

What Chomsky doesn’t note though is that Skinner’s philosophy always had safeguards in place so those under control had a means to resist any science of behavioural engineering.

            Since 1953 Skinner had written about countercontrol as a way organisms had of resisting being controlled by others. Spencer et al (2022) define countercontrol as follows:

“Countercontrol is a Skinnerian Operant concept that posits that an individual’s attempts to exert control over another person’s behaviour may evoke a countercontrolling response from the person being controlled that functions to avoid or escape potentially aversive conditions generated by the controller.” (Spencer et al p. 457)

Skinner had targeted our notion that people are free. He argued that people only described themselves as free when they could not identify the variables which were controlling their behaviour. He also noted that when people are under the control of positive reinforcers, they often describe their behaviour as freely chosen. He gave the example of state lottery which works as an implicit tax on people and noted that people think they freely chose to do the lottery. He emphasised the point that people value freedom because it is controlled by positive reinforcement and as a result does not occasion countercontrol (Delprato 2002 p. 195).

            Skinner warned that evidence of lack of countercontrol as an indication of “freedom” was dangerous. Belief that we are freely acting can lead to inadvertently being subject to long term aversive consequences resulting from our behaviour. This can happen when the controller is aware of these long-term consequences and the controlee is the ultimate loser in this scenario (ibid p. 195).

            Skinner argues that the correct solution to control is not to abolish it (he thinks this is impossible), but to analyse it and see if this is the type of control and consequence we want and if not to figure out a different type of controlling system to work within:

“Humans need (a) to eliminate aversive control (often a practical impossibility), (b) to identify positive reinforcement and other inconspicuous forms of control that have deferred aversive consequences, and (c) to substitute positive reinforcement contingencies without such consequences.” (Delprato p. 196).

Delprato argues that the above sequence is practically impossible, so we are in effect stuck with the use of countercontrol.

            As we discussed above since the mid-sixties, we knew that once people begin to follow verbal rules their behaviour isn’t shaped by the contingencies of reinforcement in the same way as non-human behaviour. Nonetheless, control and counter control are still facts in any society we live in. Relational Frame Theorists, behaviourists who have been studying emergent properties of verbal behaviour, have recently tried to tie countercontrol in with derived relational responding to see if we could use the concept now that our understanding of rule-following behaviour has expanded beyond Skinner’s conception (Spencer et al 2022).

            As things stand 50 years after Skinner wrote his ‘Beyond Freedom and Dignity’ we still are nowhere near developing the capacity to behaviourally engineer our society. While Chomsky sometimes caricatured Skinner’s behaviourist position. He was surely correct to note that Skinner’s proclamations about the capacities of behavioural science to engineer our culture went well beyond anything possible in 1971 or anything possible today.


[1] I am not hear arguing that we can explain people reading books entirely in terms of reinforcement, I am merely demonstrating that Chomsky’s quick argument for incoherence doesn’t work.

[2] See for example: Weiner et al (1964), Lippman and Myer (1967), Lowe et al (1983), Hayes et al (1986).

Are large language models the intellectual ancestors of Behaviorism

Childers et al 2023 argue that the debate between Chomsky and Quine and Skinner is revisited by contemporary debates on Large Language Models and Chomskian linguistics. They go on to argue that proponents of the view that data driven large language models mirror human natural language make the same mistakes that Chomsky critiqued people like Quine and Skinner ago 50 years ago. In this blog post I will argue that Childers et al largely misinterpret Skinner and Quine’s project and hence any connections they draw with Large Language Models and Connectionism are problematic.

            Childers et al argue that Quine and Skinner’s empiricism has collapsed under criticisms from Chomsky. And that Quine responded to this collapse by modifying his position his empiricism to accommodate Chomsky’s position by appealing to innate mechanisms.  They note that once such an appeal accepted then we are in a place where we are no longer empiricists, and our position is closer to that of rationalists. I would argue that this is an idiosyncratic reading of empiricism. Empiricists, like Hume and Locke, appealed to innate mechanisms explain how we acquired our knowledge of the world. It is just that the innate mechanisms they appealed to wouldn’t be sufficient to account for the complexity of human language and cognition. Quine’s externalized empiricism is of a piece with Hume’s except for in Quine’s case he is advocating for the innate mechanisms to be determined experimentally. Childers et al call this hybridised empiricism and note that it is empiricism only in name. It is unclear to me why Quine arguing that we determine what innate structures are necessary based on behavioural tests should be considered anything other than ordinary empiricism externalized.

            They do go on to make the further point that Quine’s speculations about the innate principles necessary are extremely vague. I would agree with them on this point; when Quine talks about analogical synthesis being the method in which we connect sentences with sentences he is extremely vague. Gibson (1987) wrote about Quine’s postulation of analogical synthesis being a postulated innate structure to be mapped by future scientists. In the 35 years since Gibson wrote those words there has been very little work done by philosophers influenced by Quine filling in the details of this project.

            There has though been scientific research into analogy from a scientific perspective in both behavioural science and cognitive science. Relational Frame Theorists are behavioural scientists whose experimental work on language has left them to abandon some of Skinner’s account of language. Thus, they argue their research shows that rule following in language changes how people respond to schedules of reinforcement, and they argue that human specific emergent properties such as the ability to derive stimulus equivalence, and relations of coordination, hierarchy, etc. These relational frames under contextual control exhibit the property of Mutual entailment, combinatorial entailment, and transfer of function (Relational Frame Theory an Overview p.62). In Relational Frame Theory they demonstrated empirically that analogy is a relational frame which typically emerges at about 5 years of age, defining analogy as the capacity to derive sameness amongst equivalence relations or equivalence-equivalence responding.  They think that these derived relational frames may play a role in linguistic productivity. However, it would be unfair to argue that they vindicate Quine’s concept of analogical synthesis as Quine’s vague formulations played no role in the experimental paradigm. And furthermore, it is impossible to tell the degree to which RFT is consistent with Quine as his formulations were too vague to match on to them.

            In cognitive science analogical reasoning has been researched in detail see for example Hofstadter (2013), and Gentre and Hoyos (2017). Gentre has being doing experimental work on analogies for over 40 years. Based on her experimental work she argues that children start deriving analogies from 7months of age. The discrepancy between her chronology of when analogies are acquired and the timeline in RFT can be accounted for in the lower criteria set for what counts as an analogy for Gentre. Gentre argues that analogies involve transfer of knowledge from one area to another whereas RFT theorists argue that stimulus-stimulus equivalence is necessary for something to count as analogy.

            In their ‘Analogy and Abstraction’ Gentner and Hoyos note a difficulty in explaining children’s acquiring of abstract analogies and their difficulty is like the one which faces Quine. We know from the psychological literature that children (1) prefer to extend their concepts based on bare perceptual similarities, (2) this results in them making concrete analogies, (3) children can extend their comparison classes through multiple exemplars.

Two of the primary ways of making an analogy are through projective alignment and through mutual alignment. Projective alignment is when people use an already well understood domain to illuminate another domain. However, when it comes to young children, they do not have a large store of well understood concepts, so it is difficult for them to use already understood concepts to explain a different concept. Therefore, young children typically use mutual alignment in their analogical abstraction. In mutual alignment analogies one discovers commonalities which were not obvious in either analog (ibid p. 3). Mutual alignment involves establishing a structural alignment between two-representations based on matching relations between analogues (ibid p.4).

The difficulty is to explain how children can go from using concrete analogies based on bare perceptual similarities to more abstract relational concepts. One problem is that presenting young children with exemplars which are not perceptually similar will not be helpful as the children won’t have the capacity to mutually align the two analogs. So, we are left with a mystery as to how young children form more abstract relational analogs. Gentner and Hoyos argue that we overcome this barrier through a process called progressive alignment. Experimental studies have shown that if young children are presented with abstract relational analogues they cannot pick up on the relationship. However, if they are first trained on concrete analogues and then later retested on the abstract relational analogues their performance improves dramatically.

This progressive alignment gives young children the capacity to move beyond bare similarity and acquire more abstract relations. Like in the case of RFT it is possible to use these experimental works to fill out Quine’s speculations, however, given the vagueness of Quine’s speculations it would be a stretch to call this work Quinean. Thus, I would agree with Childers et al that Quine’s notion of similarity and analogy is too vague to do the work he set for himself.

When it comes to Skinner, Childers et al err two major ways. Firstly, they argue that Skinner’s Verbal Behaviour book would be unheard of today if it wasn’t connected in people’s mind with Quine’s empiricist model. However, Skinner’s Verbal Behaviour has in fact spawned a massive experimental literature with hundreds of experimental studies into Skinner’s Verbal Operants (Sauter and Leblanc 2006, Jennings et al 2021). At first research inspired by Skinner focused on simpler verbal operants such as Mands and Tacts with a lot of research being applied research with people with developmental disabilities (Ochs and Dixon 1989). But over the last 15 years there has been an explosion of research into more complex verbal operants such as the intraverbal, and there is a massive increase in non-applied experiments and studies of people without an intellectual disability or autism (over 50% of the people studied have no diagnosis) (Jennings et al 2021). As the years go by the pace of experimental tests into Skinner’s Verbal Operants are rapidly increasing (Aguirre et al 2016).  There is no evidence, or reason to think, that any of this research was inspired by Skinner’s name being associated with Quine. In fact Quine’s name is virtually never cited in papers on Verbal Behaviour, or in papers about behavioural off shoots from Verbal Behaviour.

Secondly, Kenneth McCorquodale (1970), replied to Chomsky’s review of Skinner’s Verbal Behaviour and noted that when Chomsky criticized Skinner’s notion of the probability of a Tact occurring, he confused the probability of a tact being used in a particular moment, versus the probability of a Tact being used at any point in a person’s life McCorquodale (1970). Childers et al criticised the use of the notion of “momentary probability” as an idiosyncratic use of probability and went on to make the following point:

. “More importantly, his claim that under certain circumstances the relevant probability becomes “extremely high” is unwarranted, unless we already know how the language functions.” (Childers et al p. 221).

It is unclear precisely what Childers et al mean by “under certain circumstances”. One circumstance could be in studies on intraverbal acquisition. If a child is being taught to use an intraverbal, using multi-exemplar training involving the frame “the wheels on the bus go_”, what would the probability of a child saying “round and round”? Are Childers et al seriously suggesting that we cannot estimate the probability cannot be established in these circumstances? And given that we have established developmental milestones on things like stimulus equivalence and the training procedures used to elicit them are Childers et al seriously suggesting that we cannot assign probabilities in these experimental settings? Granted Chomsky’s point still stands, it is probably not possible to assign probabilities to certain words being spoken as people interact daily, but has little bearing on experimental control in precise settings which is what Skinner and Relational Frame Theorists were interested in.

            Ultimately what Childers et al argue is that both Quine and Skinner had to do in response to Chomsky’s criticisms is to progressively modify their positions with more and more innate architecture. There is little evidence to support this interpretation of Quine, see Gibson (1987), for a detailed exposition of the extended debate between Quine and Chomsky on innate architecture. However, I largely agree with Childers et al that Quine’s vague sketch of analogical synthesis wasn’t detailed enough or clear enough to account for our linguistic development. This is important because Quine was committed to giving a mechanistic explanation for our behavioural capacities. He argued consistently throughout his career that it is ultimately at the neurological level we should be looking for our explanations, that behaviour was just data to point us towards underlying mechanisms. If the data turned out to support a Chomskian type architecture I don’t think it would have much difference to Quine’s overall project of naturalizing epistemology.

            Skinner on the other hand was interested in discovering behavioural regularities. He was interested in underlying neuroscientific explanations only insofar as they helped in functional control of the organism in particular circumstances. Whether people are impressed with the literature inspired by Skinner’s verbal behaviour and the predictions it makes is one thing. But this literature needs to be engaged with we cannot stipulate a priori how much experimental control has been gained in a particular experimental setting. As things stand there is little evidence that empirical evidence that Skinner’s account will ever be able to handle linguistic productivity[1]. And this will make it extremely difficult to ever gain predictive control over more complex linguistic behaviour beyond Tacts, Mands and simple Intraverbals. Behaviourists inspired by Skinner such as Relational Frame Theorists claim that they can handle linguistic productivity but since there is little engagement with linguistic data it is hard to test these claims. Ultimately these behaviourist studies are engaged in tests of how the organism behaves not stipulating the nature of the organism’s brain. The historical story Childers et al tell of Quine, Skinner and Chomsky’s dispute being mirrored in contemporary debates on connectionism, Large Language Models, and standard computational theories. The work of Quine and Skinner are very different from each other and both of their views have very little in common with work ongoing in Artificial Intelligence.


[1] See David C Palmer 2023 ‘Towards a Behavioural Interpretation of English Grammar’ for a recent Skinnerian inspired attempt at understand grammar behaviourally.

The Extended Mind and Intellectual Disability.

The Extended Mind is a thesis by Andy Clark and David Chalmers which states that the Mind Extends beyond the brain and encompasses aspects of the physical world. They give an example of a person who has dementia who keeps a diary to remind him of things he needs to do, such as when to take medication, where things are stored etc. If the person with dementia has reliable access to this diary most of the time, then they argue that the information in the diary is part of his extended mind.

            In ordinary circumstances if I want to remember when to do something it is because the information is stored in my brain, and I can access to information to make decisions. I don’t have to always have access to the information sometimes I may forget, but in general I have reliable access to the information. Chalmers and Clark argue that it is arbitrary to consider the information stored in a brain which you can reliably access to be part of your mind but to think that information in your diary which you can reliably access isn’t.

The thesis is counter intuitive. And some people reject it because of this counterintuitive feel, arguing that the thesis extends our ordinary concept of cognition too far. However, this counter argument has little force. There is little reason to assume that our theoretical understanding of a particular phenomena should be intuitive at first. Logical coherence should be the test of the theory not whether it chimes with your folk-psychological concepts.

The argument of Chalmers and Clark focused on information within a diary, but today with our phones which we carry everywhere with us storing so much information, the argument becomes even more radical, with it implying that aspects of the internet that we can reliably access are part of our extended mind.

Intellectual Disability

Psychologist J.J Gibson wrote about affordances which are relational aspects of our environment which we could interact with. Affordances relate not just to features of the environment but to the suitability of the environment to an observer or agent. Thus, steep stairs are an affordance for a person who can walk but to a person in a wheelchair they are not an affordance. Our natural and social environments contain affordances for some people but not for all. What are affordances are depends on the intentions, capabilities, and interests of the individual. In general, we tend to build our environment in such a way that can help people access the affordances they need, for example, building wheelchair ramps.

People with an intellectual disability who are non-verbal have interests and desires, but as a result of being non-verbal they will have difficulties in accessing various affordances in their environment. It is for this reason that there are practices and regulations in place which ensure that organisations who care for people with an intellectual disability do everything possible to ensure that they facilitate their communication capacities. Doing this involves giving them access to Speech and language therapists, occupational therapists, and a full Multi-Disciplinary Team. Furthermore, staff working with them must be trained in things like techniques to facilitate non-verbal people communicating their needs. Various types of augmentative communication devices are used on the recommendations of SLT, and things are in place such as visual schedules and Picture Exchange Communication System (an augmentative system based on Skinners Verbal Behaviour).

We saw above when discussing the patient with dementia who had reliable access to his diary that this could be considered a part of his extended mind. In the case of a non-verbal person in a service, there are affordances in their environment they may want to access such as going for coffee, visiting friends, going for a bus drive etc. If they can use PECS cards to indicate what they want, and those cards are not reliably available they you are taking away a part of their extended mind. It would be analogous to taking away a person’s prosthetic leg. A prosthetic leg may be artificial, but it is still a part of the persons way of accessing affordances in their environment and the same thing would apply to augmentative communicative systems.

I would argue that a similar thing is true of staff working with non-verbal people in a service. The staffing team ends up as part of the extended mind of the non-verbal person they are supporting. Just like the diary of the person with dementia contains information he can access, and my I-phone contains information I can access, staff working with non-verbal people with an intellectual disability contain information and affordances which a non-verbal person cannot access on their own.

As discussed above, not providing a person with reliable access to their prosthetic leg, denies them access to affordances in their environment, and likewise not providing a non-verbal person with experienced staff who know them and are versed in communication training is denying them affordances in their environment. It is for this reason that there is such a massive push in policies to ensure that effective communication training is available, consistent staffing are maintained etc. However, in a lot of the literature this is spelled out in atomistic terms. Thinking of these issues in terms of the extended mind helps people think  more relationally, and emphasises how our environment, social network, and social supports are partially constitutive of our own minds.

Quine on the Interdependence of Mands and Tacts.

In this blogpost I will consider Quine’s relation to two of B.F. Skinner’s Verbal Operants: the Mand, and the Tact. I will argue that Quine’s Observation Sentence is analogous to Skinner’s Tact, but that Quine makes very little use of the notion of a Mand. A Mand is one of the most studied verbal operants, and it is the first verbal operants that is targeted by behavioural scientists if a child is experiencing language delays. Yet Quine gives the Mand a very small role in his overall theory of language acquisition. I will discuss Quine’s reasons for not considering things like Manding as important in sketching his naturalized epistemology and then discuss the degree to which these verbal operants are separable and how their separability will affect Quine’s choice of downplaying of the role of Mand’s in his story of how we go from stimulus-to science.

Quine’s Relation to Empirical Psychology

While Quine didn’t use the same vocabulary as Skinner some of his concepts map effortlessly onto Skinner’s concepts. Thus, Quine’s notion of an Observation Sentence is the same as Skinner’s notion of a Tact (both are shaped by discriminative stimulus and social feedback). Nonetheless despite arguing for this functional independence between different Verbal Operants Skinner noted that for ordinary speakers the two may be entwined. Thus, if a person has acquired a label as a Tact the chances are that they will be able to use it as a Mand. Skinner gives four reasons to support his argument for interdependence. (1) Tact emergence may be facilitated by the acquisition of a Mand in the presence of the Manded stimulus. (2) The similarity between the stimulus that evokes a Tact and that that evokes a Man may be similar enough to affect a transfer. (3) Transfer may occur if care givers reinforce one operant as if it were the other, (4) Children early in life may acquire generalized verbal skills which result in both the Mand, and the Tact being acquired (Petursdottir et all p.60). Skinner was speculating about these matters but in the 60 years since he wrote ‘Verbal Behaviour’ their partial interdependence has been confirmed.

Quine showed no interest in Mand’s and hence he had little interest in how Mand’s and Tacts related to each other and affected the process of acquiring a language.  He was clear that because of his interest in ontology and epistemology he was giving an idealized conception of how we acquire language. He was interested in observation sentences because they are our entering wedge into language and hence to our theory of the world. He is explicit that the story he gives of how we go from stimulus to science is meant to be an impressionistic one. He fully acknowledges that his story may deviate from the story told by a fully worked out science.

            Quine’s position on this matter is dubious. He was critical of Carnap for engaging in make belief in place of making use of current scientific psychology.

“But why all this creative reconstruction, all this make belief? The stimulation of his sensory receptors is all anybody has to go on, ultimately, in arriving at his picture of the world. Why not see how this reconstruction really proceeds? Why not settle for empirical psychology?” (Epistemology Naturalised p. 74)

 Yet Quine himself is in effect engaged in a made-up story about how we go from stimulus to science. He justifies this as follows:

“Much of what is earliest and most urgent in language learning, furthermore, is a matter of neither stating nor assenting nor acting upon statements, but of importuning…But statement learning is what is relevant to our study, which aims at understanding the acquisition of scientific theory.” (The Roots of Reference p. 46)

“Anyway, I am not bent even upon a factual account of the learning of English, welcome though it would be. My concern is with the essential psychogenesis of reference would be fulfilled in fair measure with a plausible account of how one might proceed from infancy step by step to a logically regimented language of science, even bypassing English” (Ibid p. 92)

 Because of Quine’s emphasis on cognitive language, he is ignoring the messier pragmatic aspects of learning a language that are described by people such as the later Wittgenstein, and Skinner. There is a sense in which we can justify this; it is after all standard practice in the sciences to engage in idealizations. Quine could be parsed as sketching a scientific story of how we go from stimulus to science; we can use this abstract sketch and fill in the details as we learn more about the acquisition process. Nonetheless, Quine does seem to be guilty of holding Carnap to higher standards than he holds himself to.

Because Quine is only interested in descriptive language and its role in us acquiring our theory of the world; he claims he doesn’t need to think about things such as importuning or as Skinner would it Manding.  Skinner had noted that Mands such as asking for water would be controlled by an establishment operation of a deprivation such as thirst and subsequent reinforcement of the thirst being relieved, while the Tact for Water could be controlled by a non-verbal discriminative stimulus, which the person was reinforced for saying the word in the presence of. Thus, you would have two different operants controlling the one sound for ‘Water’. As we saw above Skinner thought that these two operants would in practice end up entwined. If indeed Mands and Tacts are intertwined, then this would affect any proposed psychologically realistic story of how we go from stimulus to science.

Lamarre and Holand (1985) did a study on children tested the independence Tacts and Mands with preschool children. They trained children up on the relations “on the left” and “on the right” as both Mands and as Tacts. The study found that when the children learned them as Tacts, they couldn’t generalize them to Mands without training, while when they learned them as Mand’s they couldn’t generalize them to Tacts without training. This indicated that the two terms were for young children at first functionally independent and that transfer from one to the other wasn’t automatic.

Lamarre and Holland’s original study has been criticised by Wallace et al 2006. They noted that Lamarre and Holland’s establishing operations may not have been clear. There is no indication whether the items the child is manding for or the reinforcement they are receiving for Manding are items the child desires. When this was controlled for in other experiments the transfer from Tacts to Mands occurred for preferred items. Wallace et al claim their study demonstrates how responses taught as Tacts can facilitate the establishment of Mands for high preference items. And they noted that their experiments showed a difficulty with Tact-to-Mand transfer for low preference items (in line with Lamarre and Hollands study). Demonstrating that their lack of transfer was more than likely caused by not using sufficiently motivating reinforcers.

Gamba et al (2016) have done a meta-analysis on studies into whether functional independence in Mand-Tact independence has been demonstrate empirically. And they noted that there has been 28 empirical studies into the functional independence of the Mand and the Tact since Lamarre and Holland’s original study. They noted that there have been 13 studies which have demonstrated the functional independence of Mands and Tacts, but that in studies which the stimuli tacted and Manded were preferred items transfer of function occurred (Gamba et al p.27). Whether these studies are sufficient to cast doubt on the functional independence of the Mand and Tact is hard open to interpretation. Gamba et al note that in some of these experiments the Manded items were present which may have served to evoke previous Tacts and this may need to be controlled for in future experiments (ibid p. 31).

Thus far the experimental results are not sufficient to conclusively demonstrate the functional independence of Mand’s and Tacts. Quine appears to have been agnostic on this issue. However, one questions whether he has a right to this agnosticism. He was critical of Carnap for engaging in make belief in his epistemology, but his own Naturalized Epistemology engages in as much make belief. Quine’s focus on observation sentences at the expense of things like Mand’s serves to distort our picture of how we acquire our language and scientific heritage. In abstracting away from these details Quine is giving a hyper-intellectual fantasy of how we acquire language he is letting the metaphor of the scientist being a disinterested theorist keying observation sentences to stimuli, blind him to the more pragmatic aspects of language acquisition.

In this blogpost I discussed Quine’s relation to Skinner’s Verbal Operants of Mands and Tacts. In my next post I will focus on linking Quine’s concept of association of sentences to sentences with recent empirical work on Skinner’s Verbal Operant the Intraverbal.