Are large language models the intellectual ancestors of Behaviorism

Childers et al 2023 argue that the debate between Chomsky and Quine and Skinner is revisited by contemporary debates on Large Language Models and Chomskian linguistics. They go on to argue that proponents of the view that data driven large language models mirror human natural language make the same mistakes that Chomsky critiqued people like Quine and Skinner ago 50 years ago. In this blog post I will argue that Childers et al largely misinterpret Skinner and Quine’s project and hence any connections they draw with Large Language Models and Connectionism are problematic.

            Childers et al argue that Quine and Skinner’s empiricism has collapsed under criticisms from Chomsky. And that Quine responded to this collapse by modifying his position his empiricism to accommodate Chomsky’s position by appealing to innate mechanisms.  They note that once such an appeal accepted then we are in a place where we are no longer empiricists, and our position is closer to that of rationalists. I would argue that this is an idiosyncratic reading of empiricism. Empiricists, like Hume and Locke, appealed to innate mechanisms explain how we acquired our knowledge of the world. It is just that the innate mechanisms they appealed to wouldn’t be sufficient to account for the complexity of human language and cognition. Quine’s externalized empiricism is of a piece with Hume’s except for in Quine’s case he is advocating for the innate mechanisms to be determined experimentally. Childers et al call this hybridised empiricism and note that it is empiricism only in name. It is unclear to me why Quine arguing that we determine what innate structures are necessary based on behavioural tests should be considered anything other than ordinary empiricism externalized.

            They do go on to make the further point that Quine’s speculations about the innate principles necessary are extremely vague. I would agree with them on this point; when Quine talks about analogical synthesis being the method in which we connect sentences with sentences he is extremely vague. Gibson (1987) wrote about Quine’s postulation of analogical synthesis being a postulated innate structure to be mapped by future scientists. In the 35 years since Gibson wrote those words there has been very little work done by philosophers influenced by Quine filling in the details of this project.

            There has though been scientific research into analogy from a scientific perspective in both behavioural science and cognitive science. Relational Frame Theorists are behavioural scientists whose experimental work on language has left them to abandon some of Skinner’s account of language. Thus, they argue their research shows that rule following in language changes how people respond to schedules of reinforcement, and they argue that human specific emergent properties such as the ability to derive stimulus equivalence, and relations of coordination, hierarchy, etc. These relational frames under contextual control exhibit the property of Mutual entailment, combinatorial entailment, and transfer of function (Relational Frame Theory an Overview p.62). In Relational Frame Theory they demonstrated empirically that analogy is a relational frame which typically emerges at about 5 years of age, defining analogy as the capacity to derive sameness amongst equivalence relations or equivalence-equivalence responding.  They think that these derived relational frames may play a role in linguistic productivity. However, it would be unfair to argue that they vindicate Quine’s concept of analogical synthesis as Quine’s vague formulations played no role in the experimental paradigm. And furthermore, it is impossible to tell the degree to which RFT is consistent with Quine as his formulations were too vague to match on to them.

            In cognitive science analogical reasoning has been researched in detail see for example Hofstadter (2013), and Gentre and Hoyos (2017). Gentre has being doing experimental work on analogies for over 40 years. Based on her experimental work she argues that children start deriving analogies from 7months of age. The discrepancy between her chronology of when analogies are acquired and the timeline in RFT can be accounted for in the lower criteria set for what counts as an analogy for Gentre. Gentre argues that analogies involve transfer of knowledge from one area to another whereas RFT theorists argue that stimulus-stimulus equivalence is necessary for something to count as analogy.

            In their ‘Analogy and Abstraction’ Gentner and Hoyos note a difficulty in explaining children’s acquiring of abstract analogies and their difficulty is like the one which faces Quine. We know from the psychological literature that children (1) prefer to extend their concepts based on bare perceptual similarities, (2) this results in them making concrete analogies, (3) children can extend their comparison classes through multiple exemplars.

Two of the primary ways of making an analogy are through projective alignment and through mutual alignment. Projective alignment is when people use an already well understood domain to illuminate another domain. However, when it comes to young children, they do not have a large store of well understood concepts, so it is difficult for them to use already understood concepts to explain a different concept. Therefore, young children typically use mutual alignment in their analogical abstraction. In mutual alignment analogies one discovers commonalities which were not obvious in either analog (ibid p. 3). Mutual alignment involves establishing a structural alignment between two-representations based on matching relations between analogues (ibid p.4).

The difficulty is to explain how children can go from using concrete analogies based on bare perceptual similarities to more abstract relational concepts. One problem is that presenting young children with exemplars which are not perceptually similar will not be helpful as the children won’t have the capacity to mutually align the two analogs. So, we are left with a mystery as to how young children form more abstract relational analogs. Gentner and Hoyos argue that we overcome this barrier through a process called progressive alignment. Experimental studies have shown that if young children are presented with abstract relational analogues they cannot pick up on the relationship. However, if they are first trained on concrete analogues and then later retested on the abstract relational analogues their performance improves dramatically.

This progressive alignment gives young children the capacity to move beyond bare similarity and acquire more abstract relations. Like in the case of RFT it is possible to use these experimental works to fill out Quine’s speculations, however, given the vagueness of Quine’s speculations it would be a stretch to call this work Quinean. Thus, I would agree with Childers et al that Quine’s notion of similarity and analogy is too vague to do the work he set for himself.

When it comes to Skinner, Childers et al err two major ways. Firstly, they argue that Skinner’s Verbal Behaviour book would be unheard of today if it wasn’t connected in people’s mind with Quine’s empiricist model. However, Skinner’s Verbal Behaviour has in fact spawned a massive experimental literature with hundreds of experimental studies into Skinner’s Verbal Operants (Sauter and Leblanc 2006, Jennings et al 2021). At first research inspired by Skinner focused on simpler verbal operants such as Mands and Tacts with a lot of research being applied research with people with developmental disabilities (Ochs and Dixon 1989). But over the last 15 years there has been an explosion of research into more complex verbal operants such as the intraverbal, and there is a massive increase in non-applied experiments and studies of people without an intellectual disability or autism (over 50% of the people studied have no diagnosis) (Jennings et al 2021). As the years go by the pace of experimental tests into Skinner’s Verbal Operants are rapidly increasing (Aguirre et al 2016).  There is no evidence, or reason to think, that any of this research was inspired by Skinner’s name being associated with Quine. In fact Quine’s name is virtually never cited in papers on Verbal Behaviour, or in papers about behavioural off shoots from Verbal Behaviour.

Secondly, Kenneth McCorquodale (1970), replied to Chomsky’s review of Skinner’s Verbal Behaviour and noted that when Chomsky criticized Skinner’s notion of the probability of a Tact occurring, he confused the probability of a tact being used in a particular moment, versus the probability of a Tact being used at any point in a person’s life McCorquodale (1970). Childers et al criticised the use of the notion of “momentary probability” as an idiosyncratic use of probability and went on to make the following point:

. “More importantly, his claim that under certain circumstances the relevant probability becomes “extremely high” is unwarranted, unless we already know how the language functions.” (Childers et al p. 221).

It is unclear precisely what Childers et al mean by “under certain circumstances”. One circumstance could be in studies on intraverbal acquisition. If a child is being taught to use an intraverbal, using multi-exemplar training involving the frame “the wheels on the bus go_”, what would the probability of a child saying “round and round”? Are Childers et al seriously suggesting that we cannot estimate the probability cannot be established in these circumstances? And given that we have established developmental milestones on things like stimulus equivalence and the training procedures used to elicit them are Childers et al seriously suggesting that we cannot assign probabilities in these experimental settings? Granted Chomsky’s point still stands, it is probably not possible to assign probabilities to certain words being spoken as people interact daily, but has little bearing on experimental control in precise settings which is what Skinner and Relational Frame Theorists were interested in.

            Ultimately what Childers et al argue is that both Quine and Skinner had to do in response to Chomsky’s criticisms is to progressively modify their positions with more and more innate architecture. There is little evidence to support this interpretation of Quine, see Gibson (1987), for a detailed exposition of the extended debate between Quine and Chomsky on innate architecture. However, I largely agree with Childers et al that Quine’s vague sketch of analogical synthesis wasn’t detailed enough or clear enough to account for our linguistic development. This is important because Quine was committed to giving a mechanistic explanation for our behavioural capacities. He argued consistently throughout his career that it is ultimately at the neurological level we should be looking for our explanations, that behaviour was just data to point us towards underlying mechanisms. If the data turned out to support a Chomskian type architecture I don’t think it would have much difference to Quine’s overall project of naturalizing epistemology.

            Skinner on the other hand was interested in discovering behavioural regularities. He was interested in underlying neuroscientific explanations only insofar as they helped in functional control of the organism in particular circumstances. Whether people are impressed with the literature inspired by Skinner’s verbal behaviour and the predictions it makes is one thing. But this literature needs to be engaged with we cannot stipulate a priori how much experimental control has been gained in a particular experimental setting. As things stand there is little evidence that empirical evidence that Skinner’s account will ever be able to handle linguistic productivity[1]. And this will make it extremely difficult to ever gain predictive control over more complex linguistic behaviour beyond Tacts, Mands and simple Intraverbals. Behaviourists inspired by Skinner such as Relational Frame Theorists claim that they can handle linguistic productivity but since there is little engagement with linguistic data it is hard to test these claims. Ultimately these behaviourist studies are engaged in tests of how the organism behaves not stipulating the nature of the organism’s brain. The historical story Childers et al tell of Quine, Skinner and Chomsky’s dispute being mirrored in contemporary debates on connectionism, Large Language Models, and standard computational theories. The work of Quine and Skinner are very different from each other and both of their views have very little in common with work ongoing in Artificial Intelligence.


[1] See David C Palmer 2023 ‘Towards a Behavioural Interpretation of English Grammar’ for a recent Skinnerian inspired attempt at understand grammar behaviourally.

Leave a comment