How do LLMs and humans differ in the way they learn and use language

Marina Pantcheva 19 Sep 2023 6 mins
LLM RWS
The linguistic abilities of large language models (LLMs) have improved to a point where they seem to rival the language competence of humans. But is this truly the case? To find out, we take a close look at how LLMs and humans learn and produce language, highlight the differences, and recommend the most suitable language-related applications of LLMs.  
 
In March 2023, OpenAI reported that its GPT4 model outperformed 93% of human test takers at the SAT Reading and Writing test. On the one hand, it is a remarkable achievement that an artificial neural network can outcompete humans in language-related tasks. On the other hand, it does so after being trained with trillions of sentences encompassing much of the world wide web. Meanwhile, the biological neural networks in the brains of SAT-passing students are trained on a fraction of the training data consumed by a large language model (LLM). Even more astonishingly, young children achieve fluency in their native language with exposure to no more than 5 million tokens, in contrast to the petabytes of data an LLM needs to reach fluency.  
 
Why is it that humans need so little training data while LLMs must ingest vast language corpora to achieve comparable language competence?  
 
The answer is simple: because LLMs and humans learn language differently.

Language learning: Humans have Universal Grammar; LLMs use statistics

The question of how children learn language is central to modern linguistics. Numerous theories have been put forth to explain this process. To mention a few:
  • Social interactionist theory suggests that feedback and corrections play a pivotal role in language acquisition and emphasizes the crucial role of social interaction between the child and the linguistically knowledgeable adults.
  • Behaviorist theory posits that children learn language by mimicking those around them and receiving positive reinforcement for their linguistic endeavors.
  • Statistical learning theory proposes that, whether consciously or not, children use the natural statistical properties of language to deduce its structure, including sound patterns, words, and grammar. Thus, children are sensitive to the frequency with which syllable and word combinations appear in comparison to other syllables and words.
  • Universal grammar theory argues for the existence of innate constraints on what the grammar of a possible human language can look like (Universal Grammar). When children are exposed to linguistic stimuli, they adopt rules that conform to this Universal Grammar. In essence, children possess an innate biological component, aptly called a “Language Instinct” by Steven Pinker, that enables the rapid development of languages despite the limited and often imperfect linguistic input they receive.
Among the theories listed above, Universal Grammar offers some of the most compelling explanations for the “magic” behind children’s effortless acquisition of linguistic competence in the presence of very little linguistic data. In computational terms, Universal Grammar serves as a learning bias – an innate set of constraints that enables children to learn language from a small dataset. Without this learning bias that effectively constrains the hypothesis space, children would not converge on their grammar in a reasonably short timeframe, i.e., before their brains lose plasticity. 
 
Large language models lack the innate linguistic bias that children have. They gain linguistic competence from the surface statistics of the training data much in line with the tenets of the Statistical learning theory. This said, the way LLMs learn is not entirely unconstrained; machine learning scientists keep tinkering with layers, gates, and hyperparameters. But in order for an artificial neural network to learn as efficiently as a child, it must be enriched with non-trivial structural priors similar to the human language instinct. Only LLMs with a human-like linguistic learning bias can achieve human-level linguistic competence when trained on datasets as small as the ones humans learn from.

Language production: Humans use hierarchical structures; LLMs follow sequential order

The contrast between large language models and humans becomes evident not only in their language learning skills but also in how they generate language. Human linguistic performance is grounded in a grammar with a complex hierarchical structure. In contrast, LLMs generate language in a linear manner, predicting the next most likely token in a sequence. This difference may not be noticeable in simple, commonly used, and grammatically straightforward constructions. However, it becomes apparent when we look at intricate and infrequent grammatical structures. In these cases, the "flat grammar" employed by LLMs gives wrong results. 
 
Consider, for instance, the following two sentences, where the first one is a correct sentence, albeit complex, while the second one is judged as incorrect by English speakers (indicated by an asterisk *). 
  1. This is the boy that Mary met yesterday and will talk to tomorrow.
  2. *This is the boy that Mary met yesterday and will talk to you tomorrow.
When asked which one of the two sentences is correct, GPT 3.5 (as well as other tested LLMs) erroneously identifies the second sentence as the correct one. GPT 3.5 justifies this choice by pointing out that the second sentence clearly specifies to whom Mary will talk tomorrow. The reason for this wrong judgement is that GPT 3.5 lacks knowledge of the underlying deep hierarchical structure of this sentence. GPT 3.5 simply judges the sentence grammaticality by the likelihood that the tokens “talk to” would be followed by the token “you” rather than by the token “tomorrow”. It appears that there are too few instances of such structures in the GPT 3.5 training data for it to be able to make the correct prediction.
 
Thanks to their structure-based linguistic knowledge, children can accurately distinguish the first sentence as correct and the second sentence as incorrect despite having limited exposure to examples like these.

Where can we utilize the linguistic abilities of Large Language Models?

Undoubtedly, large language models stand out as an immensely valuable tool, demonstrating remarkable results across a range of linguistic tasks. Their exceptional performance can be attributed to their vast context window and expansive working memory. Taking as an example GPT 3.5, its context window size of 2,000 tokens has no equivalent in human capabilities. It's virtually impossible for a human to possess a working memory with a perfect recollection of thousands of recently encountered words. For these reasons, LLMs excel at tasks that benefit from a large working memory and do not require syntactic knowledge of the depth a human has. Examples of such tasks are content summarization, terminology extraction, definition provision, sentiment analysis, and more. The immense vocabulary of LLMs and their knowledge of common syntactic patterns make them well-suited for tasks like altering the writing style of a text, converting or neutralizing gender references, unifying voice and register, and others.
 
At the same time, as we’ve seen, LLMs still lag behind humans when it comes to understanding the deep structure of language. Their grammaticality judgements are based on probability instead of on knowledge of syntax. Consequently, the involvement of human linguistic experts in validation of LLM output, prompt design, and fine-tuning remains crucial.
 
Today’s rapid advancements in machine learning and AI are transforming the way linguistic tasks are done. Tune in to our Globally Speaking podcast, where Marina and Bart Maczynski will discuss the topic in more detail. And if you’re interested in learning more, contact us for a consultation on how RWS can support you on your AI journey. 
Marina Pantcheva
Author

Marina Pantcheva

Senior Group Manager
Marina is a Senior Group Manager at RWS. Marina holds a PhD degree in Theoretical Linguistics and is passionate about data, technology and exploring new areas of knowledge. She leads a team that develops solutions for Crowd Localization, covering tech solutions, BI, linguistic quality, community management and more.
All from Marina Pantcheva