Autoregressive LLMs and the Limits of the Law of Accelerated Returns
The rapid rise of artificial intelligence (AI), particularly in the domain of language models, has sparked debates about the long-term trajectory of their development. One concept often invoked is the "Law of Accelerated Returns," which predicts exponential technological growth. However, when it comes to autoregressive language models (LLMs), there is reason to believe that their innovation and advancements do not necessarily follow an exponential curve. This article examines autoregression, autoregressive LLMs, and why the finite availability of high-quality data imposes limitations on their future progress, with evidence showing that the pace of advancement has already slowed.
What is Autoregression?
Autoregression is a concept in time-series analysis where future values in a sequence are predicted based on past values. In statistical modeling, autoregressive models are employed to forecast time-dependent data by assuming that the future is a function of prior observations.
For example, if we’re analyzing daily stock prices, the price on Day 2 might depend on Day 1, and the price on Day 3 could depend on both Day 1 and Day 2. This recursive mechanism forms the core of autoregressive models.
What Are Autoregressive LLMs?
Autoregressive LLMs are a class of AI language models that generate text by predicting the next word (or token) in a sentence, given the previous words or tokens. Unlike traditional statistical models, autoregressive LLMs can generate human-like responses by training on large amounts of text data. At each step, the model computes a probability distribution over possible next tokens based on the preceding sequence, selecting the most probable one and iterating the process until the entire text is generated.
The most prominent example of autoregressive LLMs is OpenAI's GPT series, which excels in tasks like text completion, question-answering, and conversational AI. These models rely on vast datasets, often consisting of billions of words, to develop sophisticated linguistic capabilities. They are "autoregressive" because the next token in the sequence depends on the context provided by the previous tokens, in much the same way a statistical autoregressive model predicts time-series data.
The Role of Data in Autoregressive LLMs
One of the most critical components in the functioning of autoregressive LLMs is data. These models require enormous datasets to be effective. Data in this context consists of "tokens" — which represent the smallest meaningful units, often individual words, characters, or subword fragments. For an LLM to perform well, it needs exposure to a wide variety of tokens from diverse sources.
Training an LLM involves feeding it a large corpus of text and optimizing it through millions of iterations, during which the model learns the probabilistic relationships between tokens. More tokens generally result in better model performance, as the model is exposed to a richer set of linguistic structures and meanings. However, as we’ll explore, this dependency on data also imposes limits on how far these models can improve.
The Law of Accelerated Returns: A Brief Overview
The Law of Accelerated Returns (See The Law of Accelerated Returns) suggests that technological progress follows an exponential trajectory. According to this view, advances in one area (such as computational power or data storage) accelerate progress in other areas, leading to faster and faster growth over time. This law has been applied to many fields, including computing (as seen in Moore’s Law), where the number of transistors on a chip doubles approximately every two years.
The idea behind this law is that as technology improves, so do the tools used to develop the next generation of technologies, leading to a compounding effect. If LLMs were to follow this path, we would expect their performance to increase exponentially as more powerful algorithms and hardware become available.
Why Autoregressive LLMs Don’t Follow the Law of Accelerated Returns
Despite the remarkable progress of LLMs, there is no clear evidence that their development follows the Law of Accelerated Returns. Several key factors suggest that autoregressive LLMs, in particular, may not continue to improve at an exponential rate. In fact, the progress seen in recent years has already started to slow, especially when compared to the significant leap that occurred with the introduction of large models like GPT-4 and its applications, such as ChatGPT. Here’s why:
1. Finite Availability of High-Quality Data
The most fundamental limiting factor is the finite amount of good quality data available to train these models. While the internet contains vast amounts of text, not all of it is useful or high-quality. The performance of LLMs heavily depends on the quality, diversity, and relevance of the data they are trained on. Low-quality or biased data can result in poor model performance, introducing biases, errors, or undesirable behavior.
Even if we set aside ethical considerations such as copyright concerns or issues of social representation, the raw amount of meaningful linguistic data is limited. For instance, certain domains such as specialized scientific or medical knowledge have inherently less publicly available data. As LLMs get larger and more powerful, they consume more of the existing data, and the returns on performance start to diminish.
2. Tokenization and Its Relation to Data
Tokens are the basic building blocks in autoregressive LLMs, and their number is finite in any given language. While tokenization strategies have improved, allowing models to compress and generalize language efficiently, there is still a limited set of possible tokens and combinations in natural languages. The upper bound on meaningful tokens constrains the extent to which LLMs can learn novel relationships or develop new linguistic capabilities.
Over time, we see diminishing returns as models reach the limits of what current tokenized data can offer. Once all available high-quality data has been used, improvements to LLM performance will become more incremental, requiring increasingly refined training techniques rather than relying solely on additional data.
3. Hardware vs. Data Constraints
Much of the improvement in LLMs over recent years has been driven by advances in computational hardware, such as Graphics Processing Units (GPUs) and specialized chips like Tensor Processing Units (TPUs). While better hardware can enable training on larger datasets and increase model sizes, it cannot overcome the fundamental limitation posed by the availability of data.
In contrast to Moore’s Law, which describes the exponential growth in computing power, the law does not apply directly to data quality. Even with an abundance of hardware, the performance of LLMs will eventually plateau unless new sources of data or fundamentally different approaches to training models are developed. New sources of data or in other works additional unique human information is a factor of human evolution, which is linear. (See Exponential Technology vs. Linear Biology)
The Process Has Already Slowed
When GPT-3 and its generative applications like ChatGPT were first introduced, they marked a major leap in the capabilities of LLMs. However, the rate of innovation since then has noticeably slowed. Early advancements in LLM technology were driven by the massive increase in model sizes and the integration of vast datasets, leading to rapid improvements in language understanding and generation.
But as these models have grown larger, the marginal gains from adding more parameters or slightly more data have started to decrease. This slowdown aligns with the broader trend that suggests LLM development is reaching a phase of diminishing returns. While improvements continue to happen, they are no longer as dramatic or exponential as they were during the initial phases of deployment, providing further evidence that the Law of Accelerated Returns does not hold for this specific technology.
The Parallels to Semiconductors: A Finite Growth Model
A useful analogy for understanding the trajectory of LLM development is the semiconductor industry. For decades, Moore’s Law held true, with the number of transistors on a chip doubling every two years. However, as we approach the physical limits of semiconductor fabrication, this exponential growth has slowed, and we no longer see the same rapid advances in processing power.
Autoregressive LLMs face a similar fate. While their progress has been impressive, the finite amount of high-quality data imposes a natural limit on their continued exponential growth. Just as semiconductor technology has reached the bounds of what current materials and methods allow, LLMs are constrained by the availability and quality of the tokens they can consume.
Conclusion
Autoregressive LLMs represent a remarkable advancement in AI, capable of generating human-like text by predicting the next token in a sequence. However, despite their progress, they are unlikely to follow the Law of Accelerated Returns. The finite amount of high-quality data and the limitations of tokenization place constraints on how much further these models can improve, much like the semiconductor industry’s approach to its physical limits.
The process has already slowed compared to the early days of LLMs, when models like GPT-4 showed transformative leaps in capabilities. While future innovations in algorithms and hardware may drive incremental improvements, there is no reason to believe that LLMs will continue to advance exponentially. Instead, we should expect progress to slow as the field reaches the limits of what current data and methodologies can achieve. As with semiconductors, new paradigms will be necessary to unlock the next wave of advancements.
Further read
From Infinite Improbability to Generative AI: Navigating Imagination in Fiction and Technology
Human vs. AI in Reinforcement Learning through Human Feedback
Generative AI for Law: The Agile Legal Business Model for Law Firms
Generative AI for Law: From Harvard Law School to the Modern JD
Unjust Law is Itself a Species of Violence: Oversight vs. Regulating AI
Generative AI for Law: Technological Competence of a Judge & Prosecutor
Law is Not Logic: The Exponential Dilemma in Generative AI Governance
Generative AI & Law: I Am an American Day in Central Park, 1944
Generative AI & Law: Title 35 in 2024++ with Non-human Inventors
Generative AI & Law: Similarity Between AI and Mice as a Means to Invent
Generative AI & Law: The Evolving Role of Judges in the Federal Judiciary in the Age of AI
Embedding Cultural Value of a Society into Large Language Models (LLMs)
Lessons in Leadership: The Fall of the Roman Republic and the Rise of Julius Caesar
Justice Sotomayor on Consequence of a Procedure or Substance
From France to the EU: A Test-and-Expand Approach to EU AI Regulation
Beyond Human: Envisioning Unique Forms of Consciousness in AI
Protoconsciousness in AGI: Pathways to Artificial Consciousness
Artificial Consciousness as a Way to Mitigate AI Existential Risk
Human Memory & LLM Efficiency: Optimized Learning through Temporal Memory
Adaptive Minds and Efficient Machines: Brain vs. Transformer Attention Systems
Self-aware LLMs Inspired by Metacognition as a Step Towards AGI
The Balance of Laws with Considerations of Fairness, Equity, and Ethics
AI Recommender Systems and First-Party vs. Third-Party Speech