Autoregressive LLMs and the Limits of the Law of Accelerated Returns

Content  including text and images © Aditya Mohan. All Rights Reserved. Robometircs, Amelia, Living Interface and Skive it are trademarks of Skive it, Inc. The content is meant for human readers only under 17 U.S. Code § 106. Access, learning, analysis or reproduction by Artificial Intelligence (AI) of any form directly or indirectly, including but not limited to AI Agents, LLMs, Foundation Models, content scrapers is prohibited. These views are not legal advice but business opinion based on reading some English text written by a set of intelligent people.

The rapid rise of artificial intelligence (AI), particularly in the domain of language models, has sparked debates about the long-term trajectory of their development. One concept often invoked is the "Law of Accelerated Returns," which predicts exponential technological growth. However, when it comes to autoregressive language models (LLMs), there is reason to believe that their innovation and advancements do not necessarily follow an exponential curve. This article examines autoregression, autoregressive LLMs, and why the finite availability of high-quality data imposes limitations on their future progress, with evidence showing that the pace of advancement has already slowed.

Futuristic Agricultural Fields: A Glimpse into 2050’s Technological and Natural HarmonyDescription: This vintage-style, black-and-white photograph captures a zoomed-in view of futuristic agricultural fields in the year 2050, where advanced technology and nature intersect. Vertical farming towers, managed by precision drones and robots, showcase an efficient and sustainable approach to food production. The detailed shot highlights the intricate design of sleek irrigation systems powered by renewable energy sources, while the texture of the image evokes a nostalgic, old-world aesthetic. The photo subtly reflects the theme of technological progress reaching natural limits, symbolizing the finite boundaries of innovation—much like the relationship between autoregressive LLMs and data availability, where further advancement is constrained by diminishing resources.

What is Autoregression?

Autoregression is a concept in time-series analysis where future values in a sequence are predicted based on past values. In statistical modeling, autoregressive models are employed to forecast time-dependent data by assuming that the future is a function of prior observations.

For example, if we’re analyzing daily stock prices, the price on Day 2 might depend on Day 1, and the price on Day 3 could depend on both Day 1 and Day 2. This recursive mechanism forms the core of autoregressive models.

What Are Autoregressive LLMs?

Autoregressive LLMs are a class of AI language models that generate text by predicting the next word (or token) in a sentence, given the previous words or tokens. Unlike traditional statistical models, autoregressive LLMs can generate human-like responses by training on large amounts of text data. At each step, the model computes a probability distribution over possible next tokens based on the preceding sequence, selecting the most probable one and iterating the process until the entire text is generated.

The most prominent example of autoregressive LLMs is OpenAI's GPT series, which excels in tasks like text completion, question-answering, and conversational AI. These models rely on vast datasets, often consisting of billions of words, to develop sophisticated linguistic capabilities. They are "autoregressive" because the next token in the sequence depends on the context provided by the previous tokens, in much the same way a statistical autoregressive model predicts time-series data.

The Role of Data in Autoregressive LLMs

One of the most critical components in the functioning of autoregressive LLMs is data. These models require enormous datasets to be effective. Data in this context consists of "tokens" — which represent the smallest meaningful units, often individual words, characters, or subword fragments. For an LLM to perform well, it needs exposure to a wide variety of tokens from diverse sources.

Training an LLM involves feeding it a large corpus of text and optimizing it through millions of iterations, during which the model learns the probabilistic relationships between tokens. More tokens generally result in better model performance, as the model is exposed to a richer set of linguistic structures and meanings. However, as we’ll explore, this dependency on data also imposes limits on how far these models can improve.

The Law of Accelerated Returns: A Brief Overview

The Law of Accelerated Returns (See The Law of Accelerated Returns) suggests that technological progress follows an exponential trajectory. According to this view, advances in one area (such as computational power or data storage) accelerate progress in other areas, leading to faster and faster growth over time. This law has been applied to many fields, including computing (as seen in Moore’s Law), where the number of transistors on a chip doubles approximately every two years.

The idea behind this law is that as technology improves, so do the tools used to develop the next generation of technologies, leading to a compounding effect. If LLMs were to follow this path, we would expect their performance to increase exponentially as more powerful algorithms and hardware become available.

Why Autoregressive LLMs Don’t Follow the Law of Accelerated Returns

Despite the remarkable progress of LLMs, there is no clear evidence that their development follows the Law of Accelerated Returns. Several key factors suggest that autoregressive LLMs, in particular, may not continue to improve at an exponential rate. In fact, the progress seen in recent years has already started to slow, especially when compared to the significant leap that occurred with the introduction of large models like GPT-4 and its applications, such as ChatGPT. Here’s why:

1. Finite Availability of High-Quality Data

The most fundamental limiting factor is the finite amount of good quality data available to train these models. While the internet contains vast amounts of text, not all of it is useful or high-quality. The performance of LLMs heavily depends on the quality, diversity, and relevance of the data they are trained on. Low-quality or biased data can result in poor model performance, introducing biases, errors, or undesirable behavior.

Even if we set aside ethical considerations such as copyright concerns or issues of social representation, the raw amount of meaningful linguistic data is limited. For instance, certain domains such as specialized scientific or medical knowledge have inherently less publicly available data. As LLMs get larger and more powerful, they consume more of the existing data, and the returns on performance start to diminish.

2. Tokenization and Its Relation to Data

Tokens are the basic building blocks in autoregressive LLMs, and their number is finite in any given language. While tokenization strategies have improved, allowing models to compress and generalize language efficiently, there is still a limited set of possible tokens and combinations in natural languages. The upper bound on meaningful tokens constrains the extent to which LLMs can learn novel relationships or develop new linguistic capabilities.

Over time, we see diminishing returns as models reach the limits of what current tokenized data can offer. Once all available high-quality data has been used, improvements to LLM performance will become more incremental, requiring increasingly refined training techniques rather than relying solely on additional data.

3. Hardware vs. Data Constraints

Much of the improvement in LLMs over recent years has been driven by advances in computational hardware, such as Graphics Processing Units (GPUs) and specialized chips like Tensor Processing Units (TPUs). While better hardware can enable training on larger datasets and increase model sizes, it cannot overcome the fundamental limitation posed by the availability of data.

In contrast to Moore’s Law, which describes the exponential growth in computing power, the law does not apply directly to data quality. Even with an abundance of hardware, the performance of LLMs will eventually plateau unless new sources of data or fundamentally different approaches to training models are developed. New sources of data or in other works additional unique human information is a factor of human evolution, which is linear. (See Exponential Technology vs. Linear Biology)

The Process Has Already Slowed

When GPT-3 and its generative applications like ChatGPT were first introduced, they marked a major leap in the capabilities of LLMs. However, the rate of innovation since then has noticeably slowed. Early advancements in LLM technology were driven by the massive increase in model sizes and the integration of vast datasets, leading to rapid improvements in language understanding and generation.

But as these models have grown larger, the marginal gains from adding more parameters or slightly more data have started to decrease. This slowdown aligns with the broader trend that suggests LLM development is reaching a phase of diminishing returns. While improvements continue to happen, they are no longer as dramatic or exponential as they were during the initial phases of deployment, providing further evidence that the Law of Accelerated Returns does not hold for this specific technology.

The Parallels to Semiconductors: A Finite Growth Model

A useful analogy for understanding the trajectory of LLM development is the semiconductor industry. For decades, Moore’s Law held true, with the number of transistors on a chip doubling every two years. However, as we approach the physical limits of semiconductor fabrication, this exponential growth has slowed, and we no longer see the same rapid advances in processing power.

Autoregressive LLMs face a similar fate. While their progress has been impressive, the finite amount of high-quality data imposes a natural limit on their continued exponential growth. Just as semiconductor technology has reached the bounds of what current materials and methods allow, LLMs are constrained by the availability and quality of the tokens they can consume.

Conclusion

Autoregressive LLMs represent a remarkable advancement in AI, capable of generating human-like text by predicting the next token in a sequence. However, despite their progress, they are unlikely to follow the Law of Accelerated Returns. The finite amount of high-quality data and the limitations of tokenization place constraints on how much further these models can improve, much like the semiconductor industry’s approach to its physical limits.

The process has already slowed compared to the early days of LLMs, when models like GPT-4 showed transformative leaps in capabilities. While future innovations in algorithms and hardware may drive incremental improvements, there is no reason to believe that LLMs will continue to advance exponentially. Instead, we should expect progress to slow as the field reaches the limits of what current data and methodologies can achieve. As with semiconductors, new paradigms will be necessary to unlock the next wave of advancements.

Further read