The Case Against Regulating Matrix Multiplication

Content including text and images © Aditya Mohan. All Rights Reserved. Robometircs, Amelia, Living Interface and Skive it are trademarks of Skive it, Inc. The content is meant for human readers only under 17 U.S. Code § 106. Access, learning, analysis or reproduction by Artificial Intelligence (AI) of any form directly or indirectly, including but not limited to AI Agents, LLMs, Foundation Models, content scrapers is prohibited.

The White House Executive Order mandating detailed reporting for AI models trained with more than 10^26 floating-point operations (FLOPs) and computing clusters with a theoretical maximum of 10^20 FLOPs per second poses significant challenges, particularly when considering the computational complexity involved in training large generative AI models like GPT-4o and Claude 3 Opus.

Total FLOPs for Matrix Multiplication

The total number of floating-point operations needed to compute the product of matrices A and B is given by:

FLOPs = 2 x m x n x p

This includes both the multiplications and additions required for each element of the resulting matrix.

Generative AI models, such as GPT (Generative Pre-trained Transformer) and other foundation models, rely heavily on matrix multiplications during both the training and inference phases. These models typically consist of multiple layers, each involving several matrix multiplications. The complexity and size of these models mean that the total FLOPs can be extremely high, often reaching into the trillions of operations per second for state-of-the-art models.

Frank Rosenblatt developed the Perceptron in the 1950s, an early neural network model illustrating the beginnings of machine learning and matrix computations. He joined Cornell Aeronautical Laboratory in Buffalo, New York, where he served as a research psychologist, senior psychologist, and head of the cognitive systems section. There, he conducted pioneering work on perceptrons, leading to the development and hardware construction of the Mark I Perceptron in 1960. This was the first computer capable of learning new skills through trial and error, simulating human thought processes with a neural network.

Relationship Between Parameters, Matrix Sizes, and FLOPs

In neural networks, each parameter corresponds to an element in a weight matrix that connects layers of neurons. Parameters are the weights (and biases) the model learns during training, used to predict the next token in a sequence. The formula for matrix multiplication complexity, 2 x m x n x p, where:

m represents the batch size,
n represents the number of input features,
p represents the number of neurons in the layer,

illustrates how the parameter count directly affects the computational load. More neurons and layers in the model mean more weights connecting those neurons, leading to a higher number of parameters.

Specifics for GPT-4 and Claude 3

GPT-4: GPT-4 has 1.76 trillion parameters, organized in a Mixture of Experts architecture involving multiple smaller models (experts), each with 220 billion parameters. The matrix multiplications in such a model involve enormous dimensions, leading to a very high number of FLOPs. For example, if a layer in GPT-4 has 10,000 neurons connecting to another layer with 10,000 neurons, the weight matrix is 10,000 x 10,000. The FLOPs required for a single multiplication in this scenario would be 2 x 10^12. Across GPT-4's trillion parameters, this scales up to quadrillions of FLOPs needed for training.
Claude 3: Claude 3, with 200 billion parameters, also involves significant matrix multiplications. If a layer involves matrices of dimensions 5,000 x 5,000, it would still require substantial computational power, though less than GPT-4 due to the fewer parameters.
GPT-4o: GPT-4o is the latest flagship model from OpenAI, which improves upon GPT-4's capabilities and integrates text, voice, and vision processing. Although specific parameter details have not been disclosed, it is known to be faster and more efficient, offering a 50% reduction in cost compared to GPT-4. The model's ability to handle multimodal inputs and provide real-time responses underscores its advanced computational requirements.
Claude 3 Opus: The latest model from Anthropic, Claude 3 Opus, builds upon its predecessors with enhanced capabilities and a significant increase in parameter count, though specific numbers are not detailed. Like GPT-4o, it involves complex matrix multiplications, requiring substantial computational power for training and inference.

Challenges Imposed by the Executive Order

Administrative Burden: Companies and researchers must divert resources to comply with these reporting requirements, which can stifle innovation and slow technological advancement. The focus on compliance can detract from research and development efforts, potentially hindering the progress of AI technologies.
Rapid Evolution: The fast pace of AI development may outstrip the regulatory framework's ability to keep up, leading to potential gaps in oversight. As AI technologies evolve rapidly, the static nature of regulatory frameworks might fail to address new developments and challenges in a timely manner. Additionally, the FLOPs count for state-of-the-art models increases exponentially, potentially making the FLOPs thresholds in the Executive Order outdated within months.
Evolving Model Architectures: The complexity and model architectures may change, resulting in better models that might need fewer FLOPs but are superior to those exceeding the FLOPs threshold in the Executive Order. Advances in AI might enable smaller, more efficient models to perform on par with or better than larger models, making the current FLOPs-based thresholds less relevant.
Modular Models: Instead of developing one large model, researchers might build smaller models that work together, collectively surpassing the performance of a single large model. These modular approaches could circumvent the FLOPs thresholds while achieving superior results.

Conclusion

While the intent of the Executive Order is to enhance oversight and prevent misuse of powerful AI technologies, the stringent reporting requirements could hinder progress by imposing significant constraints on development. The relationship between the parameter count and the complexity of matrix multiplications, which scales with the size and architecture of models like GPT-4o and Claude 3 Opus, highlights the immense computational resources involved. Additionally, the rapid increase in FLOPs requirements, evolving model architectures, and the potential for smaller, modular models to replace large ones make the current regulatory approach potentially outdated and restrictive. Balancing the need for oversight with the encouragement of innovation is crucial to the continued advancement of AI technologies.

Further read

LLM, Us and Neuroplasticity

"Are you sure?": Unveiling the Quest for Human-Like Artificial Intelligence

Designers, creativity and AI

A LLM as a reflection of our inner most desires

Should AGI be a she, he, or a it?