The Case Against Regulating Matrix Multiplication

Content  including text and images © Aditya Mohan. All Rights Reserved. Robometircs, Amelia, Living Interface and Skive it are trademarks of Skive it, Inc. The content is meant for human readers only under 17 U.S. Code § 106. Access, learning, analysis or reproduction by Artificial Intelligence (AI) of any form directly or indirectly, including but not limited to AI Agents, LLMs, Foundation Models, content scrapers is prohibited. 

The White House Executive Order mandating detailed reporting for AI models trained with more than 10^26 floating-point operations (FLOPs) and computing clusters with a theoretical maximum of 10^20 FLOPs per second poses significant challenges, particularly when considering the computational complexity involved in training large generative AI models like GPT-4o and Claude 3 Opus.

Total FLOPs for Matrix Multiplication

The total number of floating-point operations needed to compute the product of matrices A and B is given by:

FLOPs = 2 x m x n x p

This includes both the multiplications and additions required for each element of the resulting matrix.

Generative AI models, such as GPT (Generative Pre-trained Transformer) and other foundation models, rely heavily on matrix multiplications during both the training and inference phases. These models typically consist of multiple layers, each involving several matrix multiplications. The complexity and size of these models mean that the total FLOPs can be extremely high, often reaching into the trillions of operations per second for state-of-the-art models.

 Frank Rosenblatt developed the Perceptron in the 1950s, an early neural network model illustrating the beginnings of machine learning and matrix computations. He joined Cornell Aeronautical Laboratory in Buffalo, New York, where he served as a research psychologist, senior psychologist, and head of the cognitive systems section. There, he conducted pioneering work on perceptrons, leading to the development and hardware construction of the Mark I Perceptron in 1960. This was the first computer capable of learning new skills through trial and error, simulating human thought processes with a neural network.

Relationship Between Parameters, Matrix Sizes, and FLOPs

In neural networks, each parameter corresponds to an element in a weight matrix that connects layers of neurons. Parameters are the weights (and biases) the model learns during training, used to predict the next token in a sequence. The formula for matrix multiplication complexity, 2 x m x n x p, where:

illustrates how the parameter count directly affects the computational load. More neurons and layers in the model mean more weights connecting those neurons, leading to a higher number of parameters.

Specifics for GPT-4 and Claude 3

Challenges Imposed by the Executive Order

Conclusion

While the intent of the Executive Order is to enhance oversight and prevent misuse of powerful AI technologies, the stringent reporting requirements could hinder progress by imposing significant constraints on development. The relationship between the parameter count and the complexity of matrix multiplications, which scales with the size and architecture of models like GPT-4o and Claude 3 Opus, highlights the immense computational resources involved. Additionally, the rapid increase in FLOPs requirements, evolving model architectures, and the potential for smaller, modular models to replace large ones make the current regulatory approach potentially outdated and restrictive. Balancing the need for oversight with the encouragement of innovation is crucial to the continued advancement of AI technologies.

Further read