.jpeg)
AI’s compute fragmentation: what matrix multiplication teaches us
AI is powered by a virtuous circle of data, algorithms (“models”), and compute. Growth in one pushes needs in the others and can grossly affect the developer experience on aspects like usability and performance. Today, we have more data and more AI model research than ever before, but compute isn’t scaling at the same speed due to … well, physics.

If AI serving tech can’t solve today’s problems, how do we scale into the future?
The technological progress that has been made in AI over the last ten years is breathtaking — from AlexNet in 2012 to the recent release of ChatGPT, which has taken large foundational models and conversational AI to another level.

Part 2: Increasing development velocity of giant AI models
The first four requirements address one fundamental problem with how we've been using MLIR: weights are constant data, but shouldn't be managed like other MLIR attributes. Until now, we've been trying to place a square peg into a round hole, creating a lot of wasted space that's costing us development velocity (and, therefore, money for users of the tools).

Increasing development velocity of giant AI models
Machine learning models are getting larger and larger — some might even say, humongous. The world’s most advanced technology companies have been in an arms race to see who can train the largest model (MUM, OPT, GPT-3, Megatron), while other companies focused on production systems have scaled their existing models to great effect. Through all the excitement, what’s gone unsaid is the myriad of practical challenges larger models present for existing AI infrastructure and developer workflows.
Easy ways to get started
Get started guide
With just a few commands, you can install MAX as a conda package and deploy a GenAI model on a local endpoint.
400+ open source models
Follow step by step recipes to build Agents, chatbots, and more with MAX.
Browse Examples
Follow step by step recipes to build Agents, chatbots, and more with MAX.