A high-performance inference framework for AI
MAX provides powerful libraries and tools to develop, optimize and deploy AI on GPUs fast.
Why developers use MAX
Incredible Performance
MAX was built from the ground up to deliver out-of-the-box performance for AI workloads. See how we measure performance.
Hardware Portability
MAX provides portability across CPU+GPU generations, and gives you incredible utilization benefits - driving real compute cost savings.
Complete Control
Optimize your model's performance, write custom ops, or build your own model. MAX gives you full control over every layer of the stack.
Deploy Gen AI in seconds with MAX
Develop custom GPU research with MAX
MAX for Research
Advanced tools and libraries for model, kernel, and hardware developers to deliver even more precise control. Some of the many tools include:
Build custom graphs
Control single to multi-gpu scaling
Program heterogenous compute
Write custom GPU code
Low-level host and device control

FREE for everyone
Paid support for scaled enterprise deployments
MAX Self Managed
FREE FOREVERMAX is available FREE for everyone to self manage
Incredible performance for LLMs, PyTorch, and ONNX models
Deploy MAX yourself on-prem or on any cloud provider
Community support through Discord and Github
MAX Enterprise
PAY AS YOU GOSupport the largest deployments needed by your enterprise
SLA support with guaranteed response time.
Dedicated Slack channel and account manager.
Access to the world’s best AI engineering team.
Developer Approved

"after wrestling with CUDA drivers for years, it felt surprisingly… smooth. No, really: for once I wasn’t battling obscure libstdc++ errors at midnight or re-compiling kernels to coax out speed. Instead, I got a peek at writing almost-Pythonic code that compiles down to something that actually flies on the GPU."
"This is about unlocking freedom for devs like me, no more vendor traps or rewrites, just pure iteration power. As someone working on challenging ML problems, this is a big thing."
“The more I benchmark, the more impressed I am with the MAX Engine.”
“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”
“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”
“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”
“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”
"Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators."
“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the 'two-language' problem. Having Mojo - as one language all the way through would be awesome.”
“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”
“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”
“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”
“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”
“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”
“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”
"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."
“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”
"It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing."
“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”
“The Community is incredible and so supportive. It’s awesome to be part of.”
“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”
“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”
“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the 'two-language' problem. Having Mojo - as one language all the way through is be awesome.”
"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."
Mojo destroys Python in speed. 12x faster without even trying. The future is bright!
Start building with Modular
Quick start resources
Get started guide
With just a few commands, you can install MAX as a conda package and deploy a GenAI model on a local endpoint.
Browse open source models
500+ supported models, most of which have been optimized for lightning fast speed on the Modular platform.
Find examples
Follow step by step recipes to build Agents, chatbots, and more with MAX.