Fast, flexible, & private enterprise AI

An open GenAI inference platform for GPUs

Get faster inference
from your GPU’s

so you can spend
less on compute

The AI inference platform built for Enterprise AI

  • Deploy AI at scale

    🌎

    Get immediate performance wins with torch.compile interoperability.

  • Lower your TCO

    🤑

    Reduce your total cost of ownership with unparalleled performance right out-of-the-box.

  • Own your AI

    🎛️

    Avoid hardware lock-in and gain full control over performance, security, and optimization.

Book a demo of MAX

SOTA performance for GenAI workloads

3860

Output Throughput (req/s)

52ms

7.1ms

4.1k

Write once and deploy AI to GPUs

No code changes or reconfiguring

Iterate quickly from your laptop

Develop, test, and deploy in a unified environment that eliminates inconsistencies and accelerates your time to production.

OR

Deploy to any cloud VPC or Kubernetes, using the same codebase

Deploy to any cloud provider with ease, ensuring flexibility and scalability without having to reconfigure for different environments.

A portable GPU software stack that gives you options

Easy to switch

Have a PoC that's working with closed proprietary model and now you're ready to own your AI stack? That's what MAX does best!

Get started with MAX

Use any open source model

Run Llama 3.1 now with MAX now, or use an open source model using the tutorial linked below.

Read Tutorial

Run on GPU or CPU

Get great performance and utilization across all your instances.

Deploy to any cloud

Have a PoC that's working with closed proprietary model and now you're ready to own your AI stack? That's what MAX does best!

Talk to us

Own, control, and secure your AI future

Use any open source model

Run Llama 3.1 now with MAX now, or use an open source model using the tutorial linked below.

Manage your data privacy & compliance

Get peace of mind with MAX.  Own your ML Pipelines and avoid sending your proprietary data to external sources.

Own your IP

Control every layer of your stack.  Get your weights from anywhere. Customize down to the kernel if needed.

“Once we got to proof of concept, we knew we needed to bring it in house.“

MRWilliams12

FAQ

How do I use MAX?

The Modular Accelerated Xecution (MAX) platform is a unified set of APIs and tools that simplify the process of building and deploying your own high-performance AI endpoint. To get started with MAX either locally, or via a Docker Container, just Install MAX or follow one our tutorials like Deploying Llama 3 on GPU with MAX Serve.

What does MAX replace?

We created MAX to solve the fragmented and confusing array of AI tools that plague the industry today. Our unified toolkit is designed to help the world build high-performance AI pipelines and deploy them to any hardware removing the need for hardware vendor specific libraries. You can read more in our blog post.

How much do I have to pay to use MAX?

MAX is a free and permissive AI inference framework that enables developers and enterprises to develop and deploy AI inference workloads on any hardware type, into any type of environment (including into production). We offer MAX Enterprise for organizations seeking enterprise support, and you can read more on our Pricing Page.

Is MAX compatible with my current stack?

Almost certainly. MAX is built without vendor-specific hardware libraries, enabling it to scale effortlessly across a wide range of CPUs and GPUs. We tightly integrate with AI ecosystem tools such as Python, PyTorch, and Hugging Face, and have fully extensible API surface. MAX Serve is available in a ready-to-deploy containers, and provides an OpenAI API endpoint API surface. We work and deploy easily with Docker and Kubernetes. Read more here.

What models does MAX currently support?

MAX supports model formats provided by Hugging Face, PyTorch, ONNX and MAX Graphs (our model format). We have a fully integrated LLM serving and execution stack that provides SOTA performance out-of-the-box. You can read more about the models here.

Free to start. Scale as you grow.

MAX is FREE for anyone to self-manage.  Looking for enterprise solutions and dedicated support?  Book a demo or reach out to our sales team.

Get exclusive access to MAX Enterprise.

Our team of world class AI engineers are looking for the hardest AI problems to solve. Ensure you get the most out of MAX and get VIP treatment along the way.

12

openings available for Q1 2025

  • FREE Enterprise support

    🤑

    Get SLA support directly from the world’s best AI engineering team

  • Custom Performance Gains

    🏎️

    See what our built-from-the-ground up solution can do for your team.

  • Prioritized feature requests

    📣

    Have a say in what comes next—your requests are fast tracked in our development roadmap

Developer Approved 👍

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

NL

Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators.

jeremyphoward

“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the "two-language" problem. Having Mojo - as one language all the way through would be awesome.”

fnands

“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”

scrumtuous

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

NL

Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators.

jeremyphoward

“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the "two-language" problem. Having Mojo - as one language all the way through would be awesome.”

fnands

“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”

scrumtuous

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

NL

Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators.

jeremyphoward

“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the "two-language" problem. Having Mojo - as one language all the way through would be awesome.”

fnands

“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”

scrumtuous

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

NL

Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators.

jeremyphoward

“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the "two-language" problem. Having Mojo - as one language all the way through would be awesome.”

fnands

“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”

scrumtuous

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”

svpino

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

svpino

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”

svpino

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

svpino

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”

svpino

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

svpino

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”

svpino

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

svpino

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."

Aydyn

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

Aydyn

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing.

Adalseno

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."

Aydyn

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

Aydyn

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing.

Adalseno

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."

Aydyn

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

Aydyn

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing.

Adalseno

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."

Aydyn

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

Aydyn

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing.

Adalseno

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

MAX on GPU waiting list

Be the first to get lightning fast inference speed on your GPUs. Be the envy of all your competitors and lower your compute spend.