MAX Engine

The world’s fastest inference engine. Accelerate your AI deployment.

The MAX Engine executes all of your TensorFlow and PyTorch models with no model rewriting or conversions. Bring your model as-is and deploy it anywhere, across server and edge, with unparalleled usability and performance.

Start in your terminal now

curl -s https://get.modular.com | sh -

Copy

By downloading, you accept our Terms.

Available now

Coming Soon

Read docs on What is MAX?

Get started now

Learn More

MAX

ENGINE

Modular
Accelerated
Xecution

 MAX Engine is everything you need to deploy low-latency, high-throughput inference pipelines into production. Consolidate the bespoke AI toolchains you are using and simplify your AI deployment by orders of magnitude.

Worlds fastest inference engine

A model inference runtime and API library that executes all your models, with no rewriting or conversions, on any hardware, with unparalleled performance and cost savings.

Discover MAX Engine

Framework Optionality

Easily deploy models trained in any framework, such as TensorFlow or PyTorch, without retraining, conversions or pre-optimization steps, using a unified set of APIs. There are no tricks, no hacks - the Engine just works incredibly fast.

Discover MAX Serving

Compute Portability

Seamlessly move your workloads to the best hardware for the job without rewriting or recompiling your models. Avoid lock-in and take advantage of price efficiencies and performance improvements without migration costs.

Discover Mojo

Worlds fastest inference engine

A model inference runtime and API library that executes all your models, with no rewriting or conversions, on any hardware, with unparalleled performance and cost savings.

Discover MAX Engine

Framework Optionality

Discover MAX Serving

Compute Portability

Discover Mojo

USE CASES

Support all your generative and traditional AI use cases

MAX provides drop-in compatibility with any model from any framework. Support for all the framework operations, quantized types, dynamics shapes, and your custom operations.

Unification

Train in any framework,
deploy anywhere

Consolidate the bespoke AI toolchains you are using and simplify your AI deployment by orders of magnitude.

Cloud & On-Prem

Frameworks

modular Engine

Server & edge

Graph APIs

Low level control over the engine with minimal external dependencies, direct programmability over hardware, including high level abstractions and the ability to drop down when you need it.

Contact Sales

Deploy directly to cloud

MAX is free to download locally on your machine for development and experimentation, and can be deployed to via our production BYOC Cloud SaaS offering.

Contact Sales

Performance

Maximize performance, minimize costs

Reduce latency, increase throughput, and improve resource efficiency across CPUs, GPUs, and accelerators. Productionize larger models and significantly reduce your computing costs.

Explore our performance

Queries per Second *

125 qps

TensorFlow

qps

PyTorch

qps

Modular Engine

125

qps

Cost per 100k Inferences *

$ 0.12

TensorFlow
$0.89

PyTorch
$0.54

Modular Engine
$0.12

* Model

DLRM RMC1

Instance

AWS c6g.4xlarge (Graviton2)

Batch Size

MODULAR ENGINE SPEED-UPS VS OTHER FRAMEWORKS ON DIFFERENT COMPUTE TYPES AT FLOAT32

Model Family

Language Model

3.2x

5.3x

1.4x

2.1x

Recommender Models

6.5x

7.5x

1.1x

1.2x

4.3x

Vision Models

2.1x

2.2x

1.7x

1.5x

1.3x

Compute Type

Intel (c5.4xlarge)

AMD (c5a.4xlarge)

ARM (c6g.4xlarge)

Intel (c5.4xlarge)

AMD (c5a.4xlarge)

ARM (c6g.4xlarge)

Results

AWS Compute Instances. TensorFlow & PyTorch listed by logo. Full performance & methodology here

integrationS

Works with your existing AI libraries and tools

Modular is designed to drop into your existing workflows and use cases. Our tools are... well... modular. They integrate with industry-standard infrastructure and open-source tools to minimize migration cost.

Contact Sales

seamless integration with popular libraries and tools

01.

Industry Serving LIBRARIES

Easily integrate the engine into your own custom server image or use Modular's off-the-shelf NVIDIA Triton and TensorFlow-Serving builds.

02.

Choose your cloud

Deploy the engine on-prem, in your own VPC on any major cloud provider, or get up and running quicker with our hosted solutions.

03.

METRICS & MONITORING

The MAX Engine works with industry-standard open-source tooling like Prometheus and Grafana, and seamlessly integrates.

max platform

MAX Engine works with all the rest of the suite

Modular MAX Engine can be used in combination with MAX Serving and is extensible by Mojo 🔥 the fastest and most portable programming language for your AI applications.

Our engine integrates with the rest of our suite of MAX products, while being usable on its own.

Explore the MAX Platform

Ready to try a preview?

Get started

API References, Tutorials, & More

Read the MAX Engine docs

The world’s fastest inference engine. Accelerate your AI deployment.

Modular Accelerated Xecution

Worlds fastest inference engine

Framework Optionality

Compute Portability

Worlds fastest inference engine

Framework Optionality

Compute Portability

Support all your generative and traditional AI use cases

WavLM Large

StarCoder 7b

Stable Diffusion Unet

Mistral 7b

DLRM RMC1 (multi-hot support)

RoBERTa-base

WavLM Large

StarCoder 7b

Stable Diffusion Unet

Mistral 7b

DLRM RMC1 (multi-hot support)

RoBERTa-base

WavLM Large

StarCoder 7b

Stable Diffusion Unet

Mistral 7b

DLRM RMC1 (multi-hot support)

RoBERTa-base

WavLM Large

StarCoder 7b

Stable Diffusion Unet

Mistral 7b

DLRM RMC1 (multi-hot support)

RoBERTa-base

BERT-base-uncased

Llama2 7b

GPT-2

BERT-large-uncased

DLRM RMC2

CLIP-ViT-large-patch14

BERT-base-uncased

Llama2 7b

GPT-2

BERT-large-uncased

DLRM RMC2

CLIP-ViT-large-patch14

BERT-base-uncased

Llama2 7b

GPT-2

BERT-large-uncased

DLRM RMC2

CLIP-ViT-large-patch14

BERT-base-uncased

Llama2 7b

GPT-2

BERT-large-uncased

DLRM RMC2

CLIP-ViT-large-patch14

Train in any framework, deploy anywhere

Graph APIs

Deploy directly to cloud

Maximize performance, minimize costs

Works with your existing AI libraries and tools

01.

02.

03.

MAX Engine works with all the rest of the suite

Ready to try a preview?

Modular
Accelerated
Xecution

Train in any framework,
deploy anywhere