The world's fastest unified AI inference engine. Get  models into production, faster.

The Modular Engine executes all of your TensorFlow and PyTorch models with no model rewriting or conversions. Bring your model as-is and deploy it anywhere, across server and edge, with unparalleled usability and performance.

P50
P90
P95
P99
TensorFlow
PyTorch
Modular Engine
* Model
DLRM RMC1
Instance
AWS c6g.4xlarge (Graviton2)
Batch Size
1

Train in any framework,
deploy anywhere

Consolidate the bespoke AI toolchains you are using and simplify your AI deployment by orders of magnitude.

Cloud & On-Prem
Frameworks
modular Engine
Server & edge

Framework optionality

Easily deploy models trained in any framework, such as TensorFlow or PyTorch, without retraining, conversions or pre-optimization steps, using a unified set of APIs. There are no tricks, no hacks - the Engine just works incredibly fast.

Compute portability

Seamlessly move your workloads to the best hardware for the job without rewriting or recompiling your models. Avoid lock-in and take advantage of price efficiencies and performance improvements without migration costs.

Maximize performance, minimize costs

Reduce latency, increase throughput, and improve resource efficiency across CPUs, GPUs, and accelerators. Productionize larger models and significantly reduce your computing costs.

Explore our performance dashboard
125 qps
TensorFlow
17
qps
PyTorch
28
qps
Modular Engine
125
qps
$ 0.12
TensorFlow
$0.89
PyTorch
$0.54
Modular Engine
$0.12
* Model
DLRM RMC1
Instance
AWS c6g.4xlarge (Graviton2)
Batch Size
1
Model Family
 vs
 vs
 vs
 vs
 vs
 vs
Language Model
3x
3.2x
5.3x
1.4x
2.1x
4x
Recommender Models
6.5x
5x
7.5x
1.1x
1.2x
4.3x
Vision Models
2.1x
2.2x
1.7x
1.5x
1.5x
1.3x
Compute Type
Intel (c5.4xlarge)
AMD (c5a.4xlarge)
ARM (c6g.4xlarge)
Intel (c5.4xlarge)
AMD (c5a.4xlarge)
ARM (c6g.4xlarge)

Works with your existing AI libraries and tools

Modular is designed to drop into your existing workflows and use cases. Our tools are... well... modular. They integrate with industry-standard infrastructure and open-source tools to minimize migration cost.

Request access

01.

Easily integrate the engine into your own custom server image or use Modular's off-the-shelf NVIDIA Triton and TensorFlow-Serving builds.

02.

Deploy the engine on-prem, in your own VPC on any major cloud provider, or get up and running quicker with out hosted solutions.

03.

The Modular Inference Engine works with industry-standard open-source tooling like Prometheus and Grafana.

Ready to try a preview?

Contact us to get early-access to the Modular Inference Engine.

Read the Modular Inference Engine docs