The world’s fastest inference engine. Accelerate your AI deployment.
The MAX Engine executes all of your TensorFlow and PyTorch models with no model rewriting or conversions. Bring your model as-is and deploy it anywhere, across server and edge, with unparalleled usability and performance.
Modular
Accelerated
Xecution
MAX Engine is everything you need to deploy low-latency, high-throughput inference pipelines into production. Consolidate the bespoke AI toolchains you are using and simplify your AI deployment by orders of magnitude.
Support all your generative and traditional AI use cases
MAX provides drop-in compatibility with any model from any framework. Support for all the framework operations, quantized types, dynamics shapes, and your custom operations.
Train in any framework,
deploy anywhere
Consolidate the bespoke AI toolchains you are using and simplify your AI deployment by orders of magnitude.
Graph APIs
Low level control over the engine with minimal external dependencies, direct programmability over hardware, including high level abstractions and the ability to drop down when you need it.
Deploy directly to cloud
MAX is free to download locally on your machine for development and experimentation, and can be deployed to via our production BYOC Cloud SaaS offering.
Maximize performance, minimize costs
Reduce latency, increase throughput, and improve resource efficiency across CPUs, GPUs, and accelerators. Productionize larger models and significantly reduce your computing costs.
Works with your existing AI libraries and tools
Modular is designed to drop into your existing workflows and use cases. Our tools are... well... modular. They integrate with industry-standard infrastructure and open-source tools to minimize migration cost.
01.
Easily integrate the engine into your own custom server image or use Modular's off-the-shelf NVIDIA Triton and TensorFlow-Serving builds.
02.
Deploy the engine on-prem, in your own VPC on any major cloud provider, or get up and running quicker with our hosted solutions.
03.
The MAX Engine works with industry-standard open-source tooling like Prometheus and Grafana, and seamlessly integrates.
MAX Engine works with all the rest of the suite
Modular MAX Engine can be used in combination with MAX Serving and is extensible by Mojo 🔥 the fastest and most portable programming language for your AI applications.
Our engine integrates with the rest of our suite of MAX products, while being usable on its own.