Modular
Accelerated
Xecution
MAX is composed of the MAX Engine, MAX Serving, and the Mojo programming language – everything you need to deploy low-latency, high-throughput inference pipelines into production.
Unified, simple infrastructure
MAX simplifies inference by unifying AI development frameworks and hardware backends.
Unparalleled performance
MAX delivers industry-leading latency and efficiency gains, helping you productionize larger models and lower costs.
Just works
MAX works out of the box without asking you to rewrite your stack or configure a bunch of knobs.
Plug into what you already use.
max Engine
Drop-in compatible with all your existing models, including all AI framework ops, quantized types, dynamic shapes, and your custom ops.
max Serving
Integrations with industry standard inference servers, including Triton, and seamless deployment to existing cloud systems, such as Kubernetes.
Mojo
Interoperable with your existing Python and C/C++ programs, including standard AI industry data libraries like Pandas and Numpy.
Support your GenAI and traditional AI use cases
MAX is built from the ground up to power all your generative and traditional AI pipelines. Using the most performant infrastructure doesn’t mean further fragmenting your stack.
Support your whole pipeline with just one set of tools
MAX provides a composable set of technologies that optimize your end-to-end inference pipeline, from input processing, to model execution and optimization, to deploying to production.
Optimize your model input loading and transformations with Mojo 🔥
Models aren’t always the bottleneck — rewrite your data loading and input processing (e.g., tokenization) with high-performance Mojo code and get more out of your models.
Discover Mojo 🔥Execute any model on any hardware with SOTA performance on the max Engine
Execute models from any AI framework (e.g., PyTorch) on any AI hardware (e.g., AMD CPU) with the MAX Engine to achieve unparalleled out-of-the-box latency and throughput wins.
Discover MAX EngineExtend your models with custom operations using Mojo 🔥
Use Mojo to extend your model with custom operations, that MAX Engine can then natively analyze and fuse, creating a highly-optimized model for incredible speed.
Discover Mojo 🔥Streamline deployment to any cloud service with MAX Serving
Deploy MAX Engine into any cloud service with full interoperability with existing inference systems, including Triton, with support for dynamic batching, load balancing, and more.
Discover MAX ServingModular
Accelerated
Xecution
MAX can be downloaded free for local development and experimentation, and deployed through our Cloud SaaS for production usage.
License | Non-commercial usage | Production usage |
---|---|---|
Pricing | Free | Consumption based |
Availabiity | Download now | Early access |
Get Started | Contact Sales |