December 4, 2023

Modular to bring NVIDIA Accelerated Computing to the MAX Platform

The era of Generative AI is upon us. Companies around the world are exploring how it can transform their businesses, yet most are finding it challenging to economically and efficiently deploy these larger and more complex models into production.

Today, Modular is excited to announce that it is collaborating with NVIDIA to bring the power of NVIDIA GPUs, CPUs and CUDA software to the Modular Accelerated Execution (MAX) Platform. This addition will enable developers and enterprises to build and ship AI into production on the Modular platform like never before, by unifying and simplifying the AI software stack through MAX, enabling unparalleled performance, usability, and extensibility to make scalable AI a reality.

Modular’s MAX platform will provide state-of-the-art support for NVIDIA H100 Tensor Core GPUs, A100, and L40S GPUs, along with the NVIDIA GH200 and NVIDIA Grace Superchips. This NVIDIA accelerated computing infrastructure and CUDA software will be deeply integrated with MAX, including Engine, Serving and Mojo, and will bring unparalleled heterogeneous computing to AI. Developers now have one toolchain that scales to all their AI use cases – GenAI and traditional AI alike – unlocking novel CPU+GPU programming models for unparalleled performance and lower cost.

MAX will enable AI developers to execute their existing TensorFlow, PyTorch, and ONNX models on NVIDIA GPUs with full compatibility and industry-leading out-of-the-box performance. The MAX platform also provides new state-of-the-art Graph APIs to build and accelerate custom, specific models like GGML and Whisper.cpp. Lastly, for data transformations and the newest model innovations, the Mojo programming language enables full extensibility of the Engine. Read more about MAX in our blog post on key announcements at ModCon 2023.

Modular leverages the power of CUDA for programming NVIDIA GPUs. This lets Mojo code work seamlessly on NVIDIA GPUs and also on the Grace CPU. Mojo’s high-level abstractions make both CPU and GPU programming easy, and provide superpowers for writing GPU-specific programs to give developers everywhere more low-level control. And of course, Mojo works with existing tooling that NVIDIA developers love, including Nsight, NVIDIA’s performance analysis, debugging, and visualization tooling. 

“Developers everywhere are helping their companies adopt and implement generative AI applications that are customized with the knowledge and needs of their business,” said Dave Salvator, director of AI and Cloud at NVIDIA. “Adding full-stack NVIDIA accelerated computing support to the MAX platform brings the world’s leading AI infrastructure to Modular’s broad developer ecosystem, supercharging and scaling the work that is fundamental to companies’ business transformation.”

Go to modular.com to sign up for early access to MAX GPU and stay tuned for more updates at NVIDIA GTC in March 2024.

Eric Johnson
,
Product Lead
Shashank Prasanna
,
AI Developer Advocate

Eric Johnson

Product Lead

Product leader who has built and scaled AI applications and infrastructure. Eric led the TensorFlow API, Compiler, and Runtime teams at Google Brain and Core Systems, including the founding of TFRT and the productionization of JAX. He holds an MBA from Wharton and Computer Science MS from Penn and loves soccer, fitness, and the great outdoors.

eric@modular.com

Shashank Prasanna

AI Developer Advocate

Shashank is an engineer, educator and doodler. He writes and talks about machine learning, specialized machine learning hardware (AI Accelerators) and AI Infrastructure in the cloud. He previously worked at Meta, AWS, NVIDIA, MathWorks (MATLAB) and Oracle in developer relations and marketing, product management, and software development roles and hold an M.S. in electrical engineering.