NVIDIA Case Study

Modular’s Platform provides state-of-the-art support for NVIDIA Blackwell, Hopper, Ampere, Ada Lovelace and NVIDIA Grace Superchips.

Partners

10+

GPU Architectures

500+

Models

70%+

Performance Wins vs VLLM

"Developers everywhere are helping their companies adopt and implement generative AI applications that are customized with the knowledge and needs of their business. Adding full-stack NVIDIA accelerated computing support to the MAX platform brings the world’s leading AI infrastructure to Modular’s broad developer ecosystem, supercharging and scaling the work that is fundamental to companies’ business transformation."

Dave Salvator

Director, AI and Cloud

Problem

The Opportunity

The AI development landscape faces a fragmented collection of open source AI projects causing advanced developers struggle with complexity - particularly when they want to deliver full-stack performance optimizations. There are trade-offs in existing solutions that force developers to choose between model compatibility and production-grade performance on GPUs. The effort often requires manual optimization, custom CUDA kernels, and deep GPU programming expertise. The industry lacks a unified approach to heterogeneous computing that could seamlessly leverage both CPUs and GPUs, which slows progress and innovation.

The Partnership

NVIDIA, as the world leader in accelerated compute, believes in helping solve these challenges and is supporting our efforts at Modular to improve the world for GPU developers everywhere. Together, we’re bringing comprehensive NVIDIA GPU support to the Modular Platform, creating a unified solution for heterogeneous AI computing:

- State-of-the-Art Hardware Support: The Modular Platform provides state-of-the-art support for NVIDIA B200, H200, H100 GPUs along with A100, and L40S GPUs, and the incredible NVIDIA Grace Superchips. This broad hardware support ensures compatibility with NVIDIA's latest accelerated computing infrastructure.

- Unified Development Experience: Developers now have one toolchain that scales to all their AI use cases – GenAI and traditional AI alike – unlocking novel CPU+GPU programming models. This unified approach simplifies development and deployment across diverse workloads.

- Mojo GPU Programming: Mojo code work seamlessly on a wide range of NVIDIA GPUs and on NVIDIA Grace CPUs. Mojo's high-level abstractions simplify both CPU and GPU programming using a single language and standard library, while still providing low-level control when needed.

- Custom Model Acceleration: The Modular Platform also provides a Python-based graph API that allows you to build custom, state-of-the-art models that are immediately accelerated on NVIDIA GPUs.

The Outcome

The partnership between Modular and NVIDIA has delivered transformative benefits for AI developers and enterprises:

- Simplified GPU Adoption: Organizations can now deploy AI models on NVIDIA GPUs without specialized CUDA expertise. The Modular Platform handles the complexity of GPU programming, making advanced acceleration accessible to a broader developer audience.

- Unparalleled Performance: By combining Modular's optimization capabilities with NVIDIA's hardware acceleration, users achieve industry-leading performance without manual optimization work.

- True Write-Once, Deploy-Anywhere: Developers can write code once and deploy it seamlessly across CPUs and GPUs. This flexibility allows organizations to choose the most cost-effective hardware for each workload without code modifications.

- Novel Programming Models: The deep integration unlocks new heterogeneous computing capabilities, allowing developers to create innovative applications that intelligently distribute work between CPUs and GPUs for optimal performance.

- Reduced Time to Market: Organizations report significantly faster development cycles, as they no longer need to maintain separate codebases for different hardware targets or spend months optimizing for specific accelerators.

The collaboration represents a major step forward in democratizing GPU-accelerated AI, making it possible for organizations of all sizes to leverage NVIDIA's powerful hardware through Modular's unified, developer-friendly platform.

Solution

Results

About

NVIDIA

NVIDIA is the pioneer and leader in accelerated computing, transforming the world's largest industries through AI and digital twins. The company invented the GPU in 1999, which sparked the growth of the PC gaming market, redefined computer graphics, and ignited the modern AI revolution.

You can read the partnership announcement here.

‍

Start building with Modular

Request a demo

Case Studies

Customers

2x cost savings with the fastest text-to-speech model ever

We made state-of-the-art speech synthesis scalable, and achieved a truly remarkable improvement both for the latency and throughput.

Read Case Study

AI batch processing is now cheaper than anyone thought possible

When selling GPUs as a commodity meets the fastest inference engine - cost savings can skyrocket.

Read Case Study

Revolutionizing your own research to production

Modular allows Qwerky AI to do advanced AI research, to write optimized code and deploy across NVIDIA, AMD, and other types of silicon.

Read Announcement

Unlocking fast AMD compute for all

AI inference has a cost problem. Hardware alone isn't enough - customers need software that can extract every ounce of performance from these chips. TensorWave and Modular team up to shatter the cost-performance ceiling for AI inference.

Read Announcement

Modular partners with AWS to democratize AI Infrastructure

Modular partnered with AWS to bring MAX to AWS Marketplace, offering SOTA performance for GenAI workloads across GPUs types.

Read Announcement

Unleashing AI performance on AMD GPUs with Modular's Platform

Modular partners with AMD to bring the AI ecosystem more choice with state-of-the-art performance on AMD Instinct GPUs.

Read Announcement

Scales for enterprises

Dedicated enterprise support

We are a team of the world's best AI infrastructure leaders who are reinventing and rebuilding accelerated compute for everyone.

About Us

Infinitely scalable to reduce your TCO

Optimize costs and performance with multi-node inference at massive scale across cloud or on-prem environments.

Enterprise grade SLA

Our performance is backed with an enterprise grade SLA, ensuring reliability, accountability, and peace of mind.

Problem

The Opportunity

The Partnership

The Outcome

Solution

Results

About

NVIDIA

Case Studies

Customers

Partnerships

Scales for enterprises

Quick start resources