Modular: All Articles

🚨

NEW

Product

Modular 25.7: Faster Inference, Safer GPU Programming, and a More Unified Developer Experience

Today, we’re excited to release Modular Platform 25.7, an update that deepens our vision of a unified, high-performance compute layer for AI. With a fully open MAX Python API, an experimental next-generation modeling API, expanded hardware support for NVIDIA Grace superchips, and a safer, more capable Mojo GPU programming experience, this release moves us closer to an ecosystem where developers spend less time fighting infrastructure and more time advancing what AI can do.

November 20, 2025

Modular Team

Read

🚨

NEW

Company

"TTS 1 Max" (powered by Modular Platform) Ranked #1 Speech Model on Artificial Analysis

November 7, 2025

Modular Team

Read

🚨

NEW

Community

PyTorch and LLVM in 2025 — Keeping up With AI Innovation

November 6, 2025

Michael Dunn-OConnor

Read

🚨

NEW

Engineering

Achieving State-of-the-Art Performance on AMD MI355 — in Just 14 Days

October 17, 2025

Tracy Sharpe

Anand Pratap Singh

Prince Jain

Abdul Dakkak

Read

🚨

NEW

Company

Modular Raises $250M to scale AI's Unified Compute Layer

Modular Raises $250M in Third Round to Unify AI Compute

September 24, 2025

Modular Team

Read

🚨

NEW

Product

Modular 25.6: Unifying the latest GPUs from NVIDIA, AMD, and Apple

We’re excited to announce Modular Platform 25.6 – a major milestone in our mission to build AI’s unified compute layer. With 25.6, we’re delivering the clearest proof yet of our mission: a unified compute layer that spans from laptops to the world’s most powerful datacenter GPUs. The platform now delivers:

September 22, 2025

Modular Team

Read

🚨

NEW

Community

Modverse #51: Modular x Inworld x Oracle, Modular Meetup Recap and Community Projects

The Modular community has been buzzing this month, from our Los Altos Meetup talks and fresh engineering docs to big wins with Inworld and Oracle. Catch the highlights, new tutorials, and open-source contributions in this edition of Modverse.

September 19, 2025

Caroline Frasca

Read

🚨

NEW

Engineering

Matrix Multiplication on Blackwell: Part 4 - Breaking SOTA

In this blog post, we’ll continue our journey to build a state-of-the-art (SOTA) matmul kernel on NVIDIA Blackwell by exploring the cluster launch control (CLC) optimization. At the end of the post we’ll improve our performance by another 15% and achieve 1772 TFLOPs, exceeding that of the current SOTA.

September 19, 2025

Ali Taha

Jiexiang Liu

Hengjie Wang

Abdul Dakkak

Read

🚨

NEW

Engineering

Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance

In this post, we continue on this journey and discuss how to leverage the 2SM technique along with pipelining to increase our performance about 5x and get within 85% of state-of-the-art (SOTA).

September 12, 2025

Ali Taha

Jiexiang Liu

Hengjie Wang

Abdul Dakkak

Read

🚨

NEW

Engineering

Matrix Multiplication on Blackwell: Part 2 - Using Hardware Features to Optimize Matmul

In this post we are going to continue our journey and improve our performance by more than 50x our initial kernel benchmark. Along the way we are going to explain more GPU programming concepts and leverage novel Blackwell features.

September 5, 2025

Ali Taha

Jiexiang Liu

Hengjie Wang

Abdul Dakkak

Read

All Articles (X)

Modular 25.7: Faster Inference, Safer GPU Programming, and a More Unified Developer Experience

"TTS 1 Max" (powered by Modular Platform) Ranked #1 Speech Model on Artificial Analysis

PyTorch and LLVM in 2025 — Keeping up With AI Innovation

Achieving State-of-the-Art Performance on AMD MI355 — in Just 14 Days

Modular Raises $250M to scale AI's Unified Compute Layer

Modular 25.6: Unifying the latest GPUs from NVIDIA, AMD, and Apple

Modverse #51: Modular x Inworld x Oracle, Modular Meetup Recap and Community Projects

Matrix Multiplication on Blackwell: Part 4 - Breaking SOTA

Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance

Matrix Multiplication on Blackwell: Part 2 - Using Hardware Features to Optimize Matmul

Quick start resources