Blog

🚨

News

Series

Matrix Multiplication on Blackwell: Part 4 - Breaking SOTA

In this blog post, we’ll continue our journey to build a state-of-the-art (SOTA) matmul kernel on NVIDIA Blackwell by exploring the cluster launch control (CLC) optimization. At the end of the post we’ll improve our performance by another 15% and achieve 1772 TFLOPs, exceeding that of the current SOTA.

September 19, 2025

Ali Taha

Jiexiang Liu

Hengjie Wang

Abdul Dakkak

Read

🚨

News

Series

Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance

In this post, we continue on this journey and discuss how to leverage the 2SM technique along with pipelining to increase our performance about 5x and get within 85% of state-of-the-art (SOTA).

September 12, 2025

Ali Taha

Jiexiang Liu

Hengjie Wang

Abdul Dakkak

Read

🚨

News

Series

Matrix Multiplication on Blackwell: Part 2 - Using Hardware Features to Optimize Matmul

In this post we are going to continue our journey and improve our performance by more than 50x our initial kernel benchmark. Along the way we are going to explain more GPU programming concepts and leverage novel Blackwell features.

September 5, 2025

Ali Taha

Jiexiang Liu

Hengjie Wang

Abdul Dakkak

Read

🚨

News

Series

Matrix Multiplication on Blackwell: Part 1 - Introduction

This series of blog posts will showcase how one can: 1. Write a high-performance GPU kernel on Blackwell that offers performance competitive to that of NVIDIA's cuBLAS implementation. 2. Shows how one can leverage Mojo's special features to make the kernel as simple as possible.

August 28, 2025

Ali Taha

Jiexiang Liu

Hengjie Wang

Read

🚨

News

Series

How is Modular Democratizing AI Compute? (Democratizing AI Compute, Part 11)

Given time, budget, and expertise from a team of veterans who’ve built this stack before, Modular set out to solve one of the defining challenges of our era: how to Democratize AI Compute. But what does that really mean—and how does it all add up?

June 20, 2025

Chris Lattner

Read

🚨

News

Series

Modular’s bet to break out of the Matrix (Democratizing AI Compute, Part 10)

May 8, 2025

Chris Lattner

Read

🚨

News

Series

Why do HW companies struggle to build AI software? (Democratizing AI Compute, Part 9)

April 22, 2025

Chris Lattner

Read

🚨

News

Series

What about the MLIR compiler infrastructure? (Democratizing AI Compute, Part 8)

April 8, 2025

Chris Lattner

Read

🚨

News

Series

What about Triton and Python eDSLs? (Democratizing AI Compute, Part 7)

In this post, we’ll break down how Python eDSLs work, their strengths and weaknesses, and take a close look at Triton.

March 26, 2025

Chris Lattner

Read

🚨

News

Series

What about TVM, XLA, and AI compilers? (Democratizing AI Compute, Part 6)

March 12, 2025

Chris Lattner

Read

Series
Democratizing Compute
Go behind the scenes of the AI industry in this blog series by Chris Lattner. Trace the evolution of AI compute, dissect its current challenges, and discover how Modular is raising the bar with the world’s most open inference stack.
11 part series
View Series
Series
Matrix Multiplication on Blackwell
Learn how to write a high-performance GPU kernel on Blackwell that offers performance competitive to that of NVIDIA's cuBLAS implementation while leveraging Mojo's special features to make the kernel as simple as possible.
4 part series
View Series
Series
Structured Mojo Kernels
Learn how Mojo simplifies GPU programming with modular kernel architecture, compile-time abstractions, and zero-cost performance across modern GPU hardware.
3 part series
View Series
Series
Software Pipelining for GPU Kernels
Explore software pipelining for GPU kernels from first principles. We formalize dependencies as a graph, solve for the optimal schedule with a constraint solver, and show how it all integrates into MAX via pure Mojo.
1 part series
View Series

No items found within this category

We couldn’t find anything. Try changing or resetting your filters.

Build the future of AI with Modular

Get started - FREE

View Editions

Sign up today
Signup to our Cloud Platform today to get started easily.
Sign Up
Browse open models
Browse our model catalog, or deploy your own custom model
Browse models

Blog

Sign up for our newsletter