Blog

🚨

New

Software Pipelining for GPU Kernels: Part 1 - The Pipeline Problem

March 30, 2026

🚨

New

Modular 26.2: State-of-the-Art Image Generation and Upgraded AI Coding with Mojo

March 19, 2026

🚨

New

Structured Mojo Kernels Part 3 - Composition in Practice

March 26, 2026

Latest

🚨

News

Company

Modverse #54: From GTC to Edinburgh, a Community Building Momentum

This edition covers one of the busiest stretches in Modular's recent history: four days at GTC, a new office on another continent, fresh community builds, and a release that expands what MAX and Mojo🔥 can do. Here's everything that's been happening across the ecosystem.

March 31, 2026

Inaara Walji

Read

🚨

News

Engineering

Software Pipelining for GPU Kernels: Part 1 - The Pipeline Problem

Flash Attention is a simple algorithm: tiled back-to-back matmuls with an online softmax algorithm in between. The algorithm fits in a few dozen lines of pseudocode. Yet Flash Attention 4's production kernel is 2,875 lines, and the hardest part to get right isn't the math. It's the async execution and pipelining synchronization, all hand-derived from a schedule that no standard debugging tool can verify.

March 30, 2026

Yingbo Ma

Read

🚨

News

Engineering

Structured Mojo Kernels Part 3 - Composition in Practice

This post shows the practical benefit of this modular design. We take two real kernel families, conv2d and block-scaled matmul, and trace exactly how they are built around the matmul foundation. In both cases, a new kernel family requires changing one component while leaving the rest untouched. The conv2d kernel adds roughly 130 lines of new code, whileBlock-scaled matmul adds roughly 200 with no performance degradation.

March 26, 2026

Fabio Riccardi

Modular Kernel Team

Read

🚨

News

Product

Modular 26.2: State-of-the-Art Image Generation and Upgraded AI Coding with Mojo

Today’s 26.2 release expands the Modular Platform’s modality support to include image generation and image editing workflows. This extends our existing support for text and audio generation. In the 26.2 version Black Forest Labs' FLUX.2 model variants are supported with over a 4x speedup over state-of-the-art.

March 19, 2026

Modular Team

Read

🚨

News

Community

Modular at NVIDIA GTC 2026: MAX on Blackwell, Mojo Kernel Porting, and DeepSeek V3 on B200

Each spring, San Jose fills up with people who have strong opinions about GPUs, and we're happily among them. Find us this week at NVIDIA GTC, Booth #3004, where we’ll be running demos all week on Blackwell.

March 16, 2026

Modular Team

Read

🚨

News

Engineering

Structured Mojo Kernels Part 2 - The Three Pillars

This post explains the components of Structured Mojo Kernels: TileIO, TilePipeline, and TileOp. Each component forms a node in a kernel execution pipeline, and the links between them create a logical separation of concerns that makes kernels easier to extend and update. That organization matters because GPU kernels don't stay static. By abstracting hardware optimized implementations into patterns, the same kernel structure can adapt across NVIDIA and AMD hardware generations with minimal rewrite.

March 11, 2026

Fabio Riccardi

Modular Kernel Team

Read

🚨

News

Community

Modverse #53: Community Builds, Research Milestones, and a Growing Ecosystem

This edition captures everything happening across the Modular ecosystem, from developers building with MAX and Mojo🔥 to the broader impact Modular is having across AI infrastructure. Here's a look at what's been happening lately.

March 6, 2026

Inaara Walji

Read

🚨

News

Engineering

Structured Mojo Kernels Part 1 - Peak Performance, Half the Code

GPU programming has always demanded precision, but the cost of that precision keeps rising. A production matmul kernel written in C++ spans 3,000–5,000 lines of tightly coupled code where a misplaced barrier silently corrupts results. That complexity gatekeeps hardware that should be available to far more developers, and it's a direct product of how GPUs have evolved: with each architecture generation, more of the orchestration burden has shifted onto the programmer.

March 5, 2026

Fabio Riccardi

Modular Kernel Team

Read

🚨

News

Engineering

The Claude C Compiler: What It Reveals About the Future of Software

Compilers occupy a special place in computer science. They're a canonical course in computer science education. Building one is a rite of passage. It forces you to confront how software actually works, by examining languages, abstractions, hardware, and the boundary between human intent and machine execution.

February 18, 2026

Chris Lattner

Read

🚨

News

Company

BentoML Joins Modular

Today, BentoML is joining Modular.

February 10, 2026

Chris Lattner

Chaoyu Yang

Tim Davis

Read

No items found within this category

We couldn’t find anything. Try changing or resetting your filters.

Build the future of AI with Modular

Get started - FREE

View Editions

Sign up today
Signup to our Cloud Platform today to get started easily.
Sign Up
Browse open models
Browse our model catalog, or deploy your own custom model
Browse models

Blog

Latest

Sign up for our newsletter