Blog

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Illustration of a smiling astronaut and a cheerful orange flame character floating in front of a neon-lit triangular background.

Democratizing AI Compute Series

Go behind the scenes of the AI industry with Chris Lattner

Latest

🚨

News

Engineering

The Five Eras of KVCache

vLLM, SGLang, TensorRT-LLM, and MAX Serve are all built on top of increasingly sophisticated KVCache management. This blog explores the evolution and role of the KVCache in these inference engines.

February 5, 2026

/

Brian Zhang

,  

🚨

News

Product

Modular 26.1: A Big Step Towards More Programmable and Portable AI Infrastructure

Today we’re releasing Modular 26.1, a major step toward making high-performance AI computing easier to build, debug, and deploy across heterogeneous hardware. This release is focused squarely on developer velocity and programmability—helping advanced AI teams reduce time to market for their most important innovations.

January 29, 2026

/

Modular Team

,  

🚨

News

Community

How to Beat Unsloth's CUDA Kernel Using Mojo—With Zero GPU Experience

Traditional GPU programming has a steep learning curve. The performance gains are massive, but the path to get there (CUDA, PTX, memory hierarchies, occupancy tuning) stops most developers before they start. Mojo aims to flatten that curve: Python-like syntax, systems-level performance, no interop gymnastics, and the same performance gains.

January 14, 2026

/

David Robertson

,  

🚨

News

Community

🔥 Modular 2025 Year in Review

Our four-part series documenting the path to record-breaking matrix multiplication performance became essential reading for anyone serious about LLM optimization. The series walks through every optimization step—from baseline implementations to advanced techniques like warp specialization and async copies—showing you exactly how to extract maximum performance from cutting-edge hardware.

December 19, 2025

/

Michael Dunn-OConnor

,  

🚨

News

Product

The path to Mojo 1.0

While we are excited about this milestone, this of course won’t be the end of Mojo development! Some commonly requested capabilities for more general systems programming won’t be completed for 1.0, such as a robust async programming model and support for private members. Read below for more information on that!

December 5, 2025

/

Modular Team

,  

🚨

News

Community

Modverse #52: Advancing AI Together — Community Projects & Platform Milestones

The Modular universe is buzzing! From next-level community projects to recognition across the AI and developer space, here’s the latest from our growing ecosystem.

December 3, 2025

/

Inaara Walji

,  

🚨

News

Product

Modular 25.7: Faster Inference, Safer GPU Programming, and a More Unified Developer Experience

Today, we’re excited to release Modular Platform 25.7, an update that deepens our vision of a unified, high-performance compute layer for AI. With a fully open MAX Python API, an experimental next-generation modeling API, expanded hardware support for NVIDIA Grace superchips, and a safer, more capable Mojo GPU programming experience, this release moves us closer to an ecosystem where developers spend less time fighting infrastructure and more time advancing what AI can do.

November 20, 2025

/

Modular Team

,  

🚨

News

Company

"TTS 1 Max" (powered by Modular Platform) Ranked #1 Speech Model on Artificial Analysis

Today the "Inworld TTS 1 Max" model (powered by the Modular Platform) is in #1 position on the Artificial Analysis speech leaderboard!

November 7, 2025

/

Modular Team

,  

🚨

News

Community

PyTorch and LLVM in 2025 — Keeping up With AI Innovation

Along with several teammates, I had the privilege of attending two recent developer events in the AI software stack: PyTorch Conference 2025 (October 22-23) in San Francisco and LLVM Developers' Meeting (October 28-29) in Santa Clara. In this post, I’ll share some observations that stood out among all the conference sessions and conversations I had with developers.

November 6, 2025

/

Michael Dunn-OConnor

,  

🚨

News

Engineering

Achieving State-of-the-Art Performance on AMD MI355 — in Just 14 Days

In late August, AMD and TensorWave reached out to collaborate on a presentation for AMD’s Media Tech Day—they asked if we could demo MAX on AMD Instinct™ MI355 on September 16th. There was just one problem: no one at Modular had access to an MI355.

October 17, 2025

/

Tracy Sharpe

,  

Anand Pratap Singh

,  

Prince Jain

,  

Abdul Dakkak

,  

No items found within this category

We couldn’t find anything. Try changing or resetting your filters.

Build the future of AI with Modular

View Editions
  • Person with blonde hair using a laptop with an Apple logo.

    Get started guide

    Install MAX with a few commands and deploy a GenAI model locally.

    Read Guide
  • Magnifying glass emoji with black handle and round clear lens.

    Browse open models

    500+ models, many optimized for lightning-fast performance

    Browse models