Blog

🚨

New

Hippocratic AI partners with Modular to power flexible, high-quality inference for real-time patient conversations

May 18, 2026

🚨

New

Modular 26.3: Mojo 1.0 Beta, MAX Video Gen, and more

May 7, 2026

🚨

New

Software Pipelining for GPU Kernels: Part 1 - The Pipeline Problem

March 30, 2026

Latest

🚨

News

Engineering

Why LLM Inference Needs a New Kind of Router - Part 2

In Part 1, we argued that LLM routing is qualitatively different from HTTP routing. Inference backends hold state that traditional load balancers ignore. This post covers the first of the three layers we identified: the data layer that makes that state queryable on the hot path of every inference request.

May 21, 2026

Aayush Deshpande

Deep Dhillon

Alexandr Nikitin

Michael Dunn-OConnor

Read

🚨

News

Community

How I built a pure Mojo app (and 10 libraries) with AI agents

To build it, I needed libraries that did not exist yet or did not support the exact required features. So I built them:

May 19, 2026

Ehsan M. Kermani

Read

🚨

News

Company

Hippocratic AI partners with Modular to power flexible, high-quality inference for real-time patient conversations

Every millisecond matters in real-time voice, and at Hippocratic AI's scale latency gains compound directly into better patient experience and per-node efficiency. Production deployments run across multiple frameworks, including SGLang and vLLM, with ongoing evaluation of emerging frameworks for additional latency headroom, alongside a hardware roadmap spanning NVIDIA, AMD, and future-generation accelerators.

May 18, 2026

Modular Team

Read

🚨

News

Product

Translating to Mojo via AI Agents

At Modular, we’re always experimenting with the latest agentic programming tools, integrating the best ones into our workflows, and learning quite a few lessons along the way. One thing we realized is that the Mojo language is ideally suited to the needs of modern AI coding agents.

May 13, 2026

Brad Larson

Modular Team

Read

🚨

News

Product

Inkwell: Why Your Inference Platform Matters As Much As Your Model

Inkwell is a web app that lets users create interactive storybooks with a custom character along infinite branching paths. When the user opens a story, the first page of text and image art streams in - text appears character-by-character via WebSocket within the first second, the illustration paints in as you read, and by the time you tap a choice, the next page is already written and illustrated. Creating a user experience around the seamless generation of new content requires an inference layer that can perform at scale.

May 12, 2026

Tim Davis

Read

🚨

News

Engineering

Why LLM Inference Needs a New Kind of Router - Part 1

HTTP routing has been a solved problem for many years. Round-robin, consistent hashing, least-connections. Pick one, put it in front of a pool of identical servers, and the traffic spreads pretty evenly.

May 8, 2026

Aayush Deshpande

Deep Dhillon

Alexandr Nikitin

Michael Dunn-OConnor

Read

🚨

News

Product

Modular 26.3: Mojo 1.0 Beta, MAX Video Gen, and more

Surprise: Mojo 1.0 is officially in beta! Modular’s 26.3 release includes new features and modalities, but the headline is that we’ve officially hit beta for Mojo 1.0, with a clear plan to finalize Mojo 1.0 in the coming months. We share details below, alongside other key announcements in our 26.3 release including video generation in MAX with Wan 2.2 and MAX framework updates.

May 7, 2026

Modular Team

Read

🚨

News

Community

Modverse #54: AMD AI DevDay, New Modular Offices, and a Community That Keeps Shipping

There was a lot to celebrate in April: the community shipped GPU renderers, FFmpeg bindings, raylib wrappers, BLAS routines, and a 2D graphics API, just to name a few. The team connected with tons of developers at AMD AI DevDay and our joint meetup with AMD, two new Modular offices opened on two different continents, and Gemma 4 launched with same-day support on NVIDIA and AMD. Here’s the April roundup.

May 4, 2026

Caroline Frasca

Read

🚨

News

Case Study

How Frontier Coding Agents Built a Video Diffusion Pipeline on MAX

In a clear demonstration of how rapidly AI coding agents are becoming capable of challenging systems engineering work, two of the five agents produced a working MAX pipeline. The models we tested were:

April 16, 2026

Rajan Agarwal

Evan Chu

Tim Davis

Eric Johnson

Read

🚨

News

Engineering

TileTensor Part 1 - Safer, More Efficient GPU Kernels

Suppose you want to load a 2D tile of a matrix, where the tile is stored in shared memory in a specific interleaved layout to avoid bank conflicts. This example uses a toy XOR swizzle to illustrate the class of bugs; real kernels use hardware- and layout-specific swizzles and vectorized accesses. Without a layout abstraction, here is how you would launch a kernel with a block size of (32,8):

April 13, 2026

Lukas Hermann

Read

No items found within this category

We couldn’t find anything. Try changing or resetting your filters.

Build the future of AI with Modular

Get started - FREE

View Editions

Sign up today
Signup to our Cloud Platform today to get started easily.
Sign Up
Browse open models
Browse our model catalog, or deploy your own custom model
Browse models

Blog

Latest

Sign up for our newsletter