Hippocratic AI + Modular to power real-time patient conversations. Read More →

Blog

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Illustration of a smiling astronaut and a cheerful orange flame character floating in front of a neon-lit triangular background.

Democratizing AI Compute Series

Go behind the scenes of the AI industry with Chris Lattner

Latest

🚨

News

Engineering

Why LLM Inference Needs a New Kind of Router - Part 2

In Part 1, we argued that LLM routing is qualitatively different from HTTP routing. Inference backends hold state that traditional load balancers ignore. This post covers the first of the three layers we identified: the data layer that makes that state queryable on the hot path of every inference request.

May 21, 2026

/

Aayush Deshpande

,  

Deep Dhillon

,  

Alexandr Nikitin

,  

Michael Dunn-OConnor

,  

🚨

News

Community

How I built a pure Mojo app (and 10 libraries) with AI agents

To build it, I needed libraries that did not exist yet or did not support the exact required features. So I built them:

May 19, 2026

/

Ehsan M. Kermani

,  

🚨

News

Company

Hippocratic AI partners with Modular to power flexible, high-quality inference for real-time patient conversations

Every millisecond matters in real-time voice, and at Hippocratic AI's scale latency gains compound directly into better patient experience and per-node efficiency. Production deployments run across multiple frameworks, including SGLang and vLLM, with ongoing evaluation of emerging frameworks for additional latency headroom, alongside a hardware roadmap spanning NVIDIA, AMD, and future-generation accelerators.

May 18, 2026

/

Modular Team

,  

🚨

News

Product

Translating to Mojo via AI Agents

At Modular, we’re always experimenting with the latest agentic programming tools, integrating the best ones into our workflows, and learning quite a few lessons along the way. One thing we realized is that the Mojo language is ideally suited to the needs of modern AI coding agents.

May 13, 2026

/

Brad Larson

,  

Modular Team

,  

🚨

News

Product

Inkwell: Why Your Inference Platform Matters As Much As Your Model

Inkwell is a web app that lets users create interactive storybooks with a custom character along infinite branching paths. When the user opens a story, the first page of text and image art streams in - text appears character-by-character via WebSocket within the first second, the illustration paints in as you read, and by the time you tap a choice, the next page is already written and illustrated. Creating a user experience around the seamless generation of new content requires an inference layer that can perform at scale.

May 12, 2026

/

Tim Davis

,  

🚨

News

Engineering

Why LLM Inference Needs a New Kind of Router - Part 1

HTTP routing has been a solved problem for many years. Round-robin, consistent hashing, least-connections. Pick one, put it in front of a pool of identical servers, and the traffic spreads pretty evenly.

May 8, 2026

/

Aayush Deshpande

,  

Deep Dhillon

,  

Alexandr Nikitin

,  

Michael Dunn-OConnor

,  

🚨

News

Product

Modular 26.3: Mojo 1.0 Beta, MAX Video Gen, and more

Surprise: Mojo 1.0 is officially in beta! Modular’s 26.3 release includes new features and modalities, but the headline is that we’ve officially hit beta for Mojo 1.0, with a clear plan to finalize Mojo 1.0 in the coming months. We share details below, alongside other key announcements in our 26.3 release including video generation in MAX with Wan 2.2 and MAX framework updates.

May 7, 2026

/

Modular Team

,  

🚨

News

Community

Modverse #54: AMD AI DevDay, New Modular Offices, and a Community That Keeps Shipping

There was a lot to celebrate in April: the community shipped GPU renderers, FFmpeg bindings, raylib wrappers, BLAS routines, and a 2D graphics API, just to name a few. The team connected with tons of developers at AMD AI DevDay and our joint meetup with AMD, two new Modular offices opened on two different continents, and Gemma 4 launched with same-day support on NVIDIA and AMD. Here’s the April roundup.

May 4, 2026

/

Caroline Frasca

,  

🚨

News

Case Study

How Frontier Coding Agents Built a Video Diffusion Pipeline on MAX

In a clear demonstration of how rapidly AI coding agents are becoming capable of challenging systems engineering work, two of the five agents produced a working MAX pipeline. The models we tested were:

April 16, 2026

/

Rajan Agarwal

,  

Evan Chu

,  

Tim Davis

,  

Eric Johnson

,  

🚨

News

Engineering

TileTensor Part 1 - Safer, More Efficient GPU Kernels

Suppose you want to load a 2D tile of a matrix, where the tile is stored in shared memory in a specific interleaved layout to avoid bank conflicts. This example uses a toy XOR swizzle to illustrate the class of bugs; real kernels use hardware- and layout-specific swizzles and vectorized accesses. Without a layout abstraction, here is how you would launch a kernel with a block size of (32,8):

April 13, 2026

/

Lukas Hermann

,  

No items found within this category

We couldn’t find anything. Try changing or resetting your filters.

Build the future of AI with Modular

View Editions
  • Person with blonde hair using a laptop with an Apple logo.

    Sign up today

    Signup to our Cloud Platform today to get started easily.

    Sign Up
  • Magnifying glass emoji with black handle and round clear lens.

    Browse open models

    Browse our model catalog, or deploy your own custom model

    Browse models