Blog

🚨

New

Qualcomm to Acquire Modular

June 24, 2026

🚨

New

ModCon 2026: Modular’s Developer Conference

June 17, 2026

🚨

New

Modular 26.4: SOTA MoE Serving, Model Bringup via Agent Skills, Mojo 1.0 Beta 2 and More

June 18, 2026

Latest

🚨

News

Company

Qualcomm to Acquire Modular

NEW YORK – June 24, 2026 – Qualcomm Incorporated (NASDAQ: QCOM), a connected computing leader at the center of the AI era, today announced that it has reached an agreement to acquire Modular Inc, strengthening Qualcomm Technologies, Inc.’s software foundation for generative and agentic AI across data center and edge environments.

June 24, 2026

Modular Team

Read

🚨

News

Product

Modular 26.4: SOTA MoE Serving, Model Bringup via Agent Skills, Mojo 1.0 Beta 2 and More

Modular 26.4 brings state-of-the-art mixture-of-experts (MoE) serving to Modular Cloud, expands MAX support for the newest open-weight models, and takes another step toward Mojo 1.0.

June 18, 2026

Modular Team

Read

🚨

News

Company

ModCon 2026: Modular’s Developer Conference

For years, the AI stack has been locked to specific hardware and switching meant rewriting everything you'd already shipped. At ModCon 2026, we’ll enable hardware flexibility and showcase what changes when the same model, code, and container run across NVIDIA, AMD, and new hardwares to be announced, with the performance and cost numbers to back it up.

June 17, 2026

Modular Team

Read

🚨

News

Company

Day Zero: MiniMax M3 Open Weights on Modular Cloud

To avoid the repeated loads, MSA inverts the mapping by grouping the queries by the KV block they selected; i.e. executing in key-block-major form and what MiniMax calls “KV outer gather Q”. As a result, we can improve the arithmetic intensity since the blocks are loaded once, before computing partial attention for all of those queries, and then merging the partial results.

June 11, 2026

Modular Team

Read

🚨

News

Community

Modverse #55: Mojo 1.0 Beta, Community Mojo Libraries, and Real-Time Patient Conversations Powered by MAX

This edition captures everything happening across the Modular ecosystem, from developers building with MAX and Mojo🔥 to the broader impact Modular is having across AI infrastructure. Here's a look at what's been happening lately.

June 10, 2026

Caroline Frasca

Read

🚨

News

Engineering

Why LLM Inference Needs a New Kind of Router - Part 3

Most routing stacks ship with a fixed set of algorithms: round-robin, least-requests, consistent hashing, etc. These are generally independent implementations rather than composable components. As a result, when a customer asks for "consistent hashing with a concurrency cap" or "cache-aware with session stickiness," it requires adding a new algorithm from scratch. Disaggregated prefill/decode increases this proliferation. Every variant traditionally has its own HTTP handler, discovery logic, proxy code, and session management. That requires hundreds of lines of additional plumbing per variant.

June 5, 2026

Aayush Deshpande

Deep Dhillon

Alexandr Nikitin

Michael Dunn-OConnor

Read

🚨

News

Engineering

Three trends from MLSys 2026

The shared conclusion of these talks was that agentic engineering requires substantially greater rigor in specification, design, and validation.

May 29, 2026

Michael Dunn-OConnor

Brian Zhang

Shouzheng Liu

Read

🚨

News

Engineering

Why LLM Inference Needs a New Kind of Router - Part 2

To route a request to the pod with the best cached prefix, you need to know which blocks are cached on which pod. That sounds simple until you look at the numbers. You may have hundreds of pods, each with thousands of cached blocks. State can change hundreds of times per second. Across this complexity, queries need to return in microseconds because they sit on the critical path of every inference request.

May 21, 2026

Aayush Deshpande

Deep Dhillon

Alexandr Nikitin

Michael Dunn-OConnor

Read

🚨

News

Community

How I built a pure Mojo app (and 10 libraries) with AI agents

To build it, I needed libraries that did not exist yet or did not support the exact required features. So I built them:

May 19, 2026

Ehsan M. Kermani

Read

🚨

News

Company

Hippocratic AI partners with Modular to power flexible, high-quality inference for real-time patient conversations

Every millisecond matters in real-time voice, and at Hippocratic AI's scale latency gains compound directly into better patient experience and per-node efficiency. Production deployments run across multiple frameworks, including SGLang and vLLM, with ongoing evaluation of emerging frameworks for additional latency headroom, alongside a hardware roadmap spanning NVIDIA, AMD, and future-generation accelerators.

May 18, 2026

Modular Team

Read

No items found within this category

We couldn’t find anything. Try changing or resetting your filters.

Build the future of AI with Modular

Get started - FREE

View Editions

Sign up today
Signup to our Cloud Platform today to get started easily.
Sign Up
Browse open models
Browse our model catalog, or deploy your own custom model
Browse models

Blog

Latest

Sign up for our newsletter