Blog

Democratizing AI Compute Series
Go behind the scenes of the AI industry with Chris Lattner
Modverse #50: Modular Platform 25.5, Community Meetups, and Mojo's Debut in the Stack Overflow Developer Survey
This past month brought a wave of community projects and milestones across the Modular ecosystem!Modular Platform 25.5 landed with Large Scale Batch Inference, leaner packages, and new integrations that make scaling AI easier than ever. It’s already powering production deployments like SF Compute’s Large Scale Inference Batch API, cutting costs by up to 80% while supporting more than 15 leading models.
Modverse #49: Modular Platform 25.4, Modular 🤝 AMD, and Modular Hack Weekend
Between a global hackathon, a major release, and standout community projects, last month was full of progress across the Modular ecosystem!Modular Platform 25.4 launched on June 18th, alongside the announcement of our official partnership with AMD, bringing full support for AMD Instinct™ MI300X and MI325X GPUs. You can now deploy the same container across both AMD and NVIDIA hardware with no code changes, no vendor lock-in, and no additional configuration!

Modverse #48: Modular Platform 25.3, MAX AI Kernels, and the Modular GPU Kernel Hackathon
May has been a whirlwind of major open source releases, packed in-person events, and deep technical content!We kicked it off with the release of Modular Platform 25.3 on May 6th, a major milestone in open source AI. This drop included more than 450k lines of Mojo and MAX code, featuring the full Mojo standard library, the MAX AI Kernels, and the MAX serving library. It’s all open source, and you can install it in seconds with pip install modular, whether you’re working locally or in Colab with A100 or L4 GPUs.

Modverse #47: MAX 25.2 and an evening of GPU programming at Modular HQ
MAX 25.2 is turning heads — and for good reason. This powerful update delivers industry-leading performance for large language models on NVIDIA GPUs, all without CUDA. MAX 25.2 builds on the momentum of 25.1 and introduces major upgrades to help you build GenAI systems that are faster, leaner, and easier to scale.

MAX is here! What does that mean for Mojo🔥?
When we started Modular, building a programming language wasn't our goal, it ended up being a solution to a set of problems. Specifically, as we were building our platform to unify the world’s ML/AI infrastructure, we realized that programming across the entire stack was too complicated.

Mojo 🔥 Advent of Code 2023
Advent of Code is an annual online coding event that takes place during the holiday season, starting on December 1st and continuing until December 25th. It consists of a series of small programming puzzles that are released daily, each becoming available at midnight EST (UTC-5). Participants from around the world compete for fun, honing their coding abilities and often learning new programming concepts in the process.

Community Spotlight: How I built llama2.🔥 by Aydyn Tairov
Mojo SDK was released in September 2023. As someone who relies on the simplicity of Python and also cares about high performance delivered by languages like C, I was excited to try out Mojo. I felt the same joy and thrill I had experienced when I first discovered programming and ran “Hello World” in QBasic and Turbo Pascal.

Democratizing Compute
Go behind the scenes of the AI industry in this blog series by Chris Lattner. Trace the evolution of AI compute, dissect its current challenges, and discover how Modular is raising the bar with the world’s most open inference stack.

Matrix Multiplication on Blackwell
Learn how to write a high-performance GPU kernel on Blackwell that offers performance competitive to that of NVIDIA's cuBLAS implementation while leveraging Mojo's special features to make the kernel as simple as possible.

Structured Mojo Kernels
Learn how Mojo simplifies GPU programming with modular kernel architecture, compile-time abstractions, and zero-cost performance across modern GPU hardware.

Software Pipelining for GPU Kernels
Explore software pipelining for GPU kernels from first principles. We formalize dependencies as a graph, solve for the optimal schedule with a constraint solver, and show how it all integrates into MAX via pure Mojo.

Why LLM Inference Needs a New Kind of Router
This series walks through why traditional HTTP routing breaks down under LLM workloads and how Modular Cloud solves it with a three-layer architecture built for cache-aware routing.

TileTensor
This series walks through how Modular built TileTensor, a Mojo tensor type that lets kernel authors express complex memory layouts precisely, safely, and efficiently.
No items found within this category
We couldn’t find anything. Try changing or resetting your filters.

Sign up today
Signup to our Cloud Platform today to get started easily.
Sign Up
Browse open models
Browse our model catalog, or deploy your own custom model
Browse models
.png)

