Blog

Democratizing AI Compute Series
Go behind the scenes of the AI industry with Chris Lattner
.jpeg)
What’s the difference between the AI Engine and Mojo?
On May 2nd, we announced our next-generation AI developer platform with two exciting breakthrough technologies — the Mojo programming language and the Modular AI Engine. In just over two months, more than 110k developers have signed up for the Mojo Playground to learn Mojo and experience its performance firsthand, over 30k developers have signed up to our waitlist for the AI engine, and our Modular community on Discord has grown to 17k developers! We’re incredibly excited to see developers sharing their experience with Mojo, providing product feedback, and learning from each other.
.jpeg)
Accelerating AI model serving with the Modular AI Engine
A few weeks ago, we announced the world’s fastest unified AI inference engine. The Modular AI Engine provides significant usability, portability, and performance gains for the leading AI frameworks — PyTorch and TensorFlow — and delivers world-leading execution performance for all cloud-available CPU architectures.

Democratizing Compute
Go behind the scenes of the AI industry in this blog series by Chris Lattner. Trace the evolution of AI compute, dissect its current challenges, and discover how Modular is raising the bar with the world’s most open inference stack.

Matrix Multiplication on Blackwell
Learn how to write a high-performance GPU kernel on Blackwell that offers performance competitive to that of NVIDIA's cuBLAS implementation while leveraging Mojo's special features to make the kernel as simple as possible.

Structured Mojo Kernels
Learn how Mojo simplifies GPU programming with modular kernel architecture, compile-time abstractions, and zero-cost performance across modern GPU hardware.

Software Pipelining for GPU Kernels
Explore software pipelining for GPU kernels from first principles. We formalize dependencies as a graph, solve for the optimal schedule with a constraint solver, and show how it all integrates into MAX via pure Mojo.

Why LLM Inference Needs a New Kind of Router
This series walks through why traditional HTTP routing breaks down under LLM workloads and how Modular Cloud solves it with a three-layer architecture built for cache-aware routing.

TileTensor
This series walks through how Modular built TileTensor, a Mojo tensor type that lets kernel authors express complex memory layouts precisely, safely, and efficiently.
No items found within this category
We couldn’t find anything. Try changing or resetting your filters.

Sign up today
Signup to our Cloud Platform today to get started easily.
Sign Up
Browse open models
Browse our model catalog, or deploy your own custom model
Browse models
