Blog

🚨

News

Series

What about OpenCL and CUDA C++ alternatives? (Democratizing AI Compute, Part 5)

March 5, 2025

Chris Lattner

Read

🚨

News

Series

Modverse #46: MAX 25.1, MAX Builds, and Democratizing AI Compute

We recently introduced MAX 25.1, a major leap forward in AI development. This release enhances agentic and LLM workflows, introduces MAX Builds as a central hub for GenAI models and application recipes, and debuts a new GPU programming interface. Developers can now take advantage of GPU-accelerated embeddings, OpenAI-compatible function calling, structured output generation, and high-performance LLM optimizations like paged attention and prefix caching for improved efficiency.

February 27, 2025

Caroline Frasca

Read

🚨

News

Series

CUDA is the incumbent, but is it any good? (Democratizing AI Compute, Part 4)

Answering the question of whether CUDA is “good” is much trickier than it sounds.

February 20, 2025

Chris Lattner

Read

🚨

News

Series

How did CUDA succeed? (Democratizing AI Compute, Part 3)

If we as an ecosystem hope to make progress, we need to understand how the CUDA software empire became so dominant.

February 12, 2025

Chris Lattner

Read

🚨

News

Series

What exactly is “CUDA”? (Democratizing AI Compute, Part 2)

February 5, 2025

Chris Lattner

Read

🚨

News

Series

DeepSeek's Impact on AI (Democratizing AI Compute, Part 1)

Part 1 of an article that explores the future of hardware acceleration for AI beyond CUDA, framed in the context of the release of DeepSeek

January 30, 2025

Chris Lattner

Read

Series
Democratizing Compute
Go behind the scenes of the AI industry in this blog series by Chris Lattner. Trace the evolution of AI compute, dissect its current challenges, and discover how Modular is raising the bar with the world’s most open inference stack.
11 part series
View Series
Series
Matrix Multiplication on Blackwell
Learn how to write a high-performance GPU kernel on Blackwell that offers performance competitive to that of NVIDIA's cuBLAS implementation while leveraging Mojo's special features to make the kernel as simple as possible.
4 part series
View Series
Series
Structured Mojo Kernels
Learn how Mojo simplifies GPU programming with modular kernel architecture, compile-time abstractions, and zero-cost performance across modern GPU hardware.
4 part series
View Series
Series
Software Pipelining for GPU Kernels
Explore software pipelining for GPU kernels from first principles. We formalize dependencies as a graph, solve for the optimal schedule with a constraint solver, and show how it all integrates into MAX via pure Mojo.
1 part series
View Series
Series
Why LLM Inference Needs a New Kind of Router
This series walks through why traditional HTTP routing breaks down under LLM workloads and how Modular Cloud solves it with a three-layer architecture built for cache-aware routing.
2 part series
View Series
Series
TileTensor
This series walks through how Modular built TileTensor, a Mojo tensor type that lets kernel authors express complex memory layouts precisely, safely, and efficiently.
1 part series
View Series

No items found within this category

We couldn’t find anything. Try changing or resetting your filters.

Build the future of AI with Modular

Get started - FREE

View Editions

Sign up today
Signup to our Cloud Platform today to get started easily.
Sign Up
Browse open models
Browse our model catalog, or deploy your own custom model
Browse models

Blog

Sign up for our newsletter