Blog

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

🚨

NEW

‍

The path to Mojo 1.0

December 5, 2025

🚨

NEW

‍

Modular 25.7: Faster Inference, Safer GPU Programming, and a More Unified Developer Experience

November 20, 2025

🚨

NEW

‍

"TTS 1 Max" (powered by Modular Platform) Ranked #1 Speech Model on Artificial Analysis

November 7, 2025

Latest

🚨

NEW

Product

Paged Attention & Prefix Caching Now Available in MAX Serve

PagedAttention & Prefix Caching Now Available in MAX Serve

February 6, 2025

Ehsan M. Kermani

Read

🚨

NEW

Company

What exactly is “CUDA”? (Democratizing AI Compute, Part 2)

February 5, 2025

Chris Lattner

Read

🚨

NEW

Company

DeepSeek's Impact on AI (Democratizing AI Compute, Part 1)

Part 1 of an article that explores the future of hardware acceleration for AI beyond CUDA, framed in the context of the release of DeepSeek

January 30, 2025

Chris Lattner

Read

🚨

NEW

Engineering

Agentic Building Blocks: Creating AI Agents with MAX Serve and OpenAI Function Calling

January 30, 2025

Ehsan M. Kermani

Read

🚨

NEW

Engineering

Use MAX with Open WebUI for RAG and Web Search

Learn how quickly MAX and Open WebUI get you up-and-running with RAG, web search, and Llama 3.1 on GPU

January 23, 2025

Bill Welense

Read

🚨

NEW

Engineering

Hands-on with Mojo 24.6

Mojo 24.6 introduces key improvements in argument conventions, memory management, and reference tracking, enhancing code clarity and safety with features like 'mut' for mutable arguments, 'origins' for references, and new collection types.

January 21, 2025

Ehsan M. Kermani

Read

🚨

NEW

Engineering

Evaluating Llama Guard with MAX 24.6 and Hugging Face

Imagine unlocking a world of open innovation while ensuring secure, reliable, and enterprise-ready Gen AI deployments—MAX 24.6 enables enterprise AI teams to seamlessly run a vast range of cutting-edge AI models from Hugging Face on NVIDIA GPUs.

December 19, 2024

Bill Welense

Read

🚨

NEW

Engineering

MAX GPU: State of the Art Throughput on a New GenAI platform

Measuring state of the art GPU performance compared to vLLM on Modular's MAX 24.6

December 17, 2024

Max Hutchinson

Tyler Kenney

Read

🚨

NEW

Product

Introducing MAX 24.6: A GPU Native Generative AI Platform

MAX 24.6 release bog featuring MAX GPU

December 17, 2024

Modular Team

Read

🚨

NEW

Engineering

Build a Continuous Chat Interface with Llama 3 and MAX Serve

Build a Chat Application with Llama 3 and MAX Serve

December 17, 2024

Ehsan M. Kermani

Read

Sign up for our newsletter

Get all our latest news, announcements and updates delivered directly to your inbox. Unsubscribe at anytime.

Thank you for your submission.

Your report has been received and is being reviewed by the Sales team. A member from our team will reach out to you shortly.

Thank you,

Modular Sales Team

Thanks for signing up to our newsletter! 🚀

Oops! Something went wrong while submitting the form.

Start building with Modular

Get started - Docs

Blog

Latest

Paged Attention & Prefix Caching Now Available in MAX Serve

What exactly is “CUDA”? (Democratizing AI Compute, Part 2)

DeepSeek's Impact on AI (Democratizing AI Compute, Part 1)

Agentic Building Blocks: Creating AI Agents with MAX Serve and OpenAI Function Calling

Use MAX with Open WebUI for RAG and Web Search

Hands-on with Mojo 24.6

Evaluating Llama Guard with MAX 24.6 and Hugging Face

MAX GPU: State of the Art Throughput on a New GenAI platform

Introducing MAX 24.6: A GPU Native Generative AI Platform

Build a Continuous Chat Interface with Llama 3 and MAX Serve

Sign up for our newsletter

Quick start resources